A Review of VoIP Communication Protocols
WhatsApp was hacked last week and many users have yet to update their mobile phones applications. Anytime 1.5 billion users on the Earth are potentially hacked, it’s worth looking into how and why. Standard based VoIP apps use three different protocols for enabling calls: SIP, RTP and RTCP.
SIP, Session Initiation Protocol, is a very simple protocol — it greatly resembles HTTP 1.0 for establishing connections between two or more parties – important since IP addresses can change as we move across different networks. SIP is bit more difficult to hack, since SIP is typically protected by TLS (we hope), but possible through advance DNS hijacking attacks (https://www.wired.com/story/iran-dns-hijacking/).
RTP, Real-time Transport Protocol, is used for sending video and voice (the media between two or more parties on a call), as digitized audio/voice packets and digitized encoded video stream. The algorithms for compressing and decompressing (CODEC) streaming media can be very complex. Special care must be used when writing decompression code to prevent attacks.
So How was WhatsApp Hacked?
The last protocol, and protocol used for the Whatsapp hack (https://research.checkpoint.com/the-nso-whatsapp-vulnerability-this-is-how-it-happened/) is RTCP, RTP Control Protocol. RTCP is a much more complex protocol used for optimizing voice quality by packet rate and size of RTP packets for simple end-to-end calls (sample rate). For multi-media calls (e.g. audio and video), RTCP keeps various media at different sampling rates synchronized, so our voice and lips match.
This attack involves a specially crafted malicious RTCP message, which is surprising since the attacker doesn’t require any human trigger, i.e. accepting an unknown phone call. RTCP is kicking-off prior to call acceptance — this reminds us of Apple Facetime kicking off a multi-call session prior to acceptance (https://www.nytimes.com/2019/01/28/technology/personaltech/facetime-bug-iphone-hack.html). In most VoIP implementations, RTCP doesn’t become active until a call is accepted, but obviously more implementations are adopting this approach.
What Vulnerability does the WhatsApp Hack Take Advantage of?
I am speculating, my best guess, WhatsApp wanted to provide a better user experience, by allowing a call to immediately begin the moment an accept occurs — no perceptible pause due to end-to-end encryption (i.e. key agreement, X.509v3 certificate chain validation, key signing, key sign verification, etc).
The root problem is not WhatsApp buffer overflow. The attacker’s payload is either leveraging a second vulnerability in the underlying operating system (iOS and Android) or triggering a vulnerability in a built-in app (Safari and Chrome), which means other vulnerable apps can be weaponized using this same payload with a trivial modification. Dellfer ZeroDayGuard detects, prevents, and notifies this class of attack.