Motivated by the need to use less bandwidth, some service providers for Internet phone calls use a compression technique called variable bitrate compression. The technique produces different size packets of data for different sounds.
How it works is the sampling rate is kept high for long complex sounds like “ow,” but cut down for simple consonants like “c.” This variable method saves on bandwidth, while maintaining sound quality.
Since VoIP calls may traverse untrusted networks, packets should be encrypted to ensure confidentiality. However, although Voice over IP (VoIP) is rapidly being adopted, its security implications are not yet fully understood.
Now comes a report from researchers at MIT Lincoln Laboratory, Google, University of North Carolina, Chapel Hill and John Hopkins University stating it is possible to identify the phrases spoken within encrypted VoIP calls when the audio is encoded using variable bit rate codecs.
Wow! This means VoIP services like Skype and Vonage which use variable bit rate compression (the compression rate varies according to the actual words being spoken) could be vulnerable to such an attack.
How they did it
The researchers trained a hidden Markov model using only knowledge of the phonetic pronunciations of words, such as those provided by a dictionary, and searched packet sequences for instances of specified phrases. The approach did not require examples of the speaker’s voice, or even example recordings of the words that make up the target phrase.
They evaluated their techniques on a standard speech recognition corpus containing over 2,000 phonetically rich phrases spoken by 630 distinct speakers from across the continental United States. The results indicated they could identify phrases within encrypted calls with an average accuracy of 50%; and with accuracy greater than 90% for some phrases.
Clearly, such an attack calls into question the efficacy of current VoIP encryption standards. However, there is a potential solution if VoIP providers expanded the data packets to an equal length. The downside is this approach would reduce the extent of the compression and hence use more bandwidth.
There is always a cost to security: you can pay now or you can pay later, but it will cost.