As voice assistants like Google Assistant and Alexa increasingly make their way into internet of things devices, it’s becoming harder to track when audio recordings are sent to the cloud and who might gain access to them. To spot transgressions, researchers at the University of Darmstadt, North Carolina State University, and the University of Paris Saclay developed LeakyPick, a platform that periodically probes microphone-equipped devices and monitors subsequent network traffic for patterns indicating audio transmission. They say it identified “dozens” of words that accidentally trigger Amazon Echo speakers.
Voice assistant usage might be on the rise — as of 2019, there were an estimated 4.25 billion assistants being used in devices around the world, according to Statista — but privacy concerns haven’t abated. Reporting has revealed that accidental activations have exposed contract workers to private conversations. The risk is such that law firms including Mischon de Reya have advised staff to mute smart speakers when they talk about client matters at home.
LeakyPick is designed to identify hidden voice audio recordings and transmissions as well as to detect potentially compromised devices. The researchers’ prototype, which was built on a Raspberry Pi for less than $40, operates by periodically generating audible noises when a user isn’t home and monitoring traffic using a statistical approach that’s applicable to a range of voice-enabled devices.
LeakyPick — which the researchers claim is 94% accurate at detecting speech traffic — works for both devices that use a wake word and those that don’t, like security cameras and smoke alarms. In the case of the former, it’s preconfigured to prefix probes with known wake words and noises (e.g., “Alexa,” “Hey Google”), and on the network level, it looks for “bursting,” where microphone-enabled devices that don’t typically send much data cause increased network traffic. A statistical probing step serves to eliminate cases where traffic bursts result from non-audio transmissions.
To identify words that might mistakenly trigger a voice recording, LeakyPick uses all words in a phoneme dictionary with the same or similar phoneme count compared with actual wake-words. (Phonemes are the perceptually distinct units of sound in a language that distinguish one word from another, for example p, b, d, and t in the English words pad, pat, bad, and bat.) It also verbalizes random words from a simple English word list.
To evaluate LeakyPick’s performance, the researchers tested it with an Echo Dot, a Google Home, a HomePod, a Netatmo Welcome and Presence, a Nest Protect, and a Hive Hub 360, and Hive View. After creating baseline burst and statistical probing data sets, they monitored the eight devices’ live traffic and randomly tested a set of 50 words out of the 1,000 most-used words in the English language combined with a list of known wake-words of voice-activated devices. Then, they had users in three households interact with the three smart speakers — the Echo Dot, HomePod, and Google Home — over a period of 52 days.
The researchers measured LeakyPick’s accuracy by recording timestamps of when the devices began listening for commands, taking advantage of indicators like the light ring around the Echo Dot. A light sensor enabled LeakyPick to mark each time the devices were activated, while a 3-Watt speaker connected to the Pi via an amplifier generated sound and a Wi-Fi USB dongle captured network traffic.
In one experiment intended to test LeakyPick’s ability to identify unknown wake words, the researchers configured the Echo Dot to use the standard “Alexa” wake word and had LeakyPick play different audio inputs, waiting for two seconds to ensure the smart speaker “heard” the input. According to the researchers, the Echo Dot “reliably” reacted to 89 words across multiple rounds of testing, some of which were phonetically very different than “Alexa,” like “alachah,” “lechner,” and “electrotelegraphic.”
All 89 words unexpectedly streamed audio recordings to Amazon — findings that aren’t surprising in light of another study identifying 1,000 phrases that incorrectly trigger Alexa-, Siri-, and Google Assistant-powered devices. The coauthors of that paper, which has yet to be published, told Ars Technica the devices’ in some cases send the audio to remote servers where “more robust” checking mechanisms also mistake the words for wake terms.
“As smart home IoT devices increasingly adopt microphones, there is a growing need for practical privacy defenses,” the LeakyPick creators wrote. “LeakyPick represents a promising approach to mitigate a real threat to smart home privacy.”