Well, "can get useful info through unheard sound on typical hardware at a range of 65 ft" is interesting. Not shocking, and horrifically oversold, but interesting.
Imagine a world in which Google Glass other speech activated devices are the norm. A virus like this could potentially spread from person to person as they passed each other in the street, without anyone knowing, if it exploited a bug in the speech recognition tech. It's not interesting if it relies on the other computer already being infected, but exploits in image/sound parsing are not uncommon and could be combined with this. Another cool hack would be a physical real-world shape/pattern which exploited the image recognition software in something like glass to take over the device.
It's an interesting idea I think, which will have more applications in the future than it does now as more computers start to be always on, listening and watching, but mostly because audio or video is not an infection vector we take seriously yet in the same way that we do network infection.
That's how it seems to me. But a well-hidden bit of malware won't have a problem turning on the mic for a few seconds on the hour every hour to listen for a handshake chirp or the like. It's a limitation, but far from an insurmountable one.
For the purposes of this article, it is assumed the air-gapped computer is already running the malware, having been infected by some other means (ex. thumbdrive). The ultrasound communications provide a continuous (albeit slow) link between two infected computers.
It would be quite impressive, though, if a vulnerability in an audio driver allowed an uninfected computer to be infected simply by "hearing" the exploit sound!
That would present infection vector^^. Study was purely about two computers with very special software trying to covertly communicate using ultrasound.
But it does have the significance that you still have to worry whether your air-gapped machine is infected since it could secretly leak info even unplugged.
The transfer rate of ~20 bytes per second is tiny, but of course that tiny amount could be the difference between two machines appearing to communicate and not appearing to. If your network traffic is confirmed to be zero, that's a state of confidence that's easy to take advantage of, and a deeply-rooted bit of malware like this could strike in extremely subtle and devious ways.
20 bits = 2 and a half bytes, and you are right that is enough.
What surprises me is how people regarding this as "novel" not too long ago quite a few people used a "modem". That was a mythical device that used phone networks to transmit information. Regular phones would pick up that as sound.
Sure there are few technical hurdles - covert communication and stringent error correction are most visible, but concepts are remarkably similar.
It's surprising how many people either forget about modems or never had internet in the days of modems. I've had numerous nerdy conversations and theory crafting scenarios where someone will bring up the crazy idea of using sound to transmit data. I'm just like "Yeah, we've done that already. Remember dialup?".
Also, modems didn't really seem very different from ethernet. My initial reaction as an early adolescent to first encountering ethernet was along the lines of, "oh, so it's just a wider port? what's all the fuss about?" Besides all the funny noises on connection (which, as an aside - why did the handshake have to actually be audible?), everything seemed to work the same, just more slowly. You don't have to forget about modems to be unaware that using sound to transmit computer data was once common.
Generally the handshake was audible so that you could tell if someone was already on the phone when you started connecting, or if someone answered the phone instead of a computer.
>It's surprising how many people either forget about modems or never had internet in the days of modems.
Pardon the snark, but what is surprising about that? Every person with access to the internet born since about 1997* will have no memory of using dialup. The ratio of people who used dialup relative to the people on the internet is going to continue to dwindle rapidly from here on out. Even then, I probably spent 6 or 7 years dialing up, and the connection of that to this story never happened in my brain.
* say 2003-2004 was when broadband went mainstream, at that point, people born in 1997 will be 6-7 years old.
Yeah situations I've described. I didn't really give much background information.
IE: I was having a conversation with some techie friends back when the SOPA scare was still at its peak. We were discussing purely hypothetical doomsday scenarios that would involve a "darknet" and the technical implications of creating an entirely subversive network. The idea was brought up that we could use our existing telephone infrastructure to send data, and only a way to transfer data via sound would need to be developed. This was shared amongst a few of the participants until I piped up and said "you know dialup has done this already, right? The information is already there."
This is a conversation amongst people who are involved with and passionate about technology and are around my age or older. These are the kinds of people that I'd expect, if not to think of specifics right away, but to at least be lead to something that was as pervasive as dialup. True, I don't expect an 18y/o of today to know or think about it, or even a 20y/o, but a 25+ hacker, developer, or general tech geek? I'm just surprised at how many in that former category just don't remember it.
Smartmodems of the 1980s didn't use sound to transmit data, they modulated an electrical signal directly on the line. They were silent (except during call set up, but that was just a "Monster Rancher" to confirm set up was proceeding in the normal way)
Beyond modems, there are also some obscure biological machines that transmit information by modulating sound waves in the air that have been around for some millions of years.
I have 6 desktops and 2 laptops in my house. None of the 6 desktops have a microphone. Testing now on the two laptops, both have a built-in microphone. I was expecting the microphone would be dependent upon the built-in webcam and at least light up the webcam light when it was "turned on" but that is apparently not the case with either one. I used the built-in Windows Sound Recorder to record ambient sound in the room I'm in just now and there was no indication that it was recording anything. Scary stuff.
It's scary just how much SCADA software (electricity, water, fuel, prisons, etc) is absolutely ridden with vulnerabilities, protected ONLY by that air gap. Typically this internal network will be accessed by terminals sitting immediately adjacent to terminals on the more general network (so a user can quickly switch between them over the air gap).
They'd both need to be infected it's true, but that is quite achievable with USB social engineering or if an attacker can gain physical access to any of the terminals on the network. If that were the case then an attacker could get any information out that they wanted (flight data, prison routines, defence asset refueling movements, even just information enumeration and vulnerabilities in the network).
The terminal probably wouldn't have a microphone it's true (typically very old hardware that everyone is too scared to upgrade), but if it did it would also give remote trigger access to abuse that infrastructure.
It's actually good information for security architects. If you can't get approval to start using software updates, make sure your damn microphones are turned off.
I bet that someone capable of infecting enough offline computers to form this sonic bridge most likely already has enough options to transfer data and commands in a way simpler and more reliable way.
OK, so on a (hypothetical) air-gapped laptop you just blu-tak the microphone port and stick a jack in the headphone output. Then for complete tin-foil control, you just open up the case and disconnect the speaker cables.
I wonder how practical this would be in reality, the frequency response of a typical built-in PC speaker and microphone should be severely limited. And in addition to this there would probably also be quite a big problem with background noise, which would get worse as the distance between the machines gets larger. I guess you could use error correction and handshakes to retransmit corrupted data, but that would limit the transmission speed even further.
Most 'white noise generators' aren't producing white noise. White noise has equal power in each linear band, e.g. 100Hz to 120Hz would have the same spectral power as 19,000Hz to 19,020Hz. Humans hear in a logarithmic fashion so real white noise is actually incredibly annoying to listen to. To the ear it has a lot of high frequency content in it, it sounds like a high frequency fuzz.
Most 'white noise generators' are really outputting some sort of spectrally shaped pseudo-random noise, e.g. sound masking systems in offices closely follow the response curve of the human voice and output basically nothing under 200Hz nor over 7kHz. Even if playing pure 'pink noise', which is logarithmically flat and thus much more reasonable to listen to, there would be very little power in the band the authors of the paper are using (17kHz to 20kHz). I would also doubt that many white noise generator products are capable of producing usable output at those frequencies in the first place. Most laptop speakers probably can as a result of their size and design. 'White noise generators' should be targeting low end extension over high frequency output and I doubt many have multiple drivers per channel to accomplish both goals. Given all this, I highly doubt off the shelf 'white noise generator' products would have much effect on this communication method.
There are ultrasonic 'blasters' for lack of a better term that may work. I know some convenience stores mount them outside the store in order to deter younger folks loitering outside. As you age the limits of your high frequency hearing is reduced in a fairly predictable way; e.g. at age 15 you may be able to hear up to 22kHz, at 35 you may only be able to hear up to 18kHz. If you want to drive away younger people, blast out noise in the 19kHz to 22kHz range, it's really annoying to listen to and older folks won't even notice it. A similar thing may work to deter this communication channel, as long as you don't have too many young people around, or older folks with exceptional hearing range for their age.
> (hint, hit more and enable stereo, I find that much more interesting for some reason)
I haven't played with this site in particular but they are probably using incoherent sources for each channel. i.e. using two separate random noise generators that aren't working off the same seed value. Even though the spectral content of each channel may be the same, it's not the same at any instant which causes your brain to get a bit 'lost', it doesn't sound like a point source anymore but instead just a 'room filling' sound that you can't pin point. The same principal (in a more targeted manner) is used in mixing stereo music to create a 'sound stage', usually only the voice mix is actually identical in both channels even though you can hear the guitar in each channel independently.
Given the amazing array of ways to slice and dice audio signals, that would be a very dangerous security measure to depend on. Better to just clip wires on the microphones and speakers.
This just shows my ignorance with regard to this topic, but is the speaker cutoff from plugging in headphones always a hardware switch? I.e., can software override it and direct sound to the speakers even if one has headphones plugged in?
Of course, if one were playing music or something, it would be obvious that the switch had been circumvented.
I have seen this behavior both controlled by an extra mechanical pin in the audio jack and also by a feature on the audio chip which was ultimately controlled in software. So it's hard to tell whether that would work.
Also consider that if you played the sound at maximum volume it might actually be audible some distance away from the headphones.
On Linux you can choose different behavior; so it's apparently software. At least, on machines I've worked on. Depending on that dropdown, I can plug in headphones and the built-in speakers will continue to output.
No, on my current mac there is a delay of one or two seconds between when the plug is fully inserted and the speakers/headphones start working. Thus, i presume it is some silicon or software.
On one computer I owned, it was entirely in the audio driver -- so when I switched to Linux, I was surprised to see that the headphones and speaker had independently controls for volume levels. (The latter was labelled "Mono Out", and the behaviour where plugging in headphones cut off speakers didn't happen on that Linux install.)
No, it's not always hardware, things are shifting over to software switching over the past few years. One way to bypass this completely is to short your microphone input, or make a feedback loop between mic/headphone jack (this hurts just thinking about it, but should work.)
What else would you exfiltrate/steal besides passphrases and private keys? Serial numbers/IPs/hostnames of any discovered devices on the air-gapped network?
Send to the airgapped computer(s) software updates and new commands to run? 20 bytes per second is enough for that, or is it bits? A shellcode is about 40bytes or less.
I guess if you want to guard against this the reasonable thing to do would be to physically take out the mic and speakers of any to-be-secure-computers. Or have one computer listen in on these high frequencies on the perimeter or whatever. Would be interesting to discover chats.