At first I thought the file sharing itself was over sound, but the sound is just to negotiate details of the WebRTC session which is then actually used to transmit the data. Neat and handy.
I was thinking they had made something similar to the Fldigi suite, software kind of common in the ham radio universe. You can use it to encode arbitrary binary data into a lot of different modulations which exist in the audible range. Send files between computers without any IP networking at all.
There does seem to be a few projects related to this on GitHub if you browse the tag. Enough so there’s firewall apps to counter non typical network telemetry. Allegedly, google, amazon, etc are using this. Interesting one way or another.
I wonder if it’s related to ad-tech that was embedding “sonic markers” in tv/YouTube ads that were to be picked up by ad-tech SDK’s embedded in apps.
If someone had granted microphone access, the SDK could wait until it hears a signal embedded in an ad, and then pass back whatever data it had accumulated.
I estimate up to somewhere in the range 100kbit/s to 500kbit/s.
It would take a sophisticated modulation scheme, like a modem. The sound would be similar to white noise. Software that's getting 20bit/s or whatever is using old-school tone-based modulation, but it is quite robust in the presence of other sounds.
That assumes:
- Frequency response ~20kHz.
- Audio ADC/DAC sample rates configurable well in excess of the Nuqyist limit of the frequency response range of the speakers and mics.
- Good signal to noise (~90dB), which equates to ~15 bits at max volume range.
- Not playing at max volume, but a reasonable level ~18dB down, so ~12 bits.
- A very quiet environment, or one where the background sound is very predictable.
- Stereo laptop speakers and stereo mics, to make a 2x2 MIMO spatially modulated channel.
Because of this I just figured out why white noise is truly random and can’t communicate any data. I wasn’t thinking of it intuitively that way before even knowing conceptually of randomness and entropy and Cosmic background radiation.
Thanks!
Interesting...
If white noise “exists” and is constantly being received does it have an energetic value? Hmm.
A lot of the attributes you describe are the attributes that humans have for hearing and perception as a range, no?
Interesting if there is a particular reason our bodies converged to this state because of a link to sound and resiliency to information communication. I need to think on that one.
For the FSK modulation scheme that I use in wave-share, I manage to achieve 8-16 bytes / s for a reasonable in-room distances and regular surrounding noise. It depends also on the speakers / microphone quality, but overall if you want to have a reliable connection between air-gapped devices, the speed is nowhere near comparable to what modems can achieve. Or at least this is my experience.
I need to study more. This brings up interesting questions for me on ultrasound spectrum and others related to information loss. I hear bass more clearly over distances that are noisy but I hear high pitches over distances without noise. Speakers have filters on them to prevent transmission in particular frequencies. I’m unsure of what quality is related to on mics.
Cool stuff to understand. Thanks. Now I must learn more.
How do I reason about the range of frequencies that air itself has a threshold of vibration? There’s a particular range within that type of matter itself I’d imagine. Sound is only a human description of a waveform perturbation within the air... I’d imagine if the air vibrates too much it could explode or some other effects.
Radio waves are electromagnetic while sound waves are a physical wave. I assume that having to actually move the air molecules would require a lot more power and time than electrons in an antenna as well as having to deal with air density and environmental sound interference. I would be interested in knowing the limit of bandwidth here though.
It also brings up an interesting thought I never had before. With early home computers, and I assume before that as well, you would place a phone handset onto a coupler to receive data. It is odd to think now that there is a little air in there that is transferring sound between the handset speaker and the coupler microphone. I assume that without that little bit of air it wouldn't function.
Also if air was such an issue then the dynamics of a song being played live would be of series inconsistent patterns leading to error in rate based on environmental factors. Rate would be whatever air density allows but whether you received that rate is different.
Song recorded vs song played vs song heard when recorded vs song heard when played over speakers.
The waveform of both should be highly similar in a way that wouldn’t be considered error besides amplitude or echoes from acoustics.
So it’s a distributed message sensing problem?
In my perspective I just saw the waveform of a techno song == the rate of data being communicated.
120 bpm is 120 bpm. Loss over distance is it’s own factor based on ability to ack.
All you need to know is the channel bandwidth (say 20-50khz) and noise floor power relative to rx power to give maximum data transfer rate (Shannon-Hartle theorem). Capacity = (Bandwidth) * log2 (1 + recieve_power / noise_floor_power). There is also a lot of em noise going on - just like sound. I would expect sound to be many orders of magnitude less power effecient/bit than EM - but the principles basically the same
Thanks for adding color to the thoughts there. Appreciate it. Now I must go figure out intuitively why log is baked into the universe as a mathematical property...
Not a lot. I'm not sure of where these things usually top out but at least for the software linked above the normal codecs are often around 20 baud. They're intended to be used with high-frequency radio transmissions, more so for keyboard to keyboard typing interfaces rather than arbitrary digital data but they can easily be used for any arbitrary data.
I've used it on a few occasions to send out a text email or some other kind of small text document. Sending out something like a several megabyte image would be very slow.
Transmission power hasn't been the cap fo a long time.
The analogue phone line signal is digitised at the exchange, and the digital channel is explicitly 64kbit/s.
Modems can only do "56k" by co-operating with the digital system; one of the modems actually has a digital ISDN connection.
Without that ISDN connection, analogue modems reach up to 33.6k. Theoretically they can do more (but never more than 64k), but in practice that was the last standard produced prior to the 56k, semi-analogue-semi-digital standard.
Acoustic communication is quite nice for this kind of ad-hoc provisioning, because it doesn't require any special hardware or prior pairing.
One security issue is that if you do this in public, an attacker could initiate malicious connections and remove your own transmission via signal cancellation. It's hard to notice that when using an inaudible frequency band. To defend against this, our research team has described a method to protect the integrity of acoustic communication on the physical layer, without requiring any prior key exchange: https://dl.acm.org/doi/10.1145/3395351.3399420 PDF: https://arxiv.org/pdf/2005.08572
A regular Diffie-Hellman key exchange is not authenticated, meaning that you don't know who you establish a key with. To authenticate this key exchange, you would need some prior security context (e.g., some shared secret or TLS certificates).
In pairing scenarios, you often do not have a prior security context, since the goal is to establish one in the first place. Our approach works without a prior security context by using the physical signal propagation properties of the acoustic signals.
No, because 3DH requires pre-exchanged public keys. For Signal, this key exchange might happen automatically, but you still need to manually compare the fingerprints to achieve authentication.
In former GDR (Eastern Germany) there was a computer show on the radio where programs were broadcasted this way. Listeners could record the programs with tapes an then run them on their computers.
Ah yes, we had something similar in Romania too, only that it was on national television. I got so many games and programs from it. Since they had reception everywhere in the country I was able to record them from the country side or a monastery too.
A technology like this could be very useful for password management. You store the passwords in your phone, and transmit them to the computer you are at through sound. No passwords to sync, no need to have an encrypted cloud location. If using a 3.5 mm male-to-male audio cable rather than speakers, no one in the vicinity can overhear either.
True, if something like that is in the vicinity, not good. But with a cable like suggested, you you make it harder. I am thinking in the workplace.
Edit: Also, communication could be encrypted. The receiver could show a QR code with a one time key, which your phone recognizes and encrypts with.
Not to mention simple analog channels vs complex modern protocol.
Eq. it's unlikely the computer will try to take over your mobile device over an analogue audio channel while there could be viable attacks over an USB connection.
My goodness - I'd heard about trasmitting data using sound, but I also didn't realize its serious security implications.
From the repo description:
> SoniControl is a novel technology for the recognition and masking of acoustic tracking information. The technology helps end-users to protect their privacy.
> Technologies like Google Nearby and Silverpush build upon ultrasonic sounds to exchange information. More and more of our devices communicate via this inaudible communication channel. Every device with a microphone and a speaker is able to send and receive ultrasonic information. The user is usually not aware of this inaudible and hidden data transfer.
> To overcome this gap SoniControl detects ultrasonic activity, notifies the user and blocks the information on demand. Thereby, we want to raise the awareness for this novel technology.
Yeah, also consider there have been no studies to ensure that we aren't destroying our pet's hearing by using these sorts of frequencies. It depends on the dB, but dogs for example have a much longer hearing range and distance than humans.
In the README it mentions that mDNS candidates break things[0]. This should still work if you are in the same LAN though, the mDNS name just needs to be encoded properly? Maybe the author was trying to connect across networks that don't support Multicast?
However mDNS candidates made this even easier! I stumbled upon this pattern and been trying to convince people that real use cases will be unlocked.
Just to clarify - the original intent was to use this only between devices in the same LAN, because this way you don't need a server. Having a server, there are much better ways to transfer files.
I am a big fan of WebRTC. Even if I had a server I would try to use SCTP over DTLS. I have gotten better transfer rates. Not having to worry about HOL blocking and the overhead of TCP.
The idea of computers transferring information through sounds reminds me of badBIOS malware that sends network packets over ultra-high frequency sound waves to bypass airgaps. [1,2]
Thanks for making this lib. We tested it a while back on a project that needed air-gapped data transfer. We ended up doing an animated QR flow but we spent some time with this.
I remember there being a working implementation from chirp.io, with mobile apps and everything. They eventually axed it some years ago and tried selling some sdk to... people? Before getting bought by Sonos. I always missed their service because I used to use it to transfer stuff between my phone and my work computer. I'd be really cool if this could replace chirp for that cause.
This is similar to how Webex room systems communicate using Proximity: ultrasounds emitted by a room device are picked up by the laptops / mobile devices of participants and exchange info on how to join. So as you walk into a room, the Telepresence unit recognizes you and allows you a set of actions, but someone in the room next door won't be able to access.
This project is cool in that it's exchanging an actual SDP, which could be used for any media negotiation.
There are companies already trying to apply this technology, even for payments. Although I don't really see this latest use case, there are other creative applications.
This thing is giving me huge cyber punk vibes!
Nicely executed (the sound itself is pleasant to hear) and fine balance (the sound is used only to avoid a STUN server, the transmission is happening through a standard WebRTC data channel).
This one happened to me just once, so I decided to put it in the README just in case. After that I have never been able to reproduce it, so it might be just a false positive
Maybe interesting for making apps in the lan more resilient:
* No need for port forwarding between network segments
* Bypasses annoying local firewalls and security approval
* On limited hardware you might literally have no Ethernet, wifi, or blue tooth -- could use this instead
* Connectionless -- no zombie sockets tying up the same 'port' when testing client protocols
* Can prevent packet sniffing + having to expose services to every node on the LAN directly. Neat one-time, local aspect to it. Have to be in range of devices
Probably other benefits. Disadvantage is it would be slow and you would have to add support for it.
This would be a great way to set up an IoT device. It is usually quite time consuming to initialize a new device - negotiating bluetooth, downloading the corresponding app, setting up Alexa integrations, etc. If I could just hold my phone playing a sound that tells the device my WiFi password, I could be up and running in a minute or less.
Thank you for the perspective! I remember back in the day I would rig a photo resistor to an Arduino so I could flash instructions to it with an LED without needing to reprogram.
SDP over sound seems pretty brilliant since it removes the weakest link of p2p connecting on a LAN. I'm excited to play with this - I think it could be a good solution for a VR use case I am going to be working on where you want to stream a computer screen to a VR headset with minimal friction.
The old TRS-80 used audio tape to save/load programs.
As a 6th grader, I was amused that there was a "TRS-80 network" setup whereby you could cable a classroom of computers together, set all but one of them to "load from tape", then set the teacher's computer to "save to tape" and it all worked. I thought that was a clever bit of engineering/hackery, actually using sound for file-sharing.
This is super cool and I can see tons of practical applications. I wonder if you could utilize this technology with SOFAR Channels in the ocean to create long range passive data transfers.
I was thinking they had made something similar to the Fldigi suite, software kind of common in the ham radio universe. You can use it to encode arbitrary binary data into a lot of different modulations which exist in the audible range. Send files between computers without any IP networking at all.
https://en.wikipedia.org/wiki/Fldigi#The_Fldigi_Suite