Serverless, peer-to-peer, local file sharing through sound

vel0city · on Sept 25, 2020

At first I thought the file sharing itself was over sound, but the sound is just to negotiate details of the WebRTC session which is then actually used to transmit the data. Neat and handy.

I was thinking they had made something similar to the Fldigi suite, software kind of common in the ham radio universe. You can use it to encode arbitrary binary data into a lot of different modulations which exist in the audible range. Send files between computers without any IP networking at all.

https://en.wikipedia.org/wiki/Fldigi#The_Fldigi_Suite

_fx6v · on Sept 25, 2020

There does seem to be a few projects related to this on GitHub if you browse the tag. Enough so there’s firewall apps to counter non typical network telemetry. Allegedly, google, amazon, etc are using this. Interesting one way or another.

See here: https://github.com/fhstp/SoniControl

Seems massively not talked about given the vector.

FridgeSeal · on Sept 25, 2020

I wonder if it’s related to ad-tech that was embedding “sonic markers” in tv/YouTube ads that were to be picked up by ad-tech SDK’s embedded in apps.

If someone had granted microphone access, the SDK could wait until it hears a signal embedded in an ad, and then pass back whatever data it had accumulated.

kangaroozach · on Sept 25, 2020

Shazamified cookies?

sebmellen · on Sept 25, 2020

Watch out, every time someone speaks a startup idea, one pops into being.

In a few years, we'll all be hearing about hot new "Shazam for Adtech" startups.

mercora · on Sept 25, 2020

the moment i saw something along the lines of "use Shazam on this ad spot to find out more" i was pretty certain where they were heading to.

chrisweekly · on Sept 25, 2020

Sounds like something Alexa might listen for.

RandallBrown · on Sept 25, 2020

I'm pretty sure Alexa already does this so it doesn't activate from Alexa commercials.

(Although that does still happen to me occasionally.)

dmos62 · on Sept 25, 2020

What throughput can you get with audible sound?

jlokier · on Sept 25, 2020

I estimate up to somewhere in the range 100kbit/s to 500kbit/s.

It would take a sophisticated modulation scheme, like a modem. The sound would be similar to white noise. Software that's getting 20bit/s or whatever is using old-school tone-based modulation, but it is quite robust in the presence of other sounds.

That assumes:

- Frequency response ~20kHz.

- Audio ADC/DAC sample rates configurable well in excess of the Nuqyist limit of the frequency response range of the speakers and mics.

- Good signal to noise (~90dB), which equates to ~15 bits at max volume range.

- Not playing at max volume, but a reasonable level ~18dB down, so ~12 bits.

- A very quiet environment, or one where the background sound is very predictable.

- Stereo laptop speakers and stereo mics, to make a 2x2 MIMO spatially modulated channel.

- Good channel separation (~80dB).

- Great linearity, which might be optimistic.

_fx6v · on Sept 26, 2020

Because of this I just figured out why white noise is truly random and can’t communicate any data. I wasn’t thinking of it intuitively that way before even knowing conceptually of randomness and entropy and Cosmic background radiation.

Thanks!

Interesting...

If white noise “exists” and is constantly being received does it have an energetic value? Hmm.

_fx6v · on Sept 26, 2020

A lot of the attributes you describe are the attributes that humans have for hearing and perception as a range, no?

Interesting if there is a particular reason our bodies converged to this state because of a link to sound and resiliency to information communication. I need to think on that one.

jlokier · on Sept 26, 2020

Some of the attributes are from MacBook Pro specs :-)

It's not a coincidence that human range is similar, as the Mac was designed for humans.

amatecha · on Sept 25, 2020

Thinking of dialup networking, I guess somewhere around 56 kilobits per second is certainly feasible? hehe :)

mongol · on Sept 25, 2020

Probably more. The phone network had a bandwidth of 300-3300 Hz I think. Audible spectrum is larger

fwip · on Sept 25, 2020

True, but roundtripping speaker -> microphone is lossier than a cable carrying electrical signals.

NetBeck · on Sept 25, 2020

I was thinking the same. Minimodem[1] turns devices into FSK communication devices. Analogous to transferring information via dial-up.

[1] https://github.com/kamalmostafa/minimodem

rubyfan · on Sept 25, 2020

I was originally thinking the post was more like what you shared.

In theory minimodem could be put into web assembly right?

ggerganov · on Sept 25, 2020

For the FSK modulation scheme that I use in wave-share, I manage to achieve 8-16 bytes / s for a reasonable in-room distances and regular surrounding noise. It depends also on the speakers / microphone quality, but overall if you want to have a reliable connection between air-gapped devices, the speed is nowhere near comparable to what modems can achieve. Or at least this is my experience.

_fx6v · on Sept 26, 2020

I need to study more. This brings up interesting questions for me on ultrasound spectrum and others related to information loss. I hear bass more clearly over distances that are noisy but I hear high pitches over distances without noise. Speakers have filters on them to prevent transmission in particular frequencies. I’m unsure of what quality is related to on mics.

Cool stuff to understand. Thanks. Now I must learn more.

Also, weird: https://en.m.wikipedia.org/wiki/Parametric_array

How do I reason about the range of frequencies that air itself has a threshold of vibration? There’s a particular range within that type of matter itself I’d imagine. Sound is only a human description of a waveform perturbation within the air... I’d imagine if the air vibrates too much it could explode or some other effects.

_fx6v · on Sept 25, 2020

Wouldn’t it be based on a few factors? Rate, volume, etc? Not too different than radio at some level? I could be totally wrong. Guessing here.

If in audible range a faster song would have more data capability - techno vs hip hop? Ha.

83457 · on Sept 25, 2020

Radio waves are electromagnetic while sound waves are a physical wave. I assume that having to actually move the air molecules would require a lot more power and time than electrons in an antenna as well as having to deal with air density and environmental sound interference. I would be interested in knowing the limit of bandwidth here though.

It also brings up an interesting thought I never had before. With early home computers, and I assume before that as well, you would place a phone handset onto a coupler to receive data. It is odd to think now that there is a little air in there that is transferring sound between the handset speaker and the coupler microphone. I assume that without that little bit of air it wouldn't function.

_fx6v · on Sept 26, 2020

How are electromagnetic waves not physical?

Also if air was such an issue then the dynamics of a song being played live would be of series inconsistent patterns leading to error in rate based on environmental factors. Rate would be whatever air density allows but whether you received that rate is different.

Song recorded vs song played vs song heard when recorded vs song heard when played over speakers.

The waveform of both should be highly similar in a way that wouldn’t be considered error besides amplitude or echoes from acoustics.

So it’s a distributed message sensing problem?

In my perspective I just saw the waveform of a techno song == the rate of data being communicated.

120 bpm is 120 bpm. Loss over distance is it’s own factor based on ability to ack.

83457 · on Sept 26, 2020

Physical waves need a medium for transfer and are affected by the density and properties of the material(s) involved.

_fx6v · on Sept 26, 2020

What isn’t a medium? A vacuum?

Physical waves depend on a fluid?

EM seems to be affected by a medium/fluid.

Wonder where the distinctions are more poignant. Maybe degree of affects?

devthrowawy · on Sept 25, 2020

All you need to know is the channel bandwidth (say 20-50khz) and noise floor power relative to rx power to give maximum data transfer rate (Shannon-Hartle theorem). Capacity = (Bandwidth) * log2 (1 + recieve_power / noise_floor_power). There is also a lot of em noise going on - just like sound. I would expect sound to be many orders of magnitude less power effecient/bit than EM - but the principles basically the same

_fx6v · on Sept 26, 2020

Thanks for adding color to the thoughts there. Appreciate it. Now I must go figure out intuitively why log is baked into the universe as a mathematical property...

vel0city · on Sept 25, 2020

Not a lot. I'm not sure of where these things usually top out but at least for the software linked above the normal codecs are often around 20 baud. They're intended to be used with high-frequency radio transmissions, more so for keyboard to keyboard typing interfaces rather than arbitrary digital data but they can easily be used for any arbitrary data.

I've used it on a few occasions to send out a text email or some other kind of small text document. Sending out something like a several megabyte image would be very slow.

Jimmc414 · on Sept 25, 2020

I would expect most technologies to be capped at 56k otherwise using a phone line for transmissions would circumvent FCC rules on transmission power.

jlokier · on Sept 25, 2020

Transmission power hasn't been the cap fo a long time.

The analogue phone line signal is digitised at the exchange, and the digital channel is explicitly 64kbit/s.

Modems can only do "56k" by co-operating with the digital system; one of the modems actually has a digital ISDN connection.

Without that ISDN connection, analogue modems reach up to 33.6k. Theoretically they can do more (but never more than 64k), but in practice that was the last standard produced prior to the 56k, semi-analogue-semi-digital standard.

fputz · on Sept 25, 2020

Acoustic communication is quite nice for this kind of ad-hoc provisioning, because it doesn't require any special hardware or prior pairing.

One security issue is that if you do this in public, an attacker could initiate malicious connections and remove your own transmission via signal cancellation. It's hard to notice that when using an inaudible frequency band. To defend against this, our research team has described a method to protect the integrity of acoustic communication on the physical layer, without requiring any prior key exchange: https://dl.acm.org/doi/10.1145/3395351.3399420 PDF: https://arxiv.org/pdf/2005.08572

koolba · on Sept 25, 2020

Wasn’t vanilla Diffie-Hellman designed for this exact yelling-in-a-crowded-room use case?

fputz · on Sept 25, 2020

A regular Diffie-Hellman key exchange is not authenticated, meaning that you don't know who you establish a key with. To authenticate this key exchange, you would need some prior security context (e.g., some shared secret or TLS certificates).

In pairing scenarios, you often do not have a prior security context, since the goal is to establish one in the first place. Our approach works without a prior security context by using the physical signal propagation properties of the acoustic signals.

arianvanp · on Sept 25, 2020

Doesn't 3DH solve this? E.g. what's used in signal?

fputz · on Sept 25, 2020

No, because 3DH requires pre-exchanged public keys. For Signal, this key exchange might happen automatically, but you still need to manually compare the fingerprints to achieve authentication.

Thorentis · on Sept 25, 2020

We've gone full circle. This is basically dial-up, but using mic/speakers instead of a phone.

bildung · on Sept 25, 2020

In former GDR (Eastern Germany) there was a computer show on the radio where programs were broadcasted this way. Listeners could record the programs with tapes an then run them on their computers.

urtie · on Sept 25, 2020

Sounds very much like BASICODE[1] which was used to broadcast e.g. tax return software on national public radio in the Netherlands back in the 1980's

[1] https://en.wikipedia.org/wiki/BASICODE

Shaddox · on Sept 25, 2020

Ah yes, we had something similar in Romania too, only that it was on national television. I got so many games and programs from it. Since they had reception everywhere in the country I was able to record them from the country side or a monastery too.

aembleton · on Sept 25, 2020

We had it in the 80s using a BBC Micro tuned into the Ceefax broadcast. Here's some more information about the expansion unit: https://www.wikiwand.com/en/BBC_Micro_expansion_unit#/Telete...

Here's a video showing it: https://youtu.be/ctxZWEVJ1S0?t=147

This is a great nostalgia trip!

krzyk · on Sept 25, 2020

Same in Poland, good memories (not such good government though)

em3rgent0rdr · on Sept 25, 2020

I get so nostalgic hearing the dial-up-modem-like sounds from their video.

cblconfederate · on Sept 25, 2020

don't be https://www.youtube.com/watch?v=r6UR_3ZieE4

0df8dkdf · on Sept 25, 2020

Yeah same with LISP and revival of functional and meta programming. History may not repeat but it sure rhyme.

mongol · on Sept 25, 2020

A technology like this could be very useful for password management. You store the passwords in your phone, and transmit them to the computer you are at through sound. No passwords to sync, no need to have an encrypted cloud location. If using a 3.5 mm male-to-male audio cable rather than speakers, no one in the vicinity can overhear either.

summarity · on Sept 25, 2020

And have it be picked up by the nearest GoogleAlexaPod?

mongol · on Sept 25, 2020

True, if something like that is in the vicinity, not good. But with a cable like suggested, you you make it harder. I am thinking in the workplace. Edit: Also, communication could be encrypted. The receiver could show a QR code with a one time key, which your phone recognizes and encrypts with.

admax88q · on Sept 25, 2020

If you're going to use a cable, might as well just use usb so you have a real data connection.

mongol · on Sept 25, 2020

Yes but it is not always ok (policywise) to attach USB devices to corporate equipment.

m4rtink · on Sept 25, 2020

Not to mention simple analog channels vs complex modern protocol.

Eq. it's unlikely the computer will try to take over your mobile device over an analogue audio channel while there could be viable attacks over an USB connection.

kaoD · on Sept 25, 2020

With Diffie-Hellman you could even transmit through air (speakers-mic).

jxy · on Sept 25, 2020

Nice. Password manager or secure keychain over the air. This is going to be a fun raspberry pi project.

kaoD · on Sept 25, 2020

Want to team up or share ideas? It's been years since my last non-solo fun project.

My email is el<mynick> at gmail

_fx6v · on Sept 25, 2020

Wow - didn’t realize the implications of this and needing a firewall: https://github.com/fhstp/SoniControl

lioeters · on Sept 25, 2020

My goodness - I'd heard about trasmitting data using sound, but I also didn't realize its serious security implications.

From the repo description:

> SoniControl is a novel technology for the recognition and masking of acoustic tracking information. The technology helps end-users to protect their privacy.

> Technologies like Google Nearby and Silverpush build upon ultrasonic sounds to exchange information. More and more of our devices communicate via this inaudible communication channel. Every device with a microphone and a speaker is able to send and receive ultrasonic information. The user is usually not aware of this inaudible and hidden data transfer.

> To overcome this gap SoniControl detects ultrasonic activity, notifies the user and blocks the information on demand. Thereby, we want to raise the awareness for this novel technology.

fao_ · on Sept 25, 2020

Yeah, also consider there have been no studies to ensure that we aren't destroying our pet's hearing by using these sorts of frequencies. It depends on the dB, but dogs for example have a much longer hearing range and distance than humans.

_fx6v · on Sept 25, 2020

Yeah this is news to me and I have pets that I’m concerned of and also want to utilize a firewall. I’ll need to research frequency ranges and safety.

hoseja · on Sept 25, 2020

Well, on the upside maybe we'll get better ultrasonic hardware. Easy listening to bats!

Sean-Der · on Sept 25, 2020

In the README it mentions that mDNS candidates break things[0]. This should still work if you are in the same LAN though, the mDNS name just needs to be encoded properly? Maybe the author was trying to connect across networks that don't support Multicast?

However mDNS candidates made this even easier! I stumbled upon this pattern and been trying to convince people that real use cases will be unlocked.

* https://github.com/pion/offline-browser-communication

* https://sean-der.github.io/webrtc-uri/draft-seaduboi-webrtc-...

[0] https://tools.ietf.org/html/draft-mdns-ice-candidates-00

ggerganov · on Sept 25, 2020

Just to clarify - the original intent was to use this only between devices in the same LAN, because this way you don't need a server. Having a server, there are much better ways to transfer files.

Sean-Der · on Sept 25, 2020

Really great work on this!

I am a big fan of WebRTC. Even if I had a server I would try to use SCTP over DTLS. I have gotten better transfer rates. Not having to worry about HOL blocking and the overhead of TCP.

em3rgent0rdr · on Sept 25, 2020

The idea of computers transferring information through sounds reminds me of badBIOS malware that sends network packets over ultra-high frequency sound waves to bypass airgaps. [1,2]

[1] https://arstechnica.com/information-technology/2013/10/meet-... [2] https://en.wikipedia.org/wiki/BadBIOS

brian-armstrong · on Sept 25, 2020

The idea of using this for WebRTC signaling is cool :)

Shameless plug: If you like this you might also like https://quiet.github.io/quiet-js/

social_quotient · on Sept 25, 2020

Thanks for making this lib. We tested it a while back on a project that needed air-gapped data transfer. We ended up doing an animated QR flow but we spent some time with this.

f1refly · on Sept 25, 2020

I remember there being a working implementation from chirp.io, with mobile apps and everything. They eventually axed it some years ago and tried selling some sdk to... people? Before getting bought by Sonos. I always missed their service because I used to use it to transfer stuff between my phone and my work computer. I'd be really cool if this could replace chirp for that cause.

_fx6v · on Sept 25, 2020

Seems like they sold to Sonos. It redirects there.

athenot · on Sept 25, 2020

This is similar to how Webex room systems communicate using Proximity: ultrasounds emitted by a room device are picked up by the laptops / mobile devices of participants and exchange info on how to join. So as you walk into a room, the Telepresence unit recognizes you and allows you a set of actions, but someone in the room next door won't be able to access.

This project is cool in that it's exchanging an actual SDP, which could be used for any media negotiation.

whoisjuan · on Sept 25, 2020

Reminds a little bit of SlickLogin which provided user authentication via sound. Google bought, killed it and the concept completely disappeared.

I’m wondering if it was unreliable from a security perspective, but it was a super cool concept.

https://techcrunch.com/2014/02/16/google-acquires-slicklogin...

_fx6v · on Sept 25, 2020

Google Nearby seems to use it

pantulis · on Sept 25, 2020

There are companies already trying to apply this technology, even for payments. Although I don't really see this latest use case, there are other creative applications.

https://lisnr.com

pantulis · on Sept 25, 2020

I've seen some demos and it kind of impress watching two mobile devices communicating while being in airplane mode :P

yuchi · on Sept 25, 2020

This thing is giving me huge cyber punk vibes! Nicely executed (the sound itself is pleasant to hear) and fine balance (the sound is used only to avoid a STUN server, the transmission is happening through a standard WebRTC data channel).

ggerganov · on Sept 25, 2020

Thanks! I did play quite some time with the frequencies to make the transmission pleasant for the ear :)

yuchi · on Sept 26, 2020

It’s really great! Nice project!

ignoramous · on Sept 25, 2020

> On mobile, using Firefox, the page can remain running in the background even after closing the tab

Yikes! Sounds like a privacy nightmare if website can still be running regardless of whether the tab is dismissed or not.

ggerganov · on Sept 25, 2020

This one happened to me just once, so I decided to put it in the README just in case. After that I have never been able to reproduce it, so it might be just a false positive

Uptrenda · on Sept 25, 2020

Maybe interesting for making apps in the lan more resilient:

* No need for port forwarding between network segments

* Bypasses annoying local firewalls and security approval

* On limited hardware you might literally have no Ethernet, wifi, or blue tooth -- could use this instead

* Connectionless -- no zombie sockets tying up the same 'port' when testing client protocols

* Can prevent packet sniffing + having to expose services to every node on the LAN directly. Neat one-time, local aspect to it. Have to be in range of devices

Probably other benefits. Disadvantage is it would be slow and you would have to add support for it.

primitivesuave · on Sept 25, 2020

This would be a great way to set up an IoT device. It is usually quite time consuming to initialize a new device - negotiating bluetooth, downloading the corresponding app, setting up Alexa integrations, etc. If I could just hold my phone playing a sound that tells the device my WiFi password, I could be up and running in a minute or less.

emidoots · on Sept 25, 2020

Many Chinese WiFi security cameras sold on Amazon do this today. Namely Yi Home security cameras.

Others use QR bar-codes, which I think is better for security cameras - but nonetheless the Yi Home ones do it through sound exactly as you described.

primitivesuave · on Sept 25, 2020

Thank you for the perspective! I remember back in the day I would rig a photo resistor to an Arduino so I could flash instructions to it with an LED without needing to reprogram.

gfodor · on Sept 25, 2020

SDP over sound seems pretty brilliant since it removes the weakest link of p2p connecting on a LAN. I'm excited to play with this - I think it could be a good solution for a VR use case I am going to be working on where you want to stream a computer screen to a VR headset with minimal friction.

Sean-Der · on Sept 25, 2020

Do you mind expanding on the `weakest link of P2P connection` on a LAN? I am really interested.

I was able to get stuff connecting without signaling via mDNS, and I am working on a RFC now to grow on that idea!

* https://github.com/pion/offline-browser-communication

* https://sean-der.github.io/webrtc-uri/draft-seaduboi-webrtc-...

_fx6v · on Sept 25, 2020

I’m investigating similar things. Might just rely on a DHT and routing algorithm - I like Yggdrasil’s algo. https://yggdrasil-network.github.io/

gfodor · on Sept 25, 2020

I was referring from a usability standpoint - “go to this URL and hold your headset near your speakers” is kind of the best case scenario.

_fx6v · on Sept 25, 2020

Probably don’t have to be too close considering range. Sadly and also cool.

boomlinde · on Sept 25, 2020

Within a local network segment I there are less involved solutions to the discovery problem, such as link-local multicast.

sokoloff · on Sept 25, 2020

The old TRS-80 used audio tape to save/load programs.

As a 6th grader, I was amused that there was a "TRS-80 network" setup whereby you could cable a classroom of computers together, set all but one of them to "load from tape", then set the teacher's computer to "save to tape" and it all worked. I thought that was a clever bit of engineering/hackery, actually using sound for file-sharing.

http://www.trs-80.org/network-1-controller/

cblconfederate · on Sept 25, 2020

But does it work in the cloud?

https://asa.scitation.org/doi/10.1121/1.3619789

EGreg · on Sept 25, 2020

What about the QR codes approach

https://aweirdimagination.net/2020/06/14/serverless-webrtc/

barakados · on Sept 26, 2020

This is super cool and I can see tons of practical applications. I wonder if you could utilize this technology with SOFAR Channels in the ocean to create long range passive data transfers.

kebman · on Sept 25, 2020

Feels like the 1980's all over again. :)

paxys · on Sept 25, 2020

Neat idea. IIRC Zoom does something similar to negotiate video sharing in conference room setups.

warkdarrior · on Sept 25, 2020

Couldn't this be done over BLE?