Back in about 2000 I remember exchanging emails with a guy who had built a modified renderer for (I think) the Half-Life 1 engine. He was using audio rendering to enable blind players to play first person shooters. My recollection is he was using a series of horizontal scan lines across the display region with different octaves or similar indicating which scan line was being "rendered" and different tones indicating the brightness value along the sweep of each scan line. I think he was primarily using mono audio at the time, but it's been long enough that I can't entirely recall.
The blog post links to an app you can demo where you can practice navigating a maze with your eyes closed using only binaural audio to guide you. Anyone can do it with very little practice!
I would love to spend more time on this in the future and I'd love to consider adopting this vOICe scheme.
This feels really groundbreaking to me. It's like echolocation. Although it also seems to be projecting 3d onto 2d, while true echolocation gives you a 3d picture(?)
There's stories of blind people using echolocation to get around, so why not do a synthetic version of that instead of this made up encoding scheme (as cool as it is).
I.e. model how sound would propagate in an environment and create the sound an echolocator would actually hear reflected back.
What I was getting at is that would be much more complex, because this uses a camera (2d), while echolocation/our ears is 3d when all is said and done. So to do what you're suggesting, you'd first need some way create a 3d model of the world from the camera's 2d image, and then encode into sound.
As I understand it, echolocation gives you a rough 3d model of everything around you, but no indication of flat textures on things. This project would seem to give a 2d image as a sound.