My intuition suggests it would make a lot of things easier as you can figure out a lot of geometry with a prominent sound like a car horn but with visual analysis you have to extract signal from way more noise. I'm not a ML engineer though at all, nor even well-read in these topics
I've heard that submarines can now extract information from ambient ocean noise. Car honkings and traffic noise have got to be at least as useful, for the roads. Would require a microphone array to be added to the car.