Hacker News new | past | comments | ask | show | jobs | submit login
Recognizing Speech from Smartphone Gyroscopes (2014) (stanford.edu)
90 points by vladmiller on Jan 17, 2016 | hide | past | favorite | 13 comments



This is a fantastic idea, but do web pages really not require special permissions to access gyroscope data?

On Android (like, 4.2 or something) with Firefox, the page and the CSV just displays `null` for alpha, beta and gamma. And the calculated rate is 9.1 samples per second, which seems way too low to extract any speech details even if the data were valid.

This is distinct from the "No device motion data" message presented on my desktop though. So it seems to me that the page sees that a gyroscope is present, and requests data from it, but Firefox is just feeding it null values?

I haven't tried the apk yet.


Re. web pages, the feature is called DeviceOrientation Event and no, the spec doesn't require user permission: https://www.w3.org/TR/orientation-event/


Tried it on Mobile Safari (iOS 9.2) - works. Pretty crazy, there was no need to give permissions.


Chrome on Android shows the proper data - with no special permissions or warnings. It updates at ~57 samples/s.


If you check the paper the authors indeed mention some rate limitations using Chrome. However Firefox provides samples at 200 Hz. Also, every application can access those measurements without any permissions.


As the authors note in the full paper, even a sample rate of 200 Hz only results in capturing audio information for frequencies up to 100 Hz.

If one applies a 100 Hz filter to some samples of human speech (even low-pitched male voices, as the authors of the paper suggest), the result is (unsurprisingly) the sort of "womp womp" that one might expect to hear from a car with giant subwoofers, not anything containing recognizable words.

The authors of the paper admit that they were unable to recover the content of speech, but focused on identifying the speaker. That's a long way from capturing credit card numbers (which is what their promo video implies).

Basically, I'll believe it when I see it. The approach is really interesting, though, and I think could be useful for other things.


Would accelerometers also be usable in a similar way? Presumably magnetometers wouldn't be quite as sensitive to such vibrations.


Seems like if auto rotate is off, this does not work.


2014


Solution: low pass filter gyroscope sensor data before the application can access it. Is there a legitimate reason for an application to need gyroscope data in the range of ~200Hz?


I think this is a legitimate usage for some games, for example, racing games in which you steer via tilt. The bigger issue in this case is not the frequencies visible to the application (the relevant hand motion is much below 200 Hz), but rather the latency of these movements. If possible, a game should be able to respond to a tilt action within the duration of a single frame.

But it's impossible to filter a signal in the time domain without introducing a phase shift (i.e. a delay). As a result, you cannot remove that extra frequency information before sending the data to an application without making sacrifices on latency.


Alternative solution: make the unfiltered gyro data available through an explicit permission.


Perhaps it's possible to use some sort of active cancellation - like what noise-cancelling headphones do - only in this case you would be trying to minimize the correlation between gyroscopic data & microphone data.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: