If you check the paper the authors indeed mention some rate limitations using Chrome. However Firefox provides samples at 200 Hz.
Also, every application can access those measurements without any permissions.
As the authors note in the full paper, even a sample rate of 200 Hz only results in capturing audio information for frequencies up to 100 Hz.
If one applies a 100 Hz filter to some samples of human speech (even low-pitched male voices, as the authors of the paper suggest), the result is (unsurprisingly) the sort of "womp womp" that one might expect to hear from a car with giant subwoofers, not anything containing recognizable words.
The authors of the paper admit that they were unable to recover the content of speech, but focused on identifying the speaker. That's a long way from capturing credit card numbers (which is what their promo video implies).
Basically, I'll believe it when I see it. The approach is really interesting, though, and I think could be useful for other things.