I wouldn't blame the audio quality. It is more about different humans have diffe...

aswanson · on Sept 14, 2016

Why shouldn't the audio quality factor in? Isn't it easier to understand 44.1 Khz 16 bit CD quality audio than 8khz 8 bit pcm (phone data).

rhizome · on Sept 14, 2016

It might be, but the resource demands of higher fidelity acoustic models slows processing down. 44.1/16 has an order of magnitude greater bitrate than 8/8.

dharma1 · on Sept 14, 2016

I guess the point of this particular data set is to check performance on low quality phone audio

rhizome · on Sept 15, 2016

The base product use case has been to handle phone fidelity for many years. Think: legal dictation, retail digital recording hardware (phones), and medical transcription. Speech-to-text for recordings fed by a Telefunken U-47 is highly niche. :)

dharma1 · on Sept 15, 2016

heh. It's not so much because of the microphones that the data sets are THAT narrow band - it's more phone bandwidth limitations. Even the cheapest electret and MEMS mics have pretty good freq response, far beyond 4khz (nyquist 8khz) this data set uses.

Now that bandwidth is becoming less of an issue, we will be getting less shitty sounding, wider bandwidth phone audio - https://en.wikipedia.org/wiki/Wideband_audio

Though if I had it my way, U87 or U47, or hey even SM7B would be mandatory for all speech recordings :)