2020-11-24
Mainline Android
phones and tablets benefit tremendously from easy, accessible speech-to-text. Want to avoid typing out that long reply? Hit the microphone icon and speak instead. The result is inserted at the cursor position. Easy.
Linux
does not have accessible speech-to-text. Or at least, none of the slimmed-down, minimal Linux
environments I regularly use have accessible speech-to-text. Sure, we’ve had Sphinx
and Kaldi
for ages, but the divide in prediction quality compared to the newer generation of speech recognition tools is huge.
This is changing. Mozilla
’s DeepSpeech
efforts have gradually increased in quality to the point that I’m now able to rely on it for jotting down notes.
Various wrappers are available, but I wanted something utterly turn-key. Something boring that just worked. Something that was as easy to use as the speech-to-text on Android
devices. My solution was to package DeepSpeech
, the latest language models, and some simple pre-processing code into a Docker
container.
A helper script is available to record microphone input, stop when silence is detected (using SoX
), run DeepSpeech
on the recording, and optionally simulate having the keyboard type the transcribed text (using xdotool
).
If the script is mapped to a hotkey, it’s almost as easy to use as Android
speech-to-text!
All code is available here. Pull requests are welcome!