Speech-to-text on Linux

Hal Clark

2020-11-24

Mainline Android phones and tablets benefit tremendously from easy, accessible speech-to-text. Want to avoid typing out that long reply? Hit the microphone icon and speak instead. The result is inserted at the cursor position. Easy.

Linux does not have accessible speech-to-text. Or at least, none of the slimmed-down, minimal Linux environments I regularly use have accessible speech-to-text. Sure, we’ve had Sphinx and Kaldi for ages, but the divide in prediction quality compared to the newer generation of speech recognition tools is huge.

This is changing. Mozilla’s DeepSpeech efforts have gradually increased in quality to the point that I’m now able to rely on it for jotting down notes.

Various wrappers are available, but I wanted something utterly turn-key. Something boring that just worked. Something that was as easy to use as the speech-to-text on Android devices. My solution was to package DeepSpeech, the latest language models, and some simple pre-processing code into a Docker container.

A helper script is available to record microphone input, stop when silence is detected (using SoX), run DeepSpeech on the recording, and optionally simulate having the keyboard type the transcribed text (using xdotool).

If the script is mapped to a hotkey, it’s almost as easy to use as Android speech-to-text!

All code is available here. Pull requests are welcome!