Automatic speech recognition — Experimentations

Ishara Usoof
7 min readJun 8, 2024

In this article, I’m taking you through how I tried out automatic speech recognition in my PC. The beginning is always weak, but it is important to embrace the weakness. In my bachelor’s degree, we had a module called signal and systems. In that module, we learned about how audio signal processing takes place. Also, it means how analog sound waves convert into digital signals, in a way that a machine can store and manipulate. Nowadays we have Google Cloud Speech Recognition, Sphinx, Wit.ai , Microsoft Bing Voice Recognition, Microsoft Azure Speech,Houndify, IBM Speech to Text, whisper and Whisper API (in Openai). Let’s try the ones that doesn’t need any APIs.

Photo by Soundtrap on Unsplash

SECTION 1 — Educate myself about automatic speech recognition

It’s important to have quick cookies of terms to know around speech recognition, afterward, it will be easy to absorb what happens in the experimentations.

The first basic thing to know is the process of audio signal processing.

  1. Sampling: The analog sound wave is measured as frequency
  2. Quantization: Breaking down the signal wave into equal bits
  3. Encoding: After slicing it, for each bit, we give an encoded label for the nearest signal frequency.

--

--

Ishara Usoof

I enjoy doing experiments in the field of Machine Learning. Especially with Kaggle and AWS.