How to convert your speech voice to text data

BalA VenkatesH
Jul 17, 2018 · 4 min read
speech to text

Now, a day’s people are so busy with their work that nobody wants to spend extra time in texting. Google translation is a great tool where you can translate text by voice. When we use voice as a medium to translate to text, it uses the same technology called speech to text conversion.

First will see, How it will works and convert speech to text data.

The first step in speech recognition is obvious — we need to feed sound waves into a computer.

Everybody knows,the sound is transmitted as waves but the computer knows only numbers. so First think, we need to convert to numbers.Sound waves are one-dimensional. At every moment in time, they have a single value based on the height of the wave. Let’s zoom in on one tiny part of the sound wave and take a look:

To turn this sound wave into numbers, we just record of the height of the wave at equally spaced points:

Sampling wave

This is sampling. It takes a reading of thousand words a second and recording a number representing the height of the sound wave at that point in time.

Lets sample our “Hello” sound wave 16,000 times per second. Here’s the first 100 samples:

Each number represents the amplitude of the sound wave at 1/16000th of a second intervals

Recognizing Characters from Short Sounds

Now that we have our audio in a format that’s easy to process, we will feed it into a deep neural network. The input to the neural network will be 20 millisecond audio chunks. For each little audio slice, it will try to figure out the letter that corresponds the sound currently being spoken.

We’ll use a recurrent neural network — that is, a neural network that has a memory that influences future predictions. That’s because each letter it predicts should affect the likelihood of the next letter it will predict too. For example, if we have said “HEL” so far, it’s very likely we will say “LO” next to finish out the word “Hello”. It’s much less likely that we will say something unpronounceable next like “XYZ”. So having that memory of previous predictions helps the neural network make more accurate predictions going forward.

Wait for a second!

You might be thinking “But what if someone says ‘Hullo’? It’s a valid word. Maybe ‘Hello’ is the wrong transcription!”

Try it out! If your phone is set to American English, try to get your phone’s digital assistant to recognize the world “Hullo.” You can’t! It refuses! It will always understand it as “Hello.”

Not recognizing “Hullo” is a reasonable behavior, but sometimes you’ll find annoying cases where your phone just refuses to understand something valid you are saying. That’s why these speech recognition models are always being retrained with more data to fix these edge cases.

flow of speech to text converter

For a company like Google or Amazon, hundreds of thousands of hours of spoken audio recorded in real-life situations is gold. That’s the single biggest thing that separates their world-class speech recognition system from your hobby system. The whole point of putting Google Now!

So if you are looking for a start-up idea, I wouldn’t recommend trying to build your own speech recognition system to compete with Google cloud speech to text API. Instead, figure out a way to get people to give you recordings of themselves talking for hours and try with deep speech. The data can be your product instead.

In next post, I will write about how to use Google cloud speech API to convert speech to text.

$……………….………… Happy learning…………………………….$

BalA VenkatesH

Written by

I have a passion for understanding things at a fundamental level and Sharing trending technology concepts, ideas, and code. * Aspire to Inspire before I expire*

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade