Artificial Speech Recognition

Prim Wong
Super AI Engineer
Published in
5 min readFeb 28, 2021

ASR enables machines to receive, recognize and understand human utterances

Artificial Speech Recognition, or ASR, is the machine learning model that will be able to translate speech to text and be able to recognize and understand human languages.

Image courtesy of https://www.hindipanda.com/speech-recognition-technology/

Overview

  1. Oral
  2. Application
  3. Challenges
  4. History
  5. Model
  6. HMM (Hidden Markov Model)

Oral Languages

Talking! Everyone talk, in order to communicate to other people. We talk in a voice with specific sound wave and we understand each other. In the sound wave, it doesn’t show the specific word. How can the model (computer) understand us? We can recognize the image by using CNN (Convolutional Nueral Network), therefore we use ASR (Automatic Speech Recognition) or “Linguistic model” to recognize, and even understand, human languages.

Communication — image courtesy from https://medium.com/@raihamalik/effective-communication-5321d663ee5a

High Technology and Oral Instructions

We access the technology or social media by typing in with the keyboard, our daily life would be tremendously easier and more convenient and intuitive to use our “voice” to instructed the computer or the high technology devices. Therefore, Artificial Intelligence is highly use in various application and modified and implimented in numerous devices and services.

Application

There are various application of ASR in daily life such as the most popular virtual assistant.

Virtual Assistant such as Google Assistance, Siri and Amazon Alexa.

Virtual Assistant
Google Assistant

Language Learning

Language Learning

Robotics

Robots

Home automation

Home automation

Medical transcription

Medical transcription

Challenges

  1. Microphone Distance
    There are different distances that user will use the microphone which every application needs its own processing processes.
    1.1 Near-field — Official Sound Recording and Reproduction
    1.2 Mid-field — Normal Online Meeting(Speaker sits in front of the computer)
    1.3 Far-field — Large seminar with variety of microphone distances that placed in the hall.
  2. Speech Compression
    Lossy : Reduces files size by removing data
    Lossless : Compression without losing detail
Image courtesy of https://www.bbc.co.uk/bitesize/guides/zqyrq6f/revision/4

3. Interaction Type

History

https://journal.theoneoff.com/trends/the-rise-of-vui

Model

HMM (Hidden Markov Model)

The probabilistic assumption that is based on the previous state or conditions.

Hidden Markov Models

A hidden Markov model is a type of a Markov model for a system with hidden states that generate some observed event. This means that sometimes, the AI has some measurement of the world but no access to the precise state of the world. In these cases, the state of the world is called the hidden state and whatever data the AI has access to are the observations. Here are a few examples for this:

For a robot exploring uncharted territory, the hidden state is its position, and the observation is the data recorded by the robot’s sensors.

In speech recognition, the hidden state is the words that were spoken, and the observation is the audio waveforms.

When measuring user engagement on websites, the hidden state is how engaged the user is, and the observation is the website or app analytics.

For our discussion, we will use the following example. Our AI wants to infer the weather (the hidden state), but it only has access to an indoor camera that records how many people brought umbrellas with them. Here is our sensor model (also called emission model) that represents these probabilities:

Code courtesy of https://cs50.harvard.edu/ai/2020/weeks/2/

from pomegranate import *

# Observation model for each state
sun = DiscreteDistribution({
"umbrella": 0.2,
"no umbrella": 0.8
})

rain = DiscreteDistribution({
"umbrella": 0.9,
"no umbrella": 0.1
})

states = [sun, rain]

# Transition model
transitions = numpy.array(
[[0.8, 0.2], # Tomorrow's predictions if today = sun
[0.3, 0.7]] # Tomorrow's predictions if today = rain
)

# Starting probabilities
starts = numpy.array([0.5, 0.5])

# Create the model
model = HiddenMarkovModel.from_matrix(
transitions, states, starts,
state_names=["sun", "rain"]
)
model.bake()

References :

Cr : Kwanchiva Thangthai, PhD (Speech and Text Understanding, NECTEC)

--

--