Image Recognition and Speech Recognition — Machine Learning Applications in Real World

venkat k
3 min readOct 16, 2019

--

Machine learning uses iterative algorithms to learn from data and allow the computer to find information, hidden values ​​that are not programmed explicitly. The recurring aspect of machine learning is important because when these models are exposed to new data, they can adapt independently. Machine learning systems can quickly apply knowledge and training from large datasets to perform facial recognition, speech recognition, and more.

Image recognition

One of the most common uses of machine learning is image recognition. There are many situations in which you classify an object as a digital image. For digital images, the dimensions describe the results of each pixel in the image.

In the case of black and white images, the intensity of each pixel serves as a measure. So if the black and white image contains N * N pixels, then the total number of pixels and hence the dimension is N2.

In the color image, each pixel is considered to provide 3 measurements for the intensities of the 3 main color components, i.e. RGB. So the N * N color image has 3 N2 dimensions.

For face recognition — categories may not face vs. face. Each person may have a specific category in their database of many people.
For Character Recognition — The part we write can be divided into smaller pictures, each with the same letter. Categories may contain 26 letters, 10 digits, and some special characters of the English alphabet.
Google is using the image recognition system machine learning technology in its products such as Google Photos, Google Search, Google Drive.

Speech Recognition

Speech Recognition (SR) is the translation of spoken words into text. It is also known as “Automatic Speech Recognition” (ASR), “Computer Speech Recognition” or “Speech to Text” (STT).

In speech recognition, the software app recognizes spoken words. The dimensions in this application may be a set of numbers that represent the speech code. We can divide the signal into parts consisting of different words or phonemes. In each section, we can represent the speech signal by the intensity or power of the different time-frequency bands.

Although the details of the signal representation are outside the scope of this program, we can represent the signal through a set of real values.

Speech recognition applications have voice user interfaces. Voice user interfaces include voice dialing, call routing, and control of domotic tools. It can also be used as general data entry, structured document preparation, speech-to-text processing, and aircraft.

Using machine learning, Baidu’s research and development department have created a tool called Deep Voice — a deep neural network that can produce artificial voices that are difficult to distinguish from a true human voice. The network can “learn” features in rhythm, voice, pronunciation, and tone to create the speaker’s voice. Additionally, Google uses machine learning for other voice-related products and translations such as Google Translate, Google Text to Speech, Google Assistant.

In addition to applications in audio recognition and image recognition, machine learning is also applied in fields such as medical analysis; Formation, classification; Data analysis and assessment in areas such as healthcare, financial services, transport, marketing & sales… In the near future, equipment and applications based on machine learning technology will be found in all aspects of human life.

Want to know more about AI services then have a free visit for USM business systems

--

--

venkat k

We USM Business provide unique edge solutions related to AI services, Ml Services, Data Quality Solutions, Deep learning services & Permanent staffing solutions