Persisting Through Adversity
Many important projects have had great setbacks, but great minds and workers have persevered to create great successes. Most of the important machine learning projects have also encountered these setbacks and have overcome to create amazingly useful products. One visible example of this is computer speech recognition. This is a field that was given serious research as long ago as the 1970’s but has only produced widespread accurate results since 2012.
Speech recognition is a vital tool that we currently expect in all of our smart phones and most of our operating systems. However, this technology went through many difficulties in its creation and improvement. Both the wikipedia page and this article describe the history and some of the issues that had to be overcome to create accurate speech recognition. Some of the most difficult problems to tackle included lack of training data and lack of computing power. Also, creating the systems to used unsupervised learning was a difficulty. Overcoming these difficulties was essential to create the accurate speech recognition we enjoy now.
One of the main problems of a lack of computing power was easily solved in time by other computer engineers outside of the sphere of machine learning. This was necessary because many of the machine learning algorithms using deep neural networks would not fit in the memory of the system. This was solved through the progression of Moore’s law on computation power and storage as well as through the use of clusters of machines to do work in parallel.
However, a more difficult problem to overcome was a lack of training data and the need for unsupervised learning. While we have an overflow of data today, there is never too much data for a machine learning problem as complex as speech recognition. Luckily, thanks mainly to the internet, there are now large quantities of everyday human speech. Also, while unsupervised learning techniques may be more difficult than supervised learning, it is rather impossible to conduct supervised learning on such expansive data as the human language. Unsupervised learning has been successful at training the data though and this difficulty has been worked out.
These days, the speech recognition found in our phones is mainly based from Hidden Markov models, which are statistical models of the signals that speech creates. These models are used along with deep neural networks to accurately model the complexity of human speech. Although this is a relatively new approach, it is now the standard in all commercial speech recognition systems.
Though speech recognition has had difficulties in the past, engineers and researchers have overcome all obstacles to create a pivotal part of our daily lives.