Inception started to listening too..

Vishnu ks
techietalks
Published in
2 min readMar 18, 2018

--

When we hear the word Inception, very first thing that comes to our mind is the Inception movie, a scifi thriller. Here, it is an open source image classification model which is developed and maintained by Google. Inception is one of the best image classification model. It is trained on 1000 classes of objects hence it can recognize 1000 types of objects. If required we can re-train the final layer of Inception with our data-set. So it sparked my curiosity and I started getting my hands dirty with Inception. I tried training Inception with my image and my friend’s image. It recognized us so well. I was amazed. Even I tried this with some complex design patterns.The success rate was still more than 90%.

Spectrogram

With my experiments with Inception model, I concluded it can outperform human intelligence in image classification, but not before this last experiment. I collected few pop and classical musics. I generated it’s spectrogram. And then I trained the Inception model with those spectrograms. If you have ever seen a spectrogram you can imagine how complex it will be to find out a pattern from a set of spectrograms. It is almost impossible for a human to do. After training the model I tested the model and amazingly it gave correct predictions. Believe it or not Google’s Inception model can be used for Audio classification too. You just need to follow 3 steps.

  1. Collect the audio files
  2. Generate spectrograms
  3. Train the model with spectrograms

I know you will not believe me unless seeing the results by yourself. So I have created a project in my Github repository. You can clone the code from there and start experimenting. I have done a sample project for recognizing Dr. APJ Abdul Kalam’s voice and Shashi Tharoor. I just pulled 2 speeches of both of them from youtube as mp3 and did some data processing. Project is written in Python. You can get more information from here.

Model predicting Kalam’s voice

Happy coding!!

--

--