Analytics Vidhya
Published in

Analytics Vidhya

Voice Recognition To Perform Tasks Done On Daily Basis

voice recognition

Speech Recognition :

Speech recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format. Rudimentary speech recognition software has a limited vocabulary of words and phrases, and it may only identify these if they are spoken very clearly

Today, this is done on a computer with ASR (automatic speech recognition) software programs. Many ASR programs require the user to “train” the ASR program to recognize their voice so that it can more accurately convert the speech to text. For example, you could say “open Internet” and the computer would open the Internet Browser.

This not only limited to opening things, we can also dictate it to perform task done on daily basis (done repeatedly) like reading data, pivoting and several task which doesn’t require any change in code other than the input.

To make things work first we need to install few things.

  1. pyttsx3 : It is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline, and is compatible with both Python 2 and 3.
  2. speechRecognition : It is a library for performing speech recognition, with support for several engines and APIs, online and offline. Speech recognition engine/API support:
  • CMU Sphinx (works offline)
  • Google Speech Recognition
  • Google Cloud Speech API
  • Microsoft Bing Voice Recognition
  • Houndify API
  • IBM Speech to Text
  • Snowboy Hotword Detection (works offline)

3. PyAudio : PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library. With PyAudio, you can easily use Python to play and record audio on a variety of platforms, such as GNU/Linux, Microsoft Windows, and Apple Mac OS X / macOS.

4. datetime : This module supplies classes for manipulating dates and times in both simple and complex ways. While date and time arithmetic is supported, the focus of the implementation is on efficient attribute extraction for output formatting and manipulation. For related functionality, see also the time and calendar modules.

5. webbrowser : The webbrowser module provides a high-level interface to allow displaying Web-based documents to users. Under most circumstances, simply calling the open() function from this module will do the right thing. The script webbrowser can be used as a command-line interface for the module. It accepts a URL as the argument.

6. wikipedia : Wikipedia is a Python library that makes it easy to access and parse data from Wikipedia.Search Wikipedia, get article summaries, get data like links and images from a page, and more. Wikipedia wraps the MediaWiki API so you can focus on using Wikipedia data, not getting it.

Complications Faced During Importing Libraries :

During importing the aforementioned modules you’ll definitely face problems specifically for PyAudio because there is no wheel (prebuilt package) for Python 3.7 on Windows (there is one for Python 2.7 and 3.4 up to 3.6) so you need to prepare build environment on your PC to use this package. Perhaps, you need to download the wheel file externally from this “link, search “pyaudio” on this web page and download the wheel according to your python version and system specifications. Now, use pip command to install it on your system for instance if your file is located in your downloads section then use “!pip install /Downloads/filename.whl ” in your IDE or open prompt and type the above command without “!”. This will definitively install this package on your system.

Let’s Start :


Using pyttsx3 :

An application invokes the pyttsx3.init() factory function to get a reference to a pytsx.Engine instance. During construction, the engine initializes a pyttsx3.driver.DriverProxy object responsible for loading a speech engine driver implementation from the module.

Name of the pyttsx3.drivers module to load and use. Defaults to the best available driver for the platform, currently:

  • sapi5 — SAPI5 on Windows
  • nsss — NSSpeechSynthesizer on Mac OS X
  • espeak — eSpeak on every other platform

setProperty(name): Queues a command to set an engine property. The new property value affects all utterances queued after this command.

getProperty(name) : Gets the current value of an engine property. The following property names are valid for all drivers.

rate : Integer speech rate in words per minute. Defaults to 200 word per minute.

voice : String identifier of the active voice.

voices : List of pyttsx3.voice.Voice descriptor objects.

volume : Floating point volume in the range of 0.0 to 1.0 inclusive. Defaults to 1.0.


The above two modules computer and user will help the computer to recognize what the user wants and deliver the dialog accordingly.

The Recognizer class :

All of the magic in SpeechRecognition happens with the recognizer class. The primary purpose of a recognizer instance is, of course, to recognize speech. Each instance comes with a variety of settings and functionality for recognizing speech from an audio source. Each recognizer instance has seven methods for recognizing speech from an audio source using various APIs. These are:

  • recognize_bing() : Microsoft Bing Speech
  • recognize_google() : Google Web Speech API
  • recognize_google_cloud() : Google Cloud Speech - requires instllation of the google-cloud-speech package
  • recognize_houndify(): Houndify by SoundHound
  • recognize_ibm() : IBM Speech to Text
  • recognize_sphinx() : CMU Sphinx - requires installing PocketSphinx
  • recognize_wit() :

greet method will make it greet you.

Now, the last thing is to make use of these modules and make the system run command encoded in main function. This part of the code will decide which module to run on given command.


Here, in command the user module is called which is recognizes the command given by the user and in the command if user mentions ‘open youtube’ or ‘open google’ it would direct the user directly to the assigned web pages.

Similarly, we can use this to do the task done on daily basis like reading the data, pivoting data, saving the file in different format and many other task which doesn’t require change in code other than the input files.

(reading data)

Likewise, other stuff can be performed.

Can Make It More Interactive :

This can be used for other purposes like you can make a talk show just like ‘KBC’ or ‘Who Wants To Be a Millionaire’. You can insert recorded audio too.

For that, first download the audio and import playsound module. use playsound.playsound(‘location/file.mp3’, True) to play the audio. Moreover, you can add Amitabh Bachchan’s famous dialog ‘Aadbhut’ for every correct answer. 😁😁😁😁



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store