Build Iron Man’s JARVIS using python

Published in

hackerdawn

7 min readApr 21, 2021

We have witnessed the coolest stuff that JARVIS does in Iron Man movies. Wouldn’t it be a marvelous idea to create a JARVIS for ourselves which will manage our day-to-day tasks just by taking voice commands from us?

What will we build?

We will a JARVIS for ourselves which will talk to us solely through voice, taking our commands and following them. Let’s talk about the exact tasks which JARVIS will be able to do for us.

Play music

To play music, we will utter:

‘play music’

and JARVIS will play a random song from the songs present in our music directory. This will help us relax when we are feeling exhausted.

Open Google Chrome

To open Google Chrome, we will say:

‘open google chrome’.

This will trigger JARVIS to open Google Chrome for us. We can then browse our favorite websites from there.

Open VS code

In the mood to write some code? Sure. Just say:

‘open vs code’

or ‘i want to write code’ and JARVIS will fire up Visual Studio Code for you.

Take a note

Want to take notes of items that are of high importance to you? JARVIS has got you covered here. Say:

‘take a note’

and JARVIS will further ask you the note you want to write. Then utter the note you want to write and it will be added to your notes file.

Hear a Joke

In a light mood and want to keep up with the latest technology jokes? JARVIS is here to help you! Just utter the word:

‘tell me a joke’

and JARVIS will crack one right there for you.

Get Weather Info

Do you want to know what’s up with the weather? Say:

‘how is the weather’.

And then, JARVIS will ask you the city name for which you want to know the weather. Just utter the city name, for example, ‘Seattle’ and you will hear JARVIS telling you the current weather details including temperature, humidity, and pressure.

Shutdown System

Done with the day’s work? Say:

‘shutdown in 12 minutes’

and JARVIS will schedule a shutdown for the system in 12 minutes.

Excited to build JARVIS? In the next section, we will write the code to build JARVIS.

Building JARVIS

To build JARVIS, we will require the following libraries: os, subprocess, random, datetime, requests, pyjokes, speech_recognition, gtts, and playsound. If you don’t have any of these libraries install on your system, you can run pip install ‘PackageName’ to install them.

Importing Libraries

Let’s import the libraries required. We will also discuss each of these libraries separately.

import os
import subprocess
import random
import datetime
import requests 
import pyjokes
import speech_recognition
import gtts
import playsound

Along with importing these libraries, we also need to install some additional dependencies. Just run the following commands inside your terminal to install the additional dependencies.

sudo apt-get install python3-pyaudio
sudo apt-get install ffmpeg libavcodec-extra

os: It is an in-built module that provides a way of using operating system-dependent functionality, enabling us to form a link with the operating system.

subprocess: This module allows us to run programs by spawning individual processes for them.

random: This module helps us create random numbers in a specified range. In this story, we are going to need them in one place.

datetime: The datetime module provides us with classes for manipulating dates and times. For example, we can get the current timestamp using this module.

requests: The requests module has several built-in methods to make HTTP requests to a specified URL. We can use the HTTP methods GET, POST, PUT, PATCH, and HEAD with the requests. An HTTP request enables us to retrieve data from a specified URL as well as push data to a server.

pyjokes: The pyjokes library lets us generate one-liner programming jokes. A new joke is generated every time. As we want JARVIS to generate programming jokes for us, this library will come in handy.

speech_recognition: This library lets us perform speech recognition, with support for several engines and APIs both online and offline. Building JARVIS involves recognizing the user voice command and the speech_recoginition library will ease out doing this task.

gtts: The gtts (Google Text-to-Speech) is a library and CLI tool to interface with Google Translate’s text-to-speech API. Using gtts, we can convert a particular piece of text to speech and save it as an mp3 or wav file.

playsound: It is a simple package to play a wav or mp3 file. The library is purely written in python and enables us to do a simple playback.

Taking User Voice Command

We need a mechanism using which we can listen to what the user says. For this purpose, we’ll use the speech_recognition library. We’ll define a function named recordCommand to take the user command. We will use the default microphone as the audio source to listen for the first phase the user says and extract it into audio data. The pause_threshold here is set to 1.2 seconds which means that if the user gives a silence of length 1.2 seconds while speaking, it will be registered as the end of the phrase. Then, the speech will be recognized using Google Speech Recognition and returned as text in lowercase.

def recordCommand():
    sr = speech_recognition 
    r = sr.Recognizer()
     
    with sr.Microphone() as source:  
        print("Listening...")
        r.pause_threshold = 1.2
        audio = r.listen(source)
  
    try:
        print("Recognizing...")   
        intent = r.recognize_google(audio, language ='en-in')
        print(f'What I heard: {intent}\n')
  
    except Exception as e:
        print(e)   
        replyToUser("Sorry, didn't understand that.")
        talk()
        return None
     
    return intent.lower()

Making JARVIS speak

We want JARVIS to interact with us via voice. To make this possible, we create the functions named replyToUser and talk. The function replyToUser will convert the text response to voice and talk will play the voice. Using, gTTS, the text argument will be converted into speech and stored as ‘talk.mp3’. The playsound function will play the stored audio file (‘talk.mp3’) containing the speech.

def replyToUser(text):
    tts = gtts.gTTS(text,lang='en',slow='True')
    tts.save('talk.mp3')
    
def talk():
    playsound.playsound('talk.mp3')

Playing music

We will record the user’s command first using the recordCommand function. If the command given is not None, we will start matching the command with the list of possible tasks that JARVIS can perform for us. If the phrase ‘play music’ is present in the intent (command we spoke out), JARVIS will navigate to the music directory and then pick a random song among the songs present in the music directory. JARVIS will make use of the randint function from the random library to do this. Ensure that you have a created a music directory filled up with your favorite songs before you ask JARVIS to ‘play music’.

intent = recordCommand()if(intent==None):
    replyToUser('Sorry, could not hear anything')
    talk()
    continueif 'play music' in intent or 'play a song' in intent:
    music_dir = "path to music directory"
    songs = os.listdir(music_dir)   
    os.system('xdg-open ' + os.path.join(music_dir, songs[random.randint(0,len(songs)-1)]))

Opening Google

If the phrase ‘google chrome’ or ‘browser’ is present in the intent (command you spoke out), JARVIS will start Google Chrome. It will do this by using the call function of the subprocess library. This will run Google Chrome by creating a new process for it which is similar to how we spin up Google Chrome from the command line.

if 'google chrome' in intent or 'browser' in intent:
    subprocess.call('google-chrome')

Opening VS code

If the phrase ‘vs code’ or ‘write code’ is present in the intent, Visual Studio Code will be run. This will again be done by creating a new process for Visual Studio Code.

if 'vs code' in intent or 'write code' in intent:
    subprocess.call('code')

Taking a Note

If the intent contains the phrase ‘take a note’, JARVIS will additionally ask us to dictate the note that we want to be written. Once we dictate the note, JARVIS opens up our notes.txt file and adds the note to it along with the current timestamp (generated using the datetime.now function). It then closes the notes file.

if 'take a note' in intent:
    replyToUser('What should I write?')
    talk()
    note_text = recordCommand()
    if(note_text!=None):
        f = open('notes.txt','a')
        timestamp = datetime.datetime.now().strftime("%H:%M:%S")
        f.write(timestamp + '\n')
        note = note_text + '\n\n'
        f.write(note)
        f.close()

Getting Weather Info

If the phrase ‘weather’ is present in the intent, JARVIS further asks us to tell the city for which we need weather information. Then, it makes a request to the URL ‘http://api.openweathermap.org/data/2.5/weather’ using the city name and the API key. The temperate, pressure, humidity, and description are extracted from the API response. A single string is constructed using all these parameters. Then, the replyToUser and talk functions are used by JARVIS to speak out this single string of weather information to the user. If the city name is not present in the API provider’s database, JARVIS speaks ‘Sorry, could not find the city’.

if 'weather' in intent:
    replyToUser('Which city')
    talk()
    city = recordCommand()
    apiKey = 'your api key'
    response = requests.get(f'http://api.openweathermap.org/data/2.5/weather?q={city}&appid={apiKey}&units=metric')
    x = response.json()
    if x["cod"] != "404":
        y = x['main']
        temperature = x['main']["temp"]
        pressure = x['main']["pressure"]
        humidity = x['main']["humidity"]
        desc = x["weather"][0]["description"]
        weather_detail = f'Current temperature is {temperature}, pressure is {pressure} hPa, humidity is {humidity} %, Weather condition is {desc}'
        replyToUser(weather_detail)
        talk()
    else:
        print('Sorry, could not find the city')

Hearing a Joke

If the phrase ‘joke’ is present in the intent, JARVIS uses the get_joke function from the pyjokes library to generate a random programming joke. Then, using the replyToUser and talk function, JARVIS speaks out the joke to the user.

if 'joke' in intent:
    joke = pyjokes.get_joke()
    replyToUser(joke)
    talk()

Shutting Down System

If the phrase ‘shutdown’ and ‘minutes’ are present in the intent, then JARVIS executes the command ‘shutdown +{minutes}’ command using the os library to schedule a system shutdown for the given time frame.

if 'shutdown' in intent and 'minutes' in intent:
    for word in st.split():
        if(word.isdigit()==1):
            minutes = word
            break
    os.system(f'shutdown +{minutes}')

Stopping JARVIS

We would want to have an option to disable/ turn off JARVIS. The code below implements this functionality where JARVIS turns off on hearing the command ‘stop’. Basically, it breaks out of the continuous loop when the user says ‘stop’.

if 'stop' in intent:
    replyToUser("Stopping")
    talk()
    break

We have written all the code snippets for implementing different functionalities of JARVIS. We’ll put together all these snippets inside an infinite loop so that JARVIS runs continuously. Below is the entire code for JARVIS.

We have completed building JARVIS. Now, you can go ahead and offload some manual work off your shoulders to JARVIS. Happy Programming!