How Does Your Google Assistant Really Work?

Mansi Katarey
The Startup
Published in
4 min readNov 30, 2020


A technical rundown of how your Google Home device works.

Wake Word: “OK Google”

A wake word is a very small scale algorithm that activates a device when spoken. It can also be referred to as a “trigger word,” or “wake up word.” The wake word used to activate your Google assistant is “OK Google.” When the device hears the wake word, it begins recording, and you will see four circles on the top, this means the device is activated. The microphones also determine the direction the word comes from, so it can focus in that direction.

Photo by Luis Cortés on Unsplash

Cloud Computing

Wake word technology runs on Cloud. And no, not the clouds you see in the sky on a rainy day. Cloud Computing is used to refer to servers that are accessed over the Internet, and the software and databases that run on those servers. Cloud servers are located in data centers all over the world. When using Cloud Computing, computers can store information in another place, instead of on the computer. That saves space and allows for more information to be stored.

However, the idea of integrating cloud into virtual assistants took fairly long. One problem that had to be solved was that the device had to respond quickly when called. The system couldn’t stream what its microphone heard to a cloud service continuously; that would result in lag and would slow down wake word recognition, enough to impact the user experience. Though, these problems have been resolved with the help of ever-growing technology.

Intense Training

Yes, your Google Assistant had to go through some intense training before you guys became best friends, but not in the way you think. The wake word is based on a Neural Network Algorithm. These Neural Networks (NN) need to be trained to work the way we want them too. It can be thought of as working out at the gym. When you work out, you’re training your muscles.

The more training data that is fed into the NN, the more accurate the results are. However, this training data needs to be diverse. Going back to our gym example, if you only do push-ups, the muscles in your arms will become really strong, but the muscles in your leg won’t be. So, you will have a much harder time going for a long run, than doing 50 push-ups. The same can be applied to training Neural Networks. If the training data only consist of women speaking, even if its millions of them, there will be more errors when men try to activate the system. The same problem will occur when people with different accents try to activate the device. So, it is important that the training sets used to train the system are diverse.

How does Google come up with its answers?

When you ask Google a question, it records your question on the device and uses the internet and cloud computing to search for your question, to find potential answers. Your words and the tone of your question or request are analyzed by an algorithm, which is then matched with a command that the device thinks you asked. In essence, the device is saying, “I’m 90% sure you said this.”

Of course, the algorithm isn’t going to be 100% sure. This is the most common reason why you don’t get the answer you were looking for. Alongside the algorithm, the main device connected to your Google Home (usually a phone) is trying to see if it can process your command locally through wi-fi or Bluetooth. For example, if you ask your Google Home to turn on the lights, your phone will take care of that command. However, for more complicated commands, such as, “what does ‘bye’ mean in French,” your Google Device will need to connect to the server to answer your question.

Is your Google Home device listening to everything you say?

Ah, that infamous question! Remember the concept of a wake word? Your Google device is only activated and only starts recording when you say “OK Google.” Yes your device is constantly listening for the wake word, but don’t worry, it doesn’t understand anything until the wake word is heard. So, the short answer is no, your device isn’t listening to everything you’re saying, at least, it’s not understanding it…

Thanks for reading! If you have any questions or want to chat further, contact me at


or contact me through Linkedin



Mansi Katarey
The Startup

Passionate about AI and how it can solve problems around the world!