Voice in Apps: YouTube

Vinayak Jhunjhunwala
Published in
4 min readSep 21, 2019

In this week’s ‘Voice in Apps’, we are breaking down an app which we all are familiar with, YouTube. For the uninitiated, Slang Labs has started a new series called ‘Voice in Apps’. Every week we take an app which has integrated voice inside it and break it down. We are doing this because we think that it’s important to give recognition to the trendsetters and show how they are adding voice inside their apps and what is the result of it. Last week we broke down ‘Gaana’ a music stream app by Times Internet. If you haven’t read it, you can read it here.

YouTube’s voice search before the overhaul

Old YouTube Voice Search

Earlier versions of the Youtube app had a mic button in the search bar. On clicking this mic button, users voice query is transcribed to the search bar, verbatim.

Whenever users gave free-formed sentences like “I want to watch Naagin”, the search ended in no results shown. Hindi voice search was a nightmare.

YouTube’s major Voice search overhaul

YouTube did a major overhaul of their voice feature in January of 2019. This update saw major UI and functional changes to improve voice search.

Visual Changes:

New User eXperience

New User Experience for the new voice search
NUX for the new Voice Search

The app starts with training users on the new voice search feature. They do this by showing coach mark which says “New ways to search with your voice! Show me trending videos” in a blue dialogue box hovering over the mic button.

UI Change

How the new YouTube voice search looks like?

With the new UI, on clicking the mic, a white overlay with a pulsating red mic takes over the whole screen. YouTube hasn’t forgotten about the dark mode and actually shows a black screen overlayed with the mic. Right above the mic, there are again hints present like ‘Play Charlie Puth’ which are personalized to the user. When the user speaks, the utterance is transcribed clearly on the screen and is visible in a large font. We are seeing this trend of large font size in other apps as well which are made for the Next Billion Users by Google eg, Neighbourly.

Functional Change:

Thought to Action:

Earlier, a user had to click the mic button then speak out the utterance, which was transcribed to the search bar which showed the listings and then a user selected the video from the listing. It was a time consuming, to say the least

Now, the user just has to click the mic button and say the utterance, for eg “Play A R Rahman” and YouTube directly plays the song, thereby reducing the thought to action latency. This removes the time spent in browsing for the videos. More time user spends seeing a video, more money they make. It is important to note that this happens only where the intent of the user is very clear for example, ‘play’. Other intents like ‘Show’ still end up opening the list of videos where user can go and select the video.

This is a pattern we are seeing in other apps like Gaana as well, where voice search by a user results directly into action. We broke down the specifics here.

Navigation via Voice:

One important feature that YouTube also enabled was, the ability to navigate parts of their app through voice. You can tap the mic and say “Show me my history” and YouTube will take you there. This will help users to navigate through the treacherous hierarchies of the app with a single voice command, essentially rendering the entire app flat. Currently, not all menus and submenus are accessible by voice commands. There are various parts of the app that users have to still access by visuals

What’s still missing?

Multilingual support: Ability to do the searches and navigation via voice in different Indian languages like Hindi and Tamil. With 400% YoY increase in Hindi searches, it is necessary to make the search at least bilingual.

Navigational support: Currently, voice navigation is a hit or a miss because users are not aware of the boundaries of the voice search. YouTube either needs to expand voice navigational support to all parts of the app or inform users the boundaries.

Better NLP capabilities: The NLP capabilities of the voice search can be improved significantly. To allow the users to speak free form sentences, essentially allowing the app to ask for videos as naturally as they want.

This major overhaul of the Voice UI was a long time coming. Google is seeing a 270% YoY increase voice search across all its properties in India. We will see a lot more functionality being added to voice search and see this change be replicated across a lot of different apps and not just by Google but with other different brands as well. Slang allows you to add voice to your apps in the easiest and fastest way possible. Reach out to us at 42@slanglabs.in if you are interested in adding voice to your apps as well.

