Gemini Live & Its Impact on the AI Assistant Race

The competition circles back to the Android vs. iOS mobile duopoly again

Richard Yao
IPG Media Lab
6 min readAug 16, 2024

--

Image credit: Google

Following its AI-heavy Pixel event on Tuesday, Google started to roll out Gemini Live, an enhanced version of its advanced voice mode that allows users to have in-depth voice chats with Gemini, Google’s generative AI-powered chatbot, on smartphones.

Gemini Live is now the default assistant on the upcoming Pixel 9 phones, and it is available today to all Gemini Advanced subscribers on Android. In fact, it has started rolling out to Samsung smartphones ahead of the launch of Pixel 9. And yes, it will be replacing Google Assistant as the new default voice assistant on Android devices. Google even says it’s coming to iOS soon, although no date has been specified.

Given Android’s massive global reach, this strong push by Google will have major ramifications for the consumer tech market.

Conversational Fidelity

In her review of Gemini Live, Joanna Stern of the Wall Street Journal found it to sound incredibly human and conversational. The enhanced speech engine behind Gemini Live delivers more consistent and emotionally expressive dialogue, creating a realistic interaction that feels more like conversing with a person than a machine. This development represents a significant leap forward in AI technology, where the focus is not just on providing accurate responses, but on creating a more engaging and natural user experience.

One of the standout features of Gemini Live is its ability to adapt to user speech patterns in real time. Users can interrupt the AI mid-sentence to ask follow-up questions, and the system will seamlessly adjust its response, making the interaction feel fluid and dynamic. This real-time adaptation is crucial for applications like job interview practice, where users can simulate real-world scenarios and receive feedback on their speaking habits.

Being capable of handling high-fidelity voice conversation is slowly shifting AI from a chatbot tool we text/prompt with, to an intelligence that we collaborate and consult with in real time. As the world’s anticipation for OpenAI’s unreleased products grows, Google has swooped in to steal the spotlight as the first to lead widespread advanced AI voice rollouts.

That being said, some reviewers have pointed out that Gemini might not be quite ready to replace Google Assistant as the default choice yet. For example, Jared Newman at Fast Company wrote about the various tasks that Gemini failed at, from dictating notes to getting turn-by-turn navigation. Some of these seem to be results of a rushed launch that could be smoothed out before long, but it might cause some reputation damage to Gemini if this is scaled too quickly and widely.

Google’s Ecosystem Play

Google has been comparatively slow to roll out integrated AI features, but Gemini Live is a major exception. Similar to Apple’s upcoming Intelligence features, Gemini Live integrates directly with Google’s services to provide context-aware answers without requiring users to switch between apps. For instance, after summoning Gemini, users can tap “Ask about this screen” or “Ask about this video” to get relevant, contextual replies based on what they are currently viewing. This capability allows Gemini to perform tasks like adding a list of restaurants from a YouTube travel video directly to Google Maps, streamlining the user experience and making the AI a more intuitive assistant.

A new Pixel Screenshots feature is reminiscent of Microsoft’s much maligned Recall feature that uses AI to track everything you do on your computer via constant screenshots, except it’s more of a manual affair. If you want to keep track of something like an event you’re planning with friends or a recipe to make for dinner, you can take a screenshot and then conversationally search for the information later. This feature is notable as a Pixel-exclusive, as opposed to all the other features that Gemini will bring to other Android phones.

However, it is worth noting that Google essentially plans to charge $20 a month for Gemini Live access. This approach will certainly help bring in some additional revenues while helping positioning it as a premium service. But it also creates a barrier to adoption, risking a competitive disadvantage, and reducing data collection opportunities. The strategy’s success hinges on Google’s broader goals — whether targeting a premium market or maximizing user adoption. A hybrid approach, offering a free basic version with paid advanced features, could balance revenue generation with building a large, engaged user base. But it’s unclear at this point what Google will do.

Cloud as a Competitive Edge

As Google pushes forward with Gemini Live, it’s clear that the company is betting on cloud integration as its competitive advantage in the AI arms race. As analyst Ben Thompson points out, the ultimate focus for Google’s AI integration is between Android and Google’s cloud services, rather than the hardware itself, be it a Pixel or a Samsung phone. “What Google is proposing is something different entirely: you can pick your device, but your AI will be integrated with your data primarily via the cloud.”

This approach contrasts with Apple’s strategy, which is far more device-centric, with AI features that are tightly integrated with the hardware and personal data stored on the device. At the end of the day, Apple is a hardware company that makes most of its revenue selling iPhones, and Google is a service company that makes most of its money from ads. For all these talks about Pixel, it only accounts for about 5% market share in the U.S., and merely 1% worldwide.

While Apple’s method offers strong privacy controls and localized processing, Google’s cloud-based approach provides scalability and flexibility, allowing for continuous updates and improvements without the need for hardware upgrades. The real power of Gemini AI lies in its ability to integrate across the Android ecosystem via the cloud, rather than being confined to a specific device.

The funny thing is, as the race of leveraging generative AI to supercharge existing AI assistants continues to unfold, it has started to circle back to the mobile duopoly of Android vs. iOS. It turns out, at the end of the day, that controlling access to personal data on mobile devices and leveraging it to deliver more personalized, contextual answers is the key to unlocking the next generation of AI assistants. And since Apple and Google are now both making the push for their own AI assistants, there would presumably be less space for other AI chatbots like OpenAI’s ChatGPT or Anthropic’s Claude to really gain a foothold on mobile. Apple Intelligence will at least offload some general inquiries to third-party partners like ChatGPT; Google, on the other hand, seems to want Gemini to do it all.

It turns out that controlling access to personal data on mobile devices is the key to unlocking the next generation of AI assistants.

Brand Implications

As AI systems evolve from simple, text-based chatbots to highly sophisticated, conversational intelligences, marketing strategies must evolve in tandem. For example, Gemini Live’s ability to provide context-aware answers without switching apps enables it to pull relevant information from various sources, such as a YouTube video or a Google search, and incorporate it into the conversation. For marketers, this means that campaigns can be designed to integrate seamlessly into the broader digital ecosystem. Google has not shared how advertisers can integrate into Gemini yet, but all the conventional wisdoms on best practices around voice assistants may apply.

To prepare for a conversational, voice-driven mobile experience, brands should optimize content for voice search and invest in conversational AI for voice-activated interactions. Enhancing personalization, ensuring multi-platform integration, and prioritizing accessibility are key to success.

Moreover, brands must also address privacy concerns by being transparent about data usage and ensuring security. Continuous monitoring and adaptation to evolving technology and user expectations will help brands stay ahead of the curve in the voice-driven landscape.

--

--