A Recap of Google I/O 2024

Abirami Vina
Nerd For Tech
Published in
7 min readJun 3, 2024

“If what we are doing is not seen by some people as Science Fiction, it’s probably not transformative enough” — Sergey Brin, co-founder of Google.

The AI announcements made at Google I/O 2024 seemed to be straight out of a sci-fi movie. From emails summarising themselves to AI assistants that remember where you last left your glasses! Google is pushing the boundaries of what AI can do, making it smarter, faster, and more intuitive than ever before. Let’s dive in and recap Google’s latest updates that promise to transform how we interact with technology every day.

Because the right way to life is through memes. Source

The Gemini Refresh: What’s New?

Google is spicing up its search engine with Gemini. Search results will now be headed by a feature called “AI Overviews.” AI overviews will generate answers to your questions using a variety of different websites for sources to create a quick overview of the topic, and you’ll get source links to find out more as well.

AI Overviews on Google Search | Source

That’s not all. Gemini is now far more context-aware than ever before, with the Gemini 1.5 Pro version being updated with a context window of 2 Million tokens (a huge progress from the previous 1 Million). It can now anticipate what you’re trying to do and provide helpful insights and suggestions. Gemini can even be personalized now using ‘Gems.’ Gems are your own personal, context-aware AI expert that you can create and save. And guess what? Google is not stopping there, according to its CEO, Sundar Pichai, this is just the first of many steps to achieving its goal of infinite context.

Google is also rolling out a new Gemini feature called ‘Live,’ which is supported by their latest speech models. You can now have in-depth conversations with Gemini; it will understand you better and answer questions more naturally. They’re also integrating Gemini into NotebookLM to create AI-generated voice discussions. Inputs for this can be almost anything — images, PDFs, or docs. You can also join in mid-way and interrupt the discussion to steer it in any way you want to.

And let’s not forget about ‘Ask Photos,’ which is possible thanks to Gemini being integrated with Google Photos. Think of it like your very own personal album keeper. It can understand the context behind photos and answer any questions you may have related to them. If you have hundreds or thousands of photos stored, then ‘Ask Photos’ can make searching easy.

Wait, there’s more?

Yup, there’s more!

What if you had an AI colleague who knew basically everything? That’s exactly what Google Workspace is offering you: an ‘AI Teammate’. Your new colleague will be present in all your chat conversations and can go through all of your work-related stuff. The AI can use this collective knowledge to answer questions related to work. For example, on a work-group chat, if anyone asks, ‘Does anyone know if our work has been approved?’, the AI can jump in and answer the question (since it was following all work-related conversations). But the collective memory and knowledge of this Teammate will only be as good as the conversations it has access to, the more group chats you add ‘AI Teammate,’ the better the collective memory. Now let’s move on to something different (but not really).

Introducing Gemini 1.5 Flash: The Speed Demon

That’s right! Google rolled out a brand new language model called Gemini 1.5 Flash. Google has designed the Flash to be more lightweight than the Gemini 1.5 Pro. The new model is fast and cost-efficient while still featuring multimodal reasoning capabilities and a long context window of 1 million tokens. The model is optimized for tasks where low latency and efficiency matter most. You can now use both 1.5 Flash and 1.5 Pro with up to 1 million tokens in Google AI Studio and Vertex AI, and all the developers out there can sign up to use 2 million tokens.

Yes :)

Project Astra: AI That Sees and Remembers

Built on their existing Gemini Model, Project Astra is Google’s latest contribution to the progress of the future of AI assistance. These agents can process information faster in real time by continuously encoding video frames. They can also combine the user-given video and audio inputs to create a timeline of events and cache this for fast and efficient recall later on.

For instance, the demonstrator in the Google Keynote used Project Astra to describe things that it saw on the camera and later asked it if it had seen the demonstrator’s glasses. It accurately remembered where the glasses were in the video (on the desk, next to an apple). The Gemini App will also have Project Astra Features. This gives Gemini the ability to see what you see using the camera and respond to your surroundings in real time.

Project Astra Demonstration Source

Crafting the Future of Creativity

Google announced a whole bunch of new tools that can generate images, music, and videos. Let’s learn more about them.

Imagen 3

We all love playing around with image-generation models. It could be for your projects, or maybe you just want to see an iguana playing the piano (who doesn’t want to see that). Google announced their most capable and highest quality image generation model yet. They’re calling it Imagen 3. It is capable of understanding prompts that are naturally written. The more creative and detailed you are, the better the images that are generated. Imagen 3 is also the best model for rendering text, which has been a real challenge for previous image generation models. You can try out Imagen 3 today in ImageFX, which is part of labs.google’s AI tools.

A generated image of an Iguana playing the piano.

Music AI Sandbox

AI-generated music has been around for a while, and Google is going to take it to the next level. Their ‘Music AI Sandbox’ is a suite of professional music AI tools that can create new musical instrumental sections (loops) from scratch, transfer different styles between tracks, and more. The tool, shown briefly in the Google I/O keynote, appears to accept text input and provides short audio clips or “stems” based on the prompt, complete with a waveform representation of the newly generated sounds. This technology was tested with professional musicians to create new music that would not have been possible without these tools. Check out the demonstration video to learn more and listen to that music.

Veo

The OpenAI Sora alternative that we have all been waiting for is finally here, and it’s called Veo. Veo is Google Deep Mind’s newest and most capable generative video model that can create high-quality 1080p videos from text, images, and video prompts. It can also capture the details of your instructions within prompts in different visual and cinematic styles, such as aerial shots of a landscape, timelapse videos, etc. You can also use Veo to edit your generated videos further using more input prompts.

An example of video editing using Veo. (Source)

The Next Generation of Processing Power

The modern world is powered by CPUs, GPUs, and TPUs. AI technologies need a lot of processing power, and AI development is not slowing down, so it’s important to advance our processing technology as well. Google announced its sixth-generation TPUs (Tensor Processing Units) called Trillium, which can deliver a 4.7x improvement in compute performance over the previous generation processing units. According to Google, Trillium is the most efficient and performant TPU today. Trillium will soon be available to Google Cloud customers later this year.

Alongside TPUs, they are also offering CPUs and GPUs that can support any type of workload. These include Google’s own CPUs called Axion Processors and NVIDIA’s Blackwell GPUs.

Axion Processors are Google’s first custom arm-based CPUs, designed for industry-leading performance and energy efficiency. Google will also be one of the first cloud providers to roll out NVIDIA’s latest cutting-edge Blackwell GPUs (which will be available in early 2025).

Other Notable Announcements

Let’s check out other notable announcements made at the Google I/O ’24 keynote.

  • Android 15 with AI Core: Android 15 is bringing AI to the core of your mobile experience. Google Pixel phones will be the first to showcase these breakthroughs later this year.
  • Gemma 2: A new 27B parameter model optimised by NVIDIA, outperforming larger models.
  • PaliGemma: Google’s first vision language open model for image captioning and visual Q&A.
  • Project Navarasa: An AI model made using Gemma for all Indian Languages, making AI technologies accessible to all corners of India.
  • LearnLM: An interactive AI feature on YouTube to make educational videos more engaging.
  • SynthID: A new technology that adds watermarks to AI-generated content to prevent misinformation.

Conclusion

Google’s I/O 2024 has truly showcased the future of AI with these groundbreaking updates. Whether you’re a developer, an artist, or just an everyday user, there’s something here that will make your life a bit easier, more productive, and a lot more fun.

Always remember to keep your eyes and ears open for the latest in AI. Thanks for reading and learning with me. Farwell, till our next adventure into AI.

--

--

Abirami Vina
Nerd For Tech

Vanakkam! I'm a computer vision engineer that writes because it's the next best thing to Dumbledore's Pensieve. I believe in love, kindness, and dreaming.