Google Cloud Next ’23 Accelerates AI: Key Takeaways

Sanjeev Mohan
𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨
10 min readSep 6, 2023

It is no surprise that the protagonist in Google Cloud’s 2023 user conference was AI. If it wasn’t explicitly in your face, it was just lurking in the background of most new announcements.

And why not? In my decades of speaking and attending vendor conferences, I have never seen businesses as energized. At Google Cloud Next, the company acknowledged that any technical advances must be to serve business needs. For example, an AI assistant called Duet AI, now allows users to ask questions against their data, metadata and infrastructure. Whether you are a coder or a cloud architect or a business user, or a data steward, Duet AI is at your service to help you be more productive.

But, I am jumping ahead. This document showcases key announcements. It is not meant to be an exhaustive list, but it highlights the more important ones for clients and partners I speak to. To explain the announcements in a consumable and digestible way, I have divided them into a vertical stack you see in the figure below.

Many of us grew up on the seminal Open Systems Interconnection (OSI) model whose seven layers clearly explained the layers of abstraction for multiple computers to communicate with each other in the early days of the internet. The work on this standard started in the late 1970s and was published by the ISO in 1980. As we fast forward to 2023, we asked ourselves if we could apply the OSI model to generative AI and how would that look. This exercise provides clarity to better understand how, in theory, we can deliver AI workloads at scale, with security and cost effectively. Developments in each layer of the stack is needed to make AI mainstream.

Figure: OSI 7-layer architecture reimagined to Generative AI with Google Cloud Next ’23 announcements

Now let’s apply the new announcements to each of the layers by taking a bottoms-up approach.

1. Infrastructure

Google Cloud‘s infrastructure delivers a performant hardware for storage, networking, compute and security. The common storage layer, based on Colossus, is shared by its data stores like BigQuery, AlloyDB, Spanner, Bigtable and the Pub/Sub eventing system. The compute layer uses Borg, a serverless and disaggregated subsystem with workload isolation. Borg runs queries and is used to launch various workloads, such as training and inference jobs.

Training models require vast amounts of resources. OpenAI’s GPT-4 was trained on 10,000 NVIDIA GPUs. With such large farms, sustainability becomes a critical factor. Efficient silicon chips are needed to drive down the cost and environmental overhead of handling LLMs. Google announced a preview of its cost efficient and scalable Cloud TPU v5e, that performs 2x training and 2.5x inference than the prior generation. By the way, TPU stands for tensor processing unit. A tensor is a mathematical representation of a vector which forms the basis for LLMs to semantically search and respond to queries.

Inferencing requires a physical infrastructure that should cost effectively scale horizontally. Google Cloud is exploring ways to use inexpensive CPUs.

NVIDIA’s CEO, Jensen Huang has been making his rounds at several keynotes this year. He was at hand to announce how Google Cloud will offer the newest NVIDIA H100 GPUs in the A3 VMs that are 3x faster training, and integrated with Google Cloud Engine, GKE, and Vertex AI.

Several other infrastructure announcements included:

  • Hybrid multi cloud support via Google Distributed Cloud (GDC). GDC is a fully managed hardware and software solution built for data and AI, and the ability to be operated either fully connected to Google or fully air-gapped. As an example, AlloyDB Omni, fully managed PostgreSQL database as a service, can run on-premises or in other cloud providers, like AWS and Microsoft Azure. GDC also offers Vertex AI integration and Dataproc Spark.
  • Titanium System — comprising custom-built silicon to scale-out offloads that deliver better cost performance, reliability, and security of customer workloads.
  • Cross-Cloud Network. This programmable network platform targets three key use cases supporting multi-cloud applications: building distributed applications, global application delivery, and securing the hybrid workforce. The distributed applications span on-premises and other cloud providers, like AWS, Microsoft Azure, Oracle Cloud Infrastructure (OCI) and Alibaba Cloud with security and workload performance optimization with 40% lower TCO and four 9’s uptime.

Before we close-out discussion of the physical layer, a new version of Google Kubernetes Engine (GKE) Enterprise was announced. While in the past years, Kubernetes would have been the center of attraction, this time the baton was passed to AI, which takes us to the next level up the stack.

2. Models

“70% of Gen AI unicorns run on Google Cloud,” said Thomas Kurian during his keynote.

Prior to the foundation models, training was accomplished by labeling a large corpus of data. This is time consuming and could be inaccurate. In the next iteration, neural networks, like convolutional or recurrent, used a very large number of compute nodes to train across many layers. However, a seminar paper in 2017 by Google engineers called “Attention is All You Need,” transformed this space as it proposed that one needs to only process key elements instead of a whole corpus of data. This approach drastically reduced training time and resources through a simple network architecture called Transformer. This watershed moment opened floodgates to Generative Pre-trained Transformer (GPT) models.

While OpenAI took an early lead, it was clear at this conference that Google Cloud wants to own this space that they invented.

Google Cloud’s PaLM 2 (Pathways Language Model) and family of models, launched in May 2023, are unique as they support all the modalities — text, image, code, and sound and video will be added later in 2023. In addition, they can translate about 150 languages (38 are GA). New announcements include:

  • Imagen — Style Tuning allows organizations to apply their own corporate brand guidelines to the generated images.
  • Codey — Codey is a text-to-code foundation model that helps improve developer velocity with code generation and code completion, as well as improve code quality. At Next, Google announced they’ve improved the quality of Codey by up to 25%. Not only does Codey help write code snippets, it also shows references to the libraries it used so as not to infringe on any copyright content. It can also explain code and generate test cases.

In addition, developers can now also choose to use Meta’s Code Llama as a code assistant.

Duet AI, which is powered by Codey, was probably the most prevalent announcement. It is being embedded all across the stack to bring the power of AI to every user. The PaLM 2 models are part of its expanding Vertex AI platform which saw a slew of updates.

3. ML Platform

Training models involve many steps of selecting the right features, connecting to the data sources, experimentation, testing and deployment. The Vertex AI platform provides tools across the ML lifecycle. This unified platform has been expanded to support prototyping and building of generative AI apps besides MLOps to automate and manage ML projects and Notebooks.

I would usually put the Responsible AI and privacy category lower in my list of announcements, but it is such a showstopper area that Google Cloud paid extra attention to this. Digital watermarking using Google DeepMind SynthID puts an invisible watermark in the images produced by the Imagen foundation model to certify their veracity.

Vertex AI Model Garden has expanded beyond first-party PaLM 2 models to include external models, like Llama 2 from Meta, Falcon from Technology Innovation Institute and soon Claude 2 from Anthropic. It already has over 100 foundation models.

Google has renamed its Generative AI App Builder to Vertex AI Search and Vertex AI Conversation, which became GA at the conference. These can be used to build search experience, chatbots and voice bots on organizations’ own business data.

Finally, Vertex AI Extensions is an exciting development as it provides tools to go beyond LLMs and call APIs to perform real-world applications. For example, developers can perform vector search in BigQuery and then call LangChain or an API to the HR system. LLM workflows, in the future, will become as prevalent as today’s data transformation pipelines.

4. Data & Analytics

Viral adoption of foundation models will only be possible when the models utilize enterprise data in a cost effective manner. What is needed is the ability to generate domain-specific and contextual prompts to the LLMs. This is the retrieval augmentation generation (RAG) pattern. To do RAG, besides deterministic keyword-based queries, databases must support semantic searches based on vector embeddings.

Google Cloud has gone all out to support vectors in its databases. PostgreSQL-based AlloyDB and Cloud SQL have both added support for the pgvector extension. AlloyDB AI has added enhancements that increase the speed of vector queries by 10x and prompt size by 4x, resulting in increased context windows to 32K. Google Cloud plans to contribute these enhancements back to the community. Google also announced real-time model inferencing and vector embeddings for BigQuery.

Duet AI is being expanded to Spanner, BigQuery, Looker, AlloyDB, Cloud SQL to generate code and manage services using natural language. Interestingly, it is also being embedded into its Database Migration Service (DMS) to automate conversion of database schema and code from source to target. The demo showed examples of converting Oracle-specific functions in stored procedures to the equivalent PostgreSQL code. Finally, Duet AI is also being added to Dataplex, the data governance and data fabric component of Google Data Cloud — providing users with a view of their ML assets and datasets through metadata search using natural language.

BigQuery gets a unified collaborative workspace, called BigQuery Studio, that developers can use for ETL, data prep, visualization, data catalog, security, CI/CD, training and inferencing workloads. The single interface supports SQL, Spark, Python, and natural language. And, yes, Duet AI support is also being added.

Data federation with a unified approach on running analytics on operational data gets a boost via Spanner Data Boost. Now operational data in Spanner data can be analyzed from within BigQuery, Dataproc using Spark or Dataflow with no data movement and no impact to the production workloads.

AlloyDB Omni will make the fully-managed PostgreSQL available on any cloud provider, your laptop or in a data center on-premises. This is unique across all cloud hyperscalers.

5. Applications

Foundation models trained on extreme data volumes need concomitant applications to test them against scale and accuracy. This is used to “ground” applications and reduce hallucination.

Google has several “extreme-scale” applications with over a billion users daily, like Ads, Search, Gmail, Workspace, YouTube, Maps, Play and Android. Many of these have been augmented with language models over the years, before LLMs became fashionable. New capabilities like the ability to write emails in Gmail demonstrate new capabilities.

Google Workspace, Google Meet, and Google Chat showed tantalizing capabilities courtesy of Duet AI that will make users more productive. Demonstrations showed Duet AI “taking notes”, summarizing documents and automatically creating Google Slides after doing a natural language query in Looker which queried BigQuery.

A glimpse of the future: domain-specific LLMs trained by subject matter experts, like Med-PaLM 2 for healthcare and Sec-PaLM for cybersecurity will become more prevalent once the entire AI ecosystem matures and becomes cost efficient.

6. Devices

Once LLMs can be produced cost effectively, models should run closer to the consumer so they have the lowest latency. This may be in on-premises data centers or on the edge. Google Cloud has an advantage as it builds devices and the Android operating system.

The Gmail feature to compose emails, mentioned earlier, runs on Gmail on Android-powered devices via an embedded foundation model. Google is working on running PaLM 2 models in a very small form factor from browsers on connected devices and connected vehicles.

Gecko model is a small model for devices and it will be GA in 2024.

7. Ecosystem

We discuss the ecosystem last because it impacts every preceding layer. For example, Google Cloud’s custom TPU silicon is available along with newer chips from Nvidia, ARM, Intel, and AMD. In addition, Model Garden has grown to support 100 foundation models from Google Cloud and 3rd party.

The Google Cloud Partner Advantage Program now has over 100K partners, including service providers. They will play a crucial role in the future of generative AI as they embark upon creating domain-specific solutions for global businesses. The Google Cloud Next ’23 show floor was packed with system integrators — global and regional.

The ISVs were also represented in strength — SingleStore, Neo4j, and ChaosSearch were just a few database vendors that showcased their partnership and offerings running on Google Cloud. Previously mentioned, Vertex AI Extensions are being pre-built for database partners, like DataStax, MongoDB, and Redis.

Finally, Google Cloud customers are building innovative solutions and pushing the boundaries. Some of them, like Wendy’s drive-through, Deutsche Bank, and Major League Baseball (MLB) had large booths to showcase their solutions. MLB ingests 25M data points from each game into BigQuery for real-time analysis and shares some of the data in a marketplace. In addition, it is digitizing its videos and images from the early part of the last century and using the power of LLMs to search them.

Conclusion

Google Cloud addresses the entire 7-layer OSI model equivalent stack for AI. It is also approaching the entire stack through principles of simplification, unification and open standards. The breadth of offerings and its guiding principles help customers avoid risky and time-consuming integration of technologies that are still early in their maturity.

The conference also showed Google Cloud’s commitment to security and privacy. Organizations want to build and deploy AI workloads in a secure and reliable way. It is also crucial that the solution not only be cost effective but also be easy to build and use.

The focus of this document is primarily on data, analytics and AI aspects although we briefly visited related infrastructure and collaboration announcements. Some announcements are GA and some are in preview. Please consult with the latest Google Cloud resources for the latest status in your region.

If you have reached the end of this document, thank you very much for your interest. You may find this analyst podcast introspective for additional learnings and perspectives from the event.

--

--

Sanjeev Mohan
𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

Sanjeev researches the space of data and analytics. Most recently he was a research vice president at Gartner. He is now a principal with SanjMo.