Sitemap
Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Session Management with Googles Multimodal Live API

4 min readMay 14, 2025

--

Welcome back to this Multimodal Live API article series. 📚 To explore previous articles in this series, head over to the article series overview.

If you’ve been following along, you’ve seen how we can create incredibly dynamic and natural interactions.

But what happens when the internet hiccups? Or when a conversation gets really, really long (yeah, some of us like to talk)?

A dropped connection or a session hitting its limits can be frustrating for users and a nightmare for developers. Imagine being halfway through assembling that shelf with thelp of our AI and mid-sentence poof silence. Not ideal.

Don’t worry, Google has equipped the Live API with fantastic features to manage these challenges.

In this article, we’re going to explore how to build even more resilient and long-lasting conversational applications by using:

  • Session Resumption
    Picking up where you left off after a disconnect.
  • Graceful Disconnect Notifications
    Knowing when the server is about to say goodbye so that you can handle it smoothly.
  • Context Window Compression
    Keeping the conversation going for longer without losing the thread.

The Challenge and Why Stable, Long-Lived Sessions Matter

As we build more sophisticated applications the need for stable, long-running sessions becomes critical.

  1. Temporary network issues are a fact of life. Without a way to recover, users might have to restart the entire interaction, losing context and patience.
  2. Sometimes, servers need to reset connections. If your application isn’t prepared, the session dies abruptly.
  3. AI applications usually need a memory limited by the context window. As a conversation grows, this window can fill up. Traditionally, the AI might start forgetting earlier parts of the dialogue, or the session might terminate.

Fortunately, the Live API now provides elegant solutions.

Session Resumption to keep the conversation alive

The Live API supports server-side session state storage for up to 24 hours. This means that the conversation history isn't lost even if the WebSocket connection is interrupted.

Here’s how it works:

  • The server provides unique identifiers that represent the session's current state.
  • You can provide this handle to the API when reconnecting, which will resume the session from where it left off.

Here’s how to enable session resumption in the configuration. This configuration allows you to resume a previous session by providing a session ID. If no ID is provided, it starts a fresh session

CONFIG = LiveConnectConfig(
response_modalities=["AUDIO"],
session_resumption=types.SessionResumptionConfig(.
handle="93f6ae1d-2420-40e9-828c-776cf553b7a6"
),
speech_config=SpeechConfig(
voice_config=VoiceConfig(
prebuilt_voice_config=PrebuiltVoiceConfig(voice_name="Puck")
)
),
system_instruction="you are a super friendly, sometime a bit to friendly",
)

During the conversation loop, you continuously receive new session IDs as the session gathers new information. Here’s how to retrieve the session ID:

if response.session_resumption_update:
update = response.session_resumption_update
if update.resumable and update.new_handle:
# The handle should be retained and linked to the session.
print(f"new SESSION: {update.new_handle}")

In a real-world application, this session ID would be stored in a cookie, local storage, or a database for later retrieval if the session drops.

Graceful Disconnect Notifications: Preparing for Closure

Sometimes, the server needs to close the connection. Instead of abruptly terminating it, the Live API sends a GoAway message to notify your application beforehand.

This message includes a timeLeft field, indicating how much time you have before the connection is closed. This allows your application to:

  • Save any critical data.
  • Inform the user about the impending disconnection.
  • Attempt to reconnect and resume the session, using the session ID provided with the session resumption.

Here’s how to handle the GoAway message.

# connection will be soon terminated
async for response in session.receive():
if response.go_away is not None:
# The connection will soon be terminated
print(response.go_away.time_left)

Context Window Compression for Long Conversations

Large language models have a limit to how much conversation history (or “context”) they can remember.

The Live API addresses this with context window compression. This feature automatically manages the context length, allowing extended interactions without hitting those limits.

Here’s how to enable it:

from google.genai import types

config = types.LiveConnectConfig(
response_modalities=["AUDIO"],
context_window_compression=(
# Configures compression with default parameters.
types.ContextWindowCompressionConfig(
sliding_window=types.SlidingWindow(),
)
),
)

By configuring context window compression, the API employs a sliding-window mechanism to intelligently compress older parts of the conversation, ensuring the model retains the most relevant information without abruptly terminating the connection. The number of tokens (triggerTokens) required to trigger context window compression can be configured.

Get The Full Code đź’»

The concepts and snippets discussed here will be integrated into our ongoing project. You can find the complete Python scripts and follow along with the developments in our GitHub repository. Check the websockets folder for the code for this article.

You Made It To The End. (Or Did You Just Scroll Down?)

Either way, I hope you enjoyed the article.

Got thoughts? Feedback? Discovered a bug while running the code? I’d love to hear about it.

  • Connect with me on LinkedIn. Let’s network! Send a connection request, tell me what you’re working on, or just say hi.
  • AND Subscribe to my YouTube Channel ❤️

--

--

Google Cloud - Community
Google Cloud - Community

Published in Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Sascha Heyer
Sascha Heyer

Written by Sascha Heyer

Hi, I am Sascha, Senior Machine Learning Engineer at @DoiT. Support me by becoming a Medium member 🙏 bit.ly/sascha-support