Multimodal Live API and Websockets
A Simple Guide to Real-Time Communication. An essential part of and for using Google's Live API.
Welcome back to this Multimodal Live API article series. 📚 To explore previous articles in this series, head over to the article series overview.
WebSockets play a crucial role when implementing applications with Googles Multimodal Live API, like the voice and video-enabled web app in our series.
- Our Application uses WebSockets to establish the real-time audio stream between the user’s browser (frontend) and our Python server (backend).
- The Google Generative AI SDKs interact with the Live API using WebSockets as the underlying communication protocol.
Understanding WebSockets is therefore important for building our own application’s communication layer and appreciating the Live API's real-time nature.
Ready to demystify WebSockets with me? As an AI Engineer, I haven't thought that WebSockets might land on my table, but here they are, so let us embrace them as they open the door to low-latency real-time AI conversations.
Don’t expect a rocket science deep dive article into WebSockets. Just enough to give you a good understanding of how to use it as part of our article series.
What Are WebSockets?
Imagine you’re on a phone call. Once the call is connected, both you and the person on the other end can talk and listen simultaneously, or one can speak while the other listens. The line stays open until one of you hangs up.
WebSockets create a similar persistent bidirectional communication between a client (typically a web browser) and a server over a single connection.
- Unlike traditional HTTP, a connection is opened for a request and closed immediately. A WebSocket connection remains open, allowing for continuous data exchange.
- Bidirectional: Both the client and the server can send messages to each other independently at any time, once the connection is established, this is called bidirectional.
- Because the connection is already open, messages can be received much faster than repeated HTTP requests. This is ideal for applications needing real-time updates, like voice.
A Simple WebSocket Server with Python
Let’s create a very basic Python server using the websockets
library. This server will wait for a client to connect. Once connected, it will receive messages from the client, print them to the console, and then send a reply back to the client.
pip install websockets
We start by importing the necessary libraries: asyncio
for asynchronous programming and websockets
.
import asyncio
import websockets
Next, we define an asynchronous function called handler
. This function will be executed for each new client that connects to our server. It takes two arguments. websocket
that represents the connection to the client, and path
reprresenting the path the client connected to. Inside this handler, we first print a message indicating a new client has connected.
# This handler is called for each new client connection
async def handler(websocket, path=None):
print(f"Client connected from {websocket.remote_address}")
The core of our handler is an async for
loop. This loop iterates indefinitely, waiting for messages to arrive from the connected client. A message is printed to the server's console when it is received.
# Keep the connection open and process messages
async for message in websocket:
print(f"Received from client: '{message}'")
The server constructs a reply after receiving and printing the clients message. In this simple case, we echo back what it received. This reply is then sent back to the client using await websocket.send(reply)
.
reply = f"Server received: '{message}'"
await websocket.send(reply)
We need a main
asynchronous function. This is where we'll set up and start our WebSocket server running on port
(8765, a common choice for development).
The websockets.serve(handler, host, port)
function starts the server, telling it to use our handler
function for incoming connections. await asyncio.Future()
keeps the server running indefinitely until it's manually stopped.
async def main():
# Define the host and port for the server
host = "localhost"
port = 8765
# Start the WebSocket server
async with websockets.serve(handler, host, port):
print(f"WebSocket server started on ws://{host}:{port}")
await asyncio.Future() # Run forever until interrupted
I guess it's now time for some HTML & JavaScript
Let us create a basic HTML page with JavaScript to act as our client. We want to have a customer-facing web app, right?
This client will connect to our Python WebSocket server, allow the user to send messages through an input field, and display messages received from the server in a chat log.
<!DOCTYPE html>
<html lang="en">
<body>
<h1>Simple WebSocket Chat</h1>
<div id="chatLog">
<div class="status">Attempting to connect to WebSocket server...</div>
</div>
<input type="text" id="messageInput" placeholder="Type message here...">
<button onclick="sendMessage()">Send</button>
<script>
// JavaScript will go here
</script>
</body>
</html>
Inside the <script>
tags, we first get references to our HTML elements and also define variables like our serverUrl
.
const chatLog = document.getElementById('chatLog');
const messageInput = document.getElementById('messageInput');
const serverUrl = 'ws://localhost:8765'; // Ensure this matches your server
let websocket;
With a little helper function, we append messages to our chatLog
. This is basically just writing our messages back to the frontend.
function logMessage(message, type = 'status') {
const messageDiv = document.createElement('div');
messageDiv.classList.add('message');
if (type === 'client') {
messageDiv.classList.add('client-message');
messageDiv.textContent = `You: ${message}`;
} else if (type === 'server') {
messageDiv.classList.add('server-message');
messageDiv.textContent = `Server: ${message}`;
} else { // status or error
messageDiv.classList.add('status');
messageDiv.textContent = message;
}
chatLog.appendChild(messageDiv);
chatLog.scrollTop = chatLog.scrollHeight; // Auto-scroll to bottom
}
The connect()
function initiates the WebSocket connection using new WebSocket(serverUrl)
.
It then sets up essential event handlers: onopen
for when the connection succeeds, onmessage
to process incoming data from the server, onclose
for when the connection terminates, and onerror
to catch any connection errors. Each handler logs relevant information.
function connect() {
websocket = new WebSocket(serverUrl);
websocket.onopen = function(event) {
logMessage('Connected to WebSocket server!', 'status');
console.log('WebSocket connection opened:', event);
};
websocket.onmessage = function(event) {
const messageFromServer = event.data;
logMessage(messageFromServer, 'server');
console.log('Message received from server:', messageFromServer);
};
websocket.onclose = function(event) {
logMessage('Disconnected from WebSocket server.', 'status');
console.log('WebSocket connection closed:', event.code, event.reason);
// Optionally, try to reconnect here
};
}
The sendMessage()
function is called when the "Send" button is clicked. It checks if the WebSocket is connected and open and sends the message to the server using websocket.send(message)
.
function sendMessage() {
if (websocket && websocket.readyState === WebSocket.OPEN) {
const message = messageInput.value;
if (message.trim() === "") return; // Don't send empty messages
websocket.send(message);
logMessage(message, 'client');
console.log('Sent message to server:', message);
messageInput.value = ''; // Clear input field
} else {
logMessage('WebSocket is not connected. Cannot send message.', 'status');
}
}
Finally, we call connect()
when the page loads, it automatically attempts to establish the WebSocket connection.
// Attempt to connect when the page loads
connect();
How They Talk with each other
When index.html
loads new WebSocket('ws://localhost:8765')
initiates a connection. This starts with an HTTP GET request from the browser to the server, but with special headers (Upgrade: websocket
, Connection: Upgrade
).
If the server understands WebSockets and agrees, it responds with an HTTP 101 Switching Protocols status, and the connection is "upgraded" from HTTP to the WebSocket protocol.
When you type a message and click “Send,” websocket.send(message)
sends your text directly over the open WebSocket connection to the Python server.
The Python servers async loop receives your message. It processes it and then uses a WebSocket reply to send its response back to the browser over the same open connection.
The browser's message event handler fires, receiving and displaying the server's reply.
This cycle can repeat as many times as needed, with either side initiating messages, until the connection is closed by either the client or the server, or due to a network issue.
While using the Google Gen AI SDK is the recommended and generally more straightforward approach, it’s also possible to communicate directly with the Live API’s WebSocket endpoint if you have specific needs or want to understand the raw protocol.
To start a session by directly connecting to the WebSocket, you would use an endpoint similar to:
wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent
API details can be found at ai.google.dev/api/live. Interacting directly with this endpoint requires careful handling of authentication, message formatting (likely Protocol Buffers), and the entire session lifecycle, which the SDKs manage for you. Be cautious here.
Conclusion
Understanding this fundamental client-server WebSocket interaction and the broader concept of persistent streaming connections is key in building the next generation of dynamic web experiences, especially when working with Google’s Multimodal Live API.
We will use WebSockets to finally give our solution a Web UI, as this is probably what you’re all already waiting for, so be excited for the following articles.
Here is another little sneak peek of what this will look like.
Get The Full Code đź’»
The concepts and snippets discussed here will be integrated into our ongoing project. You can find the complete Python scripts and follow along with the developments in our GitHub repository. Check the websockets folder for the code for this article.
You Made It To The End. (Or Did You Just Scroll Down?)
Either way, I hope you enjoyed the article.
Got thoughts? Feedback? Discovered a bug while running the code? I’d love to hear about it.
- Connect with me on LinkedIn. Let’s network! Send a connection request, tell me what you’re working on, or just say hi.
- AND Subscribe to my YouTube Channel ❤️