When Users Message Faster Than AI Can Think — part 3
TL;DR: In parts one and two of this series, we explored how to use Redis to solve a common problem for GenAI agents: handling users who send multiple messages rapidly, as they might on WhatsApp. This third and final part presents two architectures based on Google Cloud Platform services like Memorystore, highlighting its advantages for production-ready solutions, like high availability and automatic failover.
Secure your Redis and make it frictionless for you and your customers — Memorystore
Transitioning from an idea or a proof-of-concept (PoC) to a production-ready solution is inherently complex. One often overlooked aspect is the cost of ensuring observability and ongoing maintenance. This includes integrating a monitoring solution, meeting availability requirements, and applying security updates.
Customers may lack a team with the necessary expertise or may be reluctant to invest in professional services. This is where Memorystore comes in. Memorystore is a fully managed service from Google Cloud that supports various engines for managing in-memory databases, including Redis instances and Redis clusters, both based on the OSS version of Redis. In my view, Memorystore offers several essential features for any production-ready system:
High Availability
Memory store offers a 99.99% availability SLA for both Redis instances and Redis clusters. Insane!
Under the hood Google integrated the high availability at the core, instead of using external tool like Sentinel to implement it on the OSS version. If you are curious about the details of the implementation there is an awesome Google Cloud blog post that covers such topic in detail: Zero-downtime scaling in Memorystore for Redis Cluster | Google Cloud Blog
Monitoring
Memorystore is also integrated with Cloud Monitoring, providing valuable insights into Redis cluster’s performance and health. It automatically collects key metrics such as the number of connected clients, the total number of keys stored, and resource utilization like memory and CPU usage. Moreover, to proactively identify and address potential issues, it is also possible to configure alert policies directly from the Memorystore for Redis page in the Google Cloud console.
Encryption in-transit and at-rest
Memorystore for Redis makes really easy to configure in-transit encryption using TLS. This means all communication between Redis clients and the cluster can be secured, preventing unauthorized by blocking any Redis clients that aren’t configured for TLS.
Memorystore supports also Customer-managed encryption keys (CMEK) to encrypt data at-rest with customer-managed keys.
Replication and Failover
Memorystore for Redis cluster ensures your application remains highly available even in the face of unexpected disruptions. With automatic failover within a shard, if the primary node experiences an issue or undergoes maintenance, a replica seamlessly takes over with minimal impact on your application.
Since I implemented the solutions explained in this story for some business-critical applications, for me this feature is a must!
Considering Memorystore’s capabilities, the patterns discussed in part 2 were implemented with Memorystore at the core of the solution, as detailed below.
Architecting the synchronous aggregation solution on Google Cloud
As discussed in Part 2, when you lack control over your application’s frontend, you must adapt to existing mechanisms. In the last months, I had the opportunity to integrate our aggregation mechanism into a customer support solution where the frontend only supported synchronous HTTP webhooks. This configuration meant that when a user sent a message, the webhook URL was contacted, but the actual response had to be returned within the same HTTP request, otherwise the end user wouldn’t receive anything in his personal chat.
Let’s delve into the architecture by exploring its key components:
Message middleware API
This serves as the entry point of our backend, handling requests from the third-party frontend. To prevent unauthorized access and misuse we protected the API with OAuth2 authentication and a Web Application Firewall (WAF). This component writes each message segment to Memorystore and waits a specified duration for new messages. If a new message arrives (indicated by a newer last message ID in Memorystore), then the current HTTP request receives an empty response, which the third-party frontend is designed to ignore. Otherwise, the component makes a synchronous HTTP request to the message concatenator.
Message Concatenator
Once triggered, this API retrieves all message segments associated with a specific user from Memorystore and aggregates them. Additionally, it utilizes the Sensitive Data Protection API to depersonalize sensitive data before storage and encrypts data during transmission towards the AI Agent Backend to generate a response to the user’s request.
AI Backend
Lastly, this service uses the Sensitive Data Protection API to decrypt relevant information within the user query and then generates a response using Gemini. The response is relayed back to the third-party chat UI through the HTTP call stack, ultimately reaching the user.
Architecting the asynchronous aggregation solution on Google Cloud
The asynchronous aggregation solution differs primarily in the decoupling of components and how AI-generated responses reach the user.
For a client, we developed a Flutter-based chat application that allows users to interact with a virtual agent. As discussed in Part 2, leveraging the asynchronous aggregation pattern requires asynchronous communication between the frontend and backend. To achieve this, we integrated the application with a Firebase Realtime Database (with read-only permissions). This database securely stores encrypted user messages, which can only be decrypted using the user’s key.
Message middleware API
We repurposed the message middleware API from the synchronous solution, removing the waiting mechanism. The API now follows a fire-and-forget pattern, consistently returning a success code to the frontend upon receiving a new message.
Again, Memorystore forms the core of the solution. But this time, we enabled notifications for the Redis cluster, allowing it to alert the aggregator upon key expiration. This triggers the message concatenator asynchronously.
Message Concatenator
Upon receiving the notification from Memorystore, the message concatenator retrieves the message segments, performs de-anonymization and encryption, and then publishes the result to a Pub/Sub topic with a push subscription decoupling the solution even further.
AI Backend
The AI Backend, subscribed to this Pub/Sub topic, functions identically to its synchronous counterpart. However, it writes its output to Firebase’s Realtime Database, making the response visible to the user in the chat application.
Conclusions
This concludes our series. In this last part we have seen that by leveraging Memorystore for Redis and thoughtfully designing our architecture, it is possible to design an efficient message buffering solution, seamless integrated with other Google Cloud services. Whether opting for a synchronous or asynchronous approach, Memorystore’s capabilities proved invaluable benefits that are necessary for a production-ready solution.
I hope you enjoyed this journey presented in this series of stories. For any questions, feel free to contact me.