GCP- How to achieve High-Performance Synchronous Pull with Pub/Sub and Spring Boot

Hesham Hussen
Javarevisited
Published in
4 min readSep 10, 2021

GCP PubSub episode

Intro

The google cloud platform (GCP) offers a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products.

For empowering Messaging and ingestion for event-driven systems and streaming analytics, the GCP offers the Pub/Sub service. It offers Scalable, in-order message delivery with pull and push modes and auto-scaling and auto-provisioning, and many other cool features. For more info, visit the docs.

For subscribing from a Pub/Sub topic, google cloud offers two subscription models. Firstly, Streaming Pull, and this is great if you need high throughput/ speed and don't have much processing for the messages that you receive.

Then comes the Synchronous Pull, and this is a lot slower than the “Streaming pull” but is better and suitable if you have some processing logic you need to run against some cap on a number of messages, to be precise 1000 messages per Pull request. In this article, I will introduce a way to make this option really fast and at the same time still enjoy the benefits that come with it.

Use Case

My team at Hermes works mainly with messages and brokers and we move them around from A to B or from whatever to wherever all the time. A while ago, I took a task where we needed to make a connection between A topic in Pub/Sub where the messages reside and move them to our ActiveMQ broker.

The most important acceptance criteria for that task was to make it really fast, not lower than 10 million messages per hour.

This means that messages in Pub/Sub could ONLY be acknowledged when they are successfully committed to our ActiveMQ broker. So I have to pull messages from Pub/Sub, send them to the broker, commit this transaction successfully, and then acknowledge the messages.

Design

So I mentioned earlier, each Pull Request to Pub/Sub with synchronous pull can’t retrieve more than 1000 messages. As such it would be so slow and would not fulfill my criteria. Fortunately, the docs mention the following:

Note that to achieve low message delivery latency with synchronous pull, it is important to have many simultaneously outstanding pull requests. As the throughput of the topic increases, more pull requests are necessary.

The keyword is here is simultaneous PRs. So, I decided to design the task and leverage the ThreadPoolExecutor to have as many simultaneous PRs as I want.

Thread Pool Executor diagram

The thread pool executor lets you define a pool of threads. These threads sit there waiting for work (tasks), and then you start submitting tasks to the pool. Then a free thread from this pool will pop a task from the pool and start working on it. The cool thing about this setup is that all of this is already configured for you and you only need two lines of code to set up such requirements, see this example.

Implementation

With the executor service at my disposal, I put the subscribing code in a Task, which is just a class that implements that Runnable interface. it was something like that.

It simply creates a subscriber and starts pulling from Pub/Sub, and when finished it shutdowns the subscriber. The shutdown step is very important to make sure you have no leaks.

To pull from Pub/Sub you will need something that looks like this:

That is it for designing the Subscription logic into Tasks where each is a PR to Pub/Sub. What is left is just to start submitting these tasks to the thread pool executor and start pulling.

I reached about 30 million messages per hour with the setup, it was really cool.

Note on Multithreading

It is important to understand the limits of your systems to make use of such a setup. So a CPU core can roughly run one thread at a time (if no hyperthreading is activated) so if you have a pod, where you will eventually deploy your application, that runs with one core and you configure a thread pool with 10 threads that would be a waste of resources and essentially has no meaning.

It is also important to know that more threads do not necessarily mean more speed, most of the time that bottleneck is somewhere else. In our case for example one could find that performance is a bit slow and he/she want to optimize it. So he will try to make 20 Task simultaneously (20 threads), that means about 20K messages that will be sent to the ActiveMQ Broker at the same time.

While this could work from Pub/Sub end, it may be a problem for ActiveMQ to handle such a load. In this case, the optimization measurement has been applied in the wrong part. A great book about such a topic and many more would be The Definitive Guide for Java Performance.

Feel free to share your thoughts about the article and content, and If you find it interesting or useful, clap and share.

--

--