Fine-tuning Pub/Sub performance with batch and flow control settings

Chirag Shankar
Google Cloud - Community
7 min readDec 14, 2020

Google Cloud Pub/Sub is often used as an event digestion and delivery service for large scale streaming analytics pipelines. When designing large scale pipelines with downstream dependencies, it’s important to take into consideration publisher batching and subscriber flow control properties to find the right balance among cost, latency, and throughput when utilizing Cloud Pub/Sub in your service. Within the context of this article, latency refers to the time it takes for downstream dependencies to receive published data and throughput refers to the number of messages published per second.

Batching

A batch, within the context of Cloud Pub/Sub, refers to a group of one or more messages published to a topic by a publisher in a single publish request. Batching is done by default in the client library or explicitly by the user. The purpose for this feature is to allow for a higher throughput of messages while also providing a more efficient way for messages to travel through the various layers of the service(s). Adjusting the batch size (i.e. how many messages or bytes are sent in a publish request) can be used to achieve the desired level of throughput.

Ideally, if cost is not a consideration and the use case requirement is met, instances of the publisher can be created on an as-needed basis with batching disabled. This minimizes latency and maximizes throughput by scaling horizontally on the number of publishers.

Note: 1000 bytes is the minimum request size, so if a request is smaller than that, it will be rounded up to 1000 bytes for cost purposes.

However, in most cases, cost is a consideration and therefore sending multiple messages in a single publish request is one way to reach equivalent throughput with fewer publishers. Since messages will be held in order to fill batches, it can result in an increase in latency.

Batching Features

Features specific to batching on the publisher side include setElementCountThreshold(), setRequestByteThreshold(), and setDelayThreshold() as part of setBatchSettings() on a publisher client (the naming varies slightly in the different client libraries). These features can be used to finely tune the behavior of batching to find a better balance among cost, latency, and throughput.

setElementCountThreshold() and setRequestByteThreshold() control the maximum size of a publish request by specifying the maximum number of messages or bytes. The goal is to find the right balance between high throughput and the cost associated with subscriber resources needed to handle the high throughput.

Note: The maximum number of messages that can be published in a single batch is 1000 messages or 10 MB.

setDelayThreshold() provides flexibility to control how long to wait before sending a batch, specifically in regard to the amount of time messages are held in order to fill batches. Decreasing this value improves latency while increasing this value increases the likelihood of getting full batches filled, suitable for applications that are less sensitive to latency.

An example of these batching properties can be found in the Publish with batching settings documentation.

Flow Control

Data pipelines often receive sporadic spikes in published traffic which can overwhelm subscribers in an effort to catch up. The usual response to high published throughput on a subscription would be to dynamically autoscale subscriber resources to consume more messages. However, this can incur unwanted costs — for instance, you may need to use more VM’s — which can lead to additional capacity planning.

Flow control features on the subscriber side can help control the unhealthy behavior of these tasks on the pipeline by allowing the subscriber to regulate the rate at which messages are ingested. These features provide the added functionality to adjust how sensitive the service is to sudden spikes or drops of published throughput. This mitigates unwanted costs associated with auto-scaling to sustain a higher throughput.

Ideally a service with limitless subscriber capacity can minimize latency and maximize throughput by scaling horizontally on subscriber resources. However, in most cases where services are limited by subscriber capacity, flow control allows the user to define what the subscriber capacity is. These flow control features can be used to find an optimal balance between throughput and end-to-end latency.

Flow Control Features

Some features that are helpful for adjusting flow control and other settings on the subscriber are setMaxOutstandingElementCount(), setMaxOutstandingRequestBytes(), and setMaxAckExtensionPeriod() (again, the naming varies slightly in the different client libraries).

setMaxOutstandingElementCount() and setMaxOutstandingRequestBytes() are part of setFlowControlSettings() on a subscriber client, and they set the maximum number and bytes of messages whose acks/nacks have not been received by Pub/Sub. Once this limit is hit, the client won’t be able to pull more messages until the messages that have already been pulled are ack’d. This provides a way to align throughput with the cost associated with running more subscribers.

setMaxAckExtensionPeriod() on a subscriber client sets the maximum amount of time that Pub/Sub needs to wait for an ack or nack. It prevents stuck messages on any one subscriber client by letting Pub/Sub re-deliver these messages to another client past a certain amount of time. Pub/Sub client libraries automatically call modifyAckDeadline() on messages that they are processing until the setMaxAckExtensionPeriod() passes, which defaults to 1 hour. If the messages are neither ack’ed nor nack’ed during this time period, they will expire and become eligible for redelivery.

Examples of these settings being used can be found in the Subscribe with flow control documentation.

Flow Control & Batching

While flow control features work on the subscriber and batching features work on the publisher, both can be used in conjunction to offset different levels of cost, latency, and throughput to find the right balance that best satisfies the service requirements.

A service that can create publishers and subscribers on an as-needed basis yields low end-to-end latency and high throughput at the expense of higher costs associated with more compute resources. Batching and flow control optimizations can be considered to keep Cloud Pub/Sub service costs within budget.

Batching and flow control both optimize costs by limiting resources, however they do so in opposite ways. While batching is focused on providing higher throughput with less publisher resources, flow control is focused on reducing subscriber resources at the expense of throughput. Similarly, both the features compromise on latency which is dictated by resources and in turn helps reduce cost. When using these features, it is important to examine how the use case benefits from these optimizations.

The benefit of batching is higher throughput, however this is only going to be effective if subscriber resources, required for downstream processing, can support this. Depending on the use case, subscribers can be created when needed to maximize throughput and minimize end-to-end latency. This is achieved by allowing more work to be distributed across a larger pool of compute resources. However, in cases where published throughput can be hard to predict and budgetary restrictions apply, spikes or sustained periods of high throughput can be costly and cause unhealthy pipeline behavior, which can lead to undesired latency and impacts downstream. This unwanted behavior can be mitigated by the use of flow control features by restricting compute resources thereby capping the rate at which messages are received. Generally speaking, this has the added benefit of controlling how your subscribers react to periods of high throughput at the detriment of increased latency.

Message Redelivery & Duplication Rate

However, there is one place where batching configuration of the publisher can affect subscribers — all messages in a batch must be acknowledged before the acknowledgement deadline for the batch, otherwise the entire batch is re-queued for delivery. While the server caches individual messages acknowledgements in memory to try and avoid redelivery, the longer it takes to ack messages, the more likely the subscriber will move to using a different server, or the server will restart and lose state. Therefore, records of previously ack’ed messages within these unacknowledged batches will be lost and may result in a higher rate of duplicates. Because Cloud Pub/Sub guarantees at-least-once message delivery, a duplication rate of ~0.1% is expected. If duplication rate increases past the expected value, there are a few things that can be done to better tune the service:

  • Decrease ack latency by ack’ing messages at a higher rate.
  • Increase ack deadline by adjusting setMaxExtensionPeriod() on a subscriber client. This gives subscribers more time to process messages.
  • Check for bugs in the code where some messages are not getting ack’ed. This can often be the result of exceptions that are uncaught. To debug this, it can be helpful to log a message ID as soon as the message is received and then log the message ID right after calling ack() on the message. Make sure all messages received have a corresponding ack() call.
  • Publish in smaller batches can help if there is a large variance in ack latency among messages or there are messages that are unable to get ack’ed. Small batches help reduce the number of messages needed to mark an entire batch as acknowledged.
  • Reduce setMaxOutstandingElementCount() or setMaxOutstandingRequestBytes() if the subscriber is too slow to ack messages (typically when several operations are performed between message pull and message ack). Having fewer messages to process as well as ack within the same amount of time prevents messages from expiring too early and getting redelivered.

In summary, the goal is to use message size, publisher throughput, and subscriber processing latency to optimize for cost, latency, and throughput. These optimizations are delivered through a combination of tradeoffs between # of subscribers, flow control and batching.

--

--