Taking GCP PubSub reliability out for spin

Sushil Kumar
Google Cloud - Community
8 min readJul 11, 2023

Google’s PubSub service is a serverless offering that provides built-in reliability. It also provides some great failover mechanisms in case of zonal or regional failures. In theory everything should work however I’ve never really tested out how this works so I decided to see how these guarantees hold up in the real world.

In this post, we’ll start PubSub producers and consumers in different regions and then try different failover scenarios to verify that indeed what the documentation says happens in real world too.

Generated with Midjourney

Demo Setup

We’ll start by first setting up infrastructure and code. We’ll create a topic and subscription pair. Since PubSub is global service our publishers and subscribers in different regions (US and India) will publish and consume to same topic and subscription respectively.

First let us create the topic.

gcloud pubsub topics create pubsub-reliability-topic

Next let us attach a subscription to this topic.

gcloud pubsub subscriptions create pubsub-reliability-subscription --topic=projects/PROJECT-ID/topics/pubsub-reliability-topic

Once infra pieces are out of the way let us now write the code for publisher and subscriber. As usual we’ll use Spring Boot to wire up our code.

You can take a look at the full code on my Github, but I’m posting the relevant beans below.

@Slf4j
@Service
public class ReliabilityPublisher implements Runnable {

@Value("${pubsub.region}")
private String region;

@Value("${pubsub.topic}")
private String topic;

@Autowired
private PubSubTemplate template;


@Override
public void run() {
log.info("Starting publisher in region {} for topic {}", region, topic);

while (!Thread.currentThread().isInterrupted()) {
log.debug("Lived for one more cycle ;)");


try {

long startMillis = Instant.now().toEpochMilli();
int num = (int) (Math.random() * 100);
String message = String.format("Message from region : %s with number %d", region, num);
String id = template.publish(topic, message).get(5, TimeUnit.SECONDS);
log.info("Published a message with id {}", id);
long endMillis = Instant.now().toEpochMilli();
long diff = 2000 - (endMillis - startMillis);

Thread.sleep(diff);
} catch (InterruptedException ex) {
log.error("Thread interrupted. Stopping the publisher.");
} catch (ExecutionException e) {
throw new RuntimeException(e);
} catch (TimeoutException e) {
throw new RuntimeException(e);
}

}
}
}

This publisher will publish a message with random number every 2 seconds with the region its deployed in. The region in the message content will help us demarcate messages from different publishes in different region.

Similarly we have the code for subscriber.

@Slf4j
@Service
public class ReliabilityConsumer implements Runnable {

@Value("${pubsub.region}")
private String region;

@Value("${pubsub.subscription}")
private String subscription;

@Autowired
private PubSubTemplate pubSubTemplate;


@Override
public void run() {
log.info("Starting consumer in region {}",region);

Subscriber subscriber = pubSubTemplate.subscribe(subscription, (basicAcknowledgeablePubsubMessage -> {
PubsubMessage message = basicAcknowledgeablePubsubMessage.getPubsubMessage();
String data = message.getData().toStringUtf8();
String id = message.getMessageId();
log.info("Sub Region : {} , Message {}", region, data);
basicAcknowledgeablePubsubMessage.ack();
}));
subscriber.startAsync();

}
}

This subscriber logs from which region it is running to ensure that we know from where the message originated.

We will deploy our publishers and subscribers as docker containers. Below is a simple Dockerfile to containerise our applications.

FROM eclipse-temurin:17-jdk-jammy
VOLUME /tmp
ARG JAR_FILE
COPY ${JAR_FILE} app.jar
ENTRYPOINT ["java","-jar","/app.jar"]

You can then go and use this file to generate docker images for both your publisher and subscriber.

docker build --build-arg "JAR_FILE=target/*.jar" -t <IMAGE-NAME-WITH-REPO>:<IMAGE-TAG> .

I also have pushed my images to docker hub in case you don’t wish to create your own files.

docker pull kaysush/pubsub-reliability-publisher:0.3
docker pull kaysush/pubsub-reliability-subscriber:0.3

Once you have either created your own docker images or pulled mine, we’ll push these images to our project’s GCP Artifact Registry.

docker tag <YOUR-OLD-PUBLISHER-IMAGE-WITH-TAG> <YOUR-GCP-ARTIFACT-REPOSITORY>/pubsub-reliability-publisher:0.1
docker tag <YOUR-OLD-SUBSCRIBER-IMAGE-WITH-TAG> <YOUR-GCP-ARTIFACT-REPOSITORY>/pubsub-reliability-subscriber:0.1
docker push <YOUR-GCP-ARTIFACT-REPOSITORY>/pubsub-reliability-publisher:0.1
docker push <YOUR-GCP-ARTIFACT-REPOSITORY>/pubsub-reliability-subscriber:0.1

Once your images are pushed to Artifact Registry we are ready to deploy our publishers and consumers.

We’ll deploy our publishers and consumers to Compute Engine VMs with following configuration and order.

  1. Publisher in us-central1-a zone.
  2. Subscriber in us-central1-a zone.
  3. Subscriber in asia-south2-a region.
  4. Publisher in asia-south2-a region.

Below is the command to create a VM with given docker image and environment variables. We’ll pass PROJECT_ID and REGION environment variables to our containers.

First let us deploy our US publisher.

gcloud compute instances create-with-container publisher-us-1 \
--project=<PROJECT_ID> \
--zone=us-central1-a \
--machine-type=e2-small \
--network-interface=network-tier=PREMIUM,subnet=default \
--provisioning-model=STANDARD \
--service-account=486774505123-compute@developer.gserviceaccount.com \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--image=projects/cos-cloud/global/images/cos-stable-105-17412-101-42 \
--boot-disk-size=10GB \
--boot-disk-type=pd-balanced \
--boot-disk-device-name=publisher-us-1 \
--container-image=<YOUR-GCP-ARTIFACT-REPOSITORY>/pubsub-reliability-publisher:0.1 \
--container-restart-policy=always \
--container-env=PROJECT_ID=<PROJECT_ID>,REGION=US \
--no-shielded-secure-boot \
--shielded-vtpm \
--shielded-integrity-monitoring \
--labels=goog-ec-src=vm_add-gcloud,container-vm=cos-stable-105-17412-101-42

Once started you can SSH into the VM and check the logs for your container. You’ll see that it has started publishing messages from US region.

Starting publisher in region US for topic projects/<PROJECT_ID>/topics/pubsub-reliability-topic
Published a message with id 7973872392557277
Published a message with id 7973906968685436
Published a message with id 7973871167153357
Published a message with id 7973912284213252
Published a message with id 7973871044583520
Published a message with id 7973890223696195
Published a message with id 7973872256772134

Now let us start our US consumer.

gcloud compute instances create-with-container subscriber-us-1 \
--project=<PROJECT_ID> \
--zone=us-central1-a \
--machine-type=e2-small \
--network-interface=network-tier=PREMIUM,subnet=default \
--provisioning-model=STANDARD \
--service-account=486774505123-compute@developer.gserviceaccount.com \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--image=projects/cos-cloud/global/images/cos-stable-105-17412-101-42 \
--boot-disk-size=10GB \
--boot-disk-type=pd-balanced \
--boot-disk-device-name=subscriber-us-1 \
--container-image=<YOUR-GCP-ARTIFACT-REPOSITORY>/pubsub-reliability-subscriber:0.1 \
--container-restart-policy=always \
--container-env=PROJECT_ID=<PROJECT_ID>,REGION=US \
--no-shielded-secure-boot \
--shielded-vtpm \
--shielded-integrity-monitoring \
--labels=goog-ec-src=vm_add-gcloud,container-vm=cos-stable-105-17412-101-42

Similar to publisher, check the logs and you’ll see that this consumer is consuming messages published from US region (currently there is only a single publisher in US so this kinda is obvious, but hang on a moment till we deploy our next consumer in India region).

Starting consumer in region US
Sub Region : US , Message Message from region : US with number 82
Sub Region : US , Message Message from region : US with number 22
Sub Region : US , Message Message from region : US with number 56
Sub Region : US , Message Message from region : US with number 75
Sub Region : US , Message Message from region : US with number 66
Sub Region : US , Message Message from region : US with number 68

Next let us deploy our India Consumer.

gcloud compute instances create-with-container subscriber-ind-1 \
--project=<PROJECT_ID> \
--zone=asia-south2-a \
--machine-type=e2-small \
--network-interface=network-tier=PREMIUM,subnet=default \
--provisioning-model=STANDARD \
--service-account=486774505123-compute@developer.gserviceaccount.com \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--image=projects/cos-cloud/global/images/cos-stable-105-17412-101-42 \
--boot-disk-size=10GB \
--boot-disk-type=pd-balanced \
--boot-disk-device-name=subscriber-ind-1 \
--container-image=<YOUR-GCP-ARTIFACT-REPOSITORY>/pubsub-reliability-subscriber:0.1 \
--container-restart-policy=always \
--container-env=PROJECT_ID=<PROJECT_ID>,REGION=IND \
--no-shielded-secure-boot \
--shielded-vtpm \
--shielded-integrity-monitoring \
--labels=goog-ec-src=vm_add-gcloud,container-vm=cos-stable-105-17412-101-42

Now when you check the logs you’ll be surprised to see that this consumer is not consuming any messages 😮

You’ll only see following log line

Starting consumer in region IND

Now let us deploy our India publisher and see how our India consumer springs into action.

Sub Region : IND , Message Message from region : IND with number 85
Sub Region : IND , Message Message from region : IND with number 1
Sub Region : IND , Message Message from region : IND with number 90
Sub Region : IND , Message Message from region : IND with number 7
Sub Region : IND , Message Message from region : IND with number 86
Sub Region : IND , Message Message from region : IND with number 47

So far so good. The location awareness of PubSub service is on point.

PubSub by default tries to send the message where it has to travel shortest, since we have publisher and subscriber in same region, PubSub will deliver messages which were published in same region.

You can also confirm this via logs that our US consumer is not consuming any messages published in IND region.

Reliability FTW

Now let us see how the reliability works if consumer in one of the location fails. We’ll also check how the delivery rebalances if we start the consumer again.

Shutdown IND Consumer

This can be done in multiple ways, if you are logged into the VM, you can docker rm -f the subscriber container. You can also delete the VM and later use the earlier command to re-create it.

I’ll delete the VM (subscriber-ind-1) itself for ease of use. As soon as PubSub detects that consumer is unavailable in India region, it will start delivering messages to US consumer.

Sub Region : US , Message Message from region : IND with number 72
Sub Region : US , Message Message from region : IND with number 70
Sub Region : US , Message Message from region : IND with number 36
Sub Region : US , Message Message from region : IND with number 89
Sub Region : US , Message Message from region : IND with number 23
Sub Region : US , Message Message from region : IND with number 62
Sub Region : US , Message Message from region : IND with number 53
Sub Region : US , Message Message from region : US with number 45
Sub Region : US , Message Message from region : IND with number 34
Sub Region : US , Message Message from region : IND with number 11
Sub Region : US , Message Message from region : US with number 38
Sub Region : US , Message Message from region : IND with number 50
Sub Region : US , Message Message from region : IND with number 97
Sub Region : US , Message Message from region : US with number 32
Sub Region : US , Message Message from region : IND with number 91
Sub Region : US , Message Message from region : IND with number 35
Sub Region : US , Message Message from region : US with number 95
Sub Region : US , Message Message from region : IND with number 6

Let us start the consumer back and see if the PubSub re-balances the delivery of messages.

One interesting behaviour I saw is that it took around 5 mins for PubSub to start delivering messages to IND consumer and even then it was still sending some messages to US consumer. So if you are looking for isolation, you should have two set of topic and subscription one for each region.

Conclusion

In this post we only covered subscriber reliability and failover mechanisms. You can read more about Publisher failovers in the official documentation.

Learnings

  1. PubSub tries to deliver message to the nearest consumer from publish location.
  2. In case consumer in one location shuts down, PubSub will deliver messages to consumer in other location ensuring no messages are lost or un-delivered.
  3. You should not rely on PubSub location awareness for Isolation. If you need regional isolation, you should have different set of topic and subscription per region.

If you find any bug in my code snippets or have any questions, feel free to drop a comment below.

Till then Happy Coding :)

--

--

Sushil Kumar
Google Cloud - Community

A polyglot developer with a knack for Distributed systems, Cloud and automation.