Deep Dive into GCP Pub/Sub: Unraveling the Architecture and Benefits
Overview
Pub/sub is an asynchronous messaging service designed to be highly reliable and scalable. This article describes architecture of pub/sub & its related components. Most of the theoretical part is referred from Google docs because I don’t want to misinterpret the information(They have nicely explained I would say but diagram needs to be improved) they publicly shared. Main purpose of this article is to diagrammatically present pub-sub internal architecture so that one can quickly understand it.
Various Acronyms
Pub/Sub is a publish/subscribe (Pub/Sub) service: a messaging service where the senders of messages are decoupled from the receivers of messages. There are several key concepts in a Pub/Sub service:
- Message: the data that moves through the service.
- Topic: a named entity that represents a feed of messages.
- Subscription: a named entity that represents an interest in receiving messages on a particular topic.
- Publisher (also called a producer): creates messages and sends (publishes) them to the messaging service on a specified topic.
- Subscriber (also called a consumer): receives messages on a specified subscription
Visualizing the Architecture
Architectural Insights: Understanding the Foundation and Components
Pub/Sub is divided into two primary parts:
- control plane — which handles the assignment of publishers and subscribers to servers on the data plane.
2. data plane — which handles moving messages between publishers and subscribers.
The servers in the data plane are called forwarders, and the servers in the control plane are called routers.
Control Plane
- The Pub/Sub control plane distributes clients to forwarders in a way that provides scalability, availability, and low latency for all clients. Any forwarder is capable of serving clients for any topic or subscription. When a client connects to Pub/Sub, the router decides the data centers the client should connect to based on shortest network distance, a measure of the latency on the connection between two points.
- Within any given data center the router tries to distribute overall load across the set of available forwarders. The router must balance two different goals when performing this assignment: (a) uniformity of load (b) stability of assignments
- The router uses a variant of consistent hashing developed by Google Research to achieve a tunable balance between consistency and uniformity. The router provides the client with an ordered list of forwarders it can consider connecting to. This ordered list may change based on forwarder availability and the shape of the load from the client.
- A client takes this list of forwarders and connects to one or more of them. The client prefers connecting to the forwarders most recommended by the router, but also takes into consideration any failures that have occurred, e.g., it may decide to try forwarders in a different data center if several attempts to the nearest ones have failed. In order to abstract Pub/Sub clients away from these implementation details, there is a service proxy between the clients and forwarders that performs this connection optimization on behalf of clients.
Data Plane
The data plane receives messages from publishers and sends them to clients. we’ll try to understand how message
- A publisher sends a message.
- The message is written to storage.
- Pub/Sub sends an acknowledgement to the publisher that it has received the message and guarantees its delivery to all attached subscriptions.
- At the same time as writing the message to storage, Pub/Sub delivers it to subscribers.
- Subscribers send an acknowledgement to Pub/Sub that they have processed the message.
- Once at least one subscriber for each subscription has acknowledged the message, Pub/Sub deletes the message from storage.
Conclusion
There are many such solution available in the market. Some are at enterprise level of software and some are opensource or cloud based. One should carefully compare the features and choose right one that full fills the needs. Security like authentication, authorization, encryption etc. should be in the top of the list during the comparison.
Reference: