Beyond Databases: Postgres as the Ultimate Data Protocol
Listen up, fellow data aficionados, for we are about to embark on an exhilarating journey through the wild frontiers of data communication. In this captivating discourse, we shine a spotlight on Postgres, a database management system that single-handedly revolutionizes the art of data communication. Brace yourselves as we unravel the tale of how Postgres, armed with only two weapons; SQL for accessing data at rest and the transformative magic of logical replication for Change Data Capture (CDC), paves the way for a new era of seamless data communication. Get ready to witness Postgres transcend from mere to DB to data protocol.
Ah, protocols — the backbone of effective communication in the digital realm. When it comes to defining what makes a good protocol, we must peel back the layers and delve into the heart of the matter. Several key attributes rise to the surface, shaping our perception of what defines excellence. So, dear reader, allow me to share my musings on the elements that, in my humble opinion, distinguish a good protocol from the rest.
Simplicity: The Elegance of Clarity
A good protocol should embrace the elegance of simplicity. It should strive to convey its intentions with clarity, avoiding unnecessary complexity that can hinder understanding and adoption. Simplicity enables developers to grasp the protocol’s essence swiftly, reducing the cognitive load and allowing them to focus on implementation and innovation. Like a well-crafted symphony, a good protocol harmonizes simplicity and clarity to create an intuitive and seamless communication experience.
Flexibility: The Chameleon’s Dance
In the ever-evolving digital landscape, adaptability is paramount. A good protocol should possess the versatility to accommodate a wide range of use cases and scenarios and empower developers to craft innovative solutions. Just as a skilled dancer effortlessly adapts their movements to different rhythms, a good protocol gracefully adjusts to the changing demands of communication, ensuring compatibility and extensibility.
Reliability: Trusting the Foundation
A reliable protocol instills confidence in its users. It should exhibit robustness, demonstrating a consistent ability to transmit and receive data accurately and efficiently. Reliability breeds trust, establishing a solid foundation for communication. A good protocol shields its users from the turbulent seas of data transmission, providing assurances that their messages will reach their intended destinations reliably and securely.
Interoperability: Bridging the Divides
In a world teeming with disparate systems and technologies, interoperability becomes the unifying force that bridges the divides. A good protocol transcends boundaries, enabling seamless communication between diverse platforms. It fosters an ecosystem of collaboration and integration, empowering disparate components to work together harmoniously. Like a universal translator, a good protocol eradicates barriers, facilitating the exchange of data without discrimination.
Evolvability: The Ever-Forward March
The digital landscape is a relentless march forward, and a good protocol must be able to adapt and evolve alongside it. It should possess mechanisms that allow for progressive enhancement and versioning, accommodating the changing needs and demands of the communication ecosystem. A good protocol embraces the spirit of continuous improvement, providing avenues for updates and extensions that ensure its relevance and longevity.
At Lassoo.io when we began developing our SAAS behavioral data platform we recognized the need for a flexible way to make our data available to data people: data scientists, data analysts, data wranglers from all corners of the galaxy. We needed a specialized protocol. Essentially we were faced with the task of recording, storing, and manipulating billions of records and making all that available to the people who use it. Now, I know what you’re thinking, most computer science problems can be boiled down in their essence to reading, writing and operating on data but stay with me till the punch line.
Faced with the above challenge we could’ve approached it a number of ways.
An open API of some sort? GraphQL is all the rage. We could’ve published an API. It would be RESTful, work with JSON, all the good stuff. Of Course we’d have to handle authentication and authorization (probably OAuth), pagination, throttling and then there’s maintaining the damn thing for years to come. Just the thought of it makes me sick.
Ok let’s forget about the REST api, how else can we get the data out to our users? Well we live in a connector world, why not just make connectors. A proprietary connector that can be published in any app-store-like ecosystem. A connector for Salesforce, a connector for Snowflake, a connector for… I think you can see where this is going. Writing and maintaining a boatload of connectors does not seem fun either.
The best option: Make the data available in a way that’s already ubiquitous, familiar and loved. Don’t reinvent the wheel when making a new wagon.
We chose SQL as our savior (specifically Postgres), here’s why:
- As of July 2023, PostgreSQL consistently ranks among the top four most popular relational database management systems according to the DB-Engines Ranking. It has been steadily climbing in popularity over the years.
- In the Stack Overflow Developer Survey 2021, PostgreSQL was the fourth most popular database among professional developers worldwide.
- PostgreSQL has a significant presence on GitHub, with thousands of stars and forks on its official repository. It also has an active community contributing to its development and maintenance.
- PostgreSQL has been downloaded millions of times, indicating its widespread usage. It is available for various operating systems, making it accessible to a broad user base.
- Job market demand for PostgreSQL skills is consistently high, with numerous job postings seeking professionals experienced in working with PostgreSQL.
- PostgreSQL conferences and events attract a substantial number of attendees from around the world, showcasing the interest and engagement within the community.
- It’s free and open source. Duh!
So let’s see how our champion measures up against our requirements of a good protocol.
Simplicity. I often compare SQL to chess. Easy to learn but a lifetime to master. SQL is one of the most accessible languages out there and it’s the epitome of declarative programming. If you’re a data person and you’ve never touched SQL…. well then you’re not a data person.
Flexibility. SQL doesn’t care about your use case. Only that your data is (or can) be structured. Show me a data model and I’ll build you a schema!
Reliability. The postgres logo is an elephant ofcourse — not a bad symbol of reliability.
Interoperability. With every single ETL tool on the planet supporting Postgresql connections I’d say this one is a no brainer.
Evolvability. PostgreSQL is on version 15 as of this writing. Postgresql 6.3 came out on March 1, 1998. You were still in high school (probably) and Titanic was the first film to gross over $1 Billion worldwide. Since that time Postgres has improved and expanded, thanks in no small part to the open source community.
So, Postgres checks all the boxes, but let’s delve deeper into how it can be thought of as a data protocol.
SQL Access: The Universal Language of Data
Postgres and more generally SQL enables interaction with data at rest. With SQL as the lingua franca, data people can express their queries, transformations, and analyses with unparalleled clarity and elegance. Whether you’re a seasoned data scientist or a budding analyst, the mastery of SQL opens the doors to Postgres’ data kingdom, empowering you to extract insights and unravel the mysteries hidden within the data.
Change Data Capture (CDC) via Logical Replication: Unleashing Real-Time Streaming
In the fast-paced world of data communication, real-time, streaming data is the lifeblood that fuels innovation. Postgres, ever the trailblazer, equips us with a powerful weapon: logical replication. Through logical replication (the detailed mechanics of how it works are here), table-level data synchronization Postgres captures and propagates data changes. It utilizes an extensible plugin architecture, with the default plugin called pgoutput at its core. This plugin plays a vital role in converting data modifications (inserts, updates and deletes) into a stream of logical replication messages. These messages adhere to the Logical Streaming Replication Protocol, facilitating the communication between the source and destination systems. With the extensible plugin architecture, users have the flexibility to customize and extend the replication capabilities according to their specific requirements. Logical replication, powered by pgoutput provides an efficient mechanism for maintaining data consistency and supporting real-time data updates across distributed systems. Using the typical pub/sub you can create fine grained publications to stream changes on individual tables or an entire schema. Logical replication and CDC opens up new horizons, enabling applications to react in real-time to the evolving data. It breathes life into event-driven architectures, facilitates real-time analytics, and empowers businesses to make decisions at the speed of thought.
The Dance of SQL and CDC: A Harmonious Synchronization
Plain old SQL access and CDC via logical replication form a harmonious dance, a symbiotic relationship. The SQL language allows us to explore the data, querying and manipulating data at rest with grace and precision. Meanwhile, logical replication adds a dynamic dimension, streaming changes in real-time. This powerful combination transforms Postgres into more than just a database; it becomes a conduit for real-time data communication.
As a data protocol Postgres shines as a beacon of excellence. The power of SQL with logical replication demonstrates Postgres can be a formidable protocol, offering a gateway to the world of data. As we at Lassoo.io journey through the ever-evolving data landscape we have embraced the concept of Postgres as a data protocol, may it propel us into a future where data communication knows no bounds.
By Max Kremer (with help from ChatGPT)