For the Love of Sharding

Published in

SignalFire

4 min readMay 23, 2019

It was just over a year ago on a Thursday that we first met Jiten and Sugu to hear them talk about their new database startup, PlanetScale. A day later I sent them our term sheet to lead their seed round — it was the venture capital version of love at first sight. The feeling turned out to be mutual: Jiten and Sugu had a number of options for their round and we were thrilled when they chose SignalFire. A lot has happened in the past year and some of the world’s biggest enterprises who adopted the Vitess open source database are now PlanetScale customers: Slack and Github are examples (as I write this blog post we’re still waiting on permission to mention some other customers — you would recognize the names). Today, PlanetScale is announcing their Series A fundraise led by Peter Levine at Andreessen Horowitz. I thought this would be a good opportunity to explain why PlanetScale resonated so strongly with us, and to do that, let’s first take a quick history tour of databases and their development.

Ask software engineers which software systems are the most demanding and challenging to build, and distributed databases will be near the top of most lists. Databases are the mission-critical heart of the majority of software applications, and involve every aspect of computer science and software engineering, from low level storage and networking all the way to algorithms and theory.

In the 1980s and 1990s, relational databases with SQL support emerged as the winners and workhorses of the database space. I was drawn to the problem space early, and began my career as a software engineer in the database group at Oracle 20 years ago. Oracle was then the dominant vendor in SQL databases. SQL databases provided developers powerful features with which to build applications, key among them being powerful and easy to use query capabilities as well as strong guarantees on data consistency.

After a few years at Oracle, I joined Google. Right around that time, Google published three papers (Google File System (2003), MapReduce (2004), and BigTable (2006)) which shared Google’s new approach to distributed databases: do away with the relational SQL model because it didn’t scale to the levels required by Google’s huge data needs. Non-relational databases could scale further, but lacked the powerful query capabilities and consistency features of SQL — that was the tradeoff. The papers touched off a wave of popularity for so called “NoSQL” databases (for example, Hadoop, MongoDB and Cassandra). After all, if a technology leader like Google was moving away from SQL databases, NoSQL must be the way of the future!

However, there was a lot of irony in this because Google had never stopped using SQL databases! In fact, for its most mission-critical systems where data consistency was of paramount importance, SQL databases were the technology of choice. One example was the AdWords database which contained advertiser data. Another was the database powering YouTube, codenamed Vitess, which held all the video metadata. None of these systems had moved over to NoSQL (although I did hear an early account of how one of the Google founders wanted to standardize all of Google on a NoSQL database and had to be convinced by senior engineers, including Jeff Dean — a co-author of all three NoSQL papers cited above — that basing the AdWords database on a NoSQL technology would be a bad idea).

Google engineers’ jobs were made far easier if SQL databases were used for those applications because of the rich features of SQL. But what about scalability bottlenecks? In order to solve the tremendous scalability challenges for YouTube, Sugu, Jiten and the rest of the Vitess team came up with a novel idea: allow individual SQL database servers to be stitched together using a software layer on top, which would make those separate databases appear as one to the applications using them. You could now add as many servers as you need and scale horizontally without the application having to change. This approach is called “sharding” and is at the heart of Vitess.

This brings us back to PlanetScale: Vitess powered YouTube during its hyper-growth to billions of users and was open-sourced by Google. In the open source community, it got adoption from some of the largest Internet companies in the world and was accepted into CNCF. In 2018, Jiten and Sugu founded PlanetScale to provide support and continued development of Vitess and open it up to companies beyond Google. When they shared their story with me, I immediately understood the power of their approach: allowing for scalability without sacrificing the best features of SQL databases, such as data consistency and expressive query capabilities, made great sense to me, having seen the evolution of these systems inside Google firsthand. Additionally, because Vitess was developed to run on Google’s compute grid, it runs seamlessly on Kubernetes and is the most battle-hardened cloud database in the world. Instead of NoSQL, I believe “NewSQL” — exemplified by PlanetScale — is the future of database technology. Jiten and Sugu are founders with phenomenal domain expertise who have the vision and drive to make PlanetScale a new standard in databases. And that’s why for us at SignalFire, it was love at first sight.

For the Love of Sharding

Written by Ilya Kirnos