Open Adoption Software Interviews: Cockroach Labs

Published in

@Accel

13 min readJun 20, 2017

Cockroach Labs is the company behind CockroachDB, an open source, cloud-native and globally scalable SQL database inspired by Google Spanner. The company was founded in 2015, and the production-ready CockroachDB 1.0 was released in May 2017.

In the latest of our series of interviews with Open Adoption Software (OAS) founders and executives, Cockroach Labs co-founder and CEO Spencer Kimball talks about why he and his co-founders built CockroachDB and how it delivers Spanner’s core capabilities despite fundamental differences in design. Kimball also shares his thoughts on cloud provider lock-in and OAS business models, explaining CockroachDB’s unique “honor system” approach to enterprise licensing.

JAKE FLOMENBERG: Tell us about the problem that Cockroach Labs, via CockroachDB, is trying to solve.

SPENCER KIMBALL: The easiest way to describe what CockroachDB is to talk a little bit about the evolution. It’s a general-purpose database that would compete with things like Oracle, Postgres and MySQL. The big difference between Cockroach and those more-traditional relational databases is that Cockroach has an inherently scalable architecture and a built-in high-availability solution. Both of those are significantly different from what has been applied to relational databases in the past.

When the dotcom boom hit, databases like MySQL and Oracle could not scale to anywhere near the requirements necessary to host 100 million users, or 100 billion entities, or whatever the data model needed to capture. So the solution, which was embraced by most companies, including Google and Facebook — and is still being embraced consistently across the ecosystem — is to just split up your customer base. You partition it so some fraction are on one shard, some fraction are on the next, and so on.

But this has a lot of costs. It means every company is building their own unique technological snowflake, in terms of how to manage and scale these things. And the application is keenly aware of this partitioning and needs to deal with it, so that’s a lot of complexity. Ultimately, instead of having one very scalable database, you have a lot of smaller instances that are essentially separate isolated databases. This means you lose the databases guarantees, which are very important. If you have two different entities that happen to be on different partitions, you no longer can modify them in one operation transactionally. That is a big problem. You also cannot query across the database.

However, since 2012 when Google released its Spanner paper, and then the subsequent F1 paper that added SQL, people have come to the conclusion that, in fact, these two things can be done together. In fact, they should be. We don’t have to sacrifice consistency and an elegant query language in order to have scale.

We built CockroachDB to solve this problem for everyone who’s not Google. There’s three co-founders at Cockroach Labs — Peter Mattis, Ben Darnell and myself. We had been to Google, and then we left Google and did startups. But when you get out of Google, you realize what they were working on, and how useful and interesting it is. Mostly out of frustration, we decided we should build an open source version of what Google has with Spanner and F1 so that everyone could use it.

How does Cockroach deliver on this promise?

Traditionally, if you have a relational database, and you want to have a disaster-proof failover plan, you use a high-availability solution of some sort. With Oracle, that’s something called Golden Gate, and there are various things for MySQL and Postgres. What they do is essentially provide an asynchronous replication between a primary and secondary system. However, because that replication is asynchronous, it means that you can have inconsistencies in the event of a failover. But it’s also quite fast, in the sense that you don’t have to wait for the secondary to be updated. That’s a double-edged sword.

Cockroach is embracing a completely new type of availability, which we call multi-active availability. (Google calls it “multi-homing” internally, but that is a Google-ism.) The idea is that you use consensus for applications in order to have consistency, but also consistency with replication that tolerates failures. Essentially, you just need a consensus — a majority of replication sites–to agree on something, and then it becomes committed.

And you can lose any minority and it still works. If you have three, you can lose one. If you have five, you can lose two. As long as that majority’s available, you’re able to read and write the correct values for data. Multi-active availability and scalable SQL are where Cockroach is providing dramatically new and better capabilities for databases.

Where the company is in its OAS evolution — Project, Product or Profit?

We’re definitely at the product stage. We’ve probably been there for the last year, as companies have been putting us into production and figuring out which use cases make the most sense. We’re definitely not at the profit stage, but I would hope we’re going to start to creep in there sometime in 2018.

How does Cockroach Labs think about making money in OAS, including which features to include in the community and enterprise versions of the product?

There’s quite a detailed blog post on our business model, and what we’re planning to do to make sure this remains open source, but also becomes a profitable company. Essentially, it’s an “open core” model, where the bulk of the code and the most-useful capabilities of the system — including scalable SQL and built-in multi-active availability — are all in the core. It’s an Apache version 2 license, and it will remain open source.

I think it’s very important that your core is not hobbled. You want to make sure that people, especially people doing new ventures, can look at that core and feel very assured that they’re not going to hit a paywall in the course of doing everything they need to do to succeed as a business.

But when you actually start to succeed as a business, you need additional things from the database. For example, once you have a lot of data inside a mission-critical database, you start to need a very sophisticated backup and restore mechanism. That’s something where we’re offering an additional enterprise-tier feature. Another one that we’re planning to do is going to be something called “geo-partitioning.” This lets you have global data architecture that, to the developer and to the operator, look like a single, logical database.

This is a pretty big step forward for companies that are wrestling with the problem of data sovereignty and having users in different regions around the world. They want to have a global service, not just a bunch of partitions, which requires being able to glue all of them together somehow.

But we’re taking a very different approach to enterprise-tier features. Many companies create a sort of “stub” in the open source project, and then they’ll build a closed source, proprietary plugin or extension. Maybe even a completely different binary. They’re essentially providing those enterprise-grade features as closed-source object files.

On the other hand, we are maintaining everything, even the enterprise features, in the same GitHub repository. This means you can go in there as a user and you can debug through the stack. You can customize our features as necessary to meet your needs. If you want to add something that’s ahead of our roadmap, or that’s never going to be on our roadmap, those things are possible because the source code is available.

However, even though the source code’s available, these enterprise features are not open source in the strict definition of the term. They have all of the criteria for what constitutes open source software, except for free redistribution rights. So, if you’re going to use those features in a commercial setting, then you have to negotiate the license with us in order to use them past 30 days.

And so what does that practically mean from a licensing perspective? How do you accomplish this?

The enterprise features operate under what’s called the Cockroach Community License, which we abbreviate as CCL. It’s based on the Apache license, so things are very similar, it just comes down to whether you can freely redistribute the software.

Practically, it’s an honor system thing. The software itself will tell you to register if you start trying to use it. It won’t just work without you doing anything; you do have to go register. You could lie and say you’re not a commercial enterprise, but our contention is that the big companies that are actually going to pay us meaningful compensation for the use of the enterprise version, are going to abide by the terms of the license. Users in large companies trying to use the software for more than 30 days would have to actually lie in terms of how they’re going to use the software, which we think is a bridge too far.

There’s a very practical reason that we’ve taken this approach, which goes beyond altruism. It’s so much simpler for us to have everything in one repository, and to have it all open. It means that we don’t have to spend a lot of engineering time managing two different repositories, or refactoring well enough to the point where there are clean stubs between these two things. By doing it this way, I’m estimating we’re going to be able to do another major feature per year — like a major feature.

Given this model, what role does the CockroachDB community play in the development of the software?

We still do look to the communities for development. I think where open source is critically additive is in the insane surface area around the edges. There’s a lot of the different ways to integrate with Cockroach via ORMs and little features and things that different environments need to run it. That’s where open source adds tremendous value.

In terms of the core and big features, it’s rare to get significant help from the open source community. We’re starting to see companies that want to actually put people onto the project, in order to add certain things ahead of our roadmap. But typically, when we’ve found somebody that’s doing that in open source in their leisure time, we hire them — because if they’re doing that in their leisure time, they probably want to work at Cockroach full-time.

That said, we very much are cognizant and recognize the friction that this non-pure open source model might — and I say might, because I don’t think we know yet — have with the open source community at large. That’s why we wrote a long blog post about it. We ripped the Band-Aid off early, back in January. And, honestly, the feedback was virtually all positive. A lot of people felt the loss of RethinkDB, so they understand the trade-off between a good piece of open source software and a business model that can sustain a lasting business.

Can you talk a little bit about your view of Google? Insofar as CockroachDB is based on the ideas behind Spanner, and Google is now offering a proprietary, cloud-based Spanner service?

As I said, Google is really the inspiration behind CockroachDB. Really, the first two years of building CockroachDB has been about fast-following Google and building our MVP, which has the same capabilities as Spanner. But we had to build it in a tremendously different fashion.

The architecture is logically similar, but in terms of the actual nuts and bolts, it just had to be built in a very different way. At Google, they build systems that are very much layered on other systems, and depend on other systems. It works well there because you have all these dedicated teams managing these different moving parts of the architecture. We didn’t want that to be the case for Cockroach. We wanted this thing to be incredibly simple to run, because that makes a big difference for the velocity of adoption.

Cloud Spanner being a proprietary service only on Google’s cloud is a hurdle for them.

Google has to play catch-up, which makes it a harder sell. They have to tell people, “Well, you’re in Amazon right now, but you can use Spanner, which is better technology than Aurora or DynamoDB. But you’re going to have to move things into Google, and you’re going to be staying in our cloud, because where are you going to go? Your data’s here.”

I think it’s pretty easy to see how Amazon and Google look at database services as being the ultimate vendor lock-in. They also can get very expensive, and you’re not going to move off of them without downtime. Especially with something like DynamoDB, you’re going to have to re-architect your system.

But Cloud Spanner also doesn’t look like SQL. Google’s taken a rather Google-y approach to things built its own APIs. The query language looks like SQL, but their data-modification language is custom RPC. Maybe they’ll change that, but what that means right now is that it’s less portable. You can’t just use ORM, so that’s actually going to hurt adoption.

So I don’t see Google as our deep competitor, by any means. Although it may come to that point. Right now, they are incredibly complementary to us and Google is doing us a huge service by educating the marketplace, which is a tall task.

Google is certainly aware of CockroachDB. Do you work with Google at all?

We haven’t worked too closely with them. We’ve talked to them, for instance, about standardizing on a SQL dialect, and we’re perfectly happy working more with them. To the extent that our products look similar and act similarly, they can serve as on-ramps for each other.

Also, I think we’re providing them with an answer to the question about vendor lock-in. They can say, “Yeah, start with us, we’re going to be great. But if you do decide you need to move into some soft of hybrid cloud environment, or you want to run everything yourself, like Dropbox did eventually with Amazon, then you’re going to have that ability.” There’s a story there that can mollify the natural reluctance of a company to lock themselves into a proprietary solution.

What about Amazon? Would it represent validation or competition if there were, two years from now, an Amazon-hosted CockroachDB offering made by Amazon?

I think that suggests success, 100 percent. This happened to Elastic and I think it has actually been something that they’ve kept ahead of. If Amazon does that, it’s a pretty clear sign that your product has succeeded in the marketplace. Especially because, unlike with Elastic, Amazon has competing products with CockroachDB.

I think where we actually are able to compete with Amazon is, again, on this vendor lock-in issue. CockroachDB is going to be able to let you run in AWS, in your private cloud, or across clouds if you really want to have a fully cross-cloud deployment. You can migrate between clouds if it’s incredibly important — say a major security issue pops up — which is something no cloud provider is going to provide with a database service they’re running.

We can also stay ahead of Amazon. They’re not going to build a service that’s going to be better than the service we’re going to build. Already, our first enterprise feature, which is this distributed backup restore, is something that they would need to run a service. Yes, they could write their own version of backup restore, but it’s going to require them to fork our product and maintain that fork.

I also mentioned geo-partitioning earlier. Let’s say you’re a successful startup, and you’ve got customers in the E.U. and U.S. You really want, or need, the E.U. customer data to be located in the E.U. for data sovereignty reasons, but also for latency. A CockroachDB service that has geo-partitioning enabled makes this trivial. And you can expand into Asia and do the same thing. It’s a huge deal in terms of a global use case. Amazon would need to fork our product and build their own version of geo-partitioning too.

The barrier there keeps going up in these enterprise features, and putting them all in the same codebase is actually somewhat prophylactic for us. So I think we’re going to be competing with DynamoDB and Aurora and whatever else Amazon has got coming, not with Amazon running us as a service.

How do you sell the viability of your open source database model in a world where popular open source products have struggled, but where Oracle, Microsoft and even proprietary cloud database services are thriving?

I’m talking all the time to enterprise CIOs, CTOs and even CEOs. Some of these are literally from the biggest companies in the world. And these guys love open source. I wouldn’t say they require it, but it’s becoming close to a requirement. Everyone’s moving into the public cloud and these uncertain environments, and the last thing you need is something that’s licensed the way Oracle is licensed. That part works.

The question is: Do they have trust that our model’s going to give us the longevity that they require in order to adopt our solution? That’s really where getting our enterprise story nailed down and well communicated is already working to our advantage. What are the things on the roadmap that are enterprise-tier features, and when we expect them. It’s very important to communicate that externally, so people see where you’re going to go.

Any big company, they immediately are going to be using the enterprise version, because they need the incremental distributed backup restore. They’re not coming into this with the expectation that they’re not going to pay us money. They’re coming into this with the expectation that hopefully they won’t have to pay us as much as Oracle.

The reality is that they would probably pay us even if the whole thing was open source, because big companies would want support contracts. But I wouldn’t want to build a business on that. It’s just hard to keep the attrition low on a support-only model.

What’s very interesting is we’re getting these interested buyers from very large companies telling us they’re seeing Cockroach as the first database post-Oracle that they’re looking at as a 5-year solution. They’re talking about these mission-critical, tier-one applications that are all built on Oracle right now. Some of that has to do with the way Oracle deals with customers, but I think there’s a much larger thing going on, which is the growth of the public cloud and all these new things emerging, such as containerization and Kubernetes.

Everyone is desperate to move into that kind of lower-cost operational environment. Faster moving, faster iterations, continuous deployments — all this stuff. But how do you move convincingly into that new ecosystem with one foot permanently in the past? CockroachDB provides the same guarantees as Oracle, but it’s more scalable, more available and, with geo-partitioning, will soon have capabilities that Oracle doesn’t even have.

Open Adoption Software Interviews: Cockroach Labs

Written by Jake Flomenberg