MongoDB co-creator explains why ‘NoSQL’ came to be, and why open source mastery is an elusive goal
In 2007, Eliot Horowitz and Dwight Merriman invented MongoDB as a database designed for the types of applications they wanted to build. Today, MongoDB is a hugely popular open source technology, managed by a software company of the same name (formerly known as 10gen).
In this interview, Horowitz, now MongoDB CTO, explains how MongoDB and its peers came to be and continue to evolve—and why open source is both a boon for innovation and a tough nut to crack for businesses.
SCALE: Can you walk back through the inception of MongoDB about 8 years ago, and why you and your co-founders created the technology?
ELIOT HOROWITZ: We started working on the idea in the summer of 2007. We were actually working on some other products and realized that, once again, we would have to work around databases. In roughly the decade before we started working in MongoDB — between Dwight Merriman, the other co-founder, and myself — we realized we had probably built 12 custom solutions to work around database problems. Whether it was Oracle or BerkeleyDB or MySQL, nothing ever quite worked for what we needed it to. There were a number of reasons for this. One was scalability and the other, frankly, was just developer productivity and how much we can get done with the database.
So we really set out to design something that solves two huge problems. One is developer productivity; you want a data model that makes more sense for developers. Tables and rows are great for what they’re designed for. Relational data models are great for what they’re designed for. They were designed for accounting, for bookkeeping — anything you would use Excel for. And for those problems, they work unbelievably well.
But if you look at the kinds of things people are storing in databases today and the programming languages they’re using today, relational databases don’t quite fit that model. Most people are developing languages that have objects or structures, whether it’s in Java or Ruby or Python. People develop classes, they develop rich structures. What they do then is take those rich structures and attach some object-relational mapping to those structures, and try to store it in a relational database.
“We really wanted to take distributed systems to the next level, where we could build a distributed database system that was both accessible and easy enough for every company out there to use.”
This has a number of problems. One, it’s complicated: mapping a complex structure in a programming language back to a relational data model is complicated; it’s fraught with problems and inconsistencies; and it’s not very human. If you go look at a deconstructed model inside of Oracle, it is very complicated. For a typical enterprise application, a user profile takes around 75 tables.
Whereas in Mongo, you can usually store that as a single document. It’s just simpler to work with, and it lets people more intuitively understand what’s going on inside the database. When people understand it more intuitively and when it maps better to their program languages, it’s a more productive tool.
The other big thing is distributed systems. A lot of very big companies like Google and Facebook have spent a ton of time building unbelievably good distributed systems inside their organizations. At that point, no one had really designed a distributed system for most companies that gave them enough flexibility that they can actually take advantage of all the great stuff in distributed systems, but that also is simple enough to operate. That’s really what we wanted to do.
For example, horizontal scalability is becoming more and more important. Single machines can no longer quite handle the workloads. With cloud computing, you want to have lots of the same machines rather than having a special machine, so you really care about horizontal scalability. You also care about multi-datacenter and geographic issues. You don’t want to have to have users from Australia wait to go back to New York to talk to the database. You’d like to have databases spread around the world. You care about things like data governance, where you want to be able to keep certain data in certain countries. You want to be able to do things like auto-archiving data that’s always on, but on cheaper machines. All these things are distributed systems problems.
“[W]hen we starting designing MongoDB, it really was a research experiment in some ways. We were designing something that we thought we wanted but we didn’t know if anyone else wanted.”
When did you realize, “Hey, we might have something here. We might be able to make a company out of this”?
It took us about a year and a half. We started working on it roughly in the fall of 2007. The first public release was in February 2009. No one knew what we were; we just put out some stuff and we talked on the blog. We basically started talking at any user group, any meeting of developers we possibly could.
At that point, the first user that took it on for real was actually SourceForge. SourceForge, at that point, was still pretty large, and they were rebuilding their entire system and they wanted to do it in a very modern way that would be more flexible. They built it on top of MongoDB, and they wrote a blog post saying how great it was. And they were using it pre-1.0 — they were making a very big bet on a very early technology. It was incredibly successful for them, and they started writing about it and people started catching on. That was the summer of 2009.
In the spring of 2010 was when we had our first MongoBD Day in San Francisco, which was way more successful and way more crowded than we ever thought possible. At that point I think there were probably around 10 or 15 people at the company, and that’s when we sort of realized that “Wow, this is a real thing.” [Ed. note: Here’s my Gigaom story about the MongoSV conference in 2011.]
It’s also very interesting because when we starting designing MongoDB, it really was a research experiment in some ways. We were designing something that we thought we wanted but we didn’t know if anyone else wanted. We did a lot of things that we thought were the right things. We’d focus on things like the data model and distributed systems.
And everything else that was good from relational databases we took—indexes, the idea of a query language, the idea of a shell. We really tried to keep all the good things in relational databases, of which there are many, but change the two things that we think really needed to change. That let us move a lot faster than if we were trying to reinvent everything from scratch.
“By being open source, by being able to leverage this huge community, it lets us move very quickly and in very interesting ways without being omniscient—because we definitely are not.”
Around 2010 or 2011, it seemed MongoDB was suddenly everywhere. Did you capture lightning in a bottle, with cloud computing hitting critical mass and new class of developers building web and mobile apps?
I think adoption was because of the latter of the things you said, the whole new generation of developers. A document model is a simpler, more intuitive model for developers. There is whole new class of developers with whole new types of applications, where people want to move faster and faster on the product side, and where six months is way too long to get a new version of their product out. When you want the next version of your iPad app three months later, you need a database that can be as agile and as flexible as your product teams, and as intuitive. You don’t want technologies where as you add more and features, they get more and more complicated such that some point you are stuck.
One of our big early customers, they actually they told us they were 18 months behind on their product roadmap entirely because they couldn’t design their way out of the relational maps. They actually spent a year porting their entire application to MongoDB and ended up being able to catch up on their product roadmap. That’s not to say they couldn’t have done the same thing if they had known what they knew at the end, and they had started to design their relational application from Day One knowing that.
But that’s not how applications are built. Applications start and they evolve and over the course of a year, 5 years, 10 years, 20 years, 40 years. It’s being able to maintain the ability to innovate and to adapt is what documents really bring to the table.
Making the right bets on technology and open source
How does MongoDB keep up technologically with what developers are doing?
Rather than any specific technology, I think the key is a few of our core beliefs. One is that being open source we get to take advantage of a huge community. For example, all of our drivers are open source and the protocol is open source. When Node.js started taking off, you had a Node.js driver that was incredibly high-quality and was written by the community within a very short amount of time. The same thing with Go and other languages.
By being open source, we get this massive benefit of having this really great community that builds these things. Whatever the new cutting-edge technology is, we don’t actually have to even know what it is or be ahead of the curve. We can actually let the community help us.
The other interesting benefit that we get from the community is a lot of focus on what are the things that really matter. For example, one of the biggest changes on the technology side of MongoDB is when we acquired WiredTiger, just about a year ago now. This is interesting because the community was asking for a whole number of features around concurrency, encryption and compression.
What actually happened was because we were open source, and the team at WiredTiger knew that we were working on these things, they did the first integration with MongoDB on their own, sent it to me and said, “Hey do you think this is useful?” I looked at it for a couple of days and said, “Wow, this is actually incredibly useful.” And roughly six months later we ended up acquiring them.
By being open source, by being able to leverage this huge community, it lets us move very quickly and in very interesting ways without being omniscient—because we definitely are not.
“You want to be as open source as possible and as transparent as possible, but at the same time, to actually innovate and do interesting things with the technology requires a lot of engineers and a lot of experimentation. . . . I don’t think we have the perfect answer today, but we are iterating quickly to get there.”
How did you decide what open source model to adopt? And do you have a sense what’s the best approach in order to maximize both innovation, control and other factors?
To be perfectly frank, I think this is still evolving over time. If you look at first-generation open source, a lot of what you saw was people building open source alternatives to closed source products. That’s interesting, but different. With MongoDB, what we’re really doing is inventing a whole new type of thing. You see more and more of this in the open source community now, but it’s sort of still new. The business models around those are still evolving very rapidly.
But I firmly believe that you’ll never see a big piece of system software that’s closed source ever again. Whether it’s operating systems or databases, the open source software is way too good and way too compelling that I can’t imagine a closed source database taking off at this point. That was never really a consideration for us. We were assuming from Day One that this was going to be open source and that was a huge part of what we cared about. Nothing else even seemed plausible.
How that evolved of over time … it is a challenge. You want to be as open source as possible and as transparent as possible, but at the same time, to actually innovate and do interesting things with the technology requires a lot of engineers and a lot of experimentation. One of my favorite engineering principles is letting people experiment with things that may fail. That’s the way you encourage innovation, by letting people go off on tangents and explore concepts that, frankly, very often fail because they are research.
That’s something that’s very hard to do in a company if you’re not able to actually monetize well. I don’t think we have the perfect answer today, but we are iterating quickly to get there.
“We weren’t really focused on competitors at that point, frankly, because we didn’t have the bandwidth to do it.”
When a movement becomes a market
How did MongoDB manage the competitive landscape when NoSQL evolved from a very kumbaya, rising-tide-floats-all-boats movement into a competitive market?
To a large extent, we were mostly just focused on our product. During the years that you’re really talking about, we were so incredibly—busy is an understatement—inundated with things we had to do to our own product, that that’s really what we spent all of our time worrying about. We weren’t really focused on competitors at that point, frankly, because we didn’t have the bandwidth to do it.
I think what happened is that developers found MongoDB to be the most compelling database out there and that is what drove the growth. We had an obsessive need to make developers successful and productive. I’ve been an engineering manager for a long time, so making engineers efficient and successful is what we’re here for.
Historically, databases have been the thing that no one wants to worry about, they just want it to work. I would say that no one thinks we’re quite there, but that really is the mission. Developers should be able to store their data, to get access to it from whatever systems they need, be able to use it effectively, and not have to worry about it.
Other companies should be focused on their products. If you’re building the next great tool, you shouldn’t be worried about the database. You should be worried about how do you add value to your users.
How many databases do you think companies are really going to pay for in the end? Will they invest in a document store and a graph store and a key-value store, or will it boil down to a relational database and a non-relational database?
If you look at what happened, nothing interesting happened in the database industry for a very long time. Relational won and you had a few big players that kind of owned the market and it was pretty boring. Then what you saw was explosion of new technologies, including us.
It really was an explosion. It wasn’t like 10. It was like 30 or 40 — a lot of new technologies, all with slightly different focuses and slightly different ideas. And all — and this is the important part — very immature.
I think that in the future, a real company is not going to want to have 20 different databases. Just from a management standpoint, from a sanity standpoint, from “how do I interoperate with my tools with all these things?” standpoint, it doesn’t make sense.
At the very least what you’ll see is a relational database, your tried-and-true technology for your legacy applications, and a document database. The great thing with documents is that they really are, from a data model standpoint, a superset of other things. You can easily put a graph model or a key-value model or even a relational model inside of a document model. So from a modeling standpoint, documents really are the superset.
The challenge then becomes features. For example, Mongo 3.2 added our first foray into joins. It’s still quite primitive, it’s really just for reporting, but it’s a pretty big step. I think what you’ll see is us adding more and more features that push the limits of what MongoDB can do. All the features that make sense from other databases with documents, we should add over time.
But it’s going to take time. If you look at the best relational databases, Oracle is 40 years old. Postgres is at least 23 years old, maybe even older. These aren’t new things. They’re systems that are very mature and have been around a long time. MongoDB is just about eight years old now. It’s going to take us time to add all the features that we really want to add.
“[W]e don’t have a hardened philosophy on [monetizing features] as of yet. I think it’s evolving and it’s still a little bit of a gut check on big decisions.”
Here comes the cloud
Speaking of adding features, where and why did Mongo draw the line in terms of monetization? Specifically, deciding what’s free and what is an enterprise feature.
First off, we don’t have a hardened philosophy on this as of yet. I think it’s evolving and it’s still a little bit of a gut check on big decisions. But there are some core tenets that are definitely true. If you’re a startup, you want the open source database to work great out of the box. It should be easy to manage, easy to use and make everyone productive. That’s total table stakes. That’s completely obvious.
On the other side of the spectrum, you’ve got integrations with very “enterprisey” tools. For example, Kerberos. I don’t know any startup that’s ever interested in Kerberos authentication for their database. That is something that only big enterprises want. That one makes sense. Maybe we should monetize Kerberos.
Everything else in the middle is kind of a gut check. One of the things we would like to do is figure out a better answer for that but, frankly, there isn’t one.
“What it means to be an open source database in the cloud era is a very different kind of thing.”
One thing that is really interesting about this is how open source databases are interacting with cloud. What it means to be an open source database in the cloud era is a very different kind of thing.
For example, we have our Cloud Manager product which offers monitoring, backup and automation in the cloud. That is something that you pay for but it’s not traditional enterprise licensing, it is SaaS licensing. It is sort of what you’d expect from a cloud provider. So, yes, the products themselves are closed source, but it is a very different consumption model. It’s a consumption model designed for the cloud.
So the intersection of open source databases and cloud is incredibly interesting. I wish I knew exactly where it was going to be in three years, but we see cloud adoption continuing to grow, and a big focus of ours is exactly where we should go with that. How do we make MongoDB incredibly suited for the cloud computing space?
“A database in the cloud that is magical is pretty appealing. No one is there yet.”
I’m curious then what you make of something like Amazon DynamoDB, which kind of sits in the middle of what you’re talking about — a proprietary database that invokes SaaS consumption and billing models?
A database in the cloud that is magical is pretty appealing. No one is there yet. I think it’s going to be a race to getting there.
DynamoDB has one angle on it, they’re working in one direction. We’re working on a very different direction. I think the idea of a fully managed database in the cloud is incredibly appealing—as a developer I just talk to it and I don’t have to manage it, I don’t have to worry about up time. I don’t worry about scaling it. It’s just knobs for me and I just change the knob and someone else is worrying about it.
DynamoDB is a great example of a plethora of databases that have come out in the last five years. I think there will be a lot of convergence. But the idea of a fully managed database in the cloud is obviously incredibly appealing to a lot of people.
Do you think it will be new application types — the Internet of Things, for example — that push more developers toward new databases, or will it be the continued migration toward cloud platforms and all the architectural changes they enable?
I think it’s developers. The way cloud interacts is slightly different in my mind. Costs, when you start building an application or start deploying an application, more are becoming people costs and less upfront costs. So the great thing about cloud and databases is you can now take MongoDB using our management tools, spin up a thing on Amazon in 5 minutes that costs you very little per month, and get going.
With all these things people want to do—the Internet of Things—and all these applications that enterprises want, the world is evolving so rapidly that they need tools that can evolve and make their developers as efficient as possible. If you’re at a traditional enterprise and you want to start building an application, and it’s going to take you two months to get your database provisioned, that’s two months of lost productivity.
The costs behind most applications today are developer cost and time to market. Those are the two things we care the most about. Making developers as productive as possible, so it’s efficient from a cost standpoint, and customer time to market, so you’re not lagging on getting your products out and getting new features out.
Cloud computing also helps dramatically in those two things. They’re not necessarily because of each other, but they are both pushing in the same direction.
“If you’re at a traditional enterprise and you want to start building an application, and it’s going to take you two months to get your database provisioned, that’s two months of lost productivity.”
Making distributed computing consumable
You mentioned the idea of distributed computing and distributed systems before. Is the goal a nirvana state where the complexity of these things is entirely abstracted from developers?
Yes, ideally. But there’s always going to some knobs, there’s always going to be tradeoffs between latency and throughput and complexity. I think we’re headed toward simpler knobs that really are just a set of choices, and the complexity of managing of those choices is entirely on the database or on the services side.
How long it takes to get there is an open question. I think that’s the world that most people want to get to. It’s certainly the world I wanted to get to when I was on the application developer side.
“For almost every person using MongoDB, it is their first experience with a distributed system. I think we underestimated the impact of that a little when we started, maybe a lot.”
Almost 9 years into the company, if you had to go back would you have done things the same? Are there mistakes you learned along the way, either technologically or from the business side?
Technically, are there mistakes we made? Absolutely.Would I change them? It’s hard to know. You never know exactly the set of facts or the set of decisions that got you here. Maybe if you focused on one thing you would have end up with something else.
I think, technically, we made some mistakes around distributed systems, for example. I think the way we explained them, the way talked about them, were a little naïve from a user standpoint.
The biggest challenge in distributed systems is that the concepts are so complicated and so arcane, and what we’re trying to do is to make them consumable by every company. The way in which we introduced new terms was not thought out enough. We’ve got a lot of very powerful features and making sure it’s incredibly clear how those features are useful, when to use which ones and how to think about that, is an area we could have done a lot more work.
For almost every person using MongoDB, it is their first experience with a distributed system. I think we underestimated the impact of that a little when we started, maybe a lot. There aren’t a lot of distributed systems that most developers interact with before this.
That’s a huge blessing. That’s what we set out to do. We want people to use distributed systems, but there’s a learning curve—both on our side and on the user side about what that means.