Marty Weiner | Pinterest engineer, BlackOps
A good portion of the startups I meet and advise want to use the newest, hottest technology to build something that’s cool, but not technologically groundbreaking. I have yet to meet a startup building a time machine, teleporter or quantum social network that would actually require some amazing new tech. They have awesome new ideas with down-to-earth technical requirements, so I kept wondering why they choose this shiny (and risky) new stuff when all they need is a good ol’ trustworthy database. I think it’s because many assume that building the latest and greatest needs the latest and greatest!
It turns out that’s only one of three bad reasons (traps) why people go for the shiny and new. Reason two is people mistakenly assume older stuff is slow, not feature rich or won’t scale. “MySQL is sluggish,” they say. “Java is slow,” I’ve heard. “Python won’t scale,” they claim. None of it’s true.
The third reason people go for shiny is because older tech isn’t advertised as aggressively as newer tech. The younger companies needs to differentiate from the old guard and be bolder, more passionate and promise to fulfill your wildest dreams. But most new tech sales pitches aren’t generally forthright about their many failure modes.
In our early days, we fell into this third trap. We had a lot of growing pains as we scaled the architecture. The most vocal and excited database companies kept coming to us saying they’d solve all of our scalability problems. But nobody told us of the virtues of MySQL, probably because MySQL just works, and people know about it.
Through the gauntlet, two of the most important lessons I learned building Pinterest were:
- Don’t be the biggest. If you’re the biggest user of a technology, your challenges will be greatly amplified.
- Keep it simple. No matter what technology you’re using, it will fail.
After about a year of fast, sleep-defying scaling at Pinterest, we had MySQL, Memcache, MongoDB, Redis, Cassandra, Membase and Elastic Search. Everything was on fire and breaking in their own special ways. We wanted to simplify and get rid of all the fancy stuff, but we also wanted something that would scale to the moon with us. It was time to start our grand re-architecture. To help guide us and our choices, we built a set of questions to apply to every different technology:
- Does the technology meet your needs?
Solidify your requirements. Do you need simple lookups, ordered lookups and/or lookup by geography? Graph walking? One technology may not be able to support all of your requirements, but it’s nice when it does.
- How well does it scale?
Some technology is designed for massive scale and some is not. You may have to layer a little scale magic on top. With scale comes more complexity and less agility. Keep that in mind, and avoid scaling prematurely.
- Is the cost justified?
Consider the support contract or licensing and whether or not you can get by without a support contract. If you’re an angel funded startup, ideally you’d like to spend your money on the lights and Ramen, and not on a mainframe.
- How mature is the technology?
Maturity of a product is the most important question you can ask next to basic requirements. This is where I’ll spend the lion share of this post.
Maturity is natural
The harder and more passionately people push on a technology, the faster they will run across bugs or performance problems, fix them and hopefully contribute fixes back for the whole community to use.
Maturity can come a hell of lot faster if the technology is simple. Getting a five line Python program to print “Hello, World” to be bug free should take 30 person-seconds, whereas a program to handle world-wide distributed banking transactions will take many person-years to mature due to the complexity that must be hammered out.
So, let’s equation-ify this. Maturity increases with blood and sweat, but comes slower with more complexity.
Maturity = (Blood + Sweat) / Complexity
(I really wanted to throw in some e, i, and π into the equation, but just couldn’t justify it. Yet…)
As a technology hardens, collaboration occurs, understanding gets deeper and a wealth of knowledge is built out. As the technology itself gets more stable and beautiful, documentation and simplification occur and the frontiers and boundaries are tested and widened.
You don’t want to be on the wrong end of the maturity equation. There be dragons there:
- Hiring will be more difficult
Try searching on Google for MySQL admin and Cassandra admin. Try walking out into a busy San Francisco walkway and yell out that you need a MySQL admin. You’ll find one. Hbase, unlikely.
- You’ll find minimal community
It’s 2 a.m. and you’re getting a weird error message “48fe1a is opaque” (an actual error message I got from Membase). What the f!*k does that mean? Crap, there are there no answers anywhere on Google. Conversely, I can’t remember the last time a MySQL question I had wasn’t already answered along with somebody calling the questioner a nOObXor.
- You’re more likely to fail, possibly catastrophically
We had an unexpected loss of data on nearly every technology we used at one time or another, except MySQL. Even on the worst days, when the hard drive crashed, we still managed to find a script somebody wrote to do magic voodoo to get our MySQL data out again and live another day. Other technologies left us dead in our tracks because nobody had encountered the same problems, or they hadn’t taken the effort to dive deep to recover their data or they hadn’t contributed the fix back to the community. Incidentally, I’m super thankful we never trusted the golden copy of our data to any other system except MySQL in those early days.
Sometimes you have to be on the bad end of this maturity ratio. For instance, if you HAVE to have a Flux Capacitor for your time machine, recognize that you won’t be able to hire for it easily. There will be minimal community online to help you debug why you only went back in time 100 years and not 1,000 years. Support may not be able to help you understand why Marty McFly is now stuck in a supernova.
If you’re on the frontier, you’ll hit new bugs and issues that the rest of the world has never seen. They’ll be 10x harder to debug and will likely require a depth of knowledge that goes outside the comfort zone of your current engineers. You’ll have to dig in, push hard and learn fast. I send you my virtual hugs and admiration. I’ve been there. It will be tough. Blog what you find, collaborate and communicate.
If you’re starting or growing a company, and your scale is smaller than huge, consider maturity to be your most important factor aside from basic requirements. Ask yourself — does MySQL sufficiently meet my needs? If so, use it. If you’re wondering if MySQL will be fast enough, the answer is YES. Even better than fast, MySQL’s performance will be consistent.
So I’ve wailed away on a bunch of technologies, but I seem to have a near-romantic thing for MySQL. I’d like to take a moment to mention that MySQL, while mature, does not solve all your problems. Sometimes you’ll have to venture away from the comforting warming glow of maturity.
- Cartesian Distance
If you need to search for nearby points in two dimensions, storing coordinates as Geohashes in MySQL would work well (here’s an XKCD comic to help). Three dimensions would probably also work well. But if you need large N-dimensional search spaces, I don’t know of a good way to store and retrieve them in MySQL efficiently. You might find yourself needing to store N-dimensional points if, for instance, you have created a model that produces feature vectors for some input and you want to see if two inputs are similar. Classic examples include determining if two images are similar but not exactly the same. For these sorts of situations, consider building a distributed RP/KD tree (would love to collaborate! Email me!).
- Speed of writes
MySQL delivers full write consistency. If you’re willing to trade off “full” for “eventual” consistency, your writes can be much faster. HBase, Cassandra and other similar technologies write to an update log incredibly fast, at the expense of making reads slower (reads must now read the stored info and walk the update log). This is a nice inversion, because it’s easier to cache reads and make them fast.
- FIFOs, such as feeds
My biggest complaint about MySQL is that it’s still living in 1994 (with baggy pants and the.. erk.. Macarena). Many uses of databases back then needed relational queries. There were no social networks. MySpace wouldn’t come around for another nine years! And so MySQL is built out of trees and has no good notion of queues. To insert into a B-tree is an O(lg(N)) operation (assuming happy balance). But today, social networks are a major force on the internet, and they depend heavily on queues. We want uber fast O(1) enqueuing! My suggestion is to not use MySQL for feeds. It’s too much overhead. Instead, consider Redis, especially if you’re still a small team. Redis is super fast and has lists with fast insertion and retrieval. If you’re a larger company and can hire the folks to maintain it, consider HBase. It’s working well for our feeds.
For the love of everything holy, don’t store logs in MySQL. As mentioned in the previous paragraph, MySQL stores things in trees. Logs should not live in trees (that’s a weird thing to say…). Send your logs to Kafka, then use Secor to read from Kafka and ship up to S3. Then go nuts with EMR, Qubole or your map-reduce platform du jour.
- Scale beyond one box
If you’re in this position and you’ve optimized all queries (no joins, foreign keys, distinct, etc.), and you’re now considering using read slaves, you need to scale to beyond one MySQL server. MySQL won’t do this for you out of the box, but it’s not hard. And we have a solution, which you’ll be able to read about and learn how we did this once I finish writing the blog post. In the meantime, bug me at email@example.com, and I’ll get you up and running now.