“Killing Google With My Bare Hands” and Other Lessons Learned in Scaling with Jack Levin
Jack Levin was an early employee at Google and later started ImageShack, and is most recently at Nventify. In his keynote at the 2016 Annual LDV Vision Summit he spoke about his perspective going back to the early days of Google up to today. Tickets on sale now for the upcoming Annual LDV Vision Summit.
I was privileged to be at Google from 1999 to 2005, essentially in the beginning of my career. I was twenty-three or twenty-four years old when I started.
I’m here to talk about scale. What it means, how to actually scale large systems, and not essentially burn yourself and your infrastructure out. So after Google, I did spend some time running a company called ImageShack doing about two billion web hits a day, which was a pretty good scale story in itself.
So that picture is actually a rack that I built myself. It happens to be in a computer history museum in Mountain View. You can see that little label there, “JJ 17”. I put that label there, it has some of my blood DNA because, as you can see, this thing is a mess and I would often scratch my hands on that network card right there. So that was my first day at Google, essentially went into the data center, had to wire all of this, bring it online, and a week later we needed to launch netscape.com, which was the first really big client for Google.
Three hundred servers, but then the next week it was plus another two thousand, which was crazy. I had no idea what I was supposed to be doing, Larry Page said, “Hey, here’s a bunch of cables, plug them in, you’re good to go.” That was the story for the first couple years, a few years forward, that’s when I stopped going to the Data Center because it was a team of twenty five people managing all of this.
So that’s the second or maybe third generation of uber-racks. Tons of hardware, a lot of kilowatts being consumed, a lot of heat being generated, but a lot of queries being served.
So let’s talk a little bit about disasters. I claim responsibility for killing Google with my bare hands a couple times. We didn’t know what we were doing, I clearly didn’t know what I was doing, I would push the wrong button, Google would go down. I would jump on my motorcycle or scooter at the time, and literally run through the data center, unplug all the power supplies, plug them back in to wipe out my configs, and everything would be back. That happened a few times until I figured out, “Perhaps I should have a dial-up to a data center so I can dial in and undo my work.”
But that comes with experience. We had a team of really smart people when it came to development, but we had no clue in operations. I had no idea what I was doing, the guys who were hired were mostly IT on the corporate side. One of the biggest problems for startups that need to scale up quickly is that they are not IT people, they are not operations people — the Founders that is. They hire guys to run their data centers, but nobody knows what they’re doing, or nobody knows where the pin points are, so on and so forth.
For the longest time at Google, we didn’t know what might kill Google. We had a bunch of monitoring, but we didn’t know how to interpret it. Back in the day, about the year 2000, the way you would kill Google is that you would send it a query, something like, “theological silhouette”. The words have no meaning when combined together, but Google would search all the way to the bottom of an index. If you sent five queries per second from your laptop, you could actually kill the whole Google search engine.
That was an interesting thing, and we learned that monitoring of queries and spam detection is really important.
That’s actually one of my favorite slides. It’s not specifically about Twitter, but you know that back in the day Twitter would go down all the time, you know. What’s going on, why is it always down? The interesting thing about Twitter is that it’s not the language — Ruby isn’t bad, Python isn’t bad. What it is, Twitter kind of, should I say, “flew away” from the company that tried to build it. They just go so popular so quickly and it was very difficult to scale. Sometimes it takes luck, persistence, people working more than nine to five, twenty four hours several days just to get things up and running in the right kind of way.
Eighty percent of the time, you just don’t get it right. This is mostly about the operations teams. A lot of these startups, when they hire their ops teams, they don’t claim responsibility and, more often than not, it’s because they’re a disenfranchised group. Most of the people who call the shots are their founders, and operations folks are just trying to run things, but aren’t really on the forefront of the company business.
Sometimes it takes luck, persistence, people working more than nine to five, twenty four hours several days just to get things up and running in the right kind of way.
Postmortems are very important. When things break, you do need to talk about them and you need to have your peers discuss them, but more often than not you need to think about the future. What can possibly go wrong? That’s a premortem. So premortem can help you to envision the possibility of different kinds of disasters that you generally don’t think about. If you don’t think about, then likely it’s not going to work for you.
In the early years of Google, I had no backout plan. It’s not that I like to live dangerously, I just really didn’t know what I was doing. A backout plan is very important, right now at Nventify, the second company that I co-founded, it’s very important for me to ask my engineers, “So you’re going to make all those changes, do we know how to roll back?” Usually the answer I would get is, “Well, I know what I’m doing.” More often than not, that’s not the case and you need a backout plan.
How to scale. So that quad-copter actually seen at sea is pretty great, great picture of scale which actually does work. So when do you scale, and how? Interestingly, most of the blocks you need are already available on GitHub. Just go to GitHub, get your building blocks downloaded and tried by your engineers, there’s no point rebuilding the whole thing. Especially if it’s not your core business. So the company should really be focusing on the core business, and not necessarily inventing the new building blocks.
Surprisingly, you’re a better engineer if you know how to use Google. You can find a lot of things have already been solved. So using Google is a skill that every engineer should have.
That’s a very important slide here. So NIH stands for Not Invented Here. A lot of bigger companies, when they scale up, usually have one of the people who say, “Hey, we never want to buy anything. We don’t want to get anything that’s open source. We want to create everything locally so we know exactly what is going on with our libraries.” That’s often a fallacy, it actually is expensive and slows down your progress and ability to deliver product to end users.
I want to spend a few minutes talking about the future. Clearly, it’s cheaper and cheaper to store your files. Likely all of the great companies that you see on this screen are competing against each other in the future, likely within five years the cost of storage will go down to zero and you will end up paying for something else.
The way I see that, especially with Google efforts of delivering Fiber and satellite connections everywhere — and so is Facebook — it’s very likely that everybody is going to have free internet and essentially be plugged in. That’s an interesting concept. So we talk about storage nodes and we talked about it at Google as well. Likely what we’re going to see in between five to fifteen years are storage nodes that are actually self aware of driven by conditions and some by AI. This way if you go on the plane and have your data with you, it won’t be on your phone but it might be following you from the terminal into the plane, load up on the plane, and all of your movies and what you have are right there.
I call it AI Powered Peer-to-Peer Storage. I know, it’s kind of cool. There’s more and more interesting technology being developed when it comes to consuming data, specifically I’ve seen some interesting car windshield glasses, there’s talk at Google about contact lenses that can create a VR feeling right in your eyes rather than wearing anything but contact lenses. This is Google Glass, currently we use text to query for things and find things. It’s very likely that, if it isn’t Google Glass it’ll be something similar, where the visual information will be used for searches.
This is maybe twenty to twenty five years from now, where advanced technology will give people the ability to record all of your experience from your visual cortex, your feeling of whatever you’re touching, and eventually share this data between humans. It’s not telepathy, but more like close-range communication mind-to-mind which will be possible with technology.
Augmented reality and VR will likely merge, and we’re likely to see an absence of keyboards and just using our minds and hands to manipulate and interact with data.