What You Must Know About Big Data and Machine Learning

Christopher Nguyen
Deep Learning 101
Published in
22 min readJun 10, 2015

--

Why “Volume, Velocity, and Variety” are wrong

Originally published at blog.adatao.com

A few weeks ago, Sonal Chokshi of Andreessen Horowitz and I chatted on an a16z podcast. Here’s her summary of that conversation:

On this episode of the a16z Podcast, Nguyen puts on his former computer science professor hat to describe “Big Data” in relation to “Machine Learning”— as well as what comes next with “Deep Learning”. Finally, the former Google exec shares how Hadoop and Spark evolved from the efforts of companies dealing with massive amounts of real-time information; what we need to make machine learning a property of every application (Why would we even want to?); and how we can make all this intelligence accessible to everyone.

“Machine Learning is to Big Data as
Human Learning is to Life Experiences”

We’ve heard from many people that this made so much more sense of Big Data and Machine Learning for them. So, I hope you’ll enjoy listening to this conversation about Big Data, Machine Learning, and the future of Deep Learning.

Listen to podcast on “Making Sense of Big Data and Machine Learning”

Follow me on Twitter to keep informed of interesting developments on these topics.

Transcript

Sonal Chokshi: Hi everyone, welcome to the A16Z Podcast. This is Sonal and I’m here today with Christopher Nguyen from Adatao, which is a big data company and its mission is to democratize data intelligence and help people collaborate across the enterprise.

The best way to describe him is he’s an entrepreneurial scientist. He got his PhD from Stanford in Device Physics. He’s a former Google executive and as a professor, he started a computer engineering program at the Hong Kong University of Science and Technology. He’s basically an entrepreneurial scientist who’s merged the worlds of academia and doing a lot of startups.

Welcome, Christopher.

Christopher N.: Thank you, Sonal.

Sonal Chokshi: Actually, Christopher, maybe we want to just kick this off is..I actually just want to talk to you starting with big data. That’s a term that people throw around all the time, it’s completely overloaded, it’s buzz word, it means so many things to so many different people. Could you start by just telling me what your definition and take on big data is?

Christopher N.: There are two ways you can think of the term big data. There is what I think most of the world thinks about when people talk about big data, they think of the V’s, starting out as three V’s, volume, variety, velocity and so on. Then I think it’s now up to seven or eight different V’s, veracity, variance, and so on.

I actually don’t like that definition. I think that definition is functionally correct but it focuses on the problems of big data. These are the challenges that you have to deal with when you deal with big data but the definition skips or misses the part where it says you ask the question, “Why do you want to deal with these problems?” It turns out the reason for big data is machine learning.

“The reason for Big Data is Machine Learning”

Sonal Chokshi: The reason for big data is machine learning. That’s actually kind of counter-intuitive because I’ve actually heard it the other way around, that big data exists because of machine learning.

Christopher N.: I like something that Peter Norvig, the Director of Research at Google said when he referred to big data is, “Big data is not just quantitatively different but it’s qualitatively different.” In other words, there’s something that happens when you have enough data, it crosses a certain threshold.

For example, if you want to learn whether it hurts to hit your head against a brick wall, about five samples is probably very big data. If you want to learn how to classify images on the Internet, maybe two million samples are not big enough. It’s not so much a matter of how much data you have but how much is enough is to learn from.

When companies like Google, I would say it’s one of the original big data companies, when they started their life the very first batch of data they dealt with was big data. The term big data does not exists in these companies. They’ve always learned to take advantage of this data to make a lot of decisions.

Sonal Chokshi: The way I’ve heard it is that big data … that machine learning is one of many uses for big data.

Christopher N.: Right.

Sonal Chokshi: But you’re basically arguing for something different, can you describe what that is and why?

Christopher N.: Sure. If you think about the V’s definition of big data they are all problematic. We tend not to want to have problems unless there’s a reason for it, there’s a greater benefit to pay that cost. The benefit of big data is really, because we can unleash algorithms at them and these algorithms can automatically detect patterns and see these patterns.

I want to jump into that right away, because a lot of us in machine learning say this all the time, “What does it mean to detect patterns and so on?” The people take that for granted but then it’s a little fuzzy. The way I think about big data is, when machines learn from big data is very much like human beings learn from life experiences.

“Machines learn from Big Data like
Humans learn from Life Experiences

Sonal Chokshi: That’s actually interesting. I just want to hear more about why you make that analogy. You’re basically saying that machine learning is the way humans learn from life experiences, do you mean the way like a kid learns to navigate the world for the first time?

Christopher N.: Absolutely, that’s exactly right. For example, let’s turn, let’s flip that around and imagine would you like to have a child develop without any experiences and after 20 years what would that child, that person be like? Then why is it that we ascribe wisdom generally to older people than younger people?

Our brain capacity essentially remains about constant after a certain age, 16, 18, 20, whatever research you read, and yet wisdom continues to grow and accumulate and that’s … as the brain incorporates life experiences it is taking in a lot of big data, just like what machine learning algorithms do with data.

The opposite of that is rule-based computing or rule-based expert systems. You can come up with 10, 20, 30 rules and so on, but you can never come up with enough rules to handle the exceptions.

Sonal Chokshi: Exactly. Is machine learning for the exception handling then, or for everything? How does that work when you’re talking about computing?

Christopher N.: In a very real sense it is. You can think of it as for exception handling, but I like to think of it in terms of analogy as wisdom. You do have the rules but then you know when the rules don’t apply. The reason you know when the rules don’t apply is because you’ve seen three or four or five corner cases before. Somehow “intuitively” you find that in this situation that rule doesn’t apply, but what we think of as intuition are actually, you can think of as parameters inside a machine-learning model.

Sonal Chokshi: That’s interesting but how does … Just to be more concrete about that, that makes a lot of sense logically but concretely for businesses, like when you think about the business intelligence space and where we’ve been and where we are now, what’s different here? What’s happening? What do we get out of it? Basically, I guess I’m asking.

Christopher N.: Right, that’s a great question. Even the term business intelligence, sometimes we’re captured by what we meant in the past, and so what we said in the past was BI.

Sonal Chokshi: BI, being business intelligence.

Christopher N.: Exactly, business intelligence, can be self-limiting. In other words what business intelligence was, was limited by what was available. What was available was the ability to essentially look backward. You can ask a lot of questions using what we call aggregations.

Sonal Chokshi: Aggregations.

Christopher N.: You have a whole bunch of transactions that come in from all over the world and you can say, “Well, how much revenue did we make yesterday from that particular region of the world?” These are backward looking information, because that’s all we were capable of doing and because there was a particular lack of something and that something was big data. With enough data from all of that experiences what we can do is we’ll build a model out of that and project into the future.

You can think of business intelligence going forward as the ability to apply machine learning algorithms to big data and not just look at past questions but also future questions. We’re asking to predict the unknowns from the knowns.

“Business Intelligence will become predicting the Unknowns from the Knowns

Sonal Chokshi: What’s changed to make that possible? Because in the days of business intelligence, I think of stuff, the products that SAP and similar companies put out. What’s changed to make big data possible? I know the big obvious things are just more computing power, but more concretely like what’s physically making this possible to be able to parse and get all these, get these insights out of this data?

Christopher N.: Right. If you think about, it a lot of people have pointed out that big data has always existed. It’s always been there, we just didn’t collect it. Then the second insight that I think about is that we don’t necessarily get smarter over time. It’s just that certain technologies get cheaper, they become more available.

Machine learning algorithms have always been around. The data that exists that you could collect has always been around but it wasn’t until the advent of things like the Hadoop Project, and the launch of companies like Cloudera and MapR back in 2009. It made it affordable for many, many more companies to begin acquiring and storing a lot of this data.

Sonal Chokshi: I’m actually glad you brought up Hadoop, Christopher, because one of the things that I see a lot in reading about the big data space is a lot of myths and misconceptions around what Hadoop is, what Spark is. Because now we talk a lot about Apache Spark and we have a lot of, at A16Z full disclosure, we have investments in every level of the BDAS, the big Berkeley Data Analytics Stack coming out of the AMPLab.

Can you talk to us a little bit more about what exactly Hadoop does, and what Spark does, and how they all live together, and then, how that actually fits in to big data? For people who don’t actually crunch those numbers behind the scenes?

Christopher N.: Sure. I think we can look at it from two perspectives. I think that there is a top-down view and there is the bottom-up view. Let me start with the bottom up view because that’s how technology is always developed. We always build things from the bottom up and then we realize there’s a pattern here, and then we look top-down again.

From the bottom-up view, Hadoop is primarily a storage layer. There is the HDFS, the Hadoop file system. The distinction between that particular file system and other file systems in the past, I think the essential difference is that it is highly parallel.

Sonal Chokshi: Parallel, in terms of parallel processing?

Christopher N.: Parallel with storage, replication, and so on, so that you can have a lot of resiliency, and then it also is capable of running on commodity hardware. For the first time people can afford to buy many terabytes of storage and store it reliably, and still pay only a little amount for that.

Sonal Chokshi: Sorry, just to take a step back for a moment, the reason Hadoop and its ELK were able to run on commodity hardware is because the hardware has gotten cheap enough, or because the way that it processes and the way it’s architected it’s optimized for that? They could be the same, in fact in the end, but I do think it’s important to understand what the driver of that is.

Christopher N.: I think it’s both. It’s a supply and demand thing where sometimes the demand creates supply or sometimes the supply creates the demand. I think you can trace back, again, to companies like Google that started in the late 90's and early 2000's, and that started to use a lot of this commodity hardware.

Then also with Moore’s Law, making everything cheaper, essentially doubling the capacity that you can afford every 18 months. With that and then with companies that have taken down this path proving that there is something valuable about accumulating all these data and making decisions from it.

It’s all that intuition, as well as the actual economics of hardware prices going down, and the availability of open source projects. I think all of these things, the elements come together to essentially create the big data movement.

Sonal Chokshi: Where do Spark fit in to that?

Christopher N.: Spark, if you go, continuing with this bottom-up view, if you start from the storage level, and you know that storage is not enough. You can’t just store things …

Sonal Chokshi: Right, you’re not going to get insights out of just collecting them.

Christopher N.: Exactly, interestingly lots of database implementations in companies, people actually do put data in and never get anything out. In any computing stack you need more than just storage. You need a compute layer.

Sonal Chokshi: The first layer that you’re describing is the big data layer. That’s how you’re describing big data, it’s like storage.

Christopher N.: That’s exactly right.

Sonal Chokshi: I think right now people think about big data as actually getting the insights and analytics out of it, but you’re actually saying big data is just getting the big data, those that many signals and saving them in a certain place.

Christopher N.: For the purpose of being precise, I’m going to slice this up into levels so that we can refer to them more accurately. At the bottom layer we’ve got this big data and then above that we need big compute, in order to process all of this big data.

Sonal Chokshi: The storage layer, the processing layer, and what is big compute?

Christopher N.: Big compute, the first example of big compute you can think of is MapReduce. MapReduce, I don’t mean in terms of the algorithm but I mean the actual implementation with the Hadoop Project, the Hadoop MapReduce. That’s a parallelized computing system that can take all this data, do some computation with it, and then put it back. Then maybe an aggregation, for example asking the same question, the example that I gave earlier, “How much money do we make off of this widget out of Europe yesterday?” is an aggregation question.

If you have a thousand transactions you can do it with one machine but if you have, somehow stored a 100 billion of these rows and you want to ask the same question, maybe you have to parallelize it. That’s what MapReduce allows you to do. Unfortunately, MapReduce is actually not designed originally to handle queries.

Sonal Chokshi: First they only have two functions, map and reduce. Is that the reason, or is it because …

Christopher N.: Actually the reason is a little deeper and more pragmatic than that. Interestingly a lot of people may not realize that MapReduce was designed to be slow.

Sonal Chokshi: That is interesting. I didn’t know that.

Christopher N.: Let me unpack that a little bit. MapReduce as implemented at Google by Jeff Dean and Sanjay Ghemawat back in the early 2000's, and then they published their work in 2004. That MapReduce engine at Google was intended to do one particular job, and that job was to crawl and index the web. When that happens, Google’s approach was to parallelized it over thousands of machines.

When you have thousands of commodity machines doing a task that may last half a day, the probability of one of those machines going down is approaching 1. In fact it is about 1, any single machine could go down. When a machine goes down a question comes up, “Do we start the job over?” Certainly, you don’t want to have to do that because then it will never finish. It’s designed in such a way that if any single machine goes down another machine can be brought up and pick up where it started.

Sonal Chokshi: Hence it’s slow enough to be able to do that.

Christopher N.: Right, and the way you ensure that reliability is to write down everything every step of the way. If you do job A, and then you write out the results of job A, and then you do job B, write out the results of job B.

Sonal Chokshi: What does Spark do differently?

Christopher N.: Spark takes a different approach and as I said earlier, it’s not that we get smarter, it’s just that the constraints have changed. Spark’s goal is to be able to do a lot of these queries very, very fast. We’ve always known, independent of the economics of hardware and software, we know that the speed of access to RAM is a lot faster than accessing disk. In fact from CPU to RAM, you’re talking about 40 nanoseconds.

Sonal Chokshi: Sorry, just to be clear, when you say the speed of accessing RAM is a lot faster than accessing to disk, you’re just talking about how to get to the memory functions.

Christopher N.: That’s right, but generally the machinery knows … will feel the speed. Getting to information stored in RAM is about six orders of magnitude faster than getting to information stored on disk. Spark’s approach is to use memory. Now Spark, if it was created five, six years before its time would have completely failed because memory was so much more expensive.

Sonal Chokshi: Right, the hardware constraints were lifted there, that’s right.

Christopher N.: Exactly, and it was a few years after that of course, something else would have come in before Spark. The timing of Spark has a lot to do with its success. What Spark does for you is give you very fast query processing that the implementation, the MapReduce implementation of Hadoop doesn’t give you.

Sonal Chokshi: That helps us understand a little bit more of the difference between Hadoop and Spark. You’re basically talking about the in-memory aspect of it, being able to do things a lot faster.

Christopher N.: That’s right.

Sonal Chokshi: What does that give us concretely for big data and machine learning?

Christopher N.: That takes me to the top-down view, which is from the top down, e know that we always want things fast but then we also want them cheap.

Sonal Chokshi: Sorry, just to be … cantankerous for a second, why do we want things to be fast? Actually, like why do we always want them to be fast? What do we actually get out of that?

Christopher N.: Fast is competitiveness. If you can get your answer five minutes before I can, you’ll make decisions and then you’ll make that purchase, you make that buy, to supply whatever it is, that will happen before I get there, and you win.

Sometimes it’s implicitly obvious that we want everything faster because fast is competitive, but it turns out the difference between fast and slow is very, very critical. When you can get something in real time or you can get something in five seconds as opposed to five minutes, you will actually change your workflow.

You will actually do something, that’s what I learned from the consumer perspective with things like Gmail and so on. We had a phrase we call the “five-second barrier.” If the user can’t get something done within five seconds they won’t ever do it. It’s not like they’ll do it at twice the latency. Fast enables new, different use cases that may otherwise not happen.

“We had something called ‘The Five-Second Barrier’

Sonal Chokshi: That’s actually helpful because I think we tend to take it for granted that fast is better. I know obviously we want that information faster, but you’re basically talking about enabling entirely new workflows and use cases. Going back to what you were saying about the top down approach and where this fits …

Christopher N.: From the top-down perspective now we have the capability. We have this big compute layer and then we have this big data layer. So we have the capabilities of actually doing things very fast on massive amounts of data. We can apply algorithms. We can bring all these algorithms to bear on big data and get a lot of insight, but that’s still not enough because we haven’t put the human in this place yet. It’s still machines and that’s the problem with the bottom-up approach.

When you look from the top down there’s humans sitting at the command and control, at the inside layer, and they’ve got to make decisions. So far, our industry has not built the bridge from all of these machinery to that human user. There’s a layer missing.

Sonal Chokshi: The learning basically.

Christopher N.: The learning, as well as the application layer. The interfaces, the user experience, all of that put together can be thought of as the Big Apps layer. I’d like to think of things in terms of big apps, on top of big compute, on top of big data. When you have these three working effectively, harmoniously together then you have a very, very good big data stack.

“Big Apps, on top of Big Compute, on top of Big Data

Sonal Chokshi: Taking a step back for a moment, it seems obvious why the natural interfaces are pretty important for people to be able to interact with it. Let’s face it, the reason we are able to do any kind of computing is because of the GUI, like having a graphical user interface that allows us to not have to see the plumbing behind the scene. That seems pretty obvious that we need that.

What did that actually get you in the big data world? Sure, you can more easily read your data and get some insights from it, but I just feel like we throw that term around too much that we need a better interface to our data. What does it really get us?

Christopher N.: I think one way to understand it is to look back into the past. We went from the typewriter to the computer keyboard, and to the mouse, and now to touch screen and so on. You could ask the same question, what does touch screens get us, and why didn’t we do it before?

The reason touch screens and finger gestures and so on are valuable is because they’re much more natural than using a keyboard, but the reason we didn’t have that before is because the hardware and the software to make that happen were not available, or too expensive to do so.

The same analogy applies with big data machine learning. We could imagine all those capabilities before but they were too expensive. We didn’t have the storage capabilities for all of the data and we didn’t have the big compute capacity to do all of this. But now that we do and they’re affordable what you will see is that all of this machine learning will be a property of every application.

“Machine Learning will be a property of every application

Sonal Chokshi: What does that mean? I’ve heard Peter [Levine] say that as well. He makes the argument as well, that machine learning will be a property of every application as opposed to a stand-alone, isolated function, what does that actually mean?

Christopher N.: Imagine a world where, let’s say … We work with a lot of people and we expect our colleagues to remember what we say and learn from the interactions and so on and so forth. Can you imagine a world where your colleagues are just simple automatons and they don’t understand what you’re saying, and you told them something and they don’t remember it the next day, and their actions don’t change the result of that?

I claim that there will be a day very soon when then you will feel that about the machines you work with. In other words you would expect that to be a property of all of these machines.

“We would expect Learning to be a property of all our machines

Sonal Chokshi: You’re right. I think we already do expect it because we carry mobile phones with us all around, and it’s frustrating when you have a certain experience there that you can’t have with an application you’re using at work, on your desktop or anything else.

Christopher N.: Exactly.

Sonal Chokshi: I definitely think you’re right, that we might already even be there in some way, or that we need to be expecting that, but what does that really give us? Because when I think about big data, I think about it in the abstract. It’s still not clear to me what machine learning being a property of every application, what does that do for us?

Christopher N.: I’m going to give you an example by a story from one of the … There’s something called TGIF at Google, which actually we do at our company, Adatao, today as well, which is every Friday the execs basically come out and talk about almost every company secret possible to the whole company. People can ask any kind of questions that they want.

I remember there was one time when, at Google we were dealing with the problem of latency. Google cares a lot about speed. Larry was pushing everyone to make their services a lot faster. There was question people ask and say, “Hey Larry, we went from one second search delay to 500 millisecond, and 300 millisecond, a 100 millisecond. What do you want? I mean, what happens when we get to zero?” Then what Larry said was, “Why stop at zero? Why can’t it be negative latency?”

“Why can’t it be negative latency?

Sonal Chokshi: Right.

Christopher N.: Essentially what he meant was, “Why can’t our machines anticipate what we need? What we want to do.” That’s actually not as ridiculous as it may sound. Certainly, as human beings we do anticipate each other. Maybe if you see that I’m coughing or something, that you just go help me with a cup of water. Right now, I still have to tell the machine, even if we have a robot today to do that, I have to still ask that robot to do that.

Sonal Chokshi: You have to specify …

Christopher N.: What we get with predictive algorithms, what we get with machine learning, what we get with big data, remember big data is just life experiences, what we get with that is that our machines we’ll be able to learn. They will be able to anticipate. They will be able to predict. They will have behaviors that we normally expect of humans, of intelligent beings.

Sonal Chokshi: When we take that to every applications though, because why is it not okay to have it be an isolated, stand-alone thing? What do we get out of it when it becomes a part of every application?

Christopher N.: I think when it becomes part of every application then every component of the application will be receiving data all the time, maybe the screen is receiving my gestures, maybe my calendar is receiving appointments that I’m making, maybe even the location that I am at. Then they will be able to learn from all of this and make intelligent decisions about what calendar events to insert, what gestures to accept, and maybe I don’t even have to say that, it will just do that ahead of time for me.

In my view, in that world things will happen a lot better for me. It will become a lot easier for me to move around. It will become a lot easier for me to make decisions, and maybe a lot of decisions will also be suggested to me before I even have to think about it too much.

Sonal Chokshi: A lot of what you’re talking about is machines inferring and really aiding, learning like humans and helping augment human intelligence. What happens next?

Christopher N.: I think that’s a great question. I think if you back up and think about human evolution, there’s one variable that’s inexorably increasing. We may get taller, shorter, we may go from one continent to the next and so on, but one thing that’s been a single variable that’s constant or changing in one direction, that’s human intelligence. In fact our species intelligence. There’s absolutely no reason to think that we’re at the end of that. I think we’re just at the very beginning of that increasing intelligence.

“Intelligence is the inexorably increasing property in Evolution

A lot of the thing that we’re learning about machine learning itself, I’m really excited about that. If you look at the research in deep learning, what’s happening there, really in just the last 12 months, 24 months, to me, the exciting thing is that we’re learning so much about how our brains might work. It’s not just what the machines can do, but what they teach us about ourselves.

If you think about it from that perspective and think about how these algorithms are evolving, you actually see this very near future where human intelligence is going to be boosted by all this machine intelligence. That will actually change how we think about Evolution.

Sonal Chokshi: It’s interesting because people treat deep learning sometimes as just great for machine learning but you’re basically putting out the same continuum, and saying it’s just more machine learning?

Christopher N.: I think deep learning just happen to be one moniker of today, but it is a very important one because it’s showing some of us glimpses of the future, more so than at any time in the past. I think that’s the exciting thing.

What we’re doing, coming back, is the software that we’re building is essentially machine intelligence aiding human intelligence. Today, I would say in very primitive ways, I think it’s very helpful to enterprise but we’re just at the beginning of it. The next set of products we’re going to be building in deep-learning capabilities. We already have machines inside the company that can talk to each other.

It’s happening a lot faster than people are realizing, and I see it as our job to make sure that we, as the human species continue to leverage that power, as opposed to maybe one day be subjugated by it.

“We want our species to leverage that power, not subjugated by it

Sonal Chokshi: No, totally. Just one last question then, concretely what do we get out of that? It’s interesting academically and clearly it’s interesting beyond academically, because companies are investing in it left and right. In fact, more so in the corporate sphere than even in the university sphere, but what do we get out of that deep learning? What concretely comes out of that?

Christopher N.: I like to think of it in two ways, and I think they’re both concrete but perhaps one is more concrete than the other to some people’s views. Certainly, companies are helped when they have more intelligence about their data. People talk about, in the past you didn’t even know what was going on at the company level, let alone make decisions based out of it.

We’re coming to an age where you know what’s going on and the machines are also helping you make decisions. What you get out of it is competitiveness. Companies that invest in this and are good at this, that are data-intelligent, that are data-driven will win. That’s a competitive edge. That’s inevitable, but I think the larger picture also, is that as a species we’re explorers, it’s built in to our genes, and you can count on that as being inevitable.

“Like space exploration, this is exploration of the mind

Left alone, we’ll figure out that these are exciting frontiers that we will explore. We will always want to build intelligence. We will always want to build images of ourselves, if you will, maybe that intelligence that emerges would not be the same as human intelligence but we will attempt all of this. It’s just like space exploration, this is exploration of the mind.

Sonal Chokshi: Using the computer. That’s great. Thank you, Christopher from Adatao, and that’s another episode of A16Z Podcast. Thanks everyone.

--

--

Christopher Nguyen
Deep Learning 101

@arimoinc CEO & Co-Founder. Leader, Entrepreneur, Hacker, Xoogler, Executive, Professor. #DataViz #ParallelComputing #DeepLearning & former #GoogleApps.