Why is machine learning happening now?

We recently learned that machine learning was a lot like teenage sex. There’s no denying that everybody is talking about it and is claiming they do it. But, even the hottest topic in machine learning today, deep learning, is almost as old as some teenagers. The secret’s been out for a while. Why is machine learning happening now? Why are people doing it?

It all comes down to two things: possibility and accessibility. These two hormones of machine learning have finally been unleashed, and now people are raring to go.

Possibility

Machine learning is happening now because it’s finally feasible and practical. Exponentiation in the amount of data, as well as incredible hardware advances, has given life to machine learning ideas that once only lived on paper.

It’s estimated that we store zettabytes of data

For machine learning to be effective, it needs a lot of data. Recent technologies both in data capture and data management have allowed for large data warehouses. Data, like whatever happens in Vegas, stays in Vegas. We are now able to fill millions of hard drives with data (which have also become very large).

But what good are all these new numbers if you can’t crunch them? For a long time, there just wasn’t the computing power available. This relegated computationally intense solutions, like neural networks, in favor of those which are less computationally intensive. Research suggests that the resurgence in popularity of these complex methods comes down to advances in hardware. Enter, the graphics processing unit (GPU).

Known more for powering millions of gamers rather than identifying cancer, GPUs are a massive boon to enabling machine learning. Although developed to quickly render graphics, it turns out that GPUs are equally adept in training complex machine learning algorithms. This is because graphics are usually processed in mathematical objects called matrices, which are like grids. Data is stored in matrices — think of how you would store some in Excel. See the similarity? The fit is natural.

Matrices (left) are used to render computer graphics (including the shadows!)

GPUs offer massive computational speedups over the more conventional central processing unit (CPU). In fact, many of the world’s top supercomputers use GPUs to stay energy efficient while still being able to calculate a lot of numbers.

CPU vs. GPU structure

At it’s core (no pun intended), GPUs consist of many “cores”, which are units that are able to make calculations. GPUs were engineered with the sole purpose of calculating things. CPUs were developed for much more general tasks and those which include logic. GPUs do many “dumb” calculations, quickly. While the computational power of CPUs has increased rapidly, GPUs have allowed for absolutely massive machine learning models (such as Google’s to identify cats) to be trained.

Accessibility

While the development of specialized hardware has made machine learning feasible, other technologies have democratized its benefits. Cloud computing, along with overall decreases in computing cost, has freed up once costly computing resources needed to train complex machine learning algorithms. From small startups to Netflix, organizations can now dynamically provision computational resources. With the cloud increasingly becoming commoditized, the cost required to undertake machine learning projects will fall, and firms will be able to quickly integrate data science into their operations.

Cloud computing allows for multiple tenants in one computer

What is the cloud? For a long time, if you wanted to use a server, you had to buy your own physical hardware and install it somewhere. Through the cloud, you’re able to access computing resources on demand. The way cloud companies do this is through virtual machines (VMs). Virtual machines essentially allow companies to share resources. So, on one physical machine, there could be a VM for Netflix, a VM for you and a VM for me. But to us, it looks like we are each using an individual machine. Cloud computing allows organizations to quickly scale their computing needs. With this scale comes a need for distributed computing technology.

There are major efficiency gains from utilizing distributed computing. Usually we think of a computer as a single machine. The distributed computing paradigm, exactly as it sounds, allows for multiple processing units (computers) to distribute workloads (computations). Distributing things makes things fast. This can have drastic speedup benefits on “compute-bound” algorithms — those which take long to run because of the number of computations they must make. Many machine learning algorithms are good examples of compute-bound applications. For a long time, distributed computing was just plain hard and expensive. Using open-source tools such as Spark in cloud environments has made parallel computing a reality for many.

Machine learning is stronger than ever. The confluence of possibility, via increases in hardware and data, with accessibility, through cloud computing and open-source tools, have finally baked the machine learning cake that was sitting in the oven. Now that it’s ready, let’s all get our own slice.


Thanks for stopping by. I hope this piqued your interest in the computer architecture and infrastructure that goes hand in hand with machine learning.

I encourage you to view some of my published research on the performance and cost characteristics of big data analytics as well as on the computing architectures for data analytics at one of the largest computational facilities in the United States.

You can reach me @peterxeno or www.peterxeno.com