Big Data is a Big Deal

Craig Dennis
Treehouse
Published in
3 min readJul 31, 2017

Every day, we create 2.5 quintillion bytes of data. That is so much data that 90% of the data in the world today has been created in the last two years alone.

Big Data is a bit of a catch-all term. It’s used to describe data sets that are so large or complex that our traditional data processing software just can’t deal with them anymore. The term is also used to describe the tools and services that have been introduced to handle the problems that having this much data creates.

Why do we suddenly have so much data?

Quite frankly, it’s your fault. Well, all of our faults really. Application creators have learned that the more data they can collect, analyze and deliver, the better they can serve their users’ wants and needs. We all benefit greatly from the mass amounts of data that is being ingested, processed and served.

We have grown to demand features such as recommendations and streaming content. We want our searches fast and our notifications immediate. What was once a killer feature quickly becomes an expectation for all new applications to tackle and at scale.

Netflix uses Big Data to deliver personalized recommendations of their content. Facebook needs to cache images for billions of users around the world. Google processes millions of search queries per second. Target predicts purchasing decisions of its customers. PayPal uses Big Data to predict fraud across millions of transactions.

How can we possibly handle all this data?

That’s a lot of data, especially all at once. There is often no way to store all that data on a single machine. A common solution is to distribute the data across multiple machines. That data also needs to be retrieved reliably and quickly. Now how do you plan on monitoring and maintaining those multiple machines? With Big Data comes big responsibility.

The good news is that there is a wide ecosystem of mostly open source tools and services that can be used to handle the different problem domains.

What can we developers do about it?

All organizations can benefit from the power that Big Data brings. You don’t need to be a large organization to leverage the data you have available on a large scale.

Being familiar with the common use cases allows you to provide tremendous value to your company or organization. They need your expertise in recognizing the challenges and suggesting viable systems to use to solve them.

When should I start exploring Big Data solutions?

Big Data tools are all around you, and you’ve probably started hearing good things about them. Maybe you’ve heard of Hadoop and it’s highly distributed file system. Maybe your interest was sparked by the computation engine Apache Spark. Perhaps you’ve stretched your knowledge about streaming searches using Elasticsearch. Has someone connected you with the graph database Neo4J? Perhaps you you’ve been attempting to get into the flow with machine learning with TensorFlow or scikit-learn. Maybe you got the message from Apache Kafka. All those bad puns barely cover the tools in the ever expanding toolbox known as Big Data. The tools support domains like data storage, computations and infrastructure, so you are bound to run into them during your coding adventures.

Over at Treehouse, we recently launched a new course titled Introduction to Big Data. It’s a fun course, and I really think you are going to enjoy it. Our hope is that we can help you to get a bird’s eye view of how Big Data is currently being used through real world examples. It provides a warm introduction to the tools, problem spaces and overall ecosystem of Big Data.

It is an exciting time to be a developer and Big Data is powering many new solutions the world has yet to see. What will you build?

--

--