Big Data vs. Small Data

stay trying.
DataSeries
Published in
2 min readJan 27, 2019
Photo by Dennis Kummer on Unsplash

As I was riding down the freeways of Houston, I popped in this podcast about artificial intelligence (AI), by far one of my favorite subjects to hear subject matter experts talk in detail about.

Dr. Tomaso Poggio has so much knowledge to give to the world about AI, and today we will explore one tidbit of information he and Dr. Lex Fridman discussed. It is the idea of big data versus small data.

We all have heard the term “big data”, and if you haven’t already, a quick Google search should get you up to speed rather quickly. For many years now, companies have essentially found out that they are sitting on value when people started to uncover real, actionable insights within their numbers.

This could be numbers on users, traffic, and interactions between people. Or even industry-specific data such as video surveillance, stock market movements, and oil rig signal data.

And the AI community has made significant strides in modeling and predicting future outcomes from this huge repository of retrospective data, and we still continue to do so (again, Google is your best friend).

However, sometimes (maybe even most times) people and businesses only have a few hundred or thousands rows/examples of data. This problem pervades large industry like healthcare where the number of non-sick people far outweigh the number of sick people. This is “small data” problem, and it begs the question:

Are they allowed to play with these complex, non-linear models?

Of course, the answer should be YES!

And Dr. Poggio referred to this phenomenon of going from how we previously thought — where data approaches n=infinity to where data approaches n=1.

This is one of the new frontiers in data and AI. The idea that we need to create better algorithms that learn from smaller amounts of data, but have the same generalizable capability.

If we truly want to create a general intelligence, with the model being the human brain, we need to move toward this goal of learning a lot of information from a few number of examples. This is how babies learn. This is how adults learn.

We do not have to see thousands of images of apples to understand what an apple looks like.

Now, there are definitely algorithms and methods that can exploit the information in small amounts of data points, but we are far from getting the winner-take-all algorithm. It is exciting times.

Thanks for reading.

--

--

stay trying.
DataSeries

My life and brain in word-form ~||~ Views expressed are my own