The places we’ll go

A statistical essay about the future in four short parts

It’s been quite a while since Isaac Newton, the brilliant english physicist, began working on his mechanics of universal gravitation. We all know the story, the apple tree, the brilliant intuition. What not everyone realizes, is the time sir Isaac needed to translate his measured intuition into a single formula. 8 years long he worked unrelentlessly, burning the midnight oil in order to uncover the secrets of the first, second and third universal laws of motion.

Three centuries later, it takes a single home computer about 15 seconds to achieve exactly the same.

Part I: standing on the shoulders of the giants

Wait, what? There’s no way a single computer could ever equal the likes of sir Newton, and sure enough, that is not the point. Computers are still dumb, humans still smarter, there are no robot overlords in sight and no single gravitation rule has ever been written without human supervision, yet. What we’re talking about is something much more interesting — that is, the increasing convergence of mathematics, statistics and computing. A burgeoning field, computational statistics has been around for a long time now, but it’s only in the last decade that it has started growing exponentially, in value, number of employees and importance in the modern world. And to understand why we only need to look back at our beloved sir Isaac, who perfected and finely explained the deductive method for all of us poor peasants. In today’s language it would probably read like this:

  • Step one: gather data to make sense of the problem
  • Step two: formulate an explanation based on your experience
  • Step three: test the hypothesis. Rinse. Repeat.

If we assume that human’s reasoning capacity has more or less stayed the same (the Flynn effect be damned), it’s clear that it’s our tools who have evolved in order to accomodate our growing need for knowledge. Gather data? Check. The amount of GBs stored and transfered is increasing exponentially, with mobile connectivity making it easier for emerging users to leapfrog missing infrastructure, allowing an unprecedented flexibility in the range of possible connections — which in turn increases the amount of data collected. And testing our hypothesis has (literally) never been so easy, with Moore’s law still ticking and tocking away, letting us crunch enormous quantities of data extremely fast in order to prove (or disprove) our conclusions.

It’s only in step two — the formulation of an explanation based on experience, that humans have managed to mantain a competitive edge. We still need a Sherlock Holmes to make sense of the ever-increasing sea of data, and the computer is relegated to the role of sidekick, hopping behind his master and taking care of logistics and practical matters. There’s no Sherlock to be found amongst machines.

Only a Watson.

Part II: when bigger beats smarter.

IBM Watson is a machine whose single purpose is to thoroughly mop the floor with our fellow humans at Jeopardy. A faceless set of screens slapped on a big box, his only interaction with the outside world is a synthetized voice and a slender iron arm he uses to push the answer button. He comes equipped with a database of knowledge larger than the Encyclopedia Britannia (first step), which he uses to formulate answers (step 2) based on previous experience and to cross-reference them with similar items in order to statistically prove or disprove his guess (step 3). Watson is Newton’s method, incarnated, and in 2011 he demonstrated it by beating the best and second best Jeopardy contestant in human history.

Even though Watson might look like magic to the untrained eye (and sure enough, the technological breakthrough is hard to fathom), what he actually does is pretty straightforward. Watson will read huge databases, looking for patterns, connections and shared information points. He will formulate hypotesises only to be quickly disproved by his mentors. Learning from his errors, he will get better and better at what he does and after a while he will try to leap further, making implicit connections and extrapolating new insights. If it sounds hard to understand, think to the equivalent in human language: learning. We look to our mentors, books, movies and more in order to see facts. We look at variables and facts, and in today’s world there’s plenty of those — but what we are able to do afterwards is the single biggest differentiating point between us and every other animal and machine on this Earth. We understand meaning. We don’t see facts and variables, but symbols. That’s what makes us special. But not anymore.

Part III: a case study on Linear Regression

We’ve always had data, and we’ve always had people trying to make sense of it. One of the greatest contributors to this particular field was sir Francis Galton, who tabulated thousands and thousands of measurements in order to understand the transmissibility of inheritance factors (“if my father is tall, how likely am I to be tall as well?”). And sure enough, his penchant for drawing led to one of the greatest breakthroughs in statistics: linear regression.

“if my father is tall, how likely am I to be tall as well?”

How does this work? Let’s say we take a stroll through a crowded shopping street, and ask a random set of people how tall their parents are, and take their measurements as well. We’ll be able to plot the data we found in a two-dimensional graph like this one:

Source: WTPresearch, adapted from MAPS R cats dataset

For example, the upper right data point is a 190 cm male whose fathers towers above all other parents at 225 cm. Since there is a very clear correlation (and possible causation) between how tall your father or mother is and how tall you are, it is very natural for us humans to start looking for a pattern. We’d intuitively draw a line trying to ‘fit’ our data in the best possible manner; and if we stop to think we’d realize the best fit is obtained when the distance between each point and the fit line is minimized. And there it is:

Source: WTPresearch, adapted from MAPS R cats dataset

The red line is called our regression line — hence the name ‘linear’ regression. Its use is very intuitive: if we knew your dad was 160 cm tall, we’d follow the line along the x-axis and find a predicted height of 180 cm. After a number of tests we’d conclude our model is satisfactory and realize we have just followed Newton’s deductive method: we have collected data (step 1), on the street or (like on this case), on the internet. We have formulated a causation hypothesis (step 2) and finally let our computer do the grunt work by minimizing the distance between the points and the regression line and running a number of test in order to prove our hypothesis (the third and final step).

This seems an ideal workload distribution and of course it’s brilliant on paper: subcontract the boring repetitive menial work to the machine and reap the benefits of your insights and experience. Unfortunately, the world is a very complicated place and (unlike what a number of zealous statisticians, consultants and the likes would have you to believe) multivariate linear regression is not the holy grail it’s made up to be.

You see, your height is not only defined by genetic factors, but by your posture (do you do ballet or slouch in the couch?), nutrition (there’s a strong correlation between dairy consumption and bone size), sicknesses; and then of course there is a problem with accuracy of measured results (men magically grow 5 cm on average when registering on a dating site) and prediction range (for a 220 cm father the model predicts a two-and-a-half meters son. Must be quite the sight!).

And you know all of this, but there you are, staring helplessly into the ocean of data and hoping your black box model might actually look believable enough to help you stay afloat. As for your computer, he doesn’t know anything. He sits in a blissful void of zero and ones, churning away at the data you feed him, humming blindly while blowing up rockets and losing billions.

So now you take sir Newton beautiful second law derivate — the dampened oscillator equation, which describes the motion of a spring or a pendulum under normal conditions, and of course it looks like this:

source: WTPresearch, ggvis plot of dampened oscillator

For a human it is immediately clear what the pattern is. You don’t see points — you see a ball bouncing or a rope swinging of an harmonic on an oscilloscope. You don’t see facts, but symbols. But it’s 1679 and you’re Isaac Newton — the most brilliant scientist in the whole British Empire and yet it takes you eight year and a half to get to this:

newton’s second law — underdampened oscillator

But it’s inevitable, isn’t it? It’s creative scientific work, it’s beautiful and refined and only a human could do it. Because when you feed this very same data to your silicon companion and have it perform your standard regression, what you get is this:

Source: WTPresearch. Original data vs fitted linear regression (in red)

a perfectly accurate flat line of soullessness and missing intellect and a failure across the whole, red, lifeless line. So right, and yet so wrong and above all, so exactly what we have come to expect from our mindless machines.

Enter Hod Lipson and the team at Nutonian. They are here to redefine your expectations.

Part IV: a case study on symbolic regression

Computational statistics as we know it operates in a very rigid framework of constraints. The very name of forecasted variables — the so-called fitted variables, signal how real-world scenarios are basically shoehorned into existing sets of equations and parametrized ad nauseam, until they look acceptable enough.

But the world is not linear, and as we venture into the unknown we realize we understand less and less what the new hypothesis (step 2!) actually looks like. Wouldn’t it be nice if we could count on our machines to make sense of the world around it, together?

Wouldn’t it be nice if we could count on our machines to make sense of the world around us, together?

That’s the very same dilemma professor Lipson and his students at Cornell set out to solve in 2007. The answer is called Eureqa Nutonian, a free (if you can provide an academic e-mail address) software launched in 2009 and probably the most impressive piece of technology I’ve ever run. The implementation is very hard, but the idea is quite simple: just feed your pc the same data as before, but don’t provide a safe framework. Let it formulate its own model, and rapidly iterate on its own errors — which is exactly the way you learned everything you know.

And in a stunning display of ingenuity and brilliance, it just works.

problem formulation and solution (green), in real time

You select your building blocks, authorize the search and the genetic solver will quickly iterate between solutions, weighing them in order to minimize complexity. And in a handful of seconds, you’ll see the formula —one of billions — emerge from the realm of possibilities.

formulas in growing order of complexity

And it’s exactly the one you were looking for.

source: WTPresearch. Original data (blue) vs fitted line (in red)

It’s not overfitted. It’s not underfitted. And the parameters are exactly the same. The software didn’t know anything about the data you gave him. He learned everything on his own. The perfect synthesis, just like the one Newton did. Except it took you fifteen seconds instead of eight years.

Eureqa’s best guess. Parameters are correct, structure is equal to Newton’s equation.

Epilogue: the places we’ll go

In 1610 Francesco Sizzi, a Florentin astronomer, stated that ‘there are as many star in the sky as days in the week — seven, that is’. Colleague and rival Galilei refused to believe him, and following a naive yet brilliant intuition put two lenses in a tube and created the first telescope. He spent the following thirty years observing the sky with curious eyes, discovering and annotating phases and locations of more than 1000 celestial bodies, paving the way for great astronomers such as Huygens, Cassini, Hooke and finally Newton.
300 years later, the sloan digital sky survey has catalogued half a billion stars and planets. All of this data was marked and tentatively sorted by computers; the most interesting items were then uploaded to the galaxy zoo, where they can be seen and judged by a crowd of human supervisors — most of which are normal people like you and me who have never used a telescope in their life.

Times have changed. The creative spark we’ve always had is still there, but now we got new, amazing tools at our disposal acting as a firestarter. When the next Newton sees the proverbial apple falling from the tree, he’ll need minutes — not years — to understand what he saw. He’ll be able to cross-pollinate and validate his answers with thousands of other people working on similar or parallel problems. And he’ll be ready to count on smart machines — not just dumb glorified calculators —to understand where his discovery will leads us. It’s a paradigm shift, one that’s happening right now. It’s a journey we’ve just started.

Just think of the places we’ll go

“The places we’ll go”, written by Fede Torri for wethepeople research, originally appeared on Linkedin with the title “The places we’ll go — on statistics, innovations and how computers will change the way we do things