The Data and The Story

Himanshu Nautiyal
The Startup
Published in
6 min readJul 1, 2019

Oscar Wilde wrote a short story called The Nightingale and the Rose. Read it if you have time — my precis below does it little justice. What this has to do with data will become clearer soon.

‘She said that she would dance with me if I brought her red roses,’ cried the young Student; ‘but in all my garden there is no red rose.’

‘No red rose in all my garden!’ he cried, and his beautiful eyes filled with tears.

The Nightingale hears him and feels for him. It scours the gardens for a red rose bush but finds only a barren one, old and withered. The bush tells the Nightingale she will have to sing the entire night to create a rose and will have to colour it red with her heart’s blood. She sings the entire night of the elements of love, what makes it grow and how it reaches its zenith in sacrifice. Before the night is over, the Nightingale is finished and the Red Rose is complete. The student takes it to the Professor’s daughter, who replies that it won’t go with her dress and the student’s richer rival is gifting her jewellery instead. Frustrated, the student throws the Rose in the street and concludes…

‘What a silly thing Love is,’ said the Student as he walked away. ‘It is not half as useful as Logic, for it does not prove anything, and it is always telling one of things that are not going to happen, and making one believe things that are not true. In fact, it is quite unpractical, and, as in this age to be practical is everything, I shall go back to Philosophy and study Metaphysics.’

So he returned to his room and pulled out a great dusty book, and began to read.

If you read the full story, both the Nightingale and the Student have ALL the DATA about what Love is. But only one of them is able to understand the STORY, the why and the how of Love from it.

The Economist has called data the new oil.

Words shape thoughts.

Since data is the new oil, the world has rushed off to explore the depths of their data stores, from which shall be produced crude data, which they will refine, by distilling insights, each of which can be pumped as fuel into strategies or even be directly monetized.

This is Student thinking which results in using A/B testing for the minutest of decisions. This kind of thinking results in the straight selling of data because you don’t know what to do with it, but people keep insisting it has value.

The Nightingale’s sisters reside in the unlikeliest of lands, the land of science.

In science, data takes us, step by step, towards a story, a full explanation of what is, beyond a simple observation of correlations in data. This is not to underestimate the importance of the correlation, because it is the first step and requires painstaking, disciplined work. But, if we miss the story, there is work that remains undone, truth unrevealed, and better courses of action left untraversed.

In the first stage of development of a theory, we observe macroscopic variables. Data points us to the relationships between them. Boyle’s law or the claim that at constant temperature and mass, the pressure and volume of an incompressible gas are inversely related is such an observation that data brings forth.

Over time, we accumulate lots of these theories, purely by observation of data

Boyle’s Law PV = const if (m, T constant)

Charle’s Law V/T = const if (m, P constant)

Gay Lussac’s Law P/T = const if (V, m constant)

which gave what may have been the Unified Theory of Gases, the

Combined Gas Law PV/T = const if (m constant)

The further observation of

Avogadro’s Law V/n = const if (P, T constant)

and full combination of these brought us the

Ideal Gas Law PV = nRT

which related everything that could be measured about a gas in widespread conditions.

However, all of this is still data being related to data by observation. The STORY here came from the kinetic theory of molecular motion. This theory makes a very small set of simplifying assumptions and then applies classical physics to explain the mechanism of why and how the Ideal Gas Law would come about.

You can keep observing the variation in click through rates by the variation of shades of blue, and you will certainly find the best shade of blue to use. But you may completely miss the STORY, the mechanism that explains the why and how of variations in user behavior. Once you understand the mechanism, you may have much better results with a different colour or indeed with a completely different scheme.

But, isn’t this a very difficult ask? Time and budget is limited. Running experiments on blues is doable. How will thinking about theories help us to transcend the blue neighbourhood and make fundamental deductions about other colours or other schemes without having to run the same A/B tests in combinatorially exploding numbers?

Gradient descent works, after all. Yes, one must make some jumps, but is there such a thing as an informed jump?

That’s where human ingenuity and Occam’s razor save the day. Make small assumptions, make simplifying assumptions and you will often stumble upon the mechanisms. Here is an example of that.

In 1854, there was a severe outbreak of Cholera in Soho in London. Two competing theories were offered about the causes of Cholera — the miasma theory which posited that some sort of airborne foul substance spread the cholera from person to person. The germ theory posited that some sort of waterborne self-reproducing entity (this is before Louis Pasteur’s work) was spreading the disease.

John Snow (sic!) was able to trace the source of the 1854 cholera outbreak by mapping the frequency of cholera occurrence as a function of location in the map beside. The source turned out to be a handpump at the intersection of Broad Street and Cambridge. The pump was disabled by removing its handle and the disease rapidly declined. Fastidious researcher that Snow was, he admits the possibility the disease may already have been in decline by the flight of people from the area. But the whole sequence showed that the mechanism of the disease spread was the water, not the air. With this, he was able to change the fate of several areas by simply changing the source of water. Without the STORY and the mechanism, this would not have been possible. The mere observation that the outbreak was centered around the handpump would not have led to any actions to be taken in other areas where there were outbreaks.

One of the two theories of disease transmission turned out to be true. Why did researchers limit themselves to these two possible theories? This is Occam’s razor in action. The choice between the two was made by whichever had fewer inconsistencies with observed data.

Data is useful to illuminate the path, but keep following the path to find the full story.

--

--