What Nature taught me about Data Science

Audrey Lobo-Pulo
Phoensight
Published in
6 min readJul 27, 2020
Original photo by Lulucmy on Unsplash, modified by Audrey Lobo-Pulo

I was nine when I really discovered that we weren’t living on a Flat Earth — it was when an orange I held in my hand didn’t conform to Euclidean Space. In school I was taught that the angles in a triangle summed to 180 degrees, but I could see that the triangle I’d drawn on my orange had at least two right angles!

Fast forward a few decades, and I’m still in awe when nature refuses to fit into the models and paradigms we create. Not just oranges, but people too. I continue to be fascinated by the differences between the outcomes predicted by ‘data-driven’ models and the way people respond. And for a long time I’d apportioned much of the blame for this discrepancy on the limitations of these models and the data used.

And while I still believe this to be true, it took a shift in my perception of these ‘limitations’ to uncover some new insights. You see, the assumption of a flat surface is critical for the angles in a triangle to sum to a straight angle — but the many varied ways and contexts in which this assumption fails is where the interesting phenomenon and insights lie undiscovered.

Outliers on the Inside

A longstanding student of data modelling, I’d fallen into the trap of swapping ‘reality’ and ‘abstraction’ — and the ‘outliers’ in my data, apart from causing me much angst, were rarely the main game. Often omitted and side-stepped, these were put into the “difficult” basket with the understanding that if there was an underlying reason for the deviation, it would require a different model, separate study or at least an alternative hypothesis to be tested.

At the same time, there was always an uneasiness about ignoring outliers — “don’t throw out the baby with the bath water!”, was what I got told by more seasoned data professionals. “You might be dismissing valuable information which could be the missing piece of the puzzle.” I learnt that data scientists had developed many methods in their toolkit for how outliers may be dealt with.

Copyright © Audrey Lobo-Pulo (CC BY-NC-SA), 2020

But what was missing in my early attempts in coaxing these outliers back to the ‘modelling fold’ was a deeper understanding of the various contexts that this data was being extracted from.

Instead of my models truly describing reality, they were ‘fitted’ to the data. When describing the physical sciences, this worked remarkably well — for living systems, however, my results were mostly inconclusive. So I struggled to find models that really suited the data, and nature found it difficult to fit into my human-made abstractions!

But still I persisted — fine tuning assumptions, including higher order terms and fitting different types of curves to the data. Yet, at no point during those data cleansing and modelling processes did it occur to me that some of these ‘data critters’ were part of a tangle of contexts that held information of a different sort — information that might be difficult to capture through recorded data. Information that refused to be pickled in a database!

What I’d often put down to ‘noise’ in a stochastic process, was still objective in my mind — random, but objective. And any information that could be categorised as ‘subjective’ was still being processed using the same ‘data pipelines’. To me, ‘subjective data’ was information that was somehow biased because it came from particular perspectives.

And describing nature was one of those messy things — the nearest I got to that was via qualitative information. Still, nature refused to stay true to that description, and kept on moving and changing…

Nature’s Alchemy of Objective and Subjective Data

The novelty of being able to reproduce scientific results was what first made me love the scientific method — rediscovering many years after the fact that the acceleration due to gravity was the same, no matter how heavy the object in free-fall, was like a parlour trick! There’s something about being able to ‘objectively’ verify results and conclusions that give it credence.

Experiment, change a variable — holding everything else constant, and then experiment again…

The ability to collectively predict an outcome using the same methodology comes with its own confidence — and this also allows others to build on your work. There’s a certain rigour and transparency that allows for greater agreement on decision-making with ‘objective’ information. And ‘objectivity’ lends itself well when ‘scaling’ solutions. So far, all’s well — until it’s not!

Copyright © Audrey Lobo-Pulo (CC BY-NC-SA), 2019

Living systems cannot be ‘predicted’ so easily. At many points along my journey I blamed my ‘imperfect datasets’, with a range of complaints: small/sparse sample sizes; self-selection biases; missing information or inaccuracies — thinking that if I had the perfect dataset my models would describe reality perfectly. Looking back, I had discounted the many contexts that the data from living systems had come from.

Why? Because, for the most part, in an ‘objective world’ the context of these solutions doesn’t play a significant role. Subjective data, however, is different — context is everything! Context creates the conditions for how we percieve and respond in our world. Not only that, but living systems are continually learning and adapting through numerous interactions in their environment.

When the Greek philosopher, Heraclitus, observed that, “You could not step twice into the same river,” he was alluding to this elusive, continually changing quality of nature and living systems. One that won’t allow for Ceteris Paribus (holding everything else constant).

My inability to work with continually shifting subjective information and my confidence in working with ‘objective’ data had me subconsciously prefer the later. Yet, subjective data was no less important — and thinking that qualitative data was the same thing was missing the visceral nature of this information... Nature seemed to be home to both kinds of information.

My data science journey with objective and subjective data was at a dead end— that is, until I discovered Nora Bateson’s “Warm Data Labs. Nora describes “warm data” as information within the inter-relationships that’s ‘interwoven in complex systems’.

“Warm Data is not meant to replace or in any way diminish other data, but rather it is meant to keep data of certain sorts “warm” — with a nest of relations intact.”

— Nora Bateson

This continually changing subjective information may not be able to be collected, but working with this information — by understanding the many contexts and dependencies allows for new insights and also maintains the integrity of this ‘warm data’.

By giving subjective data the respect it deserves, and allowing it to flow freely outside our human-created models it might be a willing partner to what we call ‘objective’ data. But even then the distinction can become blurry — how do we really know which is which? To do that we need to be vigilant and look for what’s hidden within a forest of changing contexts…

Phoensight is an international consultancy dedicated to supporting the interrelationships between people, public policy and technology, and is accredited by the International Bateson Institute to conduct Warm Data Labs.

--

--

Audrey Lobo-Pulo
Phoensight

Founder of Phoensight, Public Interest Technologist, Tech Reg, Open Gov & Public Policy geek. Supporting the interrelationships between people, society & tech.