Data’s journey into science

How we’re all hoping that the model makers can predict our future

Carl Follows
Ricoh Digital Services
5 min readSep 16, 2019

--

What happened ?

I remember a time, way back, when I could do most of my job just writing SQL — I’d often joke that English was my second language.
I’d pull a few disparate data sources into a unified model, demonstrate how recent performance trends compared with last year and my customers would be amazed at the power of understanding into their business I’d given them.

But it seems that’s no longer possible, now the expectation is not to just to have yesterday’s data, but to know what’s going to happen next. There is a realisation that companies who use their data effectively and bring their customers into their ethos are not just more efficient, but can bring transformational change to the whole market, creating new opportunities and potentially build monopolies. Seeing how quickly, in recent years, some mega companies have been created out of nothing, everyone is thinking if only I could make use of our data, we could make money like them.

“What is our data telling us to do”?
- Companies are asking

What will happen ?

Whilst the question is concise, the route to its answer is less so.
To predict the future requires us to build a model that behaves the same as a little piece of our world, so we can see how our world might behave if the inputs are changed.

Our models aren’t made from bits of balsa wood or plastic though, working with data means our models are built from equations and by computers. It’s now easier than ever to create such a model, but having access to unlimited cloud computing power on demand and statistical wizardry doesn’t always open the door to a new nirvana. Many questions cannot just be solved by following a trend, unfortunately mathematical formulae don’t know that, they will always give an answer no matter how foolish. We must therefore evaluate the accuracy of our model to know the likelihood a prediction is accurate.

With so many models possible, finding the best requires many iterations.
With each iteration we are not guaranteed an improvement, so it’s important to quantify and assess. It this process which is akin to the Scientific Method:

  • Observe carefully with rigorous scepticism.
  • Form hypotheses based on the observations
  • Make predictions from the hypotheses
  • Experiment to prove (or not) your predictions

Which in the world of data science becomes:

  • Observe which inputs might affect the output
    Identify data which track them
    Gather, cleanse and classify it
    Visualise to look for correlations
  • Hypothesis about the type of influence the data has on the output.
    Choose a machine learning algorithm to reflect this
    Train it with data to produce a model
  • Test the model (hypothesis) with fresh data to see what it predicts
  • Evaluate the accuracy of the predictions to prove their validity

How to make it happen.

Software engineering long ago moved towards an iterative approach, with a philosophy of agile development and fail fast, now data is making this same journey. It is not that the “traditional” data warehouse is dead, the need of businesses to monitor their performance continues, but that gives the rear mirror view. Predicting the future builds on this solid understanding of key corporate data, combining it with many other sources in an attempt to identify what’s truly influencing the company’s performance.

Whilst some companies are trying to do this themselves, few really have the ability, time or money necessary to achieve it. Unless managing data is their business, they just can’t give it the focus required to make it work. Software vendors are developing products for business users to manipulate their data, democratising these new capabilities and suggesting anyone can spin gold from their data. But truthfully, it takes a certain kind of person to understand how to pose the right question and understand the response. Not every company has such a person, so they come to specialist consultancies such as ours expecting us to give them all the answers.

But what does that mean for the Data and Analytics consultants like myself?

“What skills do I need; how do I stay current”?
- I’m asking

My discipline used to be referred to as Business Intelligence (or just BI), a name tightly associated with skills across Data Engineering and Data Visualisation. With these new Data Science skills some rebranding was required, and my discipline grew to become Data & Analytics. Alongside this introduction of Data Science came the move to the cloud and the demand for mobile analysis, which brought new abilities and challenges in Data Engineering and Data Visualisation. As is the case in many industries; this increase in the abilities (and complexity) in each domain has led to a subspecialisation by individuals. However, this specialisation can lead to a loss of focus on the original point of the discipline — to surface the right data to the right person, and provide them with insights they can use to drive change. Whilst the domains require slightly different mindsets, customers will expect me to use whichever mix is appropriate for their problem. It’s not acceptable to define myself based on the technical skills I have, rather I must define myself based on the problems I solve, and learn the skills to answer those questions as elegantly as possible.

I used to ask my bosses for my job description, wanting to understand what was expected of me so I could make sure I could achieve it (and hopefully more). Invariable I never received this description, or when I did it was woefully vague or out of date, so I attempted to achieve the right mix of delivery and innovation. Recently we’ve been recruiting and I’ve been trying to quantify what I expect of my future team, which highlighted again the difficulty in expressing what future skills are required. The technology I’m using this year wasn’t even available to me 2 years ago, so how can I assess a potential employee against the skills we might need in the future.

Ultimately what’s needed is:

  1. An understanding of data and how to ask it the right questions.
  2. To keep thinking “what information would drive a change in behaviour?”

Because although the tools and techniques will continue to improve,
the constant is creating information from data.

--

--

Carl Follows
Ricoh Digital Services

Data Analytics Solutions Architect @ Version 1 | Practical Data Modeller | Builds Data Platforms on Azure