Please stop saying you need data to start on AI

Jarno Kartela
The Hands-on Advisors
3 min readApr 18, 2018

Companies, especially advisory and consultancy companies working on AI seem to start their talks always on the notion of data. If you want to do AI or machine learning in particular, you need data. Lots of it. In a warehouse, preferably. Data is the fuel of AI. Data is the new oil.

It’s simply not true.

Data is, however, a non-complete view on historical events made erroneously by us as humans and workers. It’s not really the “fuel” you’d want to feed to your systems which then learn to autonomously predict and run your operations. Sure, there is lots to be achieved when using supervised learning to understand what features have affected, say, customer churn and how to predict it. There are endless possibilities with unsupervised learning; it can generate new and better metadata out of problematic unstructured data, it can be used as a generative feature engine for supervised learning when predicting sales and demand figures with high-variation, high-volume effects. Heck, someone even solved the travelling salesman problem with self-organizing maps [1]. But it’s not what the future has in store. The future is about machines learning autonomously from their own inputs and simulations.

The main gist is that we as people are suboptimal. In every possible manner, we are worse than computers given that the problem can be presented as numbers. So why on earth would you say that you need data to start on AI? You don’t. You need creativity.

Creativity, to my personal experience, is combining things and when needed, breaking — or slightly modifying — the rules to create new things. Look at dynamic pricing. You can look at previous decisions and pinpoint what’s working and what’s not. Or, you can apply machine learning to try for itself and learn from its successes and failures and find a whole new optimum for pricing or personalization through contextual bandits and reinforcement learning [see, i.e. 2, 3]. With historic data, you will never break the rules and find completely new solutions for a given business problem, being that marketing, pricing, or anything that’s customer facing and can be tried out in volumes.

Picture 1: Simulated reinforcement learning in media, colours are media topics chosen by a bandit algorithm given a context, we’re optimizing for reward over time.

What’s more is that there are major cases with optimization that do not need, or may even be harmed by, historical data. Some examples include energy network optimization, aviation logistics optimization, public transport optimization, warehouse optimization, and so on. In these examples we do need constraints as data but we don’t really need historical event data for anything other than trying to understand the problem and its context. Using historical data will lead to suboptimal solutions since history is made by us as workers and employees under stress and a heavy workload. And we as people in general, are always suboptimal and even irrational in our decisioning.

Saying you need data (and data warehouses for god’s sake) to start on AI and machine learning might be one of the most harmful things you could do and push you back on achieving competitive advantage with AI.

[1] https://github.com/DiegoVicen/ntnu-som

[2] http://www.kdd.org/kdd2017/papers/view/an-efficient-bandit-algorithm-for-realtime-multivariate-optimization

[3] https://www.gsb.stanford.edu/sites/gsb/files/mkt_10_17_misra.pdf

--

--