A Change in the Prevailing Winds
At the beginning of 2020, which feels like a lifetime ago and yesterday in equal measure, I sat down and drafted five big data trends for the year to come with every intention to publish my thoughts before March.
But seismic events have shocked our lives beyond comprehension and forced a re-think. Or so I thought. Interestingly enough the pandemic has thrust data innovations, previously happening on the fringes, to the top of every agenda.
As I update my original thinking in light of the world we face today and will live in tomorrow, it’s not such a change of direction in the prevailing wind — more a change in strength. From a gentle breeze to a gale force 10.
Data will set you free . . . literally!
“Guided by evidence, data and science” is quoted by politicians everyday at 5pm when you tune into the government’s daily briefing.
Journalists compound the search for truth by asking what metrics leaders will use in order to open up countries, and stock market fluctuations seem to hinge on whatever health data is making headlines.
If there’s one silver lining from this horrific pandemic, it’s the value of data being realised beyond the back office, as the public witness a rallying cry to access both personal and professional data. For everyone in lockdown, data holds the key to their release.
But how countries are reacting to the same data across the world has widely differed based on their backgrounds, political leanings, and in some cases, contextual data from their own countries. As I outline three of my original trends for the rest of the year, it is important to realise that data is no longer the reserve of major enterprises and organisations with the budget to interpret volumes of insight. It is now as important to humankind as it is to companies, large and small.
What is clear, however, from this crisis, is that the need for data is ever stronger to make business decisions, and that we have taken data for granted in how it runs our day to day. Until the landscape shifts drastically and we do not have the right data to run our operations, do we realise the importance data already plays in our business decision making today. It is, therefore, a no brainer that right now (2020) is the right time to focus on the right investment in the right amount of data, with the right people involved in the building of our data and AI, and the right operating model to ensure that you can make the right decisions, if you haven’t already done so.
Data at Scale
As businesses come to terms with building data programs, one of the major trends comes from traditional companies beginning to understand ‘Data at Scale’. Bear in mind that most traditional businesses, up to this point, have used data sparingly and extremely purposefully, perhaps starting to build data lakes to store the data they have amassed over time.
But bridging data and analytics (data science included) is still relatively uncharted territory for traditional companies, especially in Europe. As a consequence, the management and use of a growing volume of data presents growing pains, never previously experienced, around scale.
In fact, because data management has been around for a while, learning how to scale is arguably more difficult than scaling digital ways of working. The reason being, eCommerce and digital operating models are new (relatively speaking) and there’s no prescribed way of working. But with data, companies find themselves trying to scale based on outdated processes and policies they use to manage the data — hitting the limits of a centralized data-at-scale model as a result. The Excel mindset is my term for it. “I know that there is data, we just can’t scale it.”
Companies and businesses will have to unlearn old habits to give scaling a chance. This includes new architecture, skillsets, organizational structures, ways of working, tools and software, models of operating and governance and sometimes, even having to pry things out of a company’s Chief CSV Officer’s hands (Thanks Ryan den Rooijen for the term). And you can’t just solve one of these components in isolation. Each impacts on the other and to avoid a chain reaction, you have to modernise all of them in some way, shape or fashion to scale.
When you finally scale, you come to realise the value of data you never knew existed. Especially in unforeseen circumstances, when it promotes agility in business decisions that your company has never been able to make before. Catching onto this trend, and to truly benefit from your data, requires truly start investing in changing ways of working, architectural patterns, bringing in the right skillsets. Not for the faint of heart, but with purpose (read: use case) driven management and monetization of your data, sequential use case roadmapping that builds on the previous use case, and the right skills to enable them, it’ll be hard to go wrong.
Right data, not big data
This year marks the 10th birthday of the term, ‘Data Lake’. And as with anything in data these days, we mourn its sad, pre-teen passing. The term big data had all the promise in the world, and for statisticians, like myself, it was the allure to make our models more accurate and salient.
But in 2015, six years after the first data lake was formed, Gartner pulled the term Big Data from the Hype Cycle. Since the dawn of big data, one of the biggest challenges has been that of relevancy — The more data that is collected, whether in depth (e.g. longer history of the data) or the breadth (e.g. more variables, features, or descriptors), the issue remained the same; every additional row or column could never incrementally reveal why a problem exists. In fact, adding too much data could reduce the capability of a model to explain the issue or predict if it might happen again.
In the realm of marketing analytics, which is the area of application I know best, every time new data appeared in breadth (like a new type of media available to purchase), or a historical data point, it added complications to a model that required even more data to explain. It was a catch-22 and new modelling techniques have only been able to predict things better in a vacuum of training and test data, but struggle in the real world.
Enter stage left, ‘Alternative Data’. In its strictest of definitions, it seems like it would only be applicable for investors looking to improve their capability to value a business against market conditions and independent performance. What sits behind this demand for alternative data is that current descriptors of a business’ performance from existing company data is sparse. Which means analysts need to obtain relevant (or statistically correlated) data aligned to their KPIs that will accurately value a business. The higher the correlation, the better the accuracy of the model. In most cases, this data is not from the subject business itself, but from third-party sources.
The ability to introduce external data sources that are far more correlated to the question and/ or the intended outcome, the better your chances of correctly predicting and recommending a way forward, especially in times of such uncertainty. More of the same data, or better descriptions of your current data may help incrementally, but are unlikely to make a huge impact. What businesses should focus on is more of the right data, and more of the flexibility of engineering and analytical teams to integrate this data into the business, correlating it against the problems you are trying to solve.
The current situation that we are in exponentiates this need. The word ‘unprecedented’ has itself seen unprecedented use. However, that in data parlance means we now live in a time where a lot of the previous data that held dear as assets are unlikely to be useful today to describe what will happen next. We use data from 2003 (SARS) or 2014 (Ebola), well beyond the shelf lives of data retention policies to help us comprehend the world today, and companies will begin to realise that the right data not just more data will be valuable to decision making, and companies that help to position themselves as providers or consumers it will find the recovery period of 2020 a little less confounding.
What should next steps be for companies to benefit from this trend? Assess the data you have in your organization, map your company’s most strategic questions that drive the direction of the business, and ensure your analytical teams have the right empowerment (technology, budget, or people coverage) to be able to explore what’s most important to answer the question, rather than just adding data of the same. Ensure your technology and analytics functions are in lock-step in solving these problems together.
Adding the human to AI
The scaremongering that humans were somehow inferior to computer algorithms, which will one day replace jobs, seems to have tempered. Especially since AI is stepping up in the race to develop a vaccine for the virus. However, we are realizing that to create relevant products, services and solutions using data and AI, or improve the adoption of AI, we need to include the human element.
Data is only as good as what’s collected, and what’s collected is usually a tiny subset of what we can actually measure. Even with the massive amounts of data that we already produce and store across the globe, only a tiny fraction is relevant to most problems that we’re trying to solve with data. And noticing and/ or defining these problems is half the battle and one for humans. For example, the global situation has caused a major change in our behavior and attitudes toward purchasing, health, wellbeing, food and entertainment, which has subsequently rendered historical consumer data less reliable in the prediction of things to come. So different techniques are needed to understand a new reality and intelligently create new data as a result. This human-centricity is so important, the EU has now set up a committee around this topic.
Once created, the data may reveal ‘what’ has changed but not necessarily ‘why’. It also necessitates the human element to interpret results, and make sense of the changes which show up in the data. As an example, through AI the suggestion has been made that the virus may invade brain tissues. Now it is up to research and doctors to establish why!
Whether it is about designing for adoption that encourages people to use the data, or utilizing a human-centred design approach to understand customer needs prior to building a product powered by AI, having human interpretation will improve the accuracy and relevancy of an AI model.
Without human intervention in the design, creativity, leadership and interpretation, AI efforts are likely to fail, and less likely to have the life changing impact we need it to have. Bring in the human diversity into both in how you build AI, who you build AI for, why you’re building AI, and when you’re building AI, and you’ll find that your success rate of adoption and applicability of your algorithms will improve.
It has been a strange, tragic, and difficult start to the 20s. One that will be remembered alongside other such global pandemics. But this time, whilst we will never forget the pain and suffering, we must also recognise how it has expedited human progression and highlighted human endeavour. None of these three trends I write about changed from my initial list of five, but there’s surely an intensification of action for the decade to come.
What are the other two I hadn’t written about? We may only find out if the world shifts again before the end of 2020.