The Essentialism Data Strategy

How a use-case-driven approach provides more value.

German version of the article

Michael Porter, the infamous economist and strategic thought leader, once said: “The essence of strategy is choosing what not to do.” However, we are living in a world that offers millions of alternatives and even though you know that 99% do not work, it still leaves you with a massive amount of strategic options. How do you develop a strategy then?

One particularly interesting field at the moment, is developing the right data strategy. Here, we see massive amounts of architecture solutions, data warehouse strategies, AI and ML learning applications, and so many other things. Making it crucial to understand how they work, it is also part of your job to calculate the risks as many products do not come cheap.

©Timo Elliot at https://timoelliott.com/blog/cartoons/more-analytics-cartoons

Often the main strategy is to centralize all data assets internally. Companies start to build up a central place for all their databases and dump everything into a data lake. While data lakes are enabling the analysis of large volumes of structured, semi-structured, and unstructured data and might eventually provide value in the long-run, there is a high risk that this strategic decision will end up in a massive data dump. This data swamp can cost you a gazillions, considering human resources alone. The reason why this strategy often fails is rooted in the main characteristic of the data lake itself. Data Lakes allow you to store everything! It does not matter whether you are actually using it or not. Hence, as Adam Wray, CEO of Basho, told Forbes, “everything is vacuumed up. However, this leads to problems. […] [Data lakes] are evil because they’re unruly, they’re incredibly costly and the extraction of value is infinitesimal compared to the value promised”.

The central question now is how to generate value and make that data usable. In other words, what is essential for the strategy?

What is Essentialism?

Recently, I came across Greg McKeown’s concept of essentialism. In a Tim Ferriss podcast episode, McKeown talks about his course in the d.school of Stanford combining this concept with Design Thinking.

According to McKeown, the best sitting definition for essentialism which he describes in his book, is that it is “less but better”. Hence, it starts with the question, what is really essential to what I want to achieve. In general, it is a disciplined, systematic approach for determining where your highest participation lies. Afterwardsl, Execute on those things that matter. McKeown’s design course on essentialism at Stanford includes a lot of techniques from Design Thinking, such as defining the problem space and develop solutions that seem essential to then test them.

Starting with use-cases

When it comes to the data strategy of a firm, it makes sense to start identifying reasonable use cases for your overall corporate strategy first. Often it is way leaner to develop good use cases first and then decide how the architecture looks like. This doesn’t mean you cannot develop a data lake strategy, but your most valuable factor is time and it is advisable to start doing early and create meaningful learnings. A valuable mindset helping to do this is e.g. Data Thinking.

Hence, some principles of essentialis also can be applied to Data Thinking and also to your data strategy at large. Through a method like Data Thinking workshops, the valuable use cases are generated, prioritise (decide which are essential) and afterwards develop further towards prototypes. A few of the principles of Essentialism — The disciplined pursuit of less can be directly applied for this kind of work.

Spend time exploring

There is a common misconception that AI and ML often are just plug and play solutions. As a foundation to employ such technologies, often a fairly big amount of data is necessary. Mostly, the companies that want to start using ML and AI however, don’t have the needed data. Hence, a strategy on how to collect and store it is need first. During Data Sprints, we often spend a significant amount of time to explore use cases that make sense for the company and decide how we actually get the data we need if it is not already there.

Realise you have a choice

Like with everything in life, it is important to also realize here that you have a choice. Only because big players decide to centralise all their data bases into a data lake and buy the respective tech architecture, doesn’t mean that this is the right setup for you. In the end, you always have a choice.

Focus on the vital few

During Data Thinking Workshops we collect a lot of different use cases. Often clients have already a list of use cases. How do you determine those that are vital? An efficient way to identify the ones you should double down on, is to evaluate them with the team, based on impact and effort. You can start by placing all the use cases on an impact effort matrix, like the one below.

In the beginning you would want to start with the use cases in the top left corner as they bring high value and cost less effort. They are also often considered as “low hanging fruits”.

Repeat the process

Not only can you only be good at something if you repeat it over and over again, also Design Thinking and Data Science are iterative disciplines. You need to constantly learn and improve it. Hence, it is in the nature of these two disciplines. Dat Tran from Idealo wrote a fantastic article about what a Minimum Viable Data Productis. He touched on this point in his article. He writes “A possible approach to solve this classification problem would be to take a neural network with one hidden layer. We would next train and evaluate the model. Then depending on the results, we might want to keep improving our model. We then would add another hidden layer and then do the same modelling exercise again. Then depending on the results again we might add more and more hidden layers.” This shows the iterative approach. Start lean and add complexity later when you learnt enough to really see what would bring you the necessary information depth.

In the end, what ever you do, your data strategy is an inseparable part of your products and services and need constant learning, testing, and improvement as well as everything else.

For a final conclusion, I want changed some of the personal questions you can ask yourself when applying essentialism and combined them with the questions we ask during Data Thinking Workshops and Data Sprints.

  • Ask yourself the three questions: What is it that our brand/product/company stands for? What are we particularly good in? What meets a significant need for our customers?
  • If you could only do one thing with the data you have at hand, what would you do?
  • Regarding my data strategy, is this the very most important thing, I should be doing with my time and resources right now?
  • What is important right now? Can we solve it with the data we have?

Have some thoughts? Please share in the comments.

References:

Kowalski, Kyle (2017). 10 Life hacks form ‘Essentialism’ (Book Summary) URL: https://www.sloww.co/essentialism-book/

Knowledgent (2014). How to design a successful data lake URL: https://knowledgent.com/whitepaper/design-successful-data-lake/

Tran, Dat (2018). What is Minimum Viable Data Product? On Medium URL: https://medium.com/idealo-tech-blog/what-is-minimum-viable-data-product-49269e338d85

Wood, Dan (2016). Why Data Lakes are evil. Forbes URL: https://www.forbes.com/sites/danwoods/2016/08/26/why-data-lakes-are-evil/#b4f61be4f736