Building a successful Data strategy

Building a successful Data strategy — Users and User Types

Stop focusing on the data and start focusing on your users

Mathew Taylor

Published in

Eliiza-AI

10 min readOct 2, 2020

Data is now at the forefront of corporate strategies, with the importance of data to drive success only growing over the last decade. Companies are talking about ‘data enablement’, ‘building data platforms’, ‘creating data democracies’ and making ‘data-driven decisions’.

However, implementing these buzz-words is proving to be challenging. As reported in the ‘New Vantage Partners, Big Data and AI Executive Survey 2019’, 77% of executives report that business adoption of Big Data/AI initiatives is a major challenge, and only a minority say that they have managed to create a data-driven organisation (31.0%), or have forged a data culture (28.3%) [1]. In the same report, 93% of respondents identify people and process issues as the main obstacle [1].

What’s the problem?

Over the past few years, there’s been a shift in the desired outcome of data projects from reactive, problem-focused deliveries (using data to fix issues one at a time), to proactive, user-focused deliveries (putting the data and the tools in the hands of the users so they can solve problems independently). That’s what’s behind the buzzwords ‘data enablement’, ‘data democracies’ and ‘data-driven decisions’.

This is all great, but what it means is that we need to change the way we provide the solutions. As engineers/developers we typically analyse things from a purely technical or problem-specific viewpoint, a ‘this is the problem we need to solve, here is the solution’ approach, rather than looking at the needs of the users or our customers. Whilst over the last decade software engineering as a whole has moved towards a customer or user-centric approach, data engineering has remained in the dark ages of using a technical or problem-centred methodology.

This is evident in a lot of data projects we see. Most data strategy projects fall under one of two scenarios; the first is the problem-focused approach:

We need a data strategy
Find a problem which can be solved with data
Solve the problem
Find another problem which can be solved by data
Solve the problem
Repeat

And whilst this does (eventually) help solve issues within the company, it’s incredibly reactive. Problems are first identified, assessed and then solved one at a time. Data access and use is bottle-necked as projects and data engineers focus on one issue at a time, which in turn leads to siloed solutions. Each solution tends to only solve a particular problem for a particular user (or user group), rather than leading to wider engagement of data by users.

The second is the technical approach. This approach tends to bunch all users in one giant user case and typically leads to a large ‘build-it-and-they-will-come’ data transformation project. Technical advice, articles and best practices are sought out and a cathedral type solution is architected. This is usually a data lake of some sort, or sometimes an all-encompassing data vault or data warehouse. Whilst this is exciting stuff, for the most part, it predominantly ends in a budget-breaking, deadline smashing, technical monolith which only a small subset of users can navigate or use.

What can we do?

We need to shift our mindset and start looking at things from a user-centric viewpoint: focus on the types of problems that our users come across and need to solve, what data and tools they need, and how we can get that data to them. We want to implement a data strategy that allows our users to solve their problems for themselves so that they really can become self-sufficient and data-driven.

Identifying our user groups

The first step on this path is to identify our data users and user groups in an organisation. This is where data projects get interesting: typical software engineering projects touch a limited number of user groups, whereas data projects usually have an impact over a larger user base.

When we look closely at our users, we realise how unique they are. They all have different use cases and requirements for data, which varies based on their role, experience and knowledge.

List of different groups of data users in a company. — Different groups of data users in a company — Illustration made by the author/eliiza.com.au

As the list above shows, there is a huge range of business users, from the very technical (data scientists and data engineers) to the non-technical (C-level executives). From internal business users with strong domain knowledge to external customers who don’t care how it all works; and from human to non-human users, such as other software applications and machine learning models.

User Types

Given we have all these different users how do we help solve their particular problems without either

a) Creating specific solutions for each user or area (which gets us back to the early reactive single problem-single solution issues)?

Or:

b) Trying to create a one-size-fits-all solution that doesn’t solve all our use cases (which gets us to the technical approach)?

We can examine our users and group them into three specific user types based on their use of data in their day-to-day work.

The three data user types; explorers, operators & analysers, readers — The three data user types — Illustration made by the author/eliiza.com.au

Data explorers — Data explorers are users whose day-to-day job is to search and wrangle the source or raw data. This group’s needs are fairly simple: their main concern is around getting access to the data. Knowing how and where to find the data, and then easily getting their hands on it, is key to their success.

Data operators and analysers — These users are our technical users and domain experts. Analysers usually have strong domain knowledge about the data and use that to join data from multiple sources to answers questions: for example, working out what has gone wrong with a customer’s campaign, or finding new patterns in the data to solve business problems. Operators have strong technical and programming skills and use the data as input into applications they are building. They might need to do some modelling or manipulation of the data to get what they need but they aren’t interested in the data itself.

Core needs for these users are cleanliness for the data and standardisation of the data and related tools. The users in this group don’t want to worry about whether the source data is accurate, whether they need to convert date formats to match or have to reconcile data between different systems (e.g. different types of user id’s). They also want the data given to them in a standardised format (e.g. parquet) that they can work with using common tools (e.g. SQL).

Data readers — Data readers operate in the business domain/layer and use data to drive decisions in their daily work. They want specific answers to specific questions and they don’t want to hunt for these answers or manipulate data to get the results. Accuracy and timeliness are the key requirements for these users, and they need data in an easily consumable and interpretable fashion. The requirements for the users in this group are often specific to the individual use case.

Data maturity

We can also look at our users in terms of data maturity, i.e how they understand data and how much we can trust them with data.

Data explorers are experienced data users and have a high level of maturity through their use of and understanding of data and its flaws. They are aware of inherent inaccuracies in the data (e.g. fraud and duplication) and so can be trusted to be sceptical regarding the results they get from queries. Part of a data explorer’s job is to verify that their query is accurate and not to just accept the results.

Data operators and analysers also understand the limitations of the data they are using. They are (to a certain extent) aware of the technical or business domain behind the data and so are used to working to get the results they need. Whilst they don’t want to deal with the inherent inaccuracies they will test and check their assumptions and results from the data. Due to this awareness, we can have a medium trust with these users.

Data readers however are not experienced with data. They are typically non-technical or management/business users and so technical or domain-specific knowledge should be hidden away from these users. With this group, there is no tolerance for the limitations of the data or discrepancies in the data, and there are high consequences for inaccurate data. As you can imagine, if members of this group are making decisions based on the data provided, then there is no leeway for incorrect results. An example would be providing an incorrect bank account balance to a customer or revenue figures to a CEO which included fraud.

Categorising our users

So far we’ve identified three groups based on their daily uses and requirements of data, but how does this tie back into implementing a successful data strategy?

As discussed earlier there are two common problems with how data strategies are implemented; they either are too specific to a single user problem and so bottleneck wider data engagement, or they are too one-size-fits-all which don’t fit any individual user’s data requirements.

What we need to do instead is look at the users we identified earlier and see how they fit into our user types. We can then focus resources on creating solutions for these types which are tailored enough to the users to provide the data and tools to fulfil their requirements but are also broad enough that we can solve for groups of users rather than individual use cases.

If we take the list of data users that we identified earlier and apply the user types to categorise them, we can see how they fall under the different types:

Image showing categorising users into the three user types — Categorising users into user types — Illustration made by the author/eliiza.com.au

The more data-specific technical roles fall under the data explorer type due to their requirement for access to source data and their high level of ‘data maturity’. Under the technical data operators are Software engineers and ML engineers who use data as an input to their applications or ML models. Users with strong domain knowledge such as account managers and business analysts are under the data analysers and then furthest to the right under data readers are our non-technical, non-domain decision-makers such as executives, finance, HR and sales.

So by looking at the makeup of users in a company and how they affect or are affected by the data strategy we can then determine what solution(s) we need to implement to support them.

For example, if our strategy is to enable ‘data-driven decisions’, that is to enable our executives, managers or sales teams to be able to make informed business decisions, then we need to focus on fulfilling the requirements of our data readers. To achieve success in this strategy we need to build solutions which give theses users accurate and timely data in a way which is easily consumable (e.g dashboards and reports). We’ll need to make sure that the quality of the data is high so we don’t give the users false information and to a certain extent determine what questions will be asked so we can supply the needed information without them having to search for it.

On the other side of the scale, our strategy might be to use data to drive innovation and create new products or systems. For example, being able to identify and target specific customers or users of our product, or building systems to prevent fraud. Here our initial target group would be data explorers, we would invest in data scientists to identify patterns from the data and then harness the power of machine learning to build models to apply those findings. A successful implementation here would involve collecting data from as many sources as possible and making it accessible in its raw state for the data scientists to use.

Depending on the size or complexity of the company, our data strategy might involve multiple (or all) of the user types. In this case, we would need to build different solutions to cater for the different user types. For example in the first example with our data readers, in a company with complex or multiple business domains, business analysts in the data analysers group might be needed to provide the domain and analytical knowledge to join different datasets to provide the relevant information to the data readers.

Similarly in the second example, the data scientists would at some point need to get their ML models into production. In a small startup, they might handle this themselves, however, in a larger company, there might be a dedicated team of ML engineers. So we would also need to implement a solution which catered to the requirements of our data operators: cleaning and curating data into a standardised format so it’s easily manipulated into the inputs needed to reliably build and train our models.

To sum up

In this article I’ve highlighted the issues in trying to implement a data strategy from a technical or problem-centric methodology, and how, if we focus our attention on the users and the requirements they have around data, we can start to design solutions which allow them to fulfil the strategic goals of becoming ‘data-enabled, ‘data democratised and ‘data-driven’.

In my next post, I’ll be showing how we can move from the high-level/non-functional requirements we’ve identified, and drill down into the detailed technical requirements for each of our user types. These technical requirements can then guide the design and technology choices of our solutions.

[1] New Vantage Partners, Big Data and AI Executive Survey 2019, http://newvantage.com/wp-content/uploads/2018/12/Big-Data-Executive-Survey-2019-Findings-Updated-010219-1.pdf

Thanks to Sophie and Emma for all their help and patience!