How to set up a corporate data strategy?

Tobias Bohnhoff
shipzero
Published in
6 min readMar 7, 2019

In the context of artificial intelligence, big data and the use of cloud infrastructure, the term “data strategy” is increasingly used. But what exactly does a data strategy comprise, who is responsible for it and what do I need it for as a company?

What is data?

Data is not the new oil. Basically, it has very little to do with finite fossil resources, which will be replaceable in the long term and produced by a few players, but used by almost everyone. But that is only a side remark. It is true that data is becoming an increasingly valuable resource for companies and is the basis for many new business models. Data is a human construct for capturing reality. A distinction is made between ground truth data, i.e. data that directly and completely measures what is happening in reality, and information provided by inference, which can be described as logical consequences of observations or projections.

Specifically, the following types can be distinguished:

  • Continuous variables - measurable data points, such as temperature
  • Categories - such as fixed / broken, cold/warm/hot

The aim of structured data management, as a basis for working with AI, is to transfer so-called features — data points of an event, object or person — into labels, categories or contexts we assign to these data points. Compared to traditional analytics the time between data analysis and derived action is dramatically reduced. This requires an appropriate infrastructure, providing data ready for reliable process automation, specified goals and guidelines as well as corresponding algorithms.

It is important to notice that data sets are not infallible. They can be misinterpreted, incompletely captured or have a statistical bias in the sample. The more people are involved in the data management process, the more complex the organization becomes to maintain a consistent way of interpreting and managing data in its context. A careful and structured handling of data is therefore critical for success in order to derive meaningful decisions.

As more and more data is generated and captured from digitized processes in a company, a certain degree of complexity will require elaborated guidelines to ensure responsible data management.

Responsibility

Data is the digital image of an organization on the basis of which decisions are made. If one follows this principle, it becomes clear that a data strategy cannot be initiated by individual departments or the IT department but is a central top management task.

Many companies install CDOs (Chief Digital Officers) or CINOs (Chief Innovation Officers) to whom the responsibility for a data strategy is assigned. There is no general answer to the question of which constellation is the best — however, two things are important:

  1. The responsible C-level person should be able to allocate at least 50% of her capacity to the topic.
  2. The data strategy cannot be developed in an ivory tower or in an innovation silo. It requires close interaction with all other departments, which is why the task is correspondingly time-consuming.

What is the data strategy?

A strategy includes the description of the behavior to achieve a defined goal. Accordingly, three core components of a data strategy are derived:

  1. Target picture
  2. Principles and guidelines
  3. Status quo and consistency analysis

It is helpful to start with the target picture and work your way up from the vision to concrete milestones and tasks. From a company perspective, there are three overriding areas that can be influenced by intelligent data use: Effectiveness, efficiency and compliance.

Depending on where data management is used, this can of course take on very different forms. In the core business, the main source of revenue, effectiveness and compliance are required above all in order to increase revenue, minimize risk and defend the USP against competition. In classic corporate functions such as purchasing, HR, controlling and supply chain, the primary goal is to increase efficiency. On the other hand, there is the development of new business areas, where in the initial traction phase of a new idea or business unit the focus is on effectiveness and growth.

The target function is derived on the one hand on the meta level described above and on the other hand in the context of the respective business area as a tangible measurable economic key figure. This also determines which measures must be applied to achieve the goal. When it comes to compliance or efficiency issues, sometimes smart data warehouse / data lake architectures are sufficient to provide appropriate data protection and analytics. If processes are to be automated and intelligent services developed, the data must be prepared for this and appropriate interfaces must be available.

Principles and guidelines are crucial in this way to ensure uniform data handling within the company and, if necessary, with external partners. However, some data points also touch on aspects such as data protection and ethical issues that cannot and should not be clarified in operations, but which should follow an internal guideline defined by or at least aligned with the management.

Questions of data privacy and security are naturally strongly oriented towards local legislation, while ethical questions are to be considered separately in the individual case of the respective company. There is no general solution here.

Structural questions in data handling must, however, be clearly defined. They include the collection of data in terms of existence, sources, formats, timeliness, availability and rules for dealing with missing or incorrect, anonymised or pseudonymised data as well as guidelines for collecting categorial data and the labelling of data.

To communicate this critical point in a company-wide uniform language, it is recommended to work with frameworks that define exactly which data is available in which quality. This step is the so-called Data Assessment. It provides an overview and makes it considerably easier to identify potential for the use of data-driven processes and at the same time to be able to estimate the effort required for this.

A scientifically backed approach is the “Data Readiness Levels” published by Neil D. Lawrence. His model in slightly modified form can be described as a pyramid at the top of which the data quality is suitable for developing intelligent services on the basis of machine learning:

In order to create such an overview for your own company, you obviously need an initial status quo and consistency analysis. As there is usually no employee in an organization who has an overview of all data sets, an iterative approach is recommended here as well: The immediately available information is first summarized in a structured manner and successively detailed. This is necessary in order not to endanger sensitive data sets and to prevent sensitivities towards the access of a central staff unit to the data sets of the specialist department with corresponding educational work. Also the consistency and feasibility of actions towards a more sophisticated data infrastructure is important. The ROI of every business intelligence or data infrastructure should be clear to the responsible decision-makers.

Conclusion

In summary, companies need a central data strategy in order to know which data is available in which quality in the company. In this way, processes and measures can be initiated to improve the quality of the data and the structural availability in order to make better and value-generating decisions in the long term.

When developing a data strategy, it is important to align the goals with the corporate goals and to ensure the full support of top management. This is only possible if the responsibility is placed there. A common language and transparency are the most important pieces of the process, therefore changes should be implemented successively with empathy and participation instead of disempowerment.

Conceptually, the use of frameworks such as the Data Readiness Levels is recommended in order to have clearly defined guidelines which, in addition to formal and technical requirements, should also cover legal and ethical questions in dealing with data.

If you have feedback or further questions on the topic of data strategy, please feel free to reach out to us at appanion.com

--

--

Tobias Bohnhoff
shipzero

Founder at appanion.com. Technology enthusiast and passionate about trends and innovation in artificial intelligence.