As shown in Part I, there are a series of issues related to internal data management policies and approaches. The answers to these problems are not trivial, and we need a frame to approach them.
A Data Stage of Development Structure (DS2) is a maturity model built for this purpose, a roadmap developed to implement a revenue-generating and impactful data strategy. It can be used to assess the current situation of the company and to understand the future steps to undertake to enhance internal big data capabilities.
The following table provides a four by four matrix where the increasing stages of evolution are indicated as Primitive, Bespoke, Factory, and Scientific, while the metrics they are considered through are Culture, Data, Technology, and Talent. The final considerations are drawn in the last row, the one that concerns the financial impact on the business of a well-set data strategy.
Stage one is about raising awareness: the realization that data science could be relevant to the company business. In this phase, there is neither any governance structure in place nor any pre-existing technology and above all no organization-wide buy-in. Yet, tangible projects are still the result of individual’s data enthusiasm being channelled into something actionable. The set of skills owned is still rudimental, and the actual use of data is quite rough. Data are used only to convey basic information to the management, so it does not really have any impact on the business. Being at this stage does not mean being inevitably unsuccessful, but it simply shows that the projects performance and output are highly variable, contingent, and not sustainable.
The second Phase is the reinforcing one: it is actually an exploration period. The pilot has proved big data to have a value, but new competencies, technologies and infrastructures are required — and especially a new data governance, in order to also take track of possible data contagion and different actors who enter the data analytics process at different stages. Since management contribution is still very limited, the potential applications are relegated to a single department or a specific function. The methods used although more advanced than in Phase I are still highly customized and not replicable.
By contrast, Phase III adopts a more standardized, optimized, and replicable process: access to the data is much broader, the tools are at the forefront, and a proper recruitment process has been set to gather talents and resources. The projects benefit from regular budget allocation, thanks to the high-level commitment of the leadership team.
Step four deals with the business transformation: every function is now data-driven, it is lead by agile methodologies (i.e., deliver value incrementally instead of at the end of the production cycle), and the full support from executives is translated into a series of relevant actions. These may encompass the creation of a Centre of Excellence (i.e., a facility made by top-tier scientists, with the goal of leveraging and fostering research, training and technology development in the field), high budget and levels of freedom in choosing the scope, or optimized cutting-edge technological and architectural infrastructures, and all these bring a real impact on the revenues’ flow.
A particular attention has to be especially put on data lakes, repositories that store data in native formats: they are low costs storage alternatives, which supports manifold languages. Highly scalable and centralized stored, they allow the company to switch without extra costs between different platforms, as well as guarantee a lower data loss likelihood. Nevertheless, they require a metadata management that contextualizes the data, and strict policies have to be established in order to safeguard data quality, analysis, and security. Data have to be correctly stored, studied through the most suitable means, and to be breach-proof. An information life cycle has to be established and followed, and it has to take particular care of timely efficient archiving, data retention, and testing data for the production environment.
A final consideration has to be made about the cross-stage dimension of “Culture”. Each stage has associated a different kind of analytics, as explained in Davenport (2015). Descriptive analytics concerned what happened, predictive analytics is about future scenarios (sometimes augmented by diagnostic analytics, which investigates also the causes of a certain phenomenon), prescriptive analytics suggests recommendations, and finally, automated analytics are the ones that take action automatically based on the analysis’ results.
Some of the outcomes presented so far are summarized in the following figure. The next chart shows indeed the relation between management support for the analytics function and the complexity and skills required to excel into data-driven businesses. The horizontal axis shows the level of commitment of the management (high vs. low), while the vertical axis takes into account the feasibility of the project undertaken — where feasibility is here intended as the ratio of the project complexity and the capabilities needed to complete it. The intersection between feasibility of big data analytics and management involvement divides the matrix into four quarters, corresponding to the four types of analytics.
Each circle identifies one of the four stages (numbered in sequence, from I-Primitive to IV-Scientific). The size of each circle indicates its impact on the business (i.e., the larger the circle, the higher the ROI). Finally, the second horizontal axis keeps track of the increasing data variety used in the different stages, meaning structured, semi-structured, or unstructured data (i.e., IoT, sensors, etc.). The orange diagonal represents what kind of data are used: from closed systems of internal private networks in the bottom left quadrant, to market/public and external data in the top right corner.
Once the different possibilities and measurements have been identified, it would be also useful to see how a company could transition from one level to the next. In the following figure, some recommended procedures have been indicated to foster this transition.
In order to smoothly move from the Primitive stage to the Bespoke, it is necessary to proceed by experiments run from single individuals, who aim to create proof of concepts or pilots to answer a single small question using internal data. If these questions have a good/high-value impact on the business, they could be acknowledged faster. Try to keep the monetary costs as low as possible (using the cloud, open source software, etc.), since the project will be already expensive in terms of time and manual effort. On a company level, the problem of data duplication should be addressed.
The transition from Bespoke to Factory instead demands the creation of standard procedures and golden records, and a robust project management support. The technologies, tools, and architecture have to be tested, and thought as they are implemented or developed to stay. The vision should be medium/long-term then. It is essential to foster the engagement of the higher senior management level. At a higher level, new sources and type of data have to be promoted, data gaps have to be addressed, and a strategy for platforms resiliency should be developed. In particular, it has to be drawn down the acceptable data loss (Recovery Point Objective), and the expected recovered time for disrupted units (Recovery Time Objective).
Finally, to become data experts and leaders and shifting to the Scientific level, it is indispensable to focus on details, to optimize models and datasets, improve the data discovery process, increase the data quality and transfer- ability, and identify a blue ocean strategy to pursue. Data security and privacy are essential, and additional transparency on the data approach should be provided to the shareholders. A Centre of Excellence (CoE) and a talent recruitment value chain play a crucial role as well, with the final goal to put the data science team in charge of driving the business. The CoE is indeed fundamental in order to mitigate the short-term performance goals that managers have, but it has to be reintegrated at some point for the sake of the organization scalability. It would be possible now to start documenting and keeping track of improvements and ROI.
From the final step on, a process of continuous learning and forefront experimentations is required to maintain a leadership and attain respectability in the data community. I have included also a suggested timeline for each step, respectively up to six months for assessing the current situation, doing some research and starting a pilot; up to one year for exploiting a specific project to understand the skills gap, justify a higher budget allocations, and plan the team expansion; two to four years to verify the complete support from every function and level within the firm, and finally at least five years to achieving a fully operationally data-driven business. Of course, the time needed by each company varies due to several factors, so it should be highly customizable.
Davenport, T. H. (2015). “The rise of automated analytics”. The Wall Street Journal, January 14, 2015. Retrieved October 30, 2015 from http://www.tomdavenport.com/wp-content/uploads/The-Rise-of-Automated-Analytics.pdf.
Note: the above is an adapted excerpt from my book “Big Data Analytics: A Management Perspective” (Springer, 2016).