Analytics Headaches? — Introducing The Essential Ten R’s of Actionable Big Data

New Thinking To Secure Your Analytics Foundations

Mark Waller

Published in

AQOIA

5 min readOct 18, 2019

Recent reports suggest we remain struggling with capturing value from Analytics.

The principle reasons and barriers to wide spread adoption is people and culture. This is a critical and expensive problem. We need to become much smarter about how we address the People Data and Technology continuum as we position ourselves towards success in the Society 5.0 era.

Data-Driven Decisions Start with These 4 Questions

Executive Summary To get useful answers from data, we can't just take it at face value. We need to learn how to ask…

hbr.org

Data has become central to how we run our businesses today. In fact, the global market intelligence firm International Data Corporation (IDC) projects spending on data and analytics to reach $274.3 billion by 2022. However, much of that money is not being spent wisely. Gartner analyst Nick Heudecker‏ has estimated that as many as 85% of big data projects fail.

The above article suggests for principle quuestions should be asked:

How was the data sourced?
How was it analyzed?
What doesn't the data tell us?
How can we use the data to redesign products and business models?

For your furher reference I have elaborated further on this key point with the 10Rs of Actionable Big Data.

Back in the day, the 5V’s – Volume, Velocity, Variety, Veracity, Value, of big data helped us frame and secure emergent new data initiatives. While still relevant, with the increasing reliance on technology in our daily lives, the 5V’s alone are insufficient to secure analyitcal outcomes.

The data to sustain us through Machine to Machine interfaces, Augmentation, Artifical Intelligence, and Advanced Analytics scenarios are becoming essential. For individuals, enterprises, and society alike.

To keep ahead of trends, we require a new heuristic to assure our data pipelines. We need at design-time to consider a new emphasis on quality in the context of fit for purpose data consumption.

Alongside the 5V’s of (big) data, we propose the 10R’s of (Consumable) data: Rich, Relevant, Reliable, Robust, Reconcilable, Ready, Resilient, Repeatable, Riskless, Returns.

R=Rich: The source data for analytics and AI must be Rich. Without an appropriate level of Richness, the best “questions” and subsequent smart “answers” cannot yield themselves. Machines cannot operate; Automation grinds to a halt, Artificial intelligence is less than more intelligent. Humans are not augmented, and decisions remain unsupported by facts.

The driver to Richness is the workflow and pipelines needed to get data into a consumable fit for purpose state.

R=Relevant: The data must be relevant to the designated purpose. For one element of data, there are infinite permutations and uses. Data relevance ranges for machines acting on a single parameter, to layering context and building Richness for complex AI model inputs and analysis.

The driver for Relevance is “in the eye of the beholder.” Who or what needs the data for what purpose.

R=Reliable: Reliability is dependability and, ultimately, trust — in its Reliability for purpose. If a human cannot rely on or trust a dataset, they will dismiss it. A machine will yield poor results.

The driver for Reliability is the quality and governance processes around securing a fit for purpose condition in the data across the end to end value stream from source to consume.

R=Robust: Robustness of the data and data pipeline is essential to secure the ongoing quality and the utility of the data and data pipeline to ensure it remains fit for purpose once set against expectations.

The driver for Robustness is the complexity, tooling, and support necessary to secure a sustainable fit for purpose operation.

R=Reconcilable: Reconciliation of the data to an accepted and known attribution for a given fit for purpose context is the characteristic that supports and enables continuous speedy and timely operations and actions. The more significant the impact and the farther away from “source” the derivation, the more critical the need for Reconciliation.

The driver for Reconciliation is in the level of trust and means needed to secure a fit for purpose outcome.

R=Ready: Ready is the characteristic the data is available and fit for purpose consumption. Ready is a crucially important factor in securing timely hand-offs between source to target in the data value stream. With decision cycles shortening, data complexity, volumes, interdependency, and reliance increasing, getting to Ready factoring the other R’s and V’s is a crucial characteristic.

The driver for Ready is the latency, and the process required to have the data Ready for fit for purpose consumption.

R=Resiliance: The amount of flexibility and adaptability, the inherent capability, the data, and the data pipeline have to evolve and modify with changing fit for purpose needs.

The driver for Resilience is the amount of volatility and variability impacting the data pipeline from source to target workflows based on changing and evolving fit for purpose measures.

R=Repeat: Repeat is the ability to reproduce in a systemic way fit for purpose outcomes in a predictable, repeatable coherent cadence. Repeat characteristics can range from sub-second streaming and telemetry workflows to complex curated data pipelines that service quarterly and annual reviews.

The driver for Repeat is a function from all the above inputs.

R=Riskless: Riskless is the characteristic for everything to do with the nature collection, accumulation, storage, combination, dissemination, access, and disposal of data.

How riskless is this data and data pipeline and how riskless does it need to be? The data value chain from a security and governance perspective. For example security, access, rights, storage properties, threat properties, legal, trade, confidentiality, GDPR, commercial terms, valuation, utility and cost etc.

The driver for Riskless is the scaled security, legal, threat, governance, procedural, commercial, and value attributes for example on the individual data, data-pipelines, and governance policies.

R=Returns: Returns is RoI (Return on Investment) or cost-benefit, or how economically and commercially viable your Analytics queries and platform in a composite is.

Returns analysis is particularly pertinent for discussions on moving or upgrading traditional monolithic Enterprise Data Warehouse solutions to the Cloud, Hybrid Cloud, and upgrading and extending their capabilities.

The pay to use vs. pay to provision infrastructure is a crucial question today. Open-source vs. assemble and build vs. monolithic off the shelf, one-stop-shop from different vendors is also a factor. New multi-cloud Use Cases, the emergence of disposable BI, service, and upkeep, and sunsetting transitions, also add a whole new dynamic to the RoI and economics.

The driver for Returns is the nature scale and complexity of all the above dimensions for a query, use case or platform in aggregate, offset against your business drivers. At the atomic query level, it pays to ask — how expensive will it be for us to answer this question.

The next time you are designing mission-critical data flows, consider the 10Rs of Consumable Data alongside the 5Vs of Big Data. You may save yourself, your data consumers, and data governors a lot of downstream problems.

If you would like to discuss your specific analytics case please contact mark.waller @aqoiagroup.com or chris.hearnshaw@aqoiagroup.com

Version control 1.3– 6/2/20