4 PREDICTIONS FOR HOW DATA TRANSFORMS EVERYTHING

Scott Gnau
future of data
Published in
6 min readJun 28, 2016

This is part two of a two-part series from Hadoop Summit.

In a related post, Rob Beardon talks about how data transforms everything and the need for Connected Data Platforms. As a follow on, here’s four predictions for technologies behind this transformation.

#1 — INTELLIGENT SELF-CONFIGURING NETWORKS WILL ENABLE NEW & FASTER DELIVERY OF DATA AND ANALYTICS ACROSS DATA CLOUDS

We all accept that number of connected devices and sensors will continue to grow. The volume, variety and complexity of data will continue to explode alongside it. Currently data is doubling every few years. A vast set of public, private and hybrid data clouds are emerging.

The need for machine-to-machine (or peer-to-peer) connectivity that the Internet of Anything (IoAT) is defining two things: the need for ‘an interface of things’ and types of intelligent environment where devices that can understand each other can work together in real-time, in the context of a larger need or purpose. It also requires the ability for networks to expand and contract on demand as well as for messages to be routed and prioritized in real time.

In this world, silos of data will be replaced by clouds of data. Freed from the need to conform to the rules of data center batch processing, we will see intelligent self-configuring networks that can enable the meaningful connections between devices, and provide the flexibility to enable new and faster delivery of data to the right place to be analyzed.

#2 — LINK CUSTOMERS, PRODUCTS AND SUPPLY CHAIN

An integrated model where data and analytics flow seamlessly is key here. Keep in mind a key notion with the advent of connected devices, our products now create, consume and use data as well and are a key part of the end-to-end business flow.

This new reality with connected customers, products and supply chains demands real time machine learning, intra-system collaboration & analytics at the edge No longer is the world going to accept ‘centralized-only’ monolithic software and silos of data. Most ‘smart’ devices will collaborate to varying degrees and be able to analyze what each other are saying. Real time machine-learning algorithms within modern distributed data applications will come into play. Algorithms that are able adjudicate ‘peer-to-peer’ decisions in real time.

Data has gravity; it’s still expensive to move versus store in relative terms. This will spur the notion of processing analytics out at the edge, where the data was born and exists, in real-time versus moving everything into the cloud or back to a central location. Plus we’ll need to keep track of all of these machine-to-machine conversations and what happened as a result in order to build better and more intelligent models and distributed solutions over time.

We all hear about self driving cars and trucks. Just imagine what an impact they will have on lives. Whether it’s better safety, better fuel efficiency, smoother traffic management, higher fuel consumption, or cheaper public transportation and movement of goods .. the personal and economic implications are far reaching.

For autonomous cars not to run into each other they actually have to communicate with each other directly as well as with the cloud. They can’t wait to go back to some place in the cloud, run an algorithm and wait for the result to come back. They have to talk to each other with a level of intelligence, and so peer-to-peer and edge analytics will come into play.

Autonomous cars will only happen as a result of Connected Data Platforms that can removes silos between data-at-rest and data-in-motion, and modern data apps. For example, think of the whole ecosystem of data in support of the concept of driving on a freeway in a self driving world and how much that requires a constant connection; from the GPS to the road maps or lanes, to the weather reports or the intersection traffic lights. Also the businesses that support this will have to transform to connect: from the insurance agencies to the car manufacturers to the

Also it also implies a new level of machine-learning algorithms that are able adjudicate peer-to-peer and in real time. For autonomous cars not to run into each other they actually have to communicate with each other directly too. They can’t wait to go back to some place in the cloud, run an algorithm and wait for the result to come back. They have to talk to each other securely, and with a level of intelligence, and so peer-to-peer and edge analytics will come into play. This can only be facilitated by Connected Data Platforms.

#3 — ANALYTICS IN REAL TIME & AT THE EDGE

I see the mandate for pre-emptive analytics: From post-event to real-time & pre-event analysis & action.

Businesses are just starting to see the value in their data, and the opportunity cost of not having the right strategy across all types of data-in-motion and data-at-rest.

The unique value creation however will come not just from processing and understanding transactions as they happen and then applying models but by actually doing it before the consumer, or the sensor, logs in to go do something.

I predict we will quickly move from post-event and even real-time to preemptive analytics that can drive transactions instead of just modifying or optimizing them. This will have a transformative impact on the ability of a data-centric business to identify new revenue streams, save costs and improve their customer intimacy.

For enterprises to succeed with data, apps and data need to be connected via a platform or framework. Convergence of such divergent requirements into single proprietary product or tool will not work.

Applications of old were silo’ed, proprietary, highly structured and in most cases monolithic. They operated only on historical data. Modern data applications are able to access broad amounts of data-at-rest and data-in-motion that can be very loosely correlated in real time and applied to machine learning.

The notion of being able to connect data-at-rest and data-in-motion in different data platforms, across multiple cloud providers and in data centers with true application portability, is the central differentiator between the future of data versus the past.

This is the foundation for the modern data application. Modern data applications are highly portable, containerized and connected. They will quickly replace vertically integrated monolithic software.

#4 — DATA IS EVERYONE’S PRODUCT

In this new world, your product is not your only product, your data is also a product. Or rather the analysis that comes from your data — about your company, your customers and your ecosystem. Or even yourself. Software can be replaced; your data is irreplaceable. It has value. Guarding your data will no longer just about private information but about holding on to that value, as a consumer or as a business.

For the business, the modern balance sheet nowadays already has multiple more intangible values; from the value of the brand to the vacation and sick days employees might take. Consumers until now largely given away the value of their purchasing power on social media and online commerce, but they will become increasingly savvy about how and whether they do this.

So whether you are a manufacturer moving metal or an individual consumer, I predict your data will become a product with value to buy, sell or lose. There will be new ways, new business models and new companies looking at how to monetize that asset.

We are moving towards an exchange and marketplace for data. Out of this, there will be companies who will capture your public available data and want to sell it back to you, and companies who can act as trusted data brokers and provide all the cloud and peer-to-peer security and privacy tools to protect you and your identity.

--

--

Scott Gnau
future of data

CTO at Hortonworks enabling the next generation data architecture.