At the core of every tech innovation today, you’ll find data — tons of it. We’re to the point where each step of the data lifecycle — its management, its care and feeding, its use and application — is critical.
Our latest #CloudMinds huddle, during Think 2018 in Las Vegas, harkened an episode of “Law & Order” as its theme, which centered around data stewardship and lifecycle management, appeared ripped from the headlines.
If speed and innovation are key to avoiding disruption, businesses need faster access not only to clean, useful data, but also the tools to help analyze it and extract insights in an easily repeatable, effective way. When developer organizations can start to take advantage of this lifecycle, innovation becomes easier to realize.
With this as cold opening, the conversation yielded five key takeaways:
We’re moving from data science discipline to data engineering
Just as the industry moved from computer science into computer engineering, we’re moving from data science to data engineering. The ability to turn on a data flow and keep it running 24/7 at the scale that you need will be one of the most significant challenges, but solving it will make us much more productive.
The biggest obstacles to committing to data science: People, process, culture
Implementing and augmenting technologies to serve the data remains a challenge, as does the problem of having sufficient data infrastructure. Another problem revolves around the question of talent — do organizations have the access they need to top talent that can help them get to a point where they realize the ROI behind AI and machine learning. People also have the potential to poison data sets and influence machines with biased algorithms.
Data engineering could benefit from becoming more iterative
Agile and development methodologies are iterative by nature, but we seldom talk about data in the same way. Those responsible for data governance are often tasked with delivering better data and performing better analytics. Better analytics, in turn, yields better data, which yields better analytics, and so on. It’s continually cycling. The better the insight you produce, the better governance you can put around that to produce better data.
Operationalizing data — introducing a DataOps culture — could be key to solving some key issues
Stakeholders throughout the data lifecycle — business executives, technical leaders, engineers, analysts, etc. — could benefit from the data equivalent to DevOps.
If, as Andrej Karpathy suggests in his recent post, “Software 2.0,” we’re moving to a model where data is becoming software, the importance of data as ops and an emphasis on sensible data governance cannot be more urgent.
Open source is often lacks in its ability to deliver the integrated systems needed to serve the data lifecycle
Integrated systems often require a profit motive to bring disparate systems together. Some open source communities aren’t motivated to collaborate or build integrated systems because it can be expensive.
Is there an opportunity to change the cost dynamic and address this?
What are the incentives that will draw us forward to allow that level of openness and sharing that will spark evolution?
There are challenges and opportunities in open sourcing data
While plenty of open data sets exist, to be most effective, open data sets need to be new, anonymized and higher value. This could allow us to work toward better models and more effective ways to glean insights.
The challenge is that enterprises see data as a commodity. Data that exists behind a firewall is the most valuable, and companies would be foolish to give that up.
We need data … but what we really need is relevant data
What you really need is a constant stream of distilled, relevant data. “Democratized” is an inefficient word to describe the clean data sets that organizations need to glean effective insight.
A final question: Does every data lifecycle need to end? That may be where regulations are heading, but is there value in keeping data in perpetuity?
With all of our huddles, the conversation certainly doesn’t stop here. Follow me on Twitter @KevJosephAllen for more conversations with the minds behind these insights.