What Is DataOps?
DataOps is a discipline that has become a necessity in a market where the demand for access to data assets and data products is skyrocketing. The inability of data platform teams and data management platforms to keep pace with the demands placed on them by DevOps-enabled teams led to the development of DataOps.[i]
In a nutshell, DataOps brings together data scientists, analysts, developers, and operations to work on the entire product/service lifecycle, from the design stage to production support.[ii]
DataOps vs. DevOps
However, DataOps isn’t just about taking DevOps principles and applying them to data analytics. It does achieve similar things in that it can significantly improve quality and cycle time, but it isn’t the same thing.[iii]
DevOps relies on automation to accelerate build lifecycles. The goal is to achieve continuous and consistent software integration and delivery by capitalizing on on-demand IT resources and through the automation of code integration, testing, and deployment.
In other words, DevOps brings together development and operations teams and provides them with the tools they need to do a better and more efficient job. The result is a reduced deployment timeframe, faster delivery to market, a reduction in problems, as well as a shorter timeframe required to fix problems.
DevOps has enabled top companies to reduce their release timeframes from months to minutes, or even seconds in some cases. This has offered them an incredible competitive edge, which is highly necessary in today’s fast-paced economy.
Essentially, the reason companies like Amazon and Google are able to release software multiple times per day is because of DevOps. Otherwise, even the attempt would end in disaster.
The goal of DataOps, on the other hand, is to make data analytics more efficient. To do so, DataOps adopts Agile Development principles, thereby improving the efficiency and effectiveness of the data teams and users.
This means that data teams can publish new analytics in shorter increments, referred to as sprints, which significantly reduces wait times. Studies also show that this Agile Development approach leads to software development projects being completed with fewer problems. In the data space, it means companies can respond to customer needs and pain points faster, thereby significantly increasing the speed of value delivery.
However, compared to DevOps, DataOps has an additional component that is continuously in flux. This is the data pipeline, where raw data enters on one side, is then processed, and exits in a different form (reports, views, models, etc.) on the other side. This data pipeline is often referred as the data Producer/Consumer model.
DataOps has a vital role when it comes to this data flow because it directs, monitors, and manages the data pipeline. Statistical process control (SPC) is one of the more powerful tools used to achieve this.
SPC ensures that statistics stay within their acceptable range, thereby resulting in significant increases in quality, efficiency, and transparency in data analytics.
Thus, DataOps combines principles from DevOps, Agile Development, and statistical process control.
What Does DataOps Do?
Data is valuable. It’s more valuable than ever and many organizations are recognizing that it can generate much greater value than they previously thought. It can become a product in itself. However, the data is only as good as an organization’s capacity to efficiently collect, process, and transform it into actionable insights.
The problem is that many organizations aren’t exactly clear on the most efficient approach to data collection and analytics. They often take a seemingly all-encompassing approach based on the principle of “we’ll collect the data and then figure out what to do with it,” that can do more harm than good.
They then have a data team who is supposed to miraculously turn garbage into gold, which generally requires far more effort than necessary and rarely leads to the desired results. Of course, this makes it virtually impossible to deliver actionable insights on a schedule that can keep up with the demands of a DevOps team that’s pushing to get their code to market.
DataOps takes this jumbled mess and turns it into a smooth process where data teams aren’t spending their time trying to fix problems. They aren’t wasting their time trying to turn poor raw data into something usable. Instead, they can focus on what matters, namely providing actionable insights.
DataOps ensures that the raw data coming in is useable, it ensures that the results are accurate, it focuses on the value of people and working together, and it keeps the data team at the center of the company’s strategic objectives.[iv] After all, they no longer take months to come up with required insights but are just as efficient and effective as DevOps-enabled teams.
The Evolution of DataOps
Lenny Liebmann, Contributing Editor at InformationWeek, was the first to introduce DataOps in “3 Reasons Why DataOps Is Essential for Big Data Success” in June of 2014. Andy Palmer subsequently popularized the terms at Tamr.[v]
DataOps saw significant evolution in 2017[vi]. Thus, the growth of enterprise-level interest in this discipline led to the development of a powerful network of vendors developing and marketing a wide range of related products and services.
Any DataOps platform relies on four essential software components, namely:
- Data pipeline orchestration: DataOps requires a guided workflow based on graphs that encompasses all the steps related to integration, data access, visualization, and modeling;
- Testing and production quality: DataOps not only tests and monitors the quality of production of all data but also tests any changes in code during the deployment phase;
- Automated deployment: DataOps constantly takes code and configurations from development environments and moves it into production;
- Data science model deployment and sandbox management: DataOps is also responsible for the creation of development environments that can be reproduced and movement of models into production;
- Other functions requiring support: code and artifact storage, parametrization and secure key storage, distributed computing, data virtualization, versioning, and test data management.
A large number of products and services came onto the market in 2017 to satisfy the aforementioned need. The number continued to expand significantly in 2018.
Despite gaining popularity, DataOps is still a new concept and widespread adoption has yet to be achieved. The latter is likely hindered in part by the limited frameworks and solutions available, but also by the lack of clear guidelines that should be followed.
Even so, it is the beginning of another evolution in the market as various companies attempt to interpret the concept loosely. Data scientists and IT professionals can still find it challenging to determine where they should begin or how they should define success metrics.
The Role of Security in DataOps
A report from 451 Research shows that global enterprises are turning to DataOps because they can innovate faster, but also because it can help them resolve serious security issues and compliance problems. In fact, 66% of the respondents cited increased security and better compliance as their number one reason for adopting DataOps[vii].
Organizations are under greater scrutiny than before due to all the data breach issues many have experienced. There’s also more pressure from regulatory bodies regarding data privacy. So, companies are turning to DataOps to develop and implement data governance policies that are consistent, but that still allow data to flow rapidly, while being completely secure.
One issue is the greater number of people who require access to data, which led to 68% of respondents stating that securing the data they share with internal and external users is a serious concern.
Most of the data breaches that make it onto the news are generally caused by external threats. However, the reality is that the most significant threats tend to come from internal users. This is not necessarily the intent, but negligence often leads to severe issues. This also falls to the organization for not having consistent and uniform security policies in place, and a way to enforce them.
DataOps can provide the homogenous approach to security required to keep the data safe, regardless of whom has access to it, as long as it has the right data platform to work from. This unified approach can function in all areas of the organization, no matter the technology being used.
The DataOps Manifesto
The organizations and people who support DataOps have created a manifesto that consists of eighteen principles summarizing the best practices, philosophies, goals, mission, and values of those who practice DataOps.
The manifesto places individuals and interactions above process and tools. They focus on working analytics instead of comprehensive documentation. They advocate for customer collaboration instead of focusing on negotiating contracts. They support experimentation, iteration, and feedback instead of spending an inordinate amount of time on upfront design. They also feel that siloed responsibilities should be eliminated in favor of cross-functional ownership of operations.
The DataOps Manifesto principles are as follows[viii]:
1. Customers must come first in all stages and the highest priority of DataOps is to ensure the customer is satisfied through the quick and continuous delivery of valuable insights.
2. Place value on insights generated, which should be the real metric of the performance of data analytics.
3. Welcome change, including the fact that customer needs evolve, and talking to customers face-to-face.
4. Analytics is about teams of people with different roles, skills, titles, and favored tools.
5. Collaborate with customers and operations at all stages, every day, throughout the project.
6. Self-organization as it leads to the best insights, architectures, algorithms, designs, and requirements.
7. Focus on creating sustainable and scalable teams and processes instead of concentrating on heroism.
8. Regular self-reflection to improve operational performance.
9. Analytic teams rely on a variety of tools that generate code and configuration, describing how the data is acted upon to generate insights.
10. Orchestration from start to finish of data, code, tools, environments, and teams is essential to success.
11. Everything must be versioned because reproducible results are a requirement.
12. Minimize experimentation costs for analytic team members by providing disposable environments.
13. Simplicity, also known as doing as little useless work is possible, is essential to success and improves agility.
14. A fundamental DataOps concept is to focus on achieving constant efficiencies in the production of insights.
15. Analytic pipelines must have a foundation that can automatically detect abnormalities and security problems in data, configuration, and code. It should also provide constant feedback so errors can be avoided.
16. Quality, performance, and security measures should be constantly measured to identify any variations.
17. Avoid repeating work previously done to improve efficiencies.
18. Minimize the time and effort required to transform a customer need into insight, transform it into reality, release it as a production process that can be repeated, and then reuse the product.
DataOps: The Future
While it might not yet have achieved widespread adoption, the future is obvious: DataOps is here to stay. Much like DevOps, we’ll see a rise in the value of associated teams and positions[ix].
For example, before Agile Development, release engineers were significantly undervalued, especially when compared to software developers. Now, though, companies that implement DevOps highly value release engineers. Furthermore, a DevOps engineer, as they are now known, is one of the best-paid positions in software engineering.
DevOps engineers are so difficult to find that companies are willing to hire someone even if they don’t have a college degree as long as they have the right knowledge and experience — this is becoming a huge trend.
Something similar is likely to happen with what can likely be known as the position of DataOps engineer. Regardless of the title, data analysts, data engineers, and data scientists can be even more valued with the implementation of a sound DataOps strategy.
However, it might be a while before this happens. DataOps is still a new idea, and though there is much conversation around it, limitations exist that hinder widespread adoption.
These limitations can gradually disappear, of course, as DataOps becomes increasingly popular. It’s likely that, in the near future, we will see more discussion on the principles and guidelines that can lead to successful implementation.
Just like DevOps has evolved to play a vital role in the management of IT infrastructure, so too can DataOps change the way data is made available, shared, and integrated.
As more data is being collected and/or produced every day, an increasing number of enterprises can have little choice but to turn to DataOps so they can manage their data more efficiently and effectively.
[i] Olavsrud, Thor. “What Is DataOps? Collaborative, Cross-Functional Analytics.” CIO, November 21, 2017. https://www.cio.com/article/3237694/what-is-dataops-data-operations-analytics.html.
[ii] Rackspace. What Is DevOps? — In Simple English. Accessed April 18, 2019. https://www.youtube.com/watch?v=5Hd0HUNhdVQ.
[iii] DataKitchen. “DataOps Is NOT Just DevOps for Data.” Data-Ops (blog), November 18, 2018. https://medium.com/data-ops/dataops-is-not-just-devops-for-data-6e03083157b7.
[iv] “DataOps.” In Wikipedia, March 27, 2019. https://en.wikipedia.org/w/index.php?title=DataOps&oldid=889646323.
[v] DataKitchen. “2017: The Year of DataOps.” Data-Ops (blog), December 19, 2017. https://medium.com/data-ops/2017-the-year-of-dataops-b2023c17d2af.
[vi] Science, #ODSC-Open Data. “DataOps and the DataOps Manifesto.” #ODSC — Open Data Science (blog), February 26, 2019. https://medium.com/@ODSC/dataops-and-the-dataops-manifesto-fc6169c02398.
[vii] “Speed and Security Are Main Drivers for Surge in DataOps Adoption.” Delphix. Accessed April 18, 2019. https://www.delphix.com/blog/speed-security-surge-dataops.
[viii] DataKitchen. “DataOps Engineer Will Be the Sexiest Job in Analytics.” Medium (blog), May 16, 2017. https://medium.com/data-ops/dataops-engineer-will-be-the-sexiest-job-in-analytics-9c38bf444e5a.
[ix] “The DataOps Manifesto.” Accessed April 18, 2019. https://www.dataopsmanifesto.org/.