The Best DataOps Articles of Q3 2018

DataOps awareness is hitting new heights with increasing media and analyst coverage. Here is our list of the best DataOps articles for the third quarter of 2018. If you missed them, here are our previous roundup articles from 2017, Q1 2018 and Q2 2018. Please tweet us if we missed your favorite.

Hype Cycle for Data Management, 2018, Gartner, Nick Heudeckerm, July 31, 2018

This year’s Hype Cycle for Data Management included a new innovation profile for DataOps, a fledging term that’s just entering the story of some data integration and data pipeline vendors. We’ve defined DataOps as: …a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and consumers across an organization. The goal of DataOps is to create predictable delivery and change management of data, data models and related artifacts. DataOps uses technology to automate data delivery with the appropriate levels of security, quality and metadata to improve the use and value of data in a dynamic environment.

All aboard the Hype Cycle! What’s DataOps? Well, it has no standards or frameworks. Got it?, The Register, Richard Speed, September 11, 2018

Gartner has turned its annual Hype Cycling gaze upon data management and found that — shock — expectations of Blockchain remain hopelessly inflated. Arriving in the first part of the Hype Cycle, or “Innovation Trigger” as Gartner branded it is DataOps. Fresh from inflicting the term DevOps on all and sundry (even Visual Studio greybeards could not escape) analysts have devised a new term for a practice “without any standards or frameworks”, as Gartner research veep Nick Heudecker put it. The practice is geared towards sorting out the flows of data within an organisation to meet the needs of consumers.

DevOps for Data Scientists: Taming the Unicorn, Towards Data Science, Syed Sadat Nazrul, July 1, 2018

When most data scientists start working, they are equipped with all the neat math concepts they learned from school textbooks. However, pretty soon, they realize that the majority of data science work involves getting data into the format needed for the model to use. This is where some basic DevOps knowledge would come in handy.

Data Observability — A Crucial Property in a DataOps World, Eckerson Group, Julian Earth, July 2, 2018

Data observability is a measure that provides the continuous, holistic view of a data landscape needed for a streamlined DataOps implementation. This article explains observability and how it applies to data.

In classical system theory, observability describes “how well internal states of a system can be inferred from knowledge of its external outputs” [2]. In the IT sector, observability currently gets a lot of attention in the context of distributed and cloud-based systems, where hundreds or even thousands of microservices need to be monitored and orchestrated.

Demystifying Artificial Intelligence in Insurance: The Tools Supporting Data Science and the Rise of DataOps, Celent, Craig Beattie, Nicolas Michellod, and Zao Wu, July 2, 2018

In this report, we discuss the Data Science workflow, from gathering requirements for the activity through to deploying the output and monitoring it. This workflow is amenable to automation, a topic increasingly referred to as DataOps. DataOps is a sibling of DevOps for Data Science, which aims to allow more effective industrialisation of Data Science in practice, through:

  • Improved repeatability of findings
  • Reduced time to identifying actionable insights
  • Decreased time to impact

DataOps Design — Vision comes first, Data second, Benjamin Peterson, July 3, 2018

DataOps comes to us from a background of Agile, Lean and above all DevOps — so it’s no surprise that it embodies a strong focus on governance, automation and collaborative delivery. The formulation of the Kinaesis® DataOps pillars isn’t too different from others, although our interpretation reflects a background in financial sector data, rather than just software. However, I believe there’s an extra pillar in DataOps that’s missing from the usual set.

Time to Value: The Currency of Data Operations, Randy Bean, Forbes CIO Network, Jul 24, 2018

Business executives at mainstream corporations are quickly grasping a central premise that executives of data-driven companies have well understood — the speed with which the enterprise can get value from its data matters. And, it matters a lot. Accelerating that timeframe, or improving on time to value, is or should be a core focus of any Chief Data Officer. As Mark Clare, former Chief Data Officer of HSBC’s and JP Morgan Chase’s retail divisions expressed it, “This is about speed and cost. Both are key requirements. Speed to market, and speed to value.”

Putting the AI cart before the data horse — “DataOps” as part of a new enterprise stack, Alex Thinath, July 29, 2018

There has been an explosion of interesting enterprise AI applications over the past 3 years, with much of the hype well deserved. It’s clear that executives across a broad spectrum of industries are taking note of the compelling potential applications of AI to tackle anything from process automation, to churn prediction, to next-best-action recommendations for service reps. While I empathize with executives’ interests in pursuing the promise of a competitive-advantage-in-a-black-box, it’s easy to miss a less obvious enabler consistent throughout the companies releasing the newsworthy innovations. In short, Google, Baidu, Microsoft, and kin can pursue these big AI initiatives because they have the clean, unified data to feed them.

Deploy Data-Intensive Applications Faster with DataOps, CA, Yann Guernion, August 3, 2018

DataOps — an abbreviation of data operations — is an agile methodology to develop and deploy data-intensive applications. Largely motivated by the growth of machine learning and data science groups within the enterprise, the practice requires close collaboration between software developers and architects, security and governance professionals, data scientists, data engineers and operations. DataOps aims to promote repeatability, productivity, agility and self-service while achieving continuous data science model deployment.

DataOps: An Interview with Tamr CEO Andy Palmer, Upside, James E. Powell, August 6, 2018

Why DataOps is gaining traction, how it’s evolving, and why it’s ever-more important in large enterprises — an interview with the man who coined the term. In our conversation, we explore how the term came about, why its popularity is growing, how to get started on a DataOps project, and how new technologies are having an impact on DataOps.

MLflow 0.4.2 Released — With Azure Blob Storage, PyTorch and TensorBoard tracking, and H20 Support, Databricks, Aaron Davidson and Denny Lee, August 8, 2018

Databricks was excited to announce MLflow v0.4.0, MLflow v0.4.1, and v0.4.2 which they released with some of the recently requested features. Azure Blob Storage makes it easy to run MLflow training jobs on multiple Azure cloud VMs and track results across them. Databricks added some samples that include advanced tracking, including a PyTorch TensorBoard Sample with the following MLflow UI and TensorBoard output. Thanks to PR 170, MLflow now includes support for H2O model export and serving; check out the h2o_example.ipynb Jupyter notebook.

DataOps Goes Global, DataKitchen, August 14, 2018

Around the world, data professionals are talking how DataOps can help close the gap between raw data and applications that deliver insights. Below is a sample of conferences, talks and meetups that are discussing DataOps and applying its principles in data-driven industries.

Busting Big Data Myths with an Analytics-First Strategy, Booz Allen Hamilton, Kirk Borne

Surviving Your Second Year as CDO, DataKitchen, August 22, 2018

The average tenure of a CDO or CAO is about 2.5 years. In our conversations with data and analytics executives, we find that CDOs and CAOs often fall short of expectations because they fail to add sufficient value in an acceptable time frame. If you are a CDO looking to survive well beyond year two, we recommend avoiding three common traps that we have seen ensnare even the best and brightest.

DataOps: The Challenges of Operating a Machine Learning Model, Altoros, Sophie Turol and Carlo Gutierrez, August 24, 2018

Today, data scientists have a much easier time deploying a machine learning (ML) model through the availability of data and open-source ML frameworks. While it’s a simple matter to write machine learning code to train a basic, non-distributed model with sample at-rest data, the process becomes a lot more complex when scaling up to a production-grade system.

How to Manage a DataOps Team, RTInsights , August 27, 2018 (Original June 29, 2018)

Using a DataOps approach to your big data project — modeled on similar methods used in DevOps teams — could unlock real value for your firm. Big data should bring big changes in how you work as well as the tools you use if you want to take full advantage of emerging technologies and innovative architectures. DataOps — a style of work that extends the flexibility of DevOps to the world of large-scale data and data-intensive applications — can make a big difference. It’s more than just a buzzword. To make DataOps work, you have to know how to organize and manage a DataOps team. Let’s look at what DataOps is, why it’s worth your consideration and how to make the necessary changes in your cultural organization to put this style of work into action.

Prove Your Awesomeness with Data: The CDO DataOps Dashboard, DataKitchen, August 28, 2018

Do you deserve a promotion? You may think to yourself that your work is exceptional. Could you prove it? As a Chief Data Officer (CDO) or Chief Analytics Officer (CAO), you serve as an advocate for the benefits of data-driven decision making. Yet, many CDO’s are surprisingly unanalytical about the activities relating to their own department. Why not use analytics to shine a light on yourself?

DataOps with Christopher Bergh, Software Engineering Daily, August 29 2018

Every company with a large set of customers has a large set of data–whether that company is 5 years old or 50 years old. That data is valuable whether you are an insurance company, a soft drink manufacturer, or a ridesharing company. All of these large companies know that their data is valuable, but some of them are not sure how to standardize the access patterns of that data, or build a culture around data.

Build Trust Through Test Automation and Monitoring, DataKitchen, September 6, 2018

“Trust takes years to build, seconds to break, and forever to repair.” We recently talked to a data team in a financial services company that lost the trust of their users. For a data-analytics team, this is the nightmare scenario, and it could have been avoided. Accurate data analytics are the product of quality controls and sound processes. But you can’t stop there. It’s not enough for analytics to “be” correct. Accurate analytics that “look wrong” to users still raise credibility questions.

DataOps comes to the cloud, betanews, Ian Barker, September 2018

The movement of data into the cloud creates challenges for enterprises who still rely on traditional data integration software or single-purpose data import tools. DataOps specialist StreamSets is launching new features that help companies efficiently build and continuously operate dataflows that span data centers and the big three cloud platforms — Microsoft Azure, AWS, and Google Cloud Platform.

Disband Your Impact Review Board: Automate Analytics Testing, DataKitchen, September 18, 2018

Some companies take six months to write 20 lines of SQL and move it into production. The last thing that an analytics professional wants to do is introduce a change that breaks the system. Large companies often institute slow, bureaucratic procedures for introducing new analytics in order to reduce fear and uncertainty. There is a lot of documentation, checks and balances, and meetings — lots of meetings.

Is the DataOps workbench a thing?, Information Management, Michele Goetz, September 20 2018

Data ops, data engineering, data development — oh my! From new roles, teams, skills and processes, the hot topic on everyone’s mind is data ops.

I started to notice the data ops emergence back in 2015 as companies began to look at agile development to spin up new data capabilities rapidly. Later, as data preparation entered the market, ETL developers were gravitating to these tools for quick data loading with transparency into newly formed analytic lakes.

Getting DataOps right is crucial to your late-stage big data projects, O’Reilly, Jesse Anderson, September 24, 2018

Early on in projects, management and developers are responsible for the success of a project. As the project matures, the operations team is jointly responsible for the success. I’ve taught in situations where the operations team members complain that no one wants to do the operational side of things. They’re right. Data science is the sexy thing companies want. The data engineering and operations teams don’t get much love. The organizations don’t realize that data science stands on the shoulders of DataOps and data engineering giants.

DevOps & DataOps: Catalysts for Organizational Transformation, DevOps.com, Vikash Kumar, September 27, 2018

Over the past few decades, use of the internet has increased exponentially. Nowadays, the need to build quality web applications and managing its huge data effectively is a huge concern of any organization. Thus, there has been continuous exploration for finding better methods of software development and data management.

Operation Data — The 18 Key Principles of DataOps, Power Admin, Des Nnochiri, August 21, 2018

The term DataOps was coined back in 2015 but only really became a significant force in professional circles during the latter part of 2017. But what is this latest tour de force in software development methodology?

_________________________________________

Join the DataOps Revolution. Sign the DataOps Manifesto.

Whether referred to as data science, data engineering, data management, big data, business intelligence, or the like, through our work we have come to value analytics:

  • Individuals and interactions over processes and tools
  • Working analytics over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Experimentation, iteration, and feedback over extensive upfront design
  • Cross-functional ownership of operations over siloed responsibilities

Like this story? Download the 140 page DataOps Cookbook!