DataOps — in Veterinary Medicine

Published in

CUC4

6 min readJun 8, 2020

First, a brief history. In 2015, we started the transformation of our IT organization from a SysOps approach to DevOps. It was motivated by the lack of performance in our IT organization. We were siloed individual contributors working for various business units in a one on one, direct style. We had no elasticity in resources, developers were responsible for applications from development to retirement, and we had no priority planning process. We weren’t really a team.

This lack of central coordination or technical strategy impeded our ability to innovate, deliver solutions quickly, and negatively impacted the quality of solutions we did provide. It also created the perfect conditions for developer burnout, which was a major concern coming out of a 3 year major system implementation.

Our DevOps evolution bore a set of process improvements and tools that changed us. The end result was an increase in speed to production, an increase in quality, products supportable by the entire team and increased end user satisfaction. And, equally important, the developer experience vastly improved. The outcome was also seen in our cultural evolution — we are now continually evolving our technology stack, processes, and tools to meet the needs of our organization.

These DevOps practices included CI/CD via bitbucket pipelines, ephemeral infrastructure through docker and k8s, cloud infrastructure using AWS, infrastructure as code through ansible and terraform, technical debt management through processes and oversight, and better product and project management techniques. We also extended our application development practices to follow 12factor principles.

I am proud of what we accomplished and continue to accomplish as a team. We transformed our technology landscape and provided meaningful wins for our organization.

And, one last aside before we go further, a simplistic framework to help us think through change:

Know your audience. Who are they? What motivates them?
Start small, learn from failures.
Stay skeptical and test assumptions.
Create value as early as possible.

Now its 2020. And we are again entering a new era. I use to call this smart data, but an engineer on our team made fun of that. He was right. Big Data is not really it either, since our data sets simply don’t meet the “big data” threshold. The reality is we are trying to apply those same big data and data science techniques to our situation. The goal is to accomplish some specific but meaningful improvements to our data landscape, focusing on:

Operational data (i.e. support of data driven decision making of our leadership)
Research data (i.e. help researchers unlock knowledge encapsulated in our health systems)
Patient care data (i.e. help support staff, faculty, DVMs and students provide world class care of patients)
Student evaluation data (i.e. support of student competency)

First, a note on our current data approaches — they are not working. It is identical to the pre-DevOps approach. We are attacking the data problem individually, in a one off fashion. We do not have a coherent strategy that systematically allows us to deliver solutions of high quality and .. fast. And, the current approach is not aware of the developer (or data science) experience.

Some of our philosophy started to change in early 2020. We started our first Apache Spark project to consume data from a vendor who published data to S3. So began the dipping of our toes into the world of streaming ETLs, of building metadata catalog on our data set, building star schemas to reflect our project needs, and pushing data as models to various data sources including elasticsearch, sql server, redshift and couchdb.

This wrangling of data has been an improvement. It was one the first places to mature as an organization, and we are on our way. I am not sure if we have the right mix of technology and processes, yet, but we are learning and experimenting. Big time!

So. How does data get into our organization:

Input (Inhale data from internal and external sources)
Contextualizing (relate to other data)
Globalizing (accessibility)
Target (Security and availability)
Customize (Proper context with individual needs being considered)
Support of BI (ad hoc, reports, dashboards)
Evolve (Add / Subtract to data sources, fast)

We have input. We are starting to understand it. And where we can be better.

And now we have a lot more work ahead of us to complete the process. This includes developing BI solutions that engage and enable our users, applying big data and data science to our data sets to meet the goals listed above (NLP, AI, ML, and more, more, more), and to be agile and innovative to make big gains for our organization. This all requires continuous learning and developing skills in big data, and then, most importantly, operationalizing it.

DataOps to the rescue? Let’s explore that question … by setting the stage to some way to reflect on our data position in our organization.

What is DataOps: An approach that seeks to apply the principles of agile software development and DevOps (combining development and operations) to data analytics, to break down silos and promote efficient, streamlined data handling across many segments of the organization.

DataOps Principles

1. Continually satisfy your customer:

Our highest priority is to satisfy the customer through the early and continuous delivery of valuable analytic insights from a couple of minutes to weeks.

2. Value working analytics:

We believe the primary measure of data analytics performance is the degree to which insightful analytics are delivered, incorporating accurate data, atop robust frameworks and systems.

3. Embrace change:

We welcome evolving customer needs, and in fact, we embrace them to generate competitive advantage. We believe that the most efficient, effective, and agile method of communication with customers is face-to-face conversation.

4. It’s a team sport:

Analytic teams will always have a variety of roles, skills, favorite tools, and titles. A diversity of backgrounds and opinions increases innovation and productivity.

5. Daily interactions:

Customers, analytic teams, and operations must work together daily throughout the project.

6. Self-organize:

We believe that the best analytic insight, algorithms, architectures, requirements, and designs emerge from self-organizing teams.

7. Reduce heroism:

As the pace and breadth of need for analytic insights ever increases, we believe analytic teams should strive to reduce heroism and create sustainable and scalable data analytic teams and processes.

8. Reflect:

Analytic teams should fine-tune their operational performance by self-reflecting, at regular intervals, on feedback provided by their customers, themselves, and operational statistics.

9. Analytics is code:

Analytic teams use a variety of individual tools to access, integrate, model, and visualize data. Fundamentally, each of these tools generates code and configuration which describes the actions taken upon data to deliver insight.

10. Orchestrate:

The beginning-to-end orchestration of data, tools, code, environments, and the analytic teams work is a key driver of analytic success.

11. Make it reproducible:

Reproducible results are required and therefore we version everything: data, low-level hardware and software configurations, and the code and configuration specific to each tool in the toolchain.

12. Disposable environments:

We believe it is important to minimize the cost for analytic team members to experiment by giving them easy to create, isolated, safe, and disposable technical environments that reflect their production environment.

13. Simplicity:

We believe that continuous attention to technical excellence and good design enhances agility; likewise simplicity — the art of maximizing the amount of work not done — is essential.

14. Analytics is manufacturing:

Analytic pipelines are analogous to lean manufacturing lines. We believe a fundamental concept of DataOps is a focus on process-thinking aimed at achieving continuous efficiencies in the manufacture of analytic insight.

15. Quality is paramount:

Analytic pipelines should be built with a foundation capable of automated detection of abnormalities (jidoka) and security issues in code, configuration, and data, and should provide continuous feedback to operators for error avoidance (poka yoke).

16. Monitor quality and performance:

Our goal is to have performance, security and quality measures that are monitored continuously to detect unexpected variation and generate operational statistics.

17. Reuse:

We believe a foundational aspect of analytic insight manufacturing efficiency is to avoid the repetition of previous work by the individual or team.

18. Improve cycle times:

We should strive to minimize the time and effort to turn a customer need into an analytic idea, create it in development, release it as a repeatable production process, and finally refactor and reuse that product.

Moving our organization toward Data Ops is not set in stone. Who knows if this is the right philosophy. We have a lot to learn. Its the beginning of a discussion that has been years in the making.