Versioning Data Science Solutions

#1 in the Evolving Data Science Series

John Aven
John Aven
Jan 7 · 3 min read

I come from an academic background, so when I talk about how versioning is done within data science, I speak from experience. However, over my career, I have honed my skills as a software engineer, architect, and tech innovation leader. This gives me a unique perspective on how things are and have been done as well as how they can be done better, much better.

Origins of Versioning in Data Science

Data science inherited its practice of versioning from academia. And this approach is what most any computationally scientific field has done for many years. Don’t get me wrong, computer science went this way in the early digital days (and in many cases, this is still prevalent in academia), but has moved onto more advanced practices.

Avoid Faux Versioning

What is this practice? It is the practice of creating ‘versioning’ schemes through ‘smart’ file naming. These schemes generally produce the following kind of files

This approach to versioning, which I refer to as faux versioning, is a disease and it is endemic. But breaking from these practices can be done, and should, as modern practices, such as model management, require it. This can be accomplished using a version control system (VCS)— with an industry preference for a git-based solution.

Go with a Git-Based VCS

Using a git-based VCS with an appropriate branching strategy, you are able to immutably track the version of your ML models, and other artifacts in your data science development life cycle. But, the versioning of code, unlike with other software engineering disciplines, is necessary but not sufficient.

Additional issues must be considered:

While these additional concerns can be managed, a vanilla implementation will not suffice longer term and may cause more harm than help. Hashmap can help you down this path and put in place a solution that best fits your unique business needs.

This is part of the Evolving Data Science series.


Feel free to share on other channels and be sure and keep up with all new content from Hashmap here.

If you enjoyed reading this, some of John’s other recent stories are below:

John Aven, Ph.D., is Lead Regional Technical Expert at Hashmap providing Data, Cloud, IoT, and AI/ML solutions and consulting expertise across industries with a group of innovative technologists and domain experts accelerating high-value business outcomes for our customers. Be sure and connect with John on LinkedIn and reach out for more perspectives and insight into accelerating your data-driven business outcomes.

HashmapInc

Innovative technologists and domain experts helping accelerate the value of Data, Cloud, IIoT/IoT, and AI/ML for the community and our clients by creating smart, flexible and high-value solutions and service offerings that work across industries. http://hashmapinc.com

John Aven

Written by

John Aven

“I’d like to join your posse, boys, but first I’m gonna sing a little song.”

HashmapInc

Innovative technologists and domain experts helping accelerate the value of Data, Cloud, IIoT/IoT, and AI/ML for the community and our clients by creating smart, flexible and high-value solutions and service offerings that work across industries. http://hashmapinc.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade