Member-only story

Down with technical debt! Clean Python for data scientists.

Andy Greatorex
Towards Data Science
12 min readDec 3, 2019

Data science teams tend to pull in two competing directions. On one side there’s the data engineers who value highly reliable, robust code which carries low technical debt. On the other, there are the data scientists who value the rapid prototyping of ideas and algorithms in Proof-of-Concept like settings.

While more mature data science functions enjoy a fruitful working partnership between the two sides, have sophisticated CI/CD pipelines in place, and have well defined segregation of responsibilities, oftentimes early stage teams are dominated by a high ratio of inexperienced data scientists. As a result, code quality suffers, and technical debt accumulates exponentially in the form of glue code, pipeline jungles, dead experimental codepaths, and configuration debt [1].

Can you imagine a life without xkcd?

Recently I wrote a brain dump on why data scientists’ code tends to suffer from mediocrity, in this post I’m hoping to shed light on some of the ways that more fledgling data scientists can write cleaner Python code and better structure small scale projects, with the important side effect of reducing the amount of technical debt you inadvertently burden on yourself and your team.

Neither exhaustive in scope nor rigorous in depth, the below is intended to act as a series of shallow introductions to ways you can institute data…

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Andy Greatorex
Andy Greatorex

Written by Andy Greatorex

London based data scientist @Revolut. Formerly in NYC @Barclays. Building stuff for the fun of it.

Responses (2)