A minimalist’s guide to clean code

Rasmus Halvgaard
3 min readMay 30, 2022

--

Most Data Scientists don’t know how to write code that is

  1. clean
  2. versioned
  3. reusable
  4. maintainable
  5. documented
  6. well-tested
  7. devops pipelined

True.

Because code is just a tool to automate and implement the fun stuff: math and algorithms that do cool sh*t. Add academically working alone in Jupyter notebooks most of the day and you really don’t want the extra overhead added by that 😩🤬😵 list.

Apparently, many software engineers can’t write clean code either. At least it doesn’t seem like many people appreciate clean code enough to actually practice it. Often to sacrifice long term quality for short term development speed.

If your code doesn’t spark joy, it’s probably time to clean up.

In my pursuit of better software engineering skills and through discussions with many professional programmers — here are my best practical principles to keep in mind when aiming for cleaner code.

1. Less > More

Functions or methods should not be more than 5 lines. And repeat after me: Functions or methods should not be more than 5 lines.

Writing code blocks of max 5 lines forces you to structure the code in nice decoupled and readable chunks ready for test.

It will prevent you from doing a million different things inside functions or classes. Don’t write a 500 line function that fetches and manipulates data with large for-loops filled with filters and logic wrapped in if-else’s spanning many screens. One class. One responsibility.

I would rather write much less code covering less functionality if it’s high quality. Even though it takes more time to write it short, it will save time in the end and you will enjoy the result more.

Difficult to discard things you can still use? Git will save you. Not cluttered files.

Here are three more principles.

2. aPPly_aCCurat3-&-m34N!ngFuL_NAM|NG

Code is words. Words tell a story. You are the author. Please make it easy for the reader. Or your future self.

Write out the words. Long names are okay. Shortening variable names was from a time when they took up memory in the compiler.

tra_ids ➡ transaction_ids

If your function name is too long and contains ‘and’, you probably have too much functionality inside. Apply the max 5 lines principle described above.

3. No comment

Avoid comments. Specifically the ones that do not add any value, e.g.

# Print init
init_print()

Whyyyyyyyyyyy…!? 🤫

Tell your story using proper function and variable names. Don’t add noise.

If you find yourself writing longer comments, you probably just need to put the next block of code in its own function with a name inspired by your comment.

Assigning boolean expressions is also an expressive way to avoid comments

data_is_weird = data.importantMetric < 0 & data.obscureVariable = 42
if data_is_weird
do_something_with(data)

4. Write one test

You will never feel safe about changing code without tests. And you will waste so many hours with stupid bugs that could have been caught immediately. Yes, ideally you test E-VE-RY-THING. But you are lazy 😒 So aim for effectiveness. Apply the 80/20 principle and just write one test (20% effort) with a lot of coverage (80% code tested), i.e. one that covers the most functions, i.e. the most effective.

Once started, you will most likely write more tests. Having at least one paves the way for automatic testing in your devops pipelines as well.

There is of course much more to writing high quality clean code — but these principles definitely sent me on the right path 🚀.

Happy coding.

💡 If you are stuck or don’t understand a new codebase. Start refactoring the code in your own branch applying the principles above. It’s a great way to get to know messy code.

TL;DR Blocks of code should not be more than 5 lines of code.
At first it sounds obviously clicheed 🙄 á la “3 surprising tips to improve your life: Eat better. Exercise daily. Sleep more.” But — please just try it out 🙏.

--

--