How to Write Tests For Your Data Science Code

Joos Korstanje
Analytics Vidhya
Published in
3 min readDec 23, 2019

Last week, a coworker called me up with an urgent demand. A bug in a program that was developed a while ago and hadn’t been looked at since.

Writing automated test cases for your code is a must!

I looked at the code and found the problem. The script imported a function from somewhere else, but the data returned by this function was in a different format than it was before.

The problem: an update to the external function without running any testing!

Not writing test cases will cause frustration for those who maintain your program!

Writing automated test cases for your code is an absolute must for everyone that writes code. But in the data field, there are still too many people not following those best practices.

For those people, I dedicate this article to approaches for test code.

What should you test?

There is much terminology around testing and it can be quite confusing when getting started. The most important is to understand the different types of tests that exist and why they are all important to have:

  • Unit tests: a test for each small piece of code.
    Each small unit code that you implement should go together with a unit test: a test that verifies whether the unit of code does what you expected. A unit can be a method or a function, a file or a class, depending on how you implement testing.
  • Integration tests: testing whether different parts of code match
    When you have multiple units that depend on each other, the integration test verifies whether this works correctly.
  • Regression tests: testing whether you change has caused another bug
    With the previous tests you established that the code works correctly, but it may be the case that something else breaks whenever putting it in the codebase. Regression testing prevents this from happening.
  • Functional tests: are the results correct
    Sometimes a program seems to run perfectly, but the “content” of the result is incorrect. For example, you haven’t a single bug in your data flow but an end-user calls you and says: sorry but this is just not the correct answer. Functional tests can help with that.
  • Acceptance test: does the program meet the specs
    Finally…

Joos Korstanje
Analytics Vidhya

Data Scientist — Machine Learning — R, Python, AWS, SQL