At Gousto when collaborating with Data Scientists (as well as working in an Agile environment) we often prototype algorithms. To productionise this we typically need to add tests; error handling, refactoring, etc.
Herein we want to show one approach to tackling this issue of testing untested code.
How do you take a piece of code that data scientists have written and combine that with software engineers skills to have a production ready codebase? It can be daunting at first because there is a lot of functionality in your main function.
Take this simplified piece of example code here:
We have a 3 functions:
1) Main: this expects a dictionary payload that gets transformed into a pandas dataframe, it then goes into convert_column_name and then reroute_factories.
2) convert_column_name: this function renames a dataframe column name.
3) Reroute_factories: goes through all the orders and changes any `Factory_3` to `New_factory`.
In this particular case I’ve kept it quite simple for this example; but where do you start testing this? Do we take the functions convert_column_name and reroute_factories?
One approach is to test the main function, if we can test this with input data we can ensure all the functions are tested. By making this our first test we can safely test the other functions (we can also safely refactor code at this point). Lets see what that looks like:
At this stage we can start to unit test the smaller functions safely within the main function.
Voila, we have tested our codebase. Now at this point we can think about including other best practices such as error handling and applying SOLID principles; we may have to refactor but we know we won’t break functionality.
For future development we can use TDD (Test Driven Development); something we strive to use at Gousto.