How to deal with someone else’s bugs? (like a boss)

Joao Sousa
Feedzai Techblog
Published in
6 min readFeb 27, 2019

In today’s reality, we build our projects upon other developers’ work. We do it not only to avoid redundancy, but also in the hope that such developers will be able to do a better job than we would. However, this is not always the truth. People make mistakes no matter how much planning and experience is involved.

But fear not! Finding a bug in a project that our system relies on is not the end of the world.

This article does not focus on fixing bugs, but rather on finding them. From my experience, one can describe the process of tackling bugs with the 80/20 rule, as we usually spend about 80% of the time finding a bug and only 20% of the time fixing it.

The following story is a true incident I dealt with at Feedzai a while ago.

One of the key features of the Feedzai platform is OpenML. OpenML is an open-source API that allows users to build custom machine learning bindings to other well-known machine learning platforms such as Data Robot or Scikit Learn. One of the open-source implementations already offered allows users to train H2O models from the Feedzai platform.

In the test suite written for this provider, there was one flaky test. This flaky test predicts different instances in parallel by using an imported Deep Learning model from H2O. The implementation for this provider simply adapts the OpenML API to H2O API. As such, it took us a little time to conclude that the abnormal behaviour was coming from H2O.

Since H2O is open source in GitHub, the common process is to open a GitHub ticket explaining the problem. However, it’s good practice to include as much information as you can when opening a bug, so that the problem can be reproduced by the developer, so I decided to investigate a little bit further.

I was able to discover from the failing test that when predicting two instances (A and B, whose expected scores were X and Y), the test would eventually fail due to instance A being scored with Y (or B with X). This is a classic race condition problem: when the prediction function is called, it shares states with other threads, thus leading to a score of a given instance being affected by another instance.

In the data science context, the term predict (or classify) is usually interchangeable with the term score. when we say we score an instance, we actually mean the classification of such instance.

The following snippet shows a simplified version of the test, which expresses the race condition by submitting two classification tasks into an ExecutorService:

The Regression Test for this race condition

Note: this test was created inside H2OModelProviderLoadTest in the openml-java repository. If you want to test it locally just fork the project and add the test to the class. You can find the gist here.

By running the test repeatedly until failure mode we can see that it eventually fails with the following error:

org.assertj.core.api.SoftAssertionError: 
The following assertion failed:
1) expected:<[0.95[56366865683105, 0.044363313431689616]]> but was:<[0.95[4432234028561, 0.04556776597143899]]>
at H2OModelProviderLoadTest.testParallelScoring(H2OModelProviderLoadTest.java:113) expected:<[0.95[56366865683105, 0.044363313431689616]]> but was:<[0.95[4432234028561, 0.04556776597143899]]>

… meaning that oneInstance got the score that was expected for anotherInstance.

The naive solution for this race condition is to synchronize the calls to the scoring function of H2O, as was done here. This is enough to fix the problem, but we still can’t identify the shared state. In order to do that we have to dive into H2O’s code.

When tackling flaky tests, the first approach should always be to create a repeatable run that runs the test until it fails so that we can reproduce the behaviour in a single click.

When debugging problems in external code, there are some common approaches:

  • Logging: Limited by the logging effort of the project being debugged. This strategy can also be quite cumbersome.
  • “Blind” Debug — Deep diving into a code that you don’t know. Easy for small projects, but unrealistic for large code bases such as H2O.
  • Injecting code by forking the project — Probably the most efficient of all, but only for simple projects. Complex builds that are hard to iterate on make it unrealistic for complex projects. Also, it assumes access to the source code.

Since all of these approaches seem too cumbersome for this investigation, I decided to take an alternative approach. My idea was to debug H2O’s code to find the moment when the scoring function picks the score from the wrong instance. Since this behaviour has a low probability of failing(only fails once in ~200 of runs), it would be absurd to debug each run waiting for the one where the concurrency issue would manifest itself. Instead, I resorted to conditional breakpoints so that the JVM would only suspend upon an inconsistent state. Conditional breakpoints extend the power of a normal debug breakpoint, such that the JVM is only suspended if a given condition is met. This is a very useful feature to explore non-deterministic behaviour, since instead of suspending the JVM on every iteration, we can suspend it only when the state becomes inconsistent.

An example of the conditional breakpoint in action.

Then it becomes a simple problem of divide and conquer. Place the conditional breakpoint that verifies the expected class distribution for a given instance. If the test fails and the breakpoint isn’t triggered, it means that the concurrency issue happens after the breakpoint. On the other hand, if the breakpoint is triggered right before the test fails, then it means that the state was already invalid before the breakpoint, and you have to keep looking in upstream instructions. Furthermore, you can even place multiple conditional breakpoints throughout the execution path to assure that you find the culprit even faster.

The only disadvantage of conditional breakpoints is that they make the execution considerably slower, which is perfectly acceptable in small tests. So, if you’re planning to use this strategy on a system test (or even an integration test), make sure you remove any redundant computations so that the test can run as fast as possible.

Who was the guilty party then?

… drum roll… One of the performance improvements that the H2O developers focused on was array reusability. This means that when you plan on having a given task that at some point requires using an array, it is faster to create a simple array and reuse it in all runs of the task than creating a task-confined array. One of these performance improvement examples is in DeeplearningMojoModel._numsAand DeeplearningMojoModel._catsA (see their open-source here), where each of these arrays store information regarding each numerical and categorical feature.

Conclusion

Bugs are a natural part of the development process of an application. As such, we should neither fear them nor avoid them, but instead find ways of identifying them so we can fix or work around them. I wrote this blog post to encourage developers to accept bugs, no matter how hard they present themselves.

Check back later in the Feedzai TechBlog, as I am planning on writing another post about how to increase performance for conditional breakpoints.

--

--