Embedding Tribuo ML library as a JUnit extension

Uday Chandra
Oracle Developers
Published in
3 min readJan 31, 2022
Alina Constantin / Better Images of AI / Handmade A.I / CC-BY 4.0

In this thought experiment, we leverage Tribuo within a custom JUnit extension to see the feasibility of using machine learning (ML) to potentially gain useful quality assurance (QA) insights for a given service or product.

JUnit, the most popular testing framework on the JVM, is a modular and extensible testing framework. JUnit provides extension points to hook into its lifecycle and add custom features to it.

Tribuo is an open source machine learning Java library that provides tools for classification, regression, clustering etc.

For the purposes of this post, I will assume that readers are familiar with JUnit extension model. If you are interested in learning more about the JUnit extension model and how to create a custom extension, you can refer to my article on InfoQ. In addition, I will also assume that the reader is familiar with ML concepts.

Say we have a unit test to validate our hash function that returns a base64 encoded string. It would typically look like this:

public class HashUtilsTest {  @Test
public void validHashTest() {
var valueToHash = "JUnit with Tribuo is fun";
var actualHash = HashUtils.hash(valueToHash);
...
assertEquals(expectedHash, actualHash);
}
}

Now let’s assume that the underlying hash function has been changed where it still does the job, but takes longer to run. Since the function still produces the right hash, the unit test continues to pass. The longer runs however, might go unnoticed.

We can build a custom JUnit extension that computes the total time taken to execute a test class and run Tribuo’s anomaly detection ML model to find any anomalies during test execution. In order to do that, the extension keeps recording the timing data in a CSV file. And for simplicity, all the data that gets recorded is naively interpreted as an EXPECTED observation. Java’s in-built file methods can be used to achieve this:

var csvPath = ...
var time = ...
Files.writeString(
csvPath,
time + ",EXPECTED\n",
StandardCharsets.UTF_8,
StandardOpenOption.APPEND
);

Once enough data gets recorded, the extension builds a ML model and will use it to find any unusual observations when the tests are run. Tribuo’s strongly typed classes can be used to achieve this as shown in the following sample code:

var oneClass = new SVMAnomalyType(SVMAnomalyType.SVMMode.ONE_CLASS); 
var params = new SVMParameters<>(oneClass, KernelType.RBF);
params.setGamma(MODEL_GAMMA);
params.setNu(MODEL_NU);
var trainer = new LibSVMAnomalyTrainer(params);
var model = trainer.train(trainingDataset);
var newRow = Map.of(
"name", clazz.getName(),
"duration", String.valueOf(durationInSec)
);
var headers = java.util.List.copyOf(newRow.keySet());
var row = new ColumnarIterator.Row(trainingDataset.size(), headers, newRow);
var example = sRowProcessor.generateExample(row,false).get();
prediction = model.predict(example);// This is where you would generate an actual report.
System.out.println(example);
System.out.println(prediction);

All we have to do is use this new extension in our test class as shown below:

@AnomalyDetector
public class HashUtilsTest {
@Test
public void validHashTest() {
...
assertEquals(expectedHash, actualHash);
}
}

In the example above, say it usually takes somewhere between 0.5–1 second to successfully run the tests. However, after the change in the hash function, it now takes 3 seconds to successfully run the tests. This could point to a potential performance regression (an over simplification, of course). If the trained model is doing its job right, this unusual behavior should now be marked as a problem.

When the timing of the test class runs are as expected, we see something like this from the model’s prediction:

Prediction(maxLabel=(EXPECTED,...

When the model detects an anomaly, we see something like this:

Prediction(maxLabel=(ANOMALOUS,...

This is just a start. There might be more useful applications of leveraging ML within a test framework to auto analyze tests and provide better QA. Thanks to the excellent open source libraries like JUnit and Tribuo, we can easily explore such use cases!

You can checkout the experimental code on GitHub.

Join the conversation!

If you’re curious about the goings-on of Oracle Developers in their natural habitat, come join us on our public Slack channel! We don’t mind being your fish bowl 🐠

--

--