A practical evaluation of spectrum-based fault localisation — A retrospective

Published in

JSS Editor’s Selection

7 min readDec 2, 2019

The following describes the road to publishing (and impact) of our Journal of Systems and Software (JSS) Paper, A practical evaluation of spectrum-based fault localisation, selected as one of the four finalists for the Journal of Systems and Software’s “10-year Most Influential Paper” award.

The Trader Project

I’ve started my PhD at TUDelft in August 2006. The research work was carried out in the context of a rather large project, partially funded by the Dutch Ministry of Economic Affairs, the Trader project. The project was a collaborative effort of industrial and academic partners, involving 9 PhD students and several faculty and industry members:

Industry: NXP (previously Philips Semiconductors), Philips Consumer Electronics, NXP Research (previously Philips Research Laboratories), Philips TASS
Academia: Design Technology Institute (a joint research institute of the Eindhoven University of Technology and the National University of Singapore), Delft University of Technology, University of Twente, University of Leiden, IMEC, and the Embedded Systems Institute.

Paraphrasing the objectives of the projects, Trader’s main goal was to develop methods and tools for ensuring reliability of consumer electronic products. Trader uses cases from Philips Semiconductors’ Innovation Center Eindhoven in the area of digital television. The Trader project objectives are two-fold:

Ensure reliability by studying and show proof of concept of methods to be applied at design time, test time, and product run-time.
Avoid user frustration by applying user-centric approaches. Ensure reliability not only in a single product, but also in a complete product line.

The line of attention I focused on was related to methods and techniques to be applied at design and test time, namely testing and debugging. The focus was first into self-adaptive systems, but we soon realised that there was no accurate technique to properly pinpoint the root cause of observed failures. As such, we dedicated most of our time to this difficult problem.

Spectrum-based Fault Localization (SFL)

As my PhD advisor is an expert in model-based diagnosis (Prof. Arjan J.C. van Gemund; he is currently an emeritus professor at TUDelft), our initial idea revolved around applying model-based diagnosis techniques in the resource-constrained, consumer electronics domain.

Soon, we came to the conclusion that a TV software stack was not well suited to model-based diagnosis techniques. Models of the TV software stack were inexistent, old or not suitable to model-based diagnosis. Moreover, model-based diagnosis is expensive given the size of the TV software stack (back then, a couple million lines of code). We needed a light-weight alternative to pinpoint the root cause of observed failures.

Back in 2007 there were a few efforts in using code coverage to help finding bugs. Notably, Pinpoint (Recovery-Oriented Computing) and Tarantula had just been published. Other works were emerging, and we decided to create a framework to frame all these techniques. This framework allowed us to understand the technique well in detail and also help us in finding one that would outperform the state-of-the-art back then. We decided to call it Spectrum-based Fault Localization (SFL) because it uses abstraction of program traces, coined program spectra, that were first described by Harrold et al. to solve the Y2K problem.

Our first attempt to publish this work was at ISSTA’07. The paper, overall, was well received and the feedback was rather positive. Ultimately, however, the paper didn’t make it to the conference. Taking into account the feedback, we improved the paper and re-submitted to TAIC PART’07. This conference was particularly appealing because it provides a stimulating platform to facilitate collaboration between industry and academia on challenging and exciting problems of real-world software testing.

TAIC PART was a success. We got good feedback during the review process as well as during the conference. Moreover, we got an invitation to submit an extension to a special issue of the Journal of Systems and Software.

The question then was: how to extend the current version of the paper in order to have an interesting and impactful follow-up paper? 🤔

We decided to do that in two ways: gain a deep understanding of Spectrum-based Fault Localization and apply it to a real-world scenario.

JSS09: Contributions

Our JSS paper describes a language-agnostic fault localisation technique that uses block hit spectra. The primary contribution of the paper is an empirical comparison of different distance metrics (a distance metric is a measure to quantify the likelihood that a certain part of the program is faulty) that can be used during fault localisation. In particular,

We investigated the use of the Tarantula, Jaccard, and Ochiai (Ochiai is currently still amongst the best performing metrics) distance metrics.
We experimentally evaluated the trade-offs associated with the number and types of tests in the test suite (e.g., the impact associated with adding more passing or failing test cases).

One strength of the paper is that the empirical evaluation incorporates several different case study applications. Another positive aspect of the paper is the inclusion of a real world program that is under development at NXP.

JSS09: Impact

Given the fact that showing that the technique was effective and efficient in a resource-constrained, industrial environment, helped in establishing the technique as a promising one. The paper describes with detail several faulty scenarios with the Philips TV software stack (check the JSS paper for further details).

Summary of applying Spectrum-based Fault Localization to the Philips TV software stack

Several studies complemented the ones we did (e.g., using more distance metrics), but the conclusions wouldn’t differ much with respect to which metric performs best. To this date, the technique is considered amongst the best performing ones and is cited in many works (cf. SemanticScholar.org, there are 52 highly influenced papers)

Following the suggestion of one of the reviewers during the reviewing period, the future work sections included several good ideas for new empirical studies. For instance, we discussed new case study applications that were written in other programming languages besides C (the language of the TV software stack), such as the practical challenges that are associated with porting the approach to an object-oriented language like C++ or Java.

There are follow-up works that investigated the usefulness of the technique in other programming languages and domains (e.g., spreadsheets). As an example, we just published a work done in collaboration with Outsystems on test suite selection, that uses ideas from spectrum-based fault localisation and is written in C#.

Another reviewer strongly recommended us to release all or part of the fault localisation framework. Actually, we went even further and instead of just releasing the techniques we implemented a toolset to offer visual interface to help understanding the diagnostic rankings as well as state-of-the-art per-test-case instrumentation facilities. We coined the toolset as GZoltar. The toolset, to the best of our knowledge, is being used by several academics and industrialists.

GZoltar vs. Textual ranking: we argue that a visual representation helps in comprehensibility of the results.

GZoltar, an Eclipse plugin that offers SFL techniques for Java. Furthermore, there is also a lib providing all the functionality needed to implement Spectrum-based Fault Localization techniques in your projects. We also released the C version but are no longer maintaining it.

There are several works that are follow-ups from our work, and listing them, without forgetting some, would be a daunting task. We therefore refrain to do so, but interested readers could check the list of cited works in the following link. In any case, one can find works using the framework proposed by us back in 2009 in techniques for test suite generation, test case selection and prioritisation, test case minimisation, automated debugging (including finding the optimal distance metrics), software metrics (a metric to diagnosability and testability that carries more information then the infamous code coverage).

Moving forward

We think the missing piece of Spectrum-based Fault Localization is the wide adoption by industry. We’ve tried to commercialise it, and raised a considerable amount of money from a VC (promotional video), but our experience tells us that it will only work as an open source effort. In order to achieve that, we need to create an open-source community to further develop these techniques and encourage professors to include such topics in their course units.

Acknowledgements

I’ve written this post on behalf of the other authors: Peter Zoeteweij (at the time post-doc working in close collaboration with me); Arjan van Gemund (my PhD advisor — I am forever grateful for his considerable dedication. Thanks!) and Rob Golsteijn (our industrial partner).

We thank the reviewers of ISSTA, TAIC PART and JSS for the valuable feedback. We also thank the Trader project members.

A practical evaluation of spectrum-based fault localisation — A retrospective

The Trader Project

Spectrum-based Fault Localization (SFL)

JSS09: Contributions

Written by Rui Maranhao