Bug Prediction

Published in

Naukri Engineering

3 min readAug 23, 2018

With a growing trend towards ‘shift left’ testing in the world of software testing teams are focusing more on bug prevention rather than detection. QA teams today are involved in activities like requirement analysis and providing test cases before the start of development(TDD) which should ideally help teams in preventing most of the bugs and hence saving time, effort and cost in the development cycles.

A few teams are taking this one step further and making an attempt to predict bugs.

Bug prediction is a process where we try to predict bugs based on historical data of the particular application. The team identifies ‘bug hot spots’ in the code base and flags as section of code which when modified historically results is a lot of bugs. This gives the the development team an edge in the following ways:

Developers are more cautious while coding in these files.
Code reviewers know areas where they need to pay more attention.
The development team can identify and prioritize refactoring the code in these areas over others.

These measures will eventually help improving the overall quality of the application.

There are various approaches that can be used to help predict bugs. However, in this article we will discuss an approach that we have tasted success working with: Prediction based on commit history.

In this approach we take into consideration the number of code check-ins made for bug fixes in a particular area of code. The higher the bug fixes, the higher the probability of bugs arising.

Bug hot-spots based on commit history:

This approach uses an algorithm which looks at the commit history of an application and based on the number of commits for bug fixes and their recency identifies bug hot-spots and arranges them in order of decreasing risk.

Categorizing commits as Bug fixes or not.

There are several ways of doing this. The simplest being asking the dev team to always use a #bug for a bug fix in the commit message and a #req when commiting code for any other requirement.

At our organization we follow a process where Jira IDs are added to the commit messages, Jira IDs can be picked from these messages and a simple API request to Jira will tell you how many of these happened to be bugs.

The Algo Itself

The algo used by us is a python based implementation of the bug prediction algorithm proposed by Google. The code for the Algo can be found here: https://github.com/niedbalski/python-bugspots

The algo now equipped with the data(Git commit history) in a given time frame, generates a ‘risk-score’ for all files in the code base being analyzed. Apart from the the total number of bugs, the algo also takes into account recency — weightage of the bugs decays with passing time. Meaning a bug fixed 3 days ago will carry more weight over one fixed 30 days ago.

The example below describes why we need the decay in weightage:

Let’s assume we have a code base with two files. File-1 has 30 bug commits but all of them occurred 2 months ago, post which the team refactored the entire file. File-2 has 15 bugs but all in the past one week. In case there is no decay, File-1 will keep scoring more on the ‘risk-o-meter’ whereas we all know that is not true in this case.

There are several challenges that can plague an honest attempt at improving quality by means of a bug prediction. Biggest among these is a need for sincere tagging of commit messages. The Algorithm, however brilliant it might be, will fail if the data we are feeding it with is not accurate. However, as we observed, once the advantage of bug-prediction becomes evident developers are quiet motivated to put in the extra effort.

Bug Prediction

Bug hot-spots based on commit history:

Written by Kandeel Chauhan