Better Test Code Coverage Through Data

EG Tech
Expedia Group Technology
3 min readNov 14, 2016

About a month ago our team (the ever growing and more distributed and diverse iOS app team responsible for the Expedia app) decided to try some new things to boost test coverage. We introduced a github build check that measures the code coverage for any particular change using codecov.io. Our starting point for this check was 90% to start. We hadn’t measured this on a per-patch basis before so the check started as advisory only, to see if we could drive team awareness. This data will help us set a harder bound on code coverage that as a team we’re willing to accept.

Collecting Data

Using the github3.py library I was able to make a quick and dirty python script to pull up the status checks for all the pull requests and filter for the codecov/patch check. This gave me 298 PR’s since this check was introduced.

[python]
#!/usr/bin/env python

import sys
from github3 import login

githubAccessToken = sys.argv[1]
githubAccount = sys.argv[2]
githubRepository = sys.argv[3]

gh = login(token=githubAccessToken)
repo = gh.repository(githubAccount, githubRepository)
for pr in repo.pull_requests(state=”all”):
statuses = repo.statuses(pr.head.sha)
for status in statuses:
if status.context == “codecov/patch”:
print(“{1.number},{0.state},{0.context},{0.description}”.format(status, pr))
break
[/python]

Pretty Pictures

That gave me a csv file that I imported into a google spreadsheet. One of the nice features of google spreadsheets is it makes all kinds of nice charts/graphs automatically based on your data. The two interesting charts are as follows:

Histogram of % coverage for iOS pull requests (shows a large bar at 0% and then a ramping up to 100%)
pie chart of PRs with 90% code coverage (59.4% successful, 40.6% failed)

This histogram is particularly interesting because we can start to understand how well we’re contributing to our testing efforts. This graph gives me hope because we definitely see some good clustering higher up in the range.

Analysis

From the first graph we see that for 59% of PRs new changes are 90%+ covered. Lets denote this as 59%@90+

If we shift from 90+ to 80+ we would see a success rate of 65% given the current data. Lets denote this as 65%@80+

So if we just move the needle for the committers of the 0% case for new code, and assume those pull requests are similar in makeup and distribution to the rest of the pull requests, then 65% of them would be 80+ and 59% of them would be 90+. Therefore we could expect a success rate of 76%@80+ and 67%@90+.

After manually checking and sampling some of those 0% PRs many of them would easily hit 80%+ so our success rate could be as high as 83%@80+.

So why all these numbers?

The current check is optional because I feared the disruption to delivery would be too drastic. I also didn’t have the data to predict what the disruption would be. But now I can present options.

90+ check required: 59% of our deliveries would receive no impact.
80+ check required: 65% of our deliveries would receive no impact.
70+ check required: 70% of our deliveries would receive no impact.
60+ check required: 73% of our deliveries would receive no impact.
50+ check required: 77% of our deliveries would receive no impact.

10+ check required: 82% of our deliveries would receive no impact.

This allows us to quantify the impact to the team and tune what the short term impact of this policy would be (we’ll of course be watching closely in the future to measure the long term effects of this through other metrics, to validate that code coverage is increasing quality).

--

--