Debugging integration tests in Wikimedia’s CI

Kosta Harlan
Down the Rabbit Hole
7 min readMar 27, 2019

Last week Antoine wrote an excellent post, Help my CI job fails with exit status -11, which details his efforts in resolving a segmentation fault error in Wikimedia’s continuous integration stack. While that post is focused mainly on the lower level software used in our CI (tl;dr, updating the libc6 package sorted things out), this post describes debugging techniques used to diagnose test failures farther up the stack, in our PHP code.

Unbreak now!

Last week, a phabricator issue was filed noting that integration tests from the Echo extension were causing all continuous integration builds for MediaWiki core and extensions to fail:

Console output from the failed build. (Public domain)

Pretty quickly the issue was escalated to the highest priority (“Unbreak Now!”) as the failure began to block merges for all extensions and MediaWiki core. As a quick fix, we disabled the test to clear up the CI pipeline, then got to work investigating the root cause.

Reproducing the error

My first priority was trying to reproduce locally, since as Roan noted:

Debugging this is slow going because it’s only reproducible in Jenkins (not locally), only through Depends-On patches in certain other extensions (not Echo itself), and the job that runs the tests in question takes 15–20 minutes to run.

To help with this, I decided to use the CI test runner Quibble on my machine to see if I could learn anything from observing or modifying its build configuration. The Quibble docs have a helpful section on reproducing a CI build, just make sure you also follow the setup directions carefully.

I created an environment file, echo-build-failure.env with the contents pulled from the Jenkins job parameters:

EXT_DEPENDENCIES=mediawiki/extensions/AbuseFilter\nmediawiki/extensions/AntiSpoof\nmediawiki/extensions/ArticlePlaceholder\nmediawiki/extensions/BetaFeatures\nmediawiki/extensions/Capiunto\nmediawiki/extensions/CentralAuth\nmediawiki/extensions/CheckUser\nmediawiki/extensions/CirrusSearch\nmediawiki/extensions/Cite\nmediawiki/extensions/CodeEditor\nmediawiki/extensions/Echo\nmediawiki/extensions/EducationProgram\nmediawiki/extensions/Elastica\nmediawiki/extensions/EventLogging\nmediawiki/extensions/GeoData\nmediawiki/extensions/JsonConfig\nmediawiki/extensions/LiquidThreads\nmediawiki/extensions/MassMessage\nmediawiki/extensions/MobileApp\nmediawiki/extensions/MobileFrontend\nmediawiki/extensions/PdfHandler\nmediawiki/extensions/PropertySuggester\nmediawiki/extensions/Renameuser\nmediawiki/extensions/Scribunto\nmediawiki/extensions/SiteMatrix\nmediawiki/extensions/SyntaxHighlight_GeSHi\nmediawiki/extensions/TemplateData\nmediawiki/extensions/TimedMediaHandler\nmediawiki/extensions/TitleBlacklist\nmediawiki/extensions/UniversalLanguageSelector\nmediawiki/extensions/UserMerge\nmediawiki/extensions/VisualEditor\nmediawiki/extensions/WikiEditor\nmediawiki/extensions/Wikibase\nmediawiki/extensions/WikibaseLexeme\nmediawiki/extensions/WikibaseMediaInfo\nmediawiki/extensions/WikibaseQualityConstraints\nmediawiki/extensions/WikimediaBadges\nmediawiki/extensions/WikimediaEvents\nmediawiki/extensions/ZeroBanner\nmediawiki/extensions/ZeroPortal\nmediawiki/extensions/cldr ZUUL_PROJECT=mediawiki/extensions/WikibaseCirrusSearch ZUUL_REF=refs/zuul/master/Zf4395db964cf4ab69bbb88a7e8286986 ZUUL_BRANCH=master ZUUL_URL=git://contint2001.wikimedia.org

Then I started Quibble, overriding its default command so that it would just run the PHPUnit test that was failing:

cd ~/quibble-builds \
&& rm -f src/LocalSettings.php \
&& docker run -it --rm \
--env-file ./echo-build-failure.env \
-v ~/quibble-builds/cache:/cache \
-v ~/quibble-builds/log:/workspace/log \
-v ~/quibble-builds/ref:/srv/git:ro \
-v ~/quibble-builds/src:/workspace/src \
docker-registry.wikimedia.org/releng/quibble-stretch-hhvm:latest \
--commands \
"php tests/phpunit/phpunit.php extensions/Echo/tests/phpunit/DiscussionParserTest.php"

Note that the quibble-builds directory is useful to keep around, as it will contain the git repositories you clone (mine contains 2.0 gigabytes of data at the moment). The rm src/LocalSettings.php is there because Quibble will refuse to install MediaWiki if a LocalSettings file is found.

Also note that the initial run here is incredibly slow, IIRC it took 1–2 hours for Zuul CloneMapper to do its thing and ensure that the 42 repositories that are part of this build were checked out to the correct version. The good news is that after the initial run, you can modify your command to add --skip-zuul and --skip-deps which means that start-to-finish, the setup, test execution, and tear down takes about 3.5 minutes:

docker run -it --rm \ 
--env-file ./echo-build-failure.env \
-v ~/quibble-builds/cache:/cache \
-v ~/quibble-builds/log:/workspace/log \
-v ~/quibble-builds/ref:/srv/git:ro \
-v ~/quibble-builds/src:/workspace/src \
docker-registry.wikimedia.org/releng/quibble-stretch-hhvm:latest \
--commands \
"php tests/phpunit/phpunit.php extensions/Echo/tests/phpunit/DiscussionParserTest.php" \
--skip-zuul --skip-deps

When the test runs, you’ll see that Quibble assembles a LocalSettings.php in ~/quibble-builds/src/ and in that file, it enables the 42 extensions that are part of the build by adding wfLoadExtension() for each extension.

My plan was to disable individual extensions by removing them from LocalSettings.php and see if the build passed. Doing so is kind of a dicey procedure; you have to kick off the Quibble build, watch the output until you see that LocalSettings.php has been created, then in a separate terminal tab, while the DB is being created and extensions are being installed, quickly open the settings file and delete (or comment out) all of the wfLoadExtension() lines before the php tests/phpunit/phpunit.php command initiates; in my experience you have about 20–30 seconds to do this.

(As an aside, it would be really nice if: 1) you could override the LocalSettings.php file and 2) if you didn’t have to reinstall the DB each time you wanted to run the phpunit step in Quibble.)

Pinpointing the problem

Since we suspected CentralAuth was the culprit, that was the first one I tried. No luck. Then I moved on to disabling a block of the extensions by removing these lines:

wfLoadExtension( 'AbuseFilter' ); 
wfLoadExtension( 'AntiSpoof' );
wfLoadExtension( 'ArticlePlaceholder' );
wfLoadExtension( 'BetaFeatures' );
wfLoadExtension( 'Capiunto' );
wfLoadExtension( 'CentralAuth' );
wfLoadExtension( 'CheckUser' );
require_once "$IP/extensions/CirrusSearch/CirrusSearch.php"; wfLoadExtension( 'Cite' );
wfLoadExtension( 'cldr' );
wfLoadExtension( 'CodeEditor' );
wfLoadExtension( 'EducationProgram' );

Success! I started disabling a few of these at a time and soon found that BetaFeatures was the culprit. Happy that I thought I had found the source of the problem, I moved over to my local development environment where I had better tooling available for debugging, namely XDebug, and wouldn’t have to wait minutes for a MediaWiki environment to be created on each test run. So, I enabled BetaFeatures and Echo, ran the test, and… the tests passed. 🤦

OK, back to the drawing board. I had to figure out what was special with the Quibble CI build environment that caused the tests to fail there, but not locally.

I started looking in the BetaFeatures Hooks implementation code. Commenting out its registration in extension.json caused the build in Quibble to pass. From there, I started looking at each individual hook, adding an early return to see if the test passed, and once I found the suspect hook (updateUserCounts()), I saw that the problem resided in this bit of code:

$oldUser = User::newFromName( $user->getName() );$betaFeatures = $wgBetaFeatures;
Hooks::run( 'GetBetaFeaturePreferences', [ $user, &$betaFeatures ] );
foreach ( $betaFeatures as $name => $option ) {
$newVal = $user->getOption( $name );
$oldVal = $oldUser->getOption( $name );
[...]
}

Adding a return just before $oldUser->getOption( $name ); caused the test to pass… wait, what?! This made no sense to me. While talking with Roan about this, he pointed out that $oldUser->getOption() is only called when an extension that provides a BetaFeature is registered. Ah, this makes a lot more sense now. From here, I could reproduce the failure in my local development environment by enabling Echo, BetaFeatures and VisualEditor (which registers a BetaFeature), and things went much more quickly.

Bisect to the rescue

After looking at the git history for Echo and BetaFeatures, I didn’t spot anything suspicious, so my attention went to MediaWiki core. I jumped back in history to commit e66958a2e9 where the build now passed! From here it was a matter of using git bisect. If you’re not familiar with it, git bisect simplifies the process of tracking down a regression if you have known “good” and “bad” states of your code. You tell git the commit hashes which correspond to those commits, then git will help you travel through the repository history in an efficient way to find the problematic commit using as few steps as possible. There’s a great tutorial here if you want to learn more.

Sure enough, with bisect I found the guilty commit at 03908112 in a seemingly innocuous patch, rdbms: make Database::query() more readable and consistent.

Just diff it

The next morning, I started poking through the code affected by the core patch. It was pretty opaque to me, so I thought I’d try another technique to see if I could understand the changes by looking at the outputs. In my development environment’s LocalSettings.php file, I added:

$wgMWLoggerDefaultSpi = [
'class' => MediaWiki\Logger\ConsoleSpi::class,
];

This results in all output getting logged to the console. Then I did two PHPUnit test runs (php tests/phpunit/phpunit.php xtensions/Echo/tests/phpunit --filter testSigningDetection). Test run A resulted in failing tests. For test run B, I added a continue; just above $oldVal = $oldUser->getOption( $name ) in BetaFeaturesHooks::updateUserCounts(). For both test runs, I piped their output to text files, then I diffed the text files.

Diffing the output is a last resort for me, but in this case it helped illuminate the underlying problem (Public domain)

From the diff it became clear that the timing for when User::loadFromDatabase executes is different between the two test runs. As a result, in the failing test run, User::loadOptions runs before User::saveOptions and so doesn’t find the signature needed for the test pass.

After stumbling around the core database code a bit I saw that the logic for setting $this->lastWriteTime had changed, and restoring the old logic fixed the problem.

While the underlying cause is complicated, the tl;dr is that a subtle change to the PHP code around database read/writes to temporary tables (our integration tests use temporary tables) ended up causing reads in certain test scenarios to fail. There’s a patch up for review that should sort out the problem.

Conclusions

  • Knowing how to use Quibble to reproduce CI build failures on your machine is key to sorting out these types of problems
  • git bisect shines in helping track down problematic commits
  • Diffing PHPUnit console output, while kind of tedious and a bit of a last resort, is a useful tool for identifying cascading changes caused by similar-but-different code
  • Talking with colleagues when debugging is key! Most of this post is reconstructed from instant message chats I had with my Growth Team colleague’s Roan and Stephane in the engineer chatroom. Without discussion this would have undoubtedly taken longer to pinpoint

Many thanks to Jess Klein, Roan Kattouw, Antoine Musso, Eric Gardner, Timo Tijhof and Ryan Kaldari for proofreading and feedback, and to Aaron Schulz for fixing the bug :)

--

--