Jenkins build boost

GumtreeDevTeam
Making Gumtree
Published in
6 min readDec 8, 2013

Dec 8th, 2013 By Tomas Bezdek

In the last couple of years agile development has become very popular across the industry. Who wouldn’t like agile. Abandoning obsolete and protracted methodologies like waterfall or prototyping and delivering code in short release cycles, less likely to introduce major bugs. But honestly, when you develop a functionality, do you really want to be waiting till the end of the sprint to see your code in production? Well in Gumtree we don’t like waiting. We want to be able to deliver our customers new functionality and better experience not withing weeks or days, but within hours. And to achieve this, we need not only fully automated, but also extremely fast release process.

One of the key problems we were struggling with was build pipeline in our CI (Jenkins). The pipeline is configured to build the whole stack of projects we have and has about 25 jobs, some of them running in parallel. Passing the pipeline usually took about 45 to 60 minutes and sometimes even longer when multiple pipelines were executed simultaneously. Surprisingly building most of the pipeline’s jobs locally and sequentially didn’t usually take more than 20 minutes, which was giving me nightmares and eventually made me to look closer on what’s happening during the build and what’s causing this slowlyness.

Good logging is essential when you are trying to detect application issues. Jenkins is no exception and default installation provides pretty good logging for jobs executions. Unfortunately the logs don’t contain any timestamps which makes revealing bottlenecks quite difficult. But there is a solution — Timestamper plugin, which can for each log entry display either system time or time elapsed since job was triggered. Each of the formats is handy in different situation, I’ve usually used time elapsed to compare builds with different build times to see where the difference comes from.

Internal networks and proxies

Having timestamps, first delay was very easy to discover. By simply following sequence of timestamps on each line I’ve noticed there’s long delay when executing jaxws-maven-plugin. Our CI infrastructure is running on private network with no direct access to Internet and communication with outer networks is handled by proxy server. We are obviously no amateurs and maven was configured to honor the proxy, but we didn’t realize this plugin is executing separate JVM and doesn’t automatically pass proxy settings. It took a while to figure out correct configuration, but I’ve succeeded and maven was no more timing out for 30 seconds each time the plugin was executed. Later on, the generated classes were moved to separate, out of pipeline project and build time was improved again.

Artifact publishing and fingerprinting

No matter how useful feature might fingerprinting be, it should be carefully chosen which jobs needs the fingerprinting and these jobs shouldn’t keep too long build history. Otherwise it can easily happen that many gigabytes of your Jenkins master will be occupied by couple millions of small fingerprint files and when Fingerprint clenup job is executed, heavy I/O load will make the master pretty unresponsive. And even if the GUI itself might still behave quite ok, it surely will slow down slave jobs which would be trying to send files back to master. In our case, fingerprinting was just leftover obsolete plugin and could be safely uninstalled.

I came across the fingerprinting plugin by trying to solve different issue — sometimes copying build artifacts back on master for archiving took ages. Fingerprinting was the reason, but after talking to other developers and site operations, it turned out that since we are storing all the artifacts in Nexus, we don’t really need them on Jenkins and this feature could be turned off as well.

As I mentioned at the beginning, building projects locally took much less time than running them in Jenkins. But even after resolving previous two blockers, maven still reported per-project build time significantly higher than on my localhost. There were no more obvious delays during the build, so I tried to login to slave, manually checkout longest lasting project and execute build manually to see if problem is in job configuration of somewhere else. The pure maven build lasted longer than running same command on my local environment and I finally realized, the problem is not job setup, but slave performance itself. Reducing number of slaves and assigning gained resources to remaining machines made the pipeline perform much better again.

Unfortunately reducing number of slaves resulted in jobs having to queue when building multiple pipelines simultaneously. This still has to be fixed, but since not all jobs require powerful hardware, the solution is easy. We just need to sacrifice one of the big slave machines, turn it into several smaller instances and assign jobs to instances using labels.

After speeding up maven build itself, there were still mysterious differences when building same job multiple times. When Jenkins finishes maven build phase, it executes publishers, reporters and notifications. In our case publishers were the issue.

Except for deploying jar artifacts to nexus, we are also building deb packages and use SCP plugin to copy them to apt-like repository from which they are later on used for deploying to QA and production environments. Each environment used to have separate repository and packages had to be copied twice. This was solved by building another pipeline which (in a nutshell) takes care of deploying modified packages to pre-release environment, running regression tests and if everything passes, copying these packages to production repository. However this wasn’t the main problem with repositories and to make things more interesting, each repository had different kind of problem.

First problem was caused by two independent and mis-configured puppet instances. First, long forgotten and locally configured instance was set to keep the repository as home of user jenkins. Second, newer and remotely configured instance was trying to keep it as home for user repo. Every now and then when puppets were checking permissions for user’s home directories, they discovered permissions do not match and decided to restore law&order in their jurisdiction. To achieve this, puppet is running usermod command which also takes care of changing file permissions for all files in given directory. Unfortunately directory containing ~12k files and being mounted using NFS caused incredible I/O load and resulted in new files upload being utterly slow.

Second, much easier to discover and solve, problem was caused by a bash script. Script which was extracting all packages, indexing their metadata and creating genuine apt repository that we used to install packages using pure apt-get update/install. Doing this on ~120 GB of packages every time you upload new one is something you really want to avoid. Luckily we didn’t have to look for more efficient way of indexing, because we just replaced last service using this repository by more efficient solution and we could simply remove the script from cron.

Conclusion

Not only when you’re experiencing issues with your CI environment, but every now and then try to look at Build Time Trend of your jobs and if you see increasing build duration, spend some time investigating why has that happened and if it’s really necessary. Also when introducing new features to the build, spend few seconds thinking about the other steps and consider if they are still needed. Most of the mentioned problems didn’t cause significant delay, but given the number of jobs in the pipeline the result was amazing. Pipeline is now usually being built in less than 15 minutes and the only problem when building multiple pipelines simultaneously is lack of slaves.

The next step for speeding up the pipeline is using parallel build introduced in Maven 3. Test runs seems very promising, about 7 minutes from pulling the trigger to last artifact copied to repository. Groovy, let’s roll it out, what are we waiting for? Unfortunately we are still using plugins which are not compatible with parallel build and sometimes makes it fail. But we work hard when we don’t play and hopefully we will be able to fix or replace all incompatible plugins soon.

Thanks you for reading, hope it was useful and if you have any tips how to increase build speed in Jenkins, please share in comments.

Originally published at www.gumtree.com on December 8, 2013.

--

--