Troubleshooting Unexplained Build Failures: the Mysterious Case of ‘Nothing Has Changed’

Lev Lazinskiy
CircleCI
Published in
4 min readMar 28, 2017

On the front lines of support, we very often we get users asking why their build has suddenly started to fail when nothing has changed. After nearly two years of fielding these sorts of questions, I can now say with confidence that something has always changed in each one of these cases.

To be clear, “nothing changed in my code” might be a very valid statement. But the idea that “nothing changed” is almost always false. The very nature of introducing continuous integration into your software development workflow is to test that whatever’s changing isn’t breaking your application. In this post, we will explore some of the most common things that change that we may not think about when troubleshooting unexpected build failure.

Change is the only constant. — Unknown

(I recently learned that people have been misattributing this age old quote to a Greek philosopher named Heraclitus. In other words, since we cannot decide who actually said this first, even the origin of this quote is constantly changing.)

CircleCI Build Image

We make updates to our build image all the time. We try our best to test it out pretty thoroughly against an array of canary projects and give our users advanced notice of upcoming updates on the announcements section of our community site. However, despite our best efforts, there are times where the changes we make will break your build. There is no great solution to this because we have been trying to create an one-size-fits-all image in a world where one size can never fit all. The good news: in our upcoming CircleCI 2.0 platform, users will have full control over their build environment. We hope this means that in the future, we will never be able to break your build again.

Unpinned Dependencies

Dependency management is tough. Each language and framework tries to solve the problem of dependency management in its own way. This leads to some inconsistencies between various tools. In addition, modern frameworks have dependency chains that are quite large, making it nearly impossible for a developer to keep track of them. This is the most common cause of builds failing when “nothing changed”.

If your language of choice uses the DependencyList.lock model, then you are typically in good shape. If it doesn’t (I’m looking at you pip and npm) or encourages the use of relative versions (such as software>=1.2), then unfortunately it is only a matter of time before your dependencies will cause your build to fail. When you run pip install foo today, and pip install foo a month from now, unless you explicitly say pip install foo==1.2.3 then the version of foo will be different. This problem is only compounded when foo has a dependency graph of 10 other Python modules, some of which may also be dependencies of the bar package that you are also installing unpinned. This is called dependency hell, and as a survivor of dependency hell I can tell you that it’s not a great place to be.

There is no simple solution here. Some strategies to deal with this problem are: Clean up your circle.yml (and any scripts) to make sure that you don’t have any dangling unpinned dependencies.

Double-check your dependencies: do you actually need them? (i.e. that node module you installed 9 months ago looked really neat, but you have not yet used it in your code).

Dedicate some time specifically to prune and update your dependencies on a regular basis.

Third-Party Services

If your integration tests rely on reaching out to third-party services, keep in mind that these things change all the time. These can range from a major, breaking API change that you did not hear about because somehow that newsletter ended up in your spam folder, to simple flakiness with some service that day. The best way to have your builds be consistently reproducible is to reduce dependence on third-parties. Mocking requests and external systems is a good strategy. If you cannot accomplish your goals with mocking then you must build in some fault tolerance into your test suite.

Changing Credentials and Environment Variables

If the Heroku CLI is telling you that your user is no longer authorized to upload a new version of your app, then it is very likely that your user is no longer authorized to upload a new version of your app.

There is not really much else to say on this topic. Error messages very rarely lie to you. The biggest culprit of this issue is that someone on your team changed something and did not tell you about it.

“Ignore this ticket, looks like someone changed the way we deploy last night.”

Working on Teams Can Be Hard

As your product and company grow, it can become more challenging to keep track of all of the moving parts. Remember last week when you did code review on that PR with 120 new files and said “LGTM”? That’s 120 things that just changed, which may not have had tests associated with them. Now, a week later, those bugs are all grown up and they are not happy.

These are often the easiest tickets for our team to solve and the most frustrating ones for the user. My only advice is to be a considerate teammate and communicate breaking changes to people. This is especially true if you work in a distributed team.

Figuring out why your builds are unexpectedly failing can be pretty frustrating. The good news is that the CircleCI support team is here to help when things like this happen. Just remember, the next time that your build spontaneously fails, it is because something has changed.

Originally published at circleci.com on March 28, 2017.

--

--