External Software Dependencies

Alex Meng
3 min readSep 24, 2014

When you build and deploy your software several times a day, bottlenecks and pain points become obvious fairly quickly. In a continuous delivery pipeline, a failed build is a problem that should be taken seriously and addressed immediately. When the failure is the cause of a third party system, the obvious step is to cut out your reliance on that system. One of the easiest dependencies to eliminate is that of a software package repository like PyPi, RubyGems, or Berkshelf. These systems are often community run and hosted, and so the occasional outage is not unheard of. When these systems are offline, your pipeline should be unaffected.

Caching dependencies is a great way to strengthen the resilience of your builds. There are several ways to do this, each with its own trade-off.

Vendoring

Vendoring a package means to store the code in your version control system. This means that your software comes with dependencies included. Many package management tools allow installation from a local cache that can avoid network calls all-together. If you’ve cached all of your Python packages from PyPi, for example, the pip command line tool supports a find-links argument:

pip install —-no-index --find-links=file://path/to/cache/

One disadvantage of this method is that dependency code will need to live in your source control, which can make things like searching your code base a bit messy. Code review also becomes more difficult as having a diff that includes vendored code is much more cumbersome to review.

A great way to get around this is to compress the packages. Having all your third party packages compressed makes for smaller and easier to review diffs. Take a look at this pull request for example:

It is clear that a package is being upgraded, but because GitHub suppresses binary file diffs, all the code within the package is kept outside the scope of the pull request.

Caching

Another great alternative for managing dependencies is to cache the HTTP requests. This method is bit more involved but might be worth the time and cost if your team builds and deploys often. Using tools like angry-caching-proxy allows you to store copies of third party libraries based on a previous download.

Unfortunately, most package management tools will still need to go online to discover the address of their packages. Meaning that if their service is down, your builds will still fail. Using something like squid to cache all requests from entire domains means you can also cache those discovery calls, and keep your builds running smoothly. To take full advantage of this approach, all of your systems, including developer’s machines, need to pull dependencies from the cache service.

Conclusion

A great build system should be idempotent. Having a build fail because of this can be incredibly frustrating:

This is especially true if that outage completely blocks your deploys. Keeping your deployments and builds resilient makes for a better overall developer experience, and allows your deployments to be truly continuous.

--

--