Trying to find the perfect storage solution for your Maven artifacts…

Why S3 is not the perfect Maven repository

Philipp “xeraa” Krenn
3 min readFeb 24, 2015

--

At ecosio we were looking for a new way to store our private Maven artifacts. And we found the perfect candidate pretty quickly — Amazon’s Simple Storage Service (S3):

  • Zero maintenance
  • Redundant storage
  • 99.99% availability
  • Built-in authentication and authorization
  • Decently priced at $0.03/GB plus a few cents for requests and free data transfer within an AWS region

So 300GB of artifacts, more than enough for our requirements, would cost a mere 10$ per month. We were sold!

Since Maven doesn’t natively support S3, we needed a wagon for that, but luckily there are multiple ones available. We started off with the Spring implementation and it worked as expected for our first respository — a simple parent POM defining repositories, dependencies, plugins,… to be used in all other projects. However, that was were we ran into the first major issue: While we could successfully upload the parent POM, we could not download it in the other projects. Very strange — how can you write but not read an artifact?

As it turned out, the parent POM (and plugins) must be loaded first and only afterwards additional wagons become available; even if the wagon is already defined in the POM. Thus we could only access the parent POM and plugins via the default protocols (HTTP, HTTPS, and FILE in Maven3), but not S3. Stupid Maven bootstrapping…

This left us with three options:

  1. You can add the S3 wagon as a library to your Maven installation. Unfortunately, not an option for us, since we are using Jenkins as a Service and cannot add libraries to the Maven installation at liberty. Furthermore, we would need to do that for every Maven installation building our projects and it’s a really ugly hack.
  2. While we don’t want to make our artifacts public and S3 doesn’t support any HTTP based authentication, there’s a workaround: s3auth
    Basically an HTTP proxy for your S3 resources that supports .htaccess. We tried it, but unfortunately it wasn’t up to our requirements: It requires full access to our artifacts and we have no idea how secure it is and who can actually gain access. It does not support HTTPS, so your credentials and artifacts are all transmitted in the clear. While not the most common threat, Cross-Build injection (XBI) is a real thing. Which lead us to the final option.
  3. Keep the parent POM and plugins accessible via HTTPS and store all other artifacts on S3. Since our hosted Jenkins provides some pretty limited Maven repository, this was an option. It had its warts though. You are distributing your artifacts over multiple services; and you need to overwrite the <distributionManagement> in every child project (the parent goes to the HTTPS accessible storage, everything else to the S3 backed one), but it worked.

At least it did at first. While everything was fine on the command line and in Eclipse, it just wouldn’t work reliably in IntelliJ. My colleagues were horrified:

People are forced to work with Eclipse (!) right now

This was the final straw. After spending multiple days trying to debug this issue and to find a reliable solution, we gave up. We really tried to avoid running a service on our own, because in theory S3 is the perfect solution. However, in practice it just is not.

So we are now running Sonatype Nexus behind nginx and it’s working great for us — alternatively Archiva or Artifactory should get the job done as well. And you’ll get some additional benefits over S3:

  • Snapshot management: S3 won’t prune old snapshots automatically, you’ll need to do that by hand every now and then.
  • Proxying third-party dependencies: If you rely on dependencies on crappy servers (hey Sourceforge —I can see you are still only half dead), their unavailability might block your builds. Obviously always at the worst possible moment.

It might be a little less elegant, it will require some maintenance, and it is more expensive, but at least it is working as it should!

PS: If you’re using nginx as a reverse proxy and your artifacts get quite big (building fat JARs for example), be sure to set client_max_body_size to an appropriate value or the upload will fail.

--

--