Want to take over the Java ecosystem? All you need is a MITM!
Hundreds of incredibly popular and widely deployed Java libraries & JVM compilers are still downloading their dependencies over HTTP with no integrity checking.
What started as a simple vulnerability report for a small project, quickly unearthed an industry-wide security vulnerability impacting huge swaths of the Java Virtual Machine (JVM) development ecosystem.
This work builds off of the excellent 2014 writeup by Max Veytsman titled: “How to take over the computer of any Java (or Closure or Scala) Developer”.
Back in 2014, when it was published, the Maven Central Repository, run by Sonatype, didn’t support SSL (HTTPS) for serving JAR files. Thanks to Max’s writeup, Sonatype fixed this within only a few days. I highly recommend that you at minimum skim his writeup before continuing. Even though it’s been 5 years since his writeup was published, his warning still holds true, and now applies to Kotlin and Groovy developers as well.
This time, however, it’s not because of a lack of support for HTTPS by repository hosts; this time it’s because of a widespread single character typo that to this day leaves tens of thousands of open source projects vulnerable.
In his writeup, Max introduced a tool called Dilettante, “a man-in-the-middle proxy that intercepts JARs from [any artifact repository] and injects malicious code into them.” “Proxying HTTP traffic through dilettante will backdoor any JARs downloaded from [the artifact repository].”
Maven Central didn’t support SSL when serving you JARs. Dilettante is a MiTM proxy for exploiting that. …
Dilettante is a simple POC, all it does is cause Java to render a picture of a cat on your screen. But this very simple technique could be used to maliciously compromise huge swaths of the Java ecosystem. The only prerequisite is that the project is downloading its dependencies over HTTP instead of HTTPS.
HTTPS doesn’t just encrypt the traffic between the client and the server, it also provides a cryptographic guarantee that the client is communicating with the server requested and not a MITM imposter.
How was this discovered?
This research started when I found that in my own build I was using HTTP because of a single artifact.
I figured out the root cause was due to a copy & paste from the Ktor repository. Digging into the history of the Ktor repository I found that until recently Ktor was using HTTP to resolve dependencies. One thing to note is that Ktor is an official JetBrains library. This incident made me curious, so I started looking for it elsewhere.
Long story short; some of the most popular JVM based projects on GitHub were or still are vulnerable.
Note: Projects marked ‘Done’ as ‘TRUE’ have completely fixed the issue and have either audited or have a CVE number issued for their previous releases.
Here’s a direct link to the Google Sheets for those who are interested.
Insecure Resolution of Dependencies
List of all Open Souce projects that were vulnerable to MITM of their dependencies.
In addition to the projects listed above, there are some big communities and organizations that this vulnerability also impacted.
This was the first place I began looking and was unsurprised to immediately start finding this vulnerability in the build infrastructure in almost all of the Minecraft mods I looked at.
Thinking that there was a chance that the Ktor incident wasn’t just a one-off incident, I started looking at the JetBrains GitHub projects.
I found multiple instances across the Kotlin compiler codebase where the build infrastructure and tests were downloading dependencies over HTTP. Not only were the Kotlin compiler’s source dependencies vulnerable, but vulnerable repositories were also used for the Gradle
buildscript classpath leaving open the potential for release artifacts to have been compromised. If this weren’t bad enough, the
buildscript classpath was also used to resolve the previous version of the Kotlin Compiler thus leaving the compiler open to the ‘Trusting Trust’ attack (see more on this below).
Not only was IntelliJ and several of the official plugins vulnerable to this, but there were many cases where the code generators for creating starter projects with IntelliJ generated projects that are vulnerable.
Gradle was an interesting case. As a contributor to Gradle, you would not have been impacted by this vulnerability, however, when Gradle was used to build the Gradle repository on Gradle Inc’s Team City CI infrastructure, that infrastructure overrode the defaults to instead use a corporate JFrog Artifactory instance that served artifacts over HTTP. Thankfully, this infrastructure is colocated on the same network as the Gradle JFrog Artifactory server.
That being said, the Gradle corporate JFrog Artifactory server was mirroring other artifact servers over HTTP thus potentially exposing those mirrors to a MITM based cache poisoning attack.
As of writing this, the Elastic Search repository has 38.6k stars, thus making it the most stared Java-based project on GitHub. The main Elastic Search project has had over 1.1k contributors. The test logic in the Elastic build was determined to be vulnerable to this.
I found this vulnerability in several of the Apache Software Foundation projects. The notable ones are listed below.
As of the publication date of this article, the Apache Software Foundation has decided not to issue CVE numbers for the impacted projects even though, in most cases, no audit was performed to determine if these projects were maliciously compromised by this vulnerability.
Apache Groovy, one of the most popular alternatives for developing for the JVM was also found to be vulnerable to this. As of writing this, Groovy is the 19th most popular programing language in the world according to the Tiobe index. Similar to Kotlin, the Groovy compiler’s
buildscript classpath had dependencies resolved over HTTP. This also left open the potential for release artifacts to have been compromised.
Thankfully, the Groovy compiler is built with a bootstrap compiler written entirely in Java, thus making the potential for the ‘Trusting Trust’ attack very small.
Additionally, the Groovy-Eclipse Plugin was also found to be vulnerable.
Apache Hadoop has over 193 contributors, making it the project with the most contributors. All of those contributors who ran the Hadoop build on their machines could have been compromised by a MITM.
Apache Kafka was originally written by the LinkedIn team to be a fast Event Broker. LinkedIn has used Kafka internally to ingest over 1 trillion messages per day. I found that Kafka’s build system was loading Gradle Plugins over HTTP instead of HTTPS.
Other Apache Projects
The list of Apache projects that I found that were vulnerable include but are not limited to the following: Casandra, Geode, Storm, Bigtop, Fink, OpenJPA, Royal Compiler & Airavata.
Over 1,000,000 Jenkins users worldwide make Jenkins the most widely-used, open source automation server.
- Jenkins Community Announces Record Growth and Innovation in 2017
Jenkins is used as a self-hosted CI pipeline to automate building and testing software.
Jenkins and many of the Jenkins official plugins all ships with dependencies that were downloaded over HTTP.
The first location that I found was in the spring-security-oauth project. The Spring project was the first Maven based project I started looking at, forcing me to establish a whole different search methodology for inspecting Maven POM files with the GitHub search functionality. Once I started looking, I found that this vulnerability existed in many of the other projects under the Spring organization.
The Spring Team responded immediately to the vulnerability and began patching all of their projects. Due to the overwhelming number of Pivotal projects impacted, Pivotal developed a tool to find and replace all uses of HTTP across the repository. That tool can be found here:
Contribute to spring-io/nohttp development by creating an account on GitHub.
This vulnerability also impacted many projects maintained by Red Hat. These projects include but are not limited to Hibernate ORM, RestEasy and many projects in the Wildfly (formerly JBoss) ecosystem.
Similar to Red Hat & Apache Foundation, this also impacted the Eclipse Foundation projects Vorto, Buildship, xtext, Orion & Birt.
This vulnerability additionally impacted a few of Oracle’s open source projects including VisualVM, PGQL, OpenGrok & Helidon.
Testing Libraries and Frameworks
A few very popular JVM testing libraries and frameworks were also vulnerable to this including TestNG, Spock & PowerMock.
Other projects this vulnerability was discovered in include Grails, the Scala module for FasterJacksonXML & Ehcache3. Additionally, I found this vulnerability in the Open Source projects of Netflix, Google, Twitter, the National Security Agency (NSA), Stripe, Gluon (Scene Builder) PortSwigger, Black Duck, Snyk, LinkedIn, and PayPal. Ironically, when this was reported to PayPal’s security team, they closed it as they consider MITM attacks ‘out of scope’ for their HackerOne program.
How Common are MITM Attacks?
My initial research into how common MITM attacks were actually stemmed from my research into maliciously compromising XML parsers that loaded DTD files loaded over HTTP in order to achieve XXE. More on this topic in a future writeup. What I found was quite alarming.
Internet Service Providers
To my surprise, Internet Service Providers (ISPs) seem to be doing this quite regularly.
- Comcast continues to inject its own code into websites you visit
Bharat Sanchar Nigam Limited (BSNL) an ISP in India also has a history of injecting Ads into their user’s webpages when those pages are loaded over HTTP.
This indicates that the infrastructure already exists and could be re-targeted to impact JAR files.
When malicious actors gain access to a system, they often quickly utilize their new foothold to establish a MITM. Every year, Verizon releases a Data Breach Investigation Report (DBIR) which analyzes the various attack vectors that were most commonly exploited each year. This is a quote from one of their reports.
The top three threat action categories were Hacking, Malware, and Social. The most common types of hacking actions used were the use of stolen login credentials, exploiting backdoors, and man-in-the-middle attacks.
To quote from the analysis from this report:
I infer that it’s a secondary action used once somebody has a foothold in the system, but the Dutch High Tech Crime Unit’s data says it’s quite credible for concern. Of the 32 data breaches that made up their statistics, 15 involved MITM actions.
- Answer: Stack Exchange: Are “man in the middle” attacks extremely rare?
MITM attacks should be considered a credible threat in software security.
Over a Public WiFi Connection
Setting aside all of these other attack vectors, any developer working on any of these projects over a public WiFi connection has opened up their computer to the potential for malicious compromise. This attack was demonstrated with the famous Firefox plugin Firesheep.
The potential compromise of a WiFi hotspot can have a significant impact on developers themselves since many developers work over WiFi at coffee shops, at developer conferences, etc. All it would take is a WiFi Pineapple and a dependency that hasn’t already been cached to infect a developers machine.
Due to the Snowden Revelations, we now understand the various methodologies used by the US Government’s three letter agencies to perform MITM attacks against US citizens.
To trick targets into visiting a FoxAcid server, the NSA relies on its secret partnerships with US telecoms companies. As part of the Turmoil system, the NSA places secret servers, codenamed Quantum, at key places on the Internet backbone. This placement ensures that they can react faster than other websites can. By exploiting that speed difference, these servers can impersonate a visited website to the target before the legitimate website can respond, thereby tricking the target’s browser to visit a Foxacid server.
- How the NSA Attacks Tor/Firefox Users With QUANTUM and FOXACID
There has also been a history of internet traffic being routed through foreign countries due to misconfigured BGP routes.
Most Common Repositories Loaded over HTTP
Let’s look at some stats for the most commonly used repositories loaded over HTTP. Please notes that these numbers aren’t exact and may be slightly inflated because GitHub’s search functionality is fuzzy in nature.
Maven Central is the most popular artifact server used in the JVM ecosystem and is the default artifact server used by Maven. Maven Central was the first major player in the JVM artifact hosting space.
JCenter is a superset of Maven Central. Developers that publish to JFrog Bintray can request that their artifacts get mirrored here.
JFrog Bintray allows developers to create their own artifact servers for free for open source projects.
Clearly, this is a widespread security vulnerability across both Gradle and Maven projects.
Writing a (Theoretical) Java Library Worm
As a thought experiment, I drafted the idea for abusing this MITM vulnerability to create a Java Library Worm. The results of this thought experiment can be found here.
Let’s write a (theoretical) Java Library Worm
This Article is an addendum to Want to take over the Java ecosystem? All you need is a MITM!
The TL;DR is this:
The consequences of this vulnerability are that a MITM of dependences used during a release could allow malicious code to maliciously compromise the artifacts produced by the build, thus infecting downstream users.
Fixing the Past
For libraries that have already been published, there’s not much that can be done unless these projects builds are completely reproducible. For the compilers vulnerable to this (Kotlin), the test libraries that are used to test themselves (Spock and TestNG), and build tools used to build themselves (Gradle), this may be an issue because of the chain of trust has been broken. Most compilers are used to compile themselves. For more on this topic see the relatively short paper by Ken Thompson titled “Reflections on Trusting Trust”.
Fixing the Future
I think that build tools like Gradle, Maven, and SBT need to require users to explicitly opt-out of using HTTPS to resolve their dependencies. This will force users to make their intention to use an insecure protocol explicit thus preventing casual typos. I have a proposal open with both Gradle and Maven to implement this functionality. You can find those proposals below. Please go upvote them there!
Automated Auditing of Repositories
The nohttp tool developed by Pivotal looks for all occurrences of HTTP except those that are whitelisted (i.e. XML namespace names). This will ensure that HTTP doesn’t cause issues in other places (i.e. Gradle Wrapper location, DTD declarations, etc). It can find HTTP occurrences, replace occurrences, and integrates with a build to ensure that HTTP is not used in the future.
Contribute to spring-io/nohttp development by creating an account on GitHub.
Artifact Hosts Deprecating HTTP January 2020
As the scope of this vulnerability got bigger, I quickly realized that some of the responsibility for this vulnerability fell with the artifact hosts like Maven Central and JCenter. I reached out to the two largest artifact hosts Sonatype (Maven Central), JFrog (JCenter) as well as smaller hosts like Pivotal (Spring), The Eclipse Foundation, Jenkins, Red Hat & JetBrains and asked if they would like to join an initiative to completely block download requests made over HTTP starting January 15th, 2020.
25% of Maven Central downloads are still using HTTP
Soon after this announcement both JFrog and Pivotal informed me that they will be following suit.
Am I vulnerable to this?
Given how many widely used open source projects that I found this vulnerability in, I would advise anyone developing software for the JVM to go check their build logic for repositories resolving dependencies over HTTP.
What am I looking for?
For Gradle, you are looking for a repository configurations like this.
Gradle developers may also want to check any
init.gradle scripts that are distributed internally at their company but are not normally checked into source control.
For Maven-based projects, you are looking for repository configurations like this.
Maven developers should also check the configuration in their
~/.m2/settings.xml file as that’s normally where credentials for repositories are configured.
Additionally, corporate users of JFrog Artifactory or Sonatype’s Nexus should check the configuration of their servers to see if they are mirroring other artifact servers over HTTP.
I’ve reached out to Sonatype and JFrog and asked that in future updates they begin warning their users/admins of insecure configuration.
JFrog has responded that this functionality is now officially part of their roadmap but they don’t have a planned release date yet.
What should I do if I find this vulnerability?
The entire machine (developer machines, build boxes, etc) that executed the potentially malicious jar should be considered potentially compromised. This also means that anything that machine has access to (other projects, credentials, other hosts, etc) should also be considered potentially compromised. Shared artifact caches like the
~/.m2 directories where Gradle and Maven cache artifact should be considered compromised and should be deleted. To avoid a single vulnerable application exposing every project it builds, it is also best practice to try and isolate your builds.
If you are a maintainer of an open source project that was vulnerable to this, you have a responsibility to your users to either audit previous releases for compromise or file for a CVE number to inform downstream users of the potential for compromise.
Unfortunately, I was only able to contact a small fraction of the projects impacted by this vulnerability. Many of the open source projects you rely upon in your own work may be vulnerable to this. If you are able to do so, consider reaching out to projects you find are impacted to help secure the Java ecosystem for all of us.
If you work on open source software or develop software with any of these build tools commercially I also highly recommend that you audit your build for the sake of the integrity of the software supply chain pipeline.
I want to thank all of the truly awesome & professional security teams and project maintainers at all of the organizations I contacted. There’s absolutely no way that I would have been able to patch all of these locations myself. Several of these teams responded within hours of my report and had begun rolling out fixes throughout their code by the next day. Additionally, I want to thank Snyk for agreeing to be the CNA for all of these CVE reports. I also want to thank Max Veytsman for creating Dilettante. Having a POC made the responsible disclosure process much smoother.
Notes For Further Research
If you happen to be a security researcher and want to find these security vulnerabilities in other Java projects that I may have missed here are the Github search queries I used.
If you do find this issue in a project using these queries, please point them back to this article.