How “The Internet’s Biggest Blind Spot” lead to a 15 year old security vulnerability

Discovering and Disclosing httpoxy

Over the past two weeks, I’ve been coordinating the disclosure of a pretty big and very old security vulnerability. If you’re looking for the technical details, you can head to httpoxy.org, and if you’re looking for a non-technical explanation, you might prefer to read my other Medium story about the issue.

Instead, this is the story of how we discovered it, and my experience with the disclosure process.

Background

Vend is a retail POS, inventory management, ecommerce and customer loyalty system we run as a service. I work on the platform team, helping to ensure our retailers receive a fast, error-free and secure experience; they rely on Vend to keep them selling stuff worry-free, and 24/7.

A pretty big part of the platform team’s remit is to maintain our monitoring and logging systems. Our system for text-based logs, as opposed to events (Kafka) or metrics (statsd), is based on an “ELK” stack; the popular combination of Elasticsearch, Logstash and Kibana.

Discovery

Just over two weeks ago, on an otherwise ordinary Thursday, Scott Geary, my colleague, was hunting through Kibana dashboards, looking for a clue related to a support ticket we received.

Instead, Scott found an “interesting” error, and Slacked it to the rest of the team. Here’s what he saw:

GuzzleHttp\Exception\ConnectException: cURL error 7: Failed to connect to proxy: Connection refused (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)

“Hmm, that’s weird”, we thought. We certainly don’t have any microservices called ‘proxy’; and that’s a bare domain name? Huh.

Personally, my mind raced to the Docker Compose environment I had set up up a few days earlier. It uses the excellent jwilder/nginx-proxy container to provide an automatic reverse proxy. Some pretty neat magic. “Oh, crap,” I thought, “I’ve somehow managed to deploy this experimental development environment all the way to production!”

A quick grep of the codebase put that notion to bed. Given we had the stack-trace, we were able to quickly find a reference to HTTP_PROXY in Guzzle, and it looked like this:

// Use the standard Linux HTTP_PROXY and HTTPS_PROXY if set        if ($proxy = getenv('HTTP_PROXY')) {
$defaults['proxy']['http'] = $proxy;
}

Cue heads asploding.

Disclosure

At first, we didn’t think this would be anything but a bug in Guzzle. It was serious, it was remotely exploitable, and Guzzle is popular .

For developers who aren’t familiar with PHP (no gloating please), Guzzle is something of a best-of-breed HTTP client for PHP. It has a nice, “modern PHP” codebase and a solid functional middleware system. This means it’s an attractive dependency to use if you’re writing an SDK or API wrapper. A remotely exploitable vulnerability in Guzzle is Bad Enough that we felt it couldn’t possibly be widespread.

But as we continued to look at the situation, it became apparent that this bug would affect a lot more than a single HTTP library; nobody seemed to mention the issue regarding HTTP_PROXY specifically in any PHP security advice we could find. So, we broadened our thinking a bit, and contacted the PHP security mailing list.

PHP’s getenv

I’ll take a brief diversion here to discuss the issue in PHP. Obviously, there is a temptation with a vulnerability like this to blame the library authors who trusted the HTTP_PROXY “environment variable”. I don’t think that’s fair.

A big part of why I think that comes down to the function getenv, as implemented in PHP, and in particular how it was documented. There are (or were) a number of problems with the documentation:

  • The function summary is just ‘gets the value of an environment variable’, with no caveats mentioned at all.
  • There’s no mention of CGI by name, just the RFC by number, in the function documentation body (which you should remember is not shown in summarised views, like the one your IDE will probably show you by default)
  • There’s a mention of section 4.1 in the RFC, but no deep link, because the RFC link goes to faqs.org rather than tools.ietf.org (which allows linking to specific sections)
  • The documentation implies (in a code comment in an example!) getenv is roughly the same as $_SERVER or other superglobals, but it never explicitly says so. The semantics of $_SERVER are very well understood; PHP programmers know that it can return client-provided values. So, why doesn’t the documentation make the link between the two clearer?

What I would expect is that the getenv function documentation should make an explicit mention of “CGI”, and mention that values returned by getenv are not always “environment variables” set by the programmer, but are often under the control of the end-user via their request headers. You can achieve the latter just by explicitly defining the relationship between getenv and superglobals.

Things get worse

Alright, so, at this point, we’ve disclosed to two parties, and we’re feeling a bit bad about the state of security in PHP. It was around this point that we started to find historical mentions of the bug.

We knew all the ingredients of the vulnerability had been around for a long time. As far as we can tell, both CGI and the HTTP_PROXY header reach back twenty-odd years into the 90s. And so we naturally headed for the next question: “is this in everything?”. Github code search was invaluable for providing the answer: no, not everything.

Specifically, that’s when we ran into the Curl mention of the vulnerability in 2001, the Ruby documentation of find_proxy (which mentions the issue), and from there, the original libwww-perl fix. That brought our list of ‘projects that fixed this years ago’ to at least three.

So, it was time to start proving the concept in other places. First up was Go, mainly just because it’s of personal interest to me, but we also use it a lot a Vend. The method for all the other places we found the issue was very similar:

  • See if the most popular HTTP client library around for that language/platform supports HTTP_PROXY, and if it doesn’t check for CGI. (Tick for Go’s net/http)
  • See if there’s a way to deploy the language under CGI (Tick for Go’s net/http/cgi)

And that was really it. No CGI implementation we could find treated HTTP_PROXY in any “special” way; if the above conditions were true, the proof of concept would succeed.

So, pretty quickly, I had Docker-based proof of concepts for Go’s net/http and Python’s requests, before I had heard back from the PHP security team. This was quickly becoming a more complex issue to disclose. Over the next couple of hours, two more disclosure emails went out to two more security groups (Python and Go), and then another couple to the common CGI (or FastCGI) servers’ security teams: Apache and Nginx.

And then things got better

One of my next points of disclosure was the Red Hat Product Security team. They were extremely helpful: walking me through choosing a suitable timeframe for the embargo, helping with getting CVEs assigned, and even coming to a provisional CVSS score, so I could easily tell people how bad the vulnerability was before giving them full details.

These are vital processes that help a disclosure process run smoothly. They also involve, for good reason, complicated decisions that need justification. And so, there’s a certain level of process involved.

By the end of the embargo period, I had sent about 60 full-length emails (in the open-source mailing list “tone” that requires careful preemptive defence against pedantry). The major CDNs were notified. Microsoft had some clarifications and some nice mitigation information to share. And on it went for days.

I’m in NZST — that’s +12 for those who don’t deal with us Kiwis regularly. That, in turn, meant conversations would often get started in the early morning; as I wanted to contribute to the discussion (and try to steer it away from pitfalls), that meant a lot of early morning starts.

Who You Gonna Call?

Naturally, perhaps, as the process continued I found myself wondering if I was doing a good job. From looking at recent large disclosures, there are plenty of examples of what not to do, but also plenty of distinct traps you can fall into.

Where is the incident response manual for a vulnerability like this? The simple answer is there isn’t one and the less satisfying answer is there probably couldn’t be one.

Security vulnerability disclosure is hard

httpoxy was a hard case to disclose because of how much affected software we found. But we also had one really good thing going for us: it’s easy to mitigate, even without patched software from upstream.

Had that not been the case, things would have been even more complicated. In such situations, you’re much more at the mercy of the developers responsible for the problem, before you can responsibly disclose to other affected applications and the like.

So, there are plenty of very hard calls to make. I had it described to me as a triage process, and that metaphor makes a lot of sense. First, do no harm, sure. But, you have to figure out who you can help in a timely manner; you can’t just disclose publicly and go home.

For example, should I report this to CERT myself? Which closed/private mailing lists are still active, trusted, and worth disclosing to? Who has the responsibility of getting CVEs assigned once seven different security teams are coordinating a fix? Do I need to come up with the text of the CVEs myself? What do I do when a vendor asks me for a two week extension of the embargo period? Are people going to yell at me and call it grandstanding if I pick a name for my vulnerability?

I think the complexity of those questions means if we’re looking for a cookie-cutter process to follow, we won’t find it. Doing security disclosure well requires a close attention to the details of the issue, and experience to know what can go wrong. Nothing beats the careful eye of someone who has done this before (and especially someone who has done it a dozen times this week.) So, thanks again specifically to Kurt Seifried.

For many, it’s just not something they’re familiar with

I have reported a few security issues before, and I had a team of security specialists at Vend helping me out with the answers to a lot of those hard questions. But open source developers are often unpaid, contributing to a community in their spare time. Asking or, worse, expecting a volunteer maintainer to become well-versed in security disclosure norms overnight is begging for trouble. It’s not just that it’s unfair and slightly rude, but it’s not going to work well.

I’m well aware of catch-phrases like “security is a process not a feature”, and I’m not advocating that developers abdicate responsibility for releasing secure code and keeping it secure. But the security disclosure process is something else. The number of acronyms alone makes even web development look tame. And we shouldn’t pretend an interest in, say, helping people make HTTP requests translates to anything like an interest in disclosing security bugs.

Security disclosure is also a terrible thing to unleash upon a developer who might be just starting out. “Don’t be a maintainer if you don’t know how to handle security issues” is short-sighted; there’s a categorical difference between knowing how to prevent XSS or SQL injection, and knowing what to do when you’ve found an open-source security vulnerability. One is a basic set of skills I apply every time I’m working on a web app. And the other is something I only do a couple of times a year (and that I don’t expect most people to be doing.)

The open source infrastructure funding issue

There ought to “just” be someone you can call. Until there is, disclosures like the one for httpoxy will remain ad hoc and inconsistent. Some will be handled well, and others won’t.

It’s a hard problem. I’m heartened to see the “open source infrastructure funding” issue being discussed recently (e.g. a great discussion kicked off by Nadia Eghbal). Such discussions often explicitly consider broad cross-project vulnerabilities like httpoxy as the natural outcome of a lack of such funding. I think that’s an interesting idea. You can see how it’s directly applicable when you consider how long this vulnerability was around for, and the way it was fixed in a piece-meal fashion, across two decades, and across a bunch of different open source software.

Why wasn’t it picked up sooner? Why isn’t there someone just watching out for fixes in one language that might affect another? It’s the same issue as “disclosure is hard” behind a different guise: without treating open source infrastructure as an end in itself (and getting someone to fund it well), we’ll continue to be disappointed.

This isn’t quite a tragedy of the commons situation. The commons itself is doing fine; more people than ever are using it, and contributing to it. It’s the lack of institutions that’s holding it back. It’s like we should have funded a fire service by now, but we’re all too busy grazing our flocks on the communal feed to deal with the brush fire in the next field over. And who wants to deal with that anyway? That should be left to the experts. (Although, if we’re going with this metaphor, please, can we skip right past the age of private fire brigades and fire insurance marks?)