Amazon API Gateway: Explaining HTTP Proxy in HTTP API

Jaewoo Ahn
10 min readSep 16, 2021

--

A year ago, I wrote a post to explain Lambda Payload version 2.0 in HTTP. This time, this long over-due post will explain how HTTP Proxy integration behavior is different between REST API and HTTP API and why it was designed like that. Are they different? Yes, they are.

REST API: HTTP integration vs. HTTP Proxy integration

Let’s revisit REST API first. Do you remember that REST API has HTTP integration and HTTP Proxy integration? This is what the official document says, but can you understand the difference clearly?

HTTP: This type of integration lets an API expose HTTP endpoints in the backend. With the HTTP integration, also known as the HTTP custom integration, you must configure both the integration request and integration response. You must set up necessary data mappings from the method request to the integration request, and from the integration response to the method response.

HTTP_PROXY: The HTTP proxy integration allows a client to access the backend HTTP endpoints with a streamlined integration setup on single API method. You do not set the integration request or the integration response. API Gateway passes the incoming request from the client to the HTTP endpoint and passes the outgoing response from the HTTP endpoint to the client.

The document is trying to say “HTTP proxy is more streamlined, since it requires a less configuration” but it is not clear how their behavior are different in the runtime. And it also says “you do not set the integration request or the integration response” but actually you can set the integration request in the HTTP proxy integration.

A difference between them is how much changes you can make on the integration request (a request to your HTTP backend endpoint) and integration response (a response from your HTTP backend endpoint). With HTTP integration, you can modify the header/querystring/path parameter /body in the request and the header/body in the response. In contrast, with HTTP proxy integration, you can only modify the header/querystring/path parameter in the request. You cannot change the request body and anything on the response.

In reality, they do more than that underneath. Some of those behaviors are documented in the known issues, while others are not, but let’s set aside them. Instead, let’s focus on what would be an ideal behavior of HTTP proxy integration in API Gateway and what has been changed.

Three common forms of HTTP Intermediaries

RFC7230 defines three common forms of HTTP intermediaries. In short:

  • A “proxy” (a.k.a. “forward proxy”) is a message-forwarding agent that is selected by the client, usually via local configuration rules, to receive requests for some type(s) of absolute URI and attempt to satisfy those requests via translation through the HTTP interface.
  • A “gateway” (a.k.a. “reverse proxy”) is an intermediary that acts as an origin server for the outbound connection but translates received requests and forwards them inbound to another server or servers.
  • A “tunnel” acts as a blind relay between two connections without changing the messages.

Some of other API Gateway product provide Forward-proxy mode, but it only makes sense when the API Gateway is running on-premise. Since Amazon API Gateway does not provide on-premise option, that doesn’t apply. You don’t want to use Amazon API Gateway as a tunnel either since it is too expensive for the purpose. We can say API Gateway should work as “gateway” literally, then it must/should do what “gateway” must/should do as defined in the RFC. What must/should it do?

HTTP API was primarily designed to pass end-to-end in a transparent way as possible while it complies the RFC specification. No more arbitrary pass-through, drop, nor remap. So you can say HTTP API’s HTTP proxy integration is HTTP more-transparent proxy integration. Still how is it different?

Via header

According to RFC 7230 Section 5.7.1, an HTTP-to-HTTP gateway MUST send a Via header to the inbound request and MAY send a Via header filed in forwarded response messages. The intention of Via header is telling the presence of intermediary — received-protocol and received-by.

Let’s imagine the client send a request with HTTP/2 like this.

// Client request
GET /test HTTP/2

Wait, does Amazon API Gateway support HTTP/2? Yes or no. Precisely speaking, API Gateway’s API “endpoint” supports HTTP/2 (meaning you can send HTTP/2 request to it) while API Gateway DataPlane does not support HTTP/2. The endpoint should be downgraded it to HTTP/1.1 to be sent to the DataPlane that is only able to send HTTP/1.1 request to the integration.

That’s what exactly the Via header should tell to your integration. You can figure out the request comes from AmazonAPIGateway using HTTP/1.1.

// Integration request
GET /test HTTP/1.1
Via: HTTP/1.1 AmazonAPIGateway

Ideally, it should have shown that the API endpoint received HTTP/2 and downgraded to HTTP/1.1 via AmazonAPIGateway, but the endpoint does not support Via header.

// Ideal Integration request
GET /test HTTP/1.1
Via: HTTP/2 abcdef.yourapi.com, HTTP/1.1 AmazonAPIGateway

In contrast, Via header is not mandatory for the response. Most people don’t want to expose the fact to the client that your API is served by Amazon API Gateway (though it was common to see “Powered by XXX” in a decade ago). Thus it won’t add the header in the response.

// Integration response
HTTP/1.1 200 OK

// Client response
HTTP/2 200

Just reminder, using Via header for any authorization purpose on your integration is NOT recommended.

User-Agent header

RFC 7231 Section 5.5.3 defines that the header contains formation about the user agent originating the request. This implies that User-Agent should not be overwritten by the intermediary unless you intentionally want to hide the originator. Even when the client didn’t send User-Agent header, it should not be added unless you explicitly configure the intermediary to send it.

It sounds natural as it should be, right? However, REST API has a funky behavior on this. If you use HTTP integration, the integration request would have a following User-Agent whether the original request has User-Agent header or not.

User-Agent: AmazonAPIGateway_<YOUR_API_ID>

IMHO, this should have been Via header instead of overriding User-Agent header, though you can argue “HTTP integration” is not a proxy since it doesn’t have “proxy” in the name which differentiates it from “HTTP proxy integration”. To forward the original User-Agent with HTTP integration, you need to use a parameter mapping to set the User-Agent header with context.identity.user-agent or use a request override in the mapping template.

If you use HTTP proxy integration, the integration request would have an original User-Agent header when it is available. However, if the original request doesn’t have the header, API Gateway add the header same as HTTP integration.

For both HTTP integration and HTTP proxy integration, when the original request does not have User-Agent header, there is no way to send an integration request WITHOUT User-Agent header. Thus there is no way to reflect the original request as is.

In contrast, HTTP API does not perform a magical override/addition since API Gateway itself does not originate a request. The User-Agent header always reflect the originating request whether it is available or not. Of course, you can use a parameter mapping to override it.

Forwarded header

Before RFC7239 came out with a standard Forwarded header, the non-standard X-Forwarded-* headers (X-Forwarded-For, X-Forwarded-Host, X-Forwarded-Proto, etc) has been worked as de-facto-standard. Those headers are not mandatory but optional. However, generally it is expected to be emitted when a proxy or a gateway has been involved as those headers can be used for debugging, statistics, security purposes, etc.

The standard Forwarded header has a benefit since it embraces several other X-Forwarded-* headers. For example, these 3 headers:

X-Forwarded-For: 123.34.567.89, 192.0.2.43, [APIGW_IP]
X-Forwarded-Host: apiid.execute-api.us-east-1.amazonaws.com
X-Forwarded-Proto: https

can be represented by a single Forwarded header:

Forwarded: for=123.34.567.89,for=192.0.2.43;by=[APIGW_IP];host=apiid.execute-api.us-east-1.amazonaws.com;proto=https

As recommended from the RFC, HTTP API translates incoming X-Forwarded-For headers into a Forwarded header as defined in the RFC and append the egress IP with “by”, the endpoint domain name with “host”, and protocol. This eliminates the necessity of mapping $context variables into the parameter by hand.

Unfortunately, there is downside. Between API Gateway and your backend, if there is an intermediary (e.g. LoadBalancer) which does not understand Via header, it will end up with adding XFF header. In the case, it is recommended to check both Forwarded header and XFF header.

Forwarded: for=123.34.567.89,for=192.0.2.43;by=[APIGW_IP];host=apiid.execute-api.us-east-1.amazonaws.com;proto=https
X-Forwarded-For: [APIGW_IP]

It would be nice if the behavior is configurable (opt-in to use XFF instead), but it isn’t available at this time.

Hop-by-Hop headers

Hop-by-hop headers are meaningful only for a single transport-level connection in contrast to end-to-end headers which are transmitted to the ultimate recipient of a request or response.

The obsoleted RFC2616 listed following hop-by-hop headers:

Connection, Keep-Alive, Proxy-Authenticate, Proxy-Authorization, TE, Trailers, Transfer-Encoding, Upgrade

and Connection header can contain the list of headers considered as hop-by-hop.

However, RFC7230 removed the explicit list of headers, and defer the decision to the Connection header. It means Connection header is the only hop-by-hop header by default, and all other headers must be listed under Connection header if they want to be treated as hop-by-hop headers.

Connection: close, TE, Transfer-Encoding

By the definition, hop-by-hop headers should be consumed by the immediate connection only and removed before forwarding the request to the downstream. This is a bit complex since there is “API endpoint” between the client and API Gateway DataPlane. From API Gateway DataPlane’s perspective, the API endpoint is an immediate connection.
The problem is that the API endpoint does not handle hop-by-hop headers correctly. It consumes and removes Connection header while it does not remove headers specified in the Connection header nor remove the old known hop-by-hop headers in RFC2616. With this reason, HTTP API consumes and removes the old known hop-by-hop headers in RFC2616 as a last resort.

RequestId and API Gateway emitted custom headers

Like many AWS services, API Gateway also emits API Gateway-specific header. The response from REST API should have “x-amzn-requestid” which is UUID style request id and “x-amz-apigw-id” which is an extended request id. If the response doesn’t contain those headers, that means the request never reach the API Gateway DataPlane and the response came from somewhere else, like your proxy or API endpoint.

HTTP/1.1 200 OK
x-amzn-requestid: eb26c844-eb6b-4321-a1c5-d23350779a57
x-amz-apigw-id: Fw52dGF5vHcFuYg=

The existence of “x-amzn-*” headers in the response actually discloses that the customer’s API is built in Amazon API Gateway. You may not like having it in the response, the request id is super-essential for the troubleshooting and operations for both you and AWS. Okay, agree, but should it be “x-amzn-*”? As a result, HTTP API emits a single apigw-requestid header in the response, with a format of extended request id.

HTTP/1.1 200 OK
...
apigw-requestid: Fw406iQ_vHcEPIw=

There are few things to note:

  • For a long time, the x- prefix has been conventionally used for the custom headers. RFC6648 deprecated the ‘x-’ convention for the custom header in the application protocol because of the inconveniences it caused when nonstandard fields became standard.
  • ‘amzn’ has been dropped to make it more generic, not tied to Amazon or AWS.
  • HTTP API will use apigw- prefix for the any custom header that is emitted by API Gateway. API Gateway reserves the prefix for its own use. Any “apigw-” headers in the incoming request or integration response would be ignored and dropped.

The header(s) are not passed to the integration request by default except when you’re using Lambda Proxy integration that passes it as part of requestContext. In many cases, you’ll need to have API Gateway’s request id since it makes easier to correlate requests between API Gateway and your backend integration. If you want to pass the request id to the integration for the tracing purpose, you must configure as an explicit parameter mapping with $context.requestId.

CORS headers

CORS headers is one of the example that can cause a conflict between Integration Response header and API Gateway emitted header. What if API Gateway CORS is configured but the integration response also contains CORS header?

When HTTP API was in beta, this was what described in the document.

… For a CORS request, API Gateway adds the configured CORS headers to the response from an integration.

Note
If your backend integration explicitly returns CORS headers, those headers take precedence and override headers in the CORS configuration.

When it became GA, there was a breaking change on this in an opposite way.

Note
If you configure CORS for an API, API Gateway ignores CORS headers returned from your backend integration.

The rationale behind this is simple: When you enabled the CORS in API Gateway, then you want to control the CORS with API Gateway so it takes the precedence over the CORS headers in the integration response regardless of integration type. If you want to honor the CORS header from the integration, just disable API Gateway CORS configuration.

Compressed Payload

What if the request payload is compressed (content-encoded)?

REST API always has decompressed the payload before sending it to the integration, thus the integration backend always receives the decompressed payload. This sometimes leads to an unexpected behavior when you’re playing around API Gateway’s 10MB payload limit or Lambda’s 6MB payload limit. If the client sent 7MB compressed payload, you easily think it’s still under API Gateway’s limit. However, it can turn to more than 10MB after decompressing it where API Gateway checks the payload size again, then it would be rejected. For Lambda, the binary payload needs to be base64 encoded which bloats the size, so even compressed 1–2MB binary payload can hit the 6MB limit.

Since the payload is always decompressed, the Content-Encoding header must be dropped or set to identity in the integration header, but HTTP/Lambda non-proxy integration does not deal it correctly.

In contrast, HTTP API does not decompress the payload at all, thus the payload and Content-Encoding header would be passed as is into the integration. HTTP API still has 10MB limit, but the compressed 10MB also should be fine for HTTP Proxy integration.
However, in other word, there is no way to decompress the request payload nor compress the response payload in HTTP API at this time, thus you must handle it within your integration backend.

Summary

Here’s a summary for the changes:

  • HTTP API adds/appends Via header
  • HTTP API translates X-Forwarded-* headers into the standard Forwarded header then append it to the integration request
  • HTTP API does not touch User-Agent header
  • HTTP API removes Hop-by-Hop headers before forwarding the request to the integration
  • HTTP API reserves ‘apigw-’ prefix for the API Gateway specific custom headers. ‘apigw-*’ headers in the incoming request and integration response would be ignored and dropped
  • HTTP API renamed the request id header to ‘apigw-requestid’
  • HTTP API ignores/drops CORS header in the integration response once API CORS configuration is enabled
  • HTTP API passes the compressed payload and Content-Encoding header as is

p.s. I wrote this with my best personal knowledge but this may contain an inaccurate info. If you found one, feel free to let me know!

--

--