Using Containers to Learn Nginx Reverse Proxy

Rosemary Wang
9 min readJul 22, 2017

--

As a beginner to nginx and its reverse proxy capabilities, I did not know where to get started or how to even understand it. To break it down, I decided to try out my own reverse proxy container and explore one of its quirks. I did not actually have network connectivity when testing this (on a plane!) so I had to experiment with it locally with Docker. Happily, it worked — no need for instances spun up in the cloud!

Ironically, I started this blog while flying to Seattle…which was very cloudy. So I guess did actually spin my instances up in the clouds after all.

What is a reverse proxy?

I refer to Wikipedia on this one:

In computer networks, a reverse proxy is a type of proxy server that retrieves resources on behalf of a client from one or more servers. These resources are then returned to the client like they originated from the Web server itself.

I thought of a reverse proxy like a courier. Couriers (the ones that zip through the city on bikes) retrieve a set of packages and deliver them as quickly and efficiently as possible, as if the sender had actually delivered it themselves.

Why use a reverse proxy when it is cloudy?

By “cloudy”, I mean running a set of applications on a cloud, whether public or private. This question made me think and research a bit. A good article I found from 2012 outlines that a reverse proxy serves as:

  1. Load Balancer
  2. Security for application tiers (the request does not hit the application directly)
  3. Single Point of Authentication, Logging, & Audit
  4. Static Content Server
  5. Cache
  6. Compressor
  7. URL Re-writer

With all of these functions covered, it makes a lot of sense to use a reverse proxy to address my requests. In the cloud, applications are fairly dynamic and it can be difficult to track where applications are logging, the kind of authentication they are using, and more. A reverse proxy alleviates that burden.

How is a reverse proxy different than service discovery?

I wondered if I was thinking about this wrong. I came to the conclusion that service discovery solves a different problem than a reverse proxy. Recalling my previous experimentation with service discovery, I remembered that service discovery is about proactively registering new services in a cloud environment, enabling services to resolve each other in a very dynamic way. nginx serves as a service registry rather than the discovery and registration mechanism. Another component has to assume the responsibility of changing the reverse proxy configuration.

How do I configure nginx as a reverse proxy?

nginx has many functions, including an HTTP server. Besides serving requests, nginx has a configuration file you can create to direct where requests should go, thereby acting as a reverse proxy. A simple example would be a test application that runs on port 80 by default. However, I want my users to go to one endpoint for all of my applications. Furthermore, I want to make sure that if one server is down, it moves to the next available server (load balancing). This can be accomplished with the upstream directive.

Breaking it down:

  • worker_processes tells you how many instances of nginx to run. To accommodate for more load, it is recommended to set it to auto (one per core).
  • worker_connections outlines how many connections can be processed at one time by the worker. Here is a reference that talks about worker_connections in more detail.
  • log_format adds specific fields to the logs. When working with extra directives, it helps to put in custom fields for debugging. In particular, I wanted to output my upstream server address.
  • upstream is a group of servers that can be accessed by specific directives, including proxy_pass. We’ve also got a server directive under upstream which brings up the nginx web server and tells it which port to listen to (in my case, 8080).
  • There are also location directives that contain information about how the request should be proxied. Namely, proxy_pass has the protocol and address of where the proxy should go.

Let’s see an example.

For ease of use, I created Docker image with the nginx reverse proxy configuration outlined above. I called the image reverseproxy.

I also created a Docker compose that spins up reverseproxy and test, my application.

An example output of my test application is below:

# curl test:80
Hello World!
# curl test:80/another?user=joatmon08
joatmon08 says Hello!

In the Docker compose, I am only exposing my reverseproxy for external access on port 8080. When I access the nginx reverse proxy with the /hello/ path from localhost:8080, I get the “Hello World!” served from my test application. If I access my API at /hello/another, it returns my message with a user.

$ curl localhost:8080/hello/
Hello World!
$ curl localhost:8080/hello/another?user=joatmon08
joatmon08 says Hello!

nginx proxies my request from itself to my application. Both configurations return the same output. Now I can add a different application to another nginx location, such as /goodbye.

When I look at my nginx reverse proxy logs (in this case, via docker logs), I see that my access to the API via curl has been logged. On a server with nginx, you can find this in your nginx access.log.

$ docker logs nginxtest_reverseproxy_1
172.19.0.1 - - [22/Jul/2017:00:14:22 +0000] "GET /hello/ HTTP/1.1" 200 172.19.0.3:80 "-" "curl/7.43.0" "-"
172.19.0.1 - - [22/Jul/2017:00:14:54 +0000] "GET /hello/another?user=joatmon08 HTTP/1.1" 200 172.19.0.3:80 "-" "curl/7.43.0" "-"

It even logs my upstream server, 172.19.0.3:80. test resolves to this IP address and port! The IP address is actually my application’s container (see my previous discussion on container networking for more details).

What happens when I have multiple instances linked to my upstream server URL?

I scaled up my test application instance to two. This means that when I am trying to access http://test, I could go to one of two containers with different IP addresses.

$ docker-compose up -d --scale test=2
nginxtest_reverseproxy_1 is up-to-date
Starting nginxtest_test_1 ... done
Creating nginxtest_test_2 ...
Creating nginxtest_test_2 ... done

Just to check that the internal URL http://test goes to two instances, I run the dig command in a separate container attached to the same network.

$ dig test; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> test
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64355
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;test. IN A
;; ANSWER SECTION:
test. 600 IN A 172.19.0.3
test. 600 IN A 172.19.0.4
;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Fri Jul 21 02:43:21 UTC 2017
;; MSG SIZE rcvd: 62

I have two entries in the answer section. When I try to access my test application again through my reverse proxy, I check if the upstream server resolves to one of those two IP addresses.

$ docker logs nginxtest_reverseproxy_1
172.19.0.1 - - [22/Jul/2017:00:16:49 +0000] "GET /hello/ HTTP/1.1" 200 172.19.0.3:80 "-" "curl/7.43.0" "-

The answer is yes, it resolves to the same one as before, namely 172.19.0.3.

What happens when one of the application servers fails?

I am curious to know what happens when I remove 172.19.0.3. nginx should redirect it to 172.19.0.4, since test should send the request to the server that is still alive. I delete 172.19.0.3, which is nginxtest_test_1. To confirm, I run dig again to confirm that my DNS record for my test application still goes to 172.19.0.4.

$ dig test; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> test
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35920
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;test. IN A
;; ANSWER SECTION:
test. 600 IN A 172.19.0.4
;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Sat Jul 22 00:47:28 UTC 2017
;; MSG SIZE rcvd: 42

Let me try to access my reverse proxy again on localhost:8080.

$ curl localhost:8080/hello/
<html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.13.1</center>
</body>
</html>

WHAT?! WHAT HAPPENED!? I got a 502 Bad Gateway!

While I can reach 172.19.0.4, nginx can’t reach 172.19.0.4. Or maybe…it is not even accessing it all. Maybe if I restart the reverse proxy container, nginx will get the remaining IP address.

$ docker restart d2
d2
$ curl localhost:8080/hello/
Hello World!

It resolves to 172.19.0.4 now, which is the IP address of the remaining container.

$ docker logs nginxtest_reverseproxy_1
172.19.0.1 - - [22/Jul/2017:00:18:22 +0000] "GET /hello/ HTTP/1.1" 200 172.19.0.4:80 "-" "curl/7.43.0" "-"

As it turns out, nginx caches the IP address it first resolves to through upstream and does not refresh the cache (at least in the open source version).

What are the consequences if the reverse proxy does not re-resolve to a new IP address?

To be honest, I am not sure the upstream directive is intended for a dynamic DNS record in the first place. In the nginx samples, they use the upstream directive to load balance between a set of IP addresses. Usually, upstream is used for:

  • Weighted load balancing between the groups of servers.
  • If one connection errors out, it will move to the next one. If all of them error out, the connection is closed.

I probably should use a different nginx configuration that explicitly declares my container IP addresses.

When I test the above, nginx handled my load balancing for me. I removed the container at 172.19.0.3 and nginx just retried on 172.19.0.4.

$ docker logs nginxtest_reverseproxy_1
172.19.0.1 - - [22/Jul/2017:00:21:49 +0000] "GET /hello/ HTTP/1.1" 200 172.19.0.3:80 "-" "curl/7.43.0" "-"
2017/07/22 00:21:53 [error] 7#7: *8 connect() failed (113: No route to host) while connecting to upstream, client: 172.19.0.1, server: , request: "GET /hello/ HTTP/1.1", upstream: "http://172.19.0.3:80/", host: "localhost:8080"
172.19.0.1 - - [22/Jul/2017:00:21:53 +0000] "GET /hello/ HTTP/1.1" 200 172.19.0.4:80, 172.19.0.4:80 "-" "curl/7.43.0" "-"

If you are using a URL as an upstream server, you probably have a load balancer in front of it anyway. By choosing to use the upstream directive with a URL (or load-balanced DNS record), you run the risk of the nginx reverse proxy not re-resolving the IP address if your load balancer’s IP address changes. Generally, you’ll face the problem above with:

  • Public cloud load balancers
  • Docker embedded DNS server
  • Any other dynamic load balancers

How do I get nginx to re-resolve the IP address when using a dynamic load balancer?

Fortunately, this issue with open source nginx is well-described about by many bloggers. Below is a quick configuration summarizing what they have recommended:

Basically, do away with the upstream directive. Use the resolver directive instead (notice the DNS server that it resolves to is the Docker embedded DNS!). I also realized that I had to set the upstream endpoint with a dynamic variable that resolves each time resolver executes (every 5 seconds).

A very important note: You’ll need to add the rewrite directive in order to pass the right URI. Without it, my URI is not passed correctly and I get a 404 Not Found.

$ curl localhost:8080/hello/
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>404 Not Found</title>
<h1>Not Found</h1>
<p>The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.</p>

nginx needs the trailing “/” on the proxy_pass directive to trim the URI. However, when you set up the proxy_pass as a dynamic variable, nginx ignores it! It is key to include the rewrite directive if you want your URI to go to the right place.

Does the new nginx reverse proxy configuration work?

I wanted to test if it works with the same setup…

  1. Create a reverse proxy container (reverseproxy).
  2. Create my application container (test).
  3. Issue a call to my reverseproxy to my application.
  4. Scale up my application containers to two.
  5. Remove my first application container (nginxtest_test_1).

At the end of this, I get a pretty neat result. After removing my first application container at 172.19.0.3, I tried to issue a call to my application endpoint.

$ curl localhost:8080/hello/
Hello World!
$ curl localhost:8080/hello/another?user=joatmon08
joatmon08 says Hello!

Unlike before, I did not get a 502 Bad Gateway error. Just to check that my nginx reverse proxy resolves appropriately, I look at my nginx logs.

$ docker logs nginxtest_reverseproxy_1
172.19.0.1 - - [22/Jul/2017:00:34:37 +0000] "GET /hello/ HTTP/1.1" 200 172.19.0.3:80 "-" "curl/7.43.0" "-"
172.19.0.1 - - [22/Jul/2017:00:35:02 +0000] "GET /hello/ HTTP/1.1" 200 172.19.0.4:80 "-" "curl/7.43.0" "-"

Notice that I did not need to restart the nginx container. nginx re-resolved the IP address of my application to 172.19.0.4!

In summary…

Trying to test out this quirk of nginx resolvers pushed me to learn more about how it behaves and what all the directives mean. Furthermore, the usage of containers to mock this behavior and learn all of this is truly impressive — I found containers to be an excellent learning tool to let me explore and test. It was a great way to test my knowledge of how to break down a bug into consumable and testable parts, decoupling function from technology and infrastructure.

References

--

--

Rosemary Wang

explorer of infrastructure-as-code. enthusiast of cloud. formerly @thoughtworks. curious traveller & foodie.