CGI, our next jump — rust-httpd

Published in

Adventures in Rust

7 min readMay 2, 2017

If you’ve been following our stories closely, you would know that we had planned to implement the following three features for our little Rust web server.

Serve static files
CGI
Reverse Proxy

With static file serving completed, implementing the CGI (Common Gateway Interface) Protocol was next on our plate.

For those who do not know what CGI is, it’s an age-old protocol for serving dynamic content on the web. Think Apache executing a PHP or a Ruby script and talking back to clients with dynamic responses. CGI is the intermediary protocol between the web server and the application scripts (cgi scripts) aimed at enabling anyone to write dynamic scripts to work with any web server (that implements the protocol).

The big question,

Why we’d want to implement the CGI protocol at this time since it’s a pretty old standard?

The answer is simple, Every major web server has implemented it :D

With that solid rationale, we plunged in.

The CGI Spec

The CGI protocol is specified in great detail by the IETF RFC 3875. This was the first time we had read an entire RFC and to be fair it was not that long. The precision used and the way they don’t let any sentence or jargon remain unexplained is quite fascinating.

Specifically the use of BNF (Backus-Naur Form) to describe the protocol.

A short excerpt from the CGI Protocol Documentation,

AUTH_TYPE      = "" | auth-scheme
auth-scheme    = "Basic" | "Digest" | extension-auth
extension-auth = token

that’s how they specify the structure of a certain ‘meta variable’ (we’ll discuss about meta variables a little later in this blog).

No ambiguity whatsoever.

After reading the CGI spec, we arrived at a solid understanding of the protocol including details like —

How the server picks which script to execute.
How the HTTP request is made available to the CGI script.
How the script is executed.
How the response from the script is processed.

and much more.

We realised that we would not be able to match the specification word by word. We took a pragmatic approach since this is a learning project and not a server intended for production use.

Armed with this knowledge we took on the implementation side of things.

Flavors of CGI

Before starting to code, we studied about how Apache and Nginx did CGI. We learned that over the years there were many advancements to the CGI protocol resulting in a few flavors of the protocol itself.

The basic CGI protocol as defined by RFC 3875 involves the web server executing the CGI scripts as child processes and making available the request details (requested path, query string, accepted content types, content length etc.) to them through environment variables. This means for every incoming request, there will be a new child cgi-script process spawned.

Then came FastCGI and SCGI which involve setting up an additional CGI server which maintains persistent cgi-script processes so they can be executed on-demand and thus improving performance greatly.

Communication between the CGI server and web server is over a Unix domain socket or a TCP socket. FastCGI specifies a binary protocol for communication between the web server and CGI server. It’s the fastest of the available flavors. SCGI is also similar to FastCGI but it is easier to implement and little less performant.

So which flavor do we go with? — We opted to go with the basic plain CGI protocol implementation as, again, our focus is not in building a production web server but to try to build a functional web server and learning a lot of Rust along the way.

A small detour, worth knowing

Before we could start working on the CGI protocol implementation, we had one piece of the puzzle missing.

Till now we had not completely read the entire request packet. We wanted to read the entire incoming request.

After glancing through RFC 7230, we learnt the format of HTTP packet —

HTTP-message   = start-line
                 *( header-field CRLF )
                 CRLF
                 [ message-body ]

It has start-line followed by a list of headers terminated by CRLF (\r\n).
HEAD part of the packet ends with a single CRLF occurrence.

A better picture with a sample HTTP Request,

POST /cgi/hello_ruby HTTP/1.1 (Implicit /r/n)
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT) (Implicit /r/n)
Host: ******** (Implicit /r/n)
Content-Type: application/x-www-form-urlencoded (Implicit /r/n)
Content-Length: length (Implicit /r/n)
Accept-Language: en-us (Implicit /r/n)
Accept-Encoding: gzip, deflate (Implicit /r/n)
Connection: Keep-Alive (Implicit /r/n)
(Implicit /r/n)---- END OF HEADER -----
licenseID=string&content=string&/paramsXML=string

And this is the Rust code we used to read the request until the start of the message body —

And going through our code you’ll find out that we’re using this crate, httparse for the parsing job.

Breaking down The Meta-Variables

Now that we have the HTTP request in hand and we know how to execute the CGI scripts, we will need to figure out how to pass on the request details to the CGI script.

The CGI spec, like we saw before makes use of environment variables to pass on various details about the request to the script.

There were a number of meta variables defined by the spec. We decided to implement the following though —

AUTH_TYPE  —  This basically holds the value of the Authorization HTTP header.CONTENT_LENGTH  —  This is the HTTP Content-Length header.CONTENT_TYPE  —  The HTTP Content-Type header.SERVER_NAME & SERVER_PORT  —  These meta variables hold domain / IP address & port to which the request was sent by the client. Eg. it would be localhost:8888 in development mode.REMOTE_ADDR  —  This is the IP Address of the client which is sending the request.REMOTE_HOST  —  If the client IP address can be mapped to a host name, this should be set with that else this can be set with the IP address itself.REQUEST_METHOD —  One of the many HTTP verb methods.SCRIPT_NAME  —  Name of the cgi script being executed.PATH_INFO  —  The part of the request URI that succeeds the portion that locates the CGI script.For example, in the URI — http://localhost:8888/cgi/blog/posts/1 , PATH_INFO would be posts/1 which basically identifies an application specific resource to be interpreted by the individual cgi scripts.QUERY_STRING  —  This contains the query string sent in the request URI.SERVER_PROTOCOL  —  HTTP 1.1 / HTTP 2.0 ..SERVER_SOFTWARE  —  The name and version of hte server software. rust-httpd 0.1 in our case :)

We then wrote a function that took in the HTTP request struct and few other details and returned a vector of tuples — Vec<(&str, &str)> — where the first element in the tuple holds the name of the meta-variable and the second element is the value of the variable.

We now have the meta-variables ready. Next step was to execute the CGI script from the rust program and pass all these as environment variables.

TcpListener’s incoming() vs accept()

One of the meta variables that was needed to be sent to the CGI scripts was REMOTE_ADDR which is the IP address of the client sending the request.

We were using std::net::TcpListener ‘s incoming() function to accept new connections to our TCP socket. It would be used this way —

But the incoming() function did not give us the IP address of the client.

Going back to TcpListener’s docs, we figured out that the alternate accept() method gave us access to a SockAddr which held the IP of the remote client.

But then the accept() method worked a bit different from incoming() in that it does not return an Iterator like incoming() does and needed to be wrapped in a loop {} to make it run forever. Here’s what we have now —

And we now had access to the client’s IP and we were able to set the REMOTE_ADDR variable properly.

std::process::Command

Once we had got the entire request and prepared the meta-variables from the request, all we had to do is find the script to be executed and do it.

From the docs we found out that Rust has this struct std::process::Command with beautiful interface that lets you spawn processes from your Rust program.

It follows the builder pattern and lets you specify so many aspects of child process spawning like the command line arguments, environment variables, stdin, etc. in an expressive manner.

Here is a simple example where we run the ls command —

Command::new("ls")
        .arg("-l")
        .arg("-a")
        .output()
        .expect("ls command failed to start");

In our case, we had to set a ton of environment variables (you’d do that using the .env(name, value) function) and since as per the builder pattern, each function call on the Command struct returns the struct itself, we were able to do something like so —

With our background as Ruby developers, Rust’s syntax most of the time doesn’t sit well with us. But the Command API is something we enjoyed working with, I must say.

Alright, all we had to do now was run the scripts, get their outputs and send it back to the client. And we’re done with our simple CGI server implementation.

We even have included a very silly and non-functional blog Ruby app in our repo here for fun.

What’s missing?

Although we have a functional CGI server in our hands now, there are few things that we will need to take care of.

Protocol specific headers — The CGI spec talks about setting a variety of protocol specific meta variables from headers (the HTTP Expires, ETag, Origin, etc.). This we have not done yet. We’re planning to dig into what Apache provides for its CGI scripts and try to mimic (again, if it seems interesting enough for us to do).
Request message body — With the current implementation, if you have a body portion in your HTTP Post request, it would not be available to the CGI scripts from the stdin as the spec dictates. This is up next on our TODO list and we’ll get around to it.
Redirect handling — The spec talks a great deal about handling different types of responses and redirect directives. We have not taken a good look at this yet. So this is on our TODO as well.

Note —

There was yet another Lifetime issue that took a really long time and help from Rust community for us to move ahead, we’ll elaborate on that in our next article.

Thanks for reading this. We will keep writing more about our project and developments. Stay tuned and do checkout our project on GitHub.

P.S
If you’re looking out for my Partner-in-Science, Preethi Kumar is her!