Demystifying Ruby Applications, Ruby Application Servers, and Web Servers
This article is meant to provide new developers with a mental model of web application architecture. Platforms are constantly changing, and it is not enough to be proficient in the technical requirements of a single development platform. Without an understanding of the underlying architecture of web applications, skill in one platform will not transfer to skill in another platform. This is due to a lack of first principles.
To put it another way, proficiency with one web app development platform is not the same as being a proficient web app developer. The latter requires a robust mental model of what web applications do in general, regardless of the particular “platform-of-the-day.” A strong foundation about what it is that web apps accomplish behind the scenes is the difference between an “API user” and a “developer.”
The aim of this article is to introduce a mental model that will bridge the gap between an API user and a developer. It is written for people who have already been introduced to HTTP and web development. The concepts introduced here are transferrable to development in many languages, although the language used for examples is Ruby. Therefore, acquaintance with Ruby, and with some web app frameworks built with Ruby (such as Sinatra or Rails) will make the read more comfortable. Knowledge of Ruby and web app development in Ruby is therefore not necessary, but it is recommended. Knowledge of HTTP and general web development is necessary, but an introductory level of understanding will suffice.
The final section of this article, titled “Further Reading,” will list many resources that will be of use to the developer looking to learn even more (i.e., every developer should be exploring the resources in this section!).
TCP and Sockets
Ruby’s standard library provides a class called TCPSocket. TCPSocket allows Rubyists to utilize the socket API more easily. The term “API” is short for Application Programming Interface. Any API is merely a collection of programs that an application can use to access other components of the operating system. Therefore, the socket API allows applications — such as those written in Ruby — to make use of sockets (more on what sockets are, below).
In web app development, the sockets used to communicate over the web are called TCP sockets. TCP stands for “Transmission Control Protocol.” TCP specifies how information should be passed between a sender and a receiver remotely. TCP is one of many protocols in something called the “TCP/IP” stack, otherwise known as the “Internet Protocol Suite.” The IP suite contains a lot of protocols, and these protocols can be divided into the following layers:
- The Network Interface layer (otherwise known as “Link” layer)
- The Internet layer
- The Transport layer
- The Application layer
The transmission control protocol is mapped to the transport layer, and the transport layer specifies how information is transmitted from one remote system to another in a reliable manner. To accomplish this, the location of the remote device must be specified. This location is referred to as an IP address. Further, the location within the device itself through which the destination application is receiving data must be specified. This location is referred to as a port. Applications exist on one machine — one IP address — but they receive information through different ports. Therefore, if a web server needs to communicate with a web browser, the web server ought to send the data to the port through which the web browser is listening (and not the port through which the device’s email client is listening, for example).
To send data over networks, we need to know the end point before the data is ever sent. This requires knowledge of the IP address of the destination, and the port number through which the application receives data. The IP address and port number concatenated together is called a socket.
If you wanted to send HTTP data to a computer, it wouldn’t be enough to send it to the IP address; the computer does not inherently know which application is supposed to receive the data. Therefore, we use sockets, which define the exact destination that the data should take: both the IP address of the receiving computer as well as the port on the computer which the receiving application is using. But the sending application doesn’t have a way to find out which port the receiving application (on the destination computer) is listening to. Therefore, it makes use of default ports, which are default only by convention.
Another name for these default ports is “well known” ports. The standard set of “well-known ports” are simply port numbers that the Internet Assigned Numbers Authority has dedicated to applications or services as conventions. For example, port 80 is conventionally used by HTTP services, but that doesn’t mean that FTP services can’t, technically, use port 80.
An Iteration of the Request/Response Cycle (Ruby without a Web Server)
The following is a description of what a Ruby application would have to accomplish to respond to client HTTP requests by directly interacting with the
The application would use the socket API of the operating system (via Ruby’s TCPSocket class) to open a passive connection, which notifies the operating system that the application is ready to receive connections through a particular port. The application could be assigned to listen to any port number that is not already being used by another service (let’s say that, in this hypothetical application, the application is assigned to port 4567).
Once the application opens a passive connection to listen on, it is ready to receive an HTTP request. Once it receives the request, what it does next is up to the code written in the application. The HTTP request will arrive as one very long string. The entire content of the headers and the body will have to be parsed so that the application will be able to refer to specific headers, the request path, and the HTTP method, etc.
For the code in the app to execute, the client will need to send an HTTP request to the socket on which the application is listening. Remember that a TCP socket is a concatenated IP address and port number. So, the request will have to be directed not only to the host’s (in this case, the developer’s local machine) IP address but also to the port number on the host on which the application is passively listening (port 4567). Since 4567 is not a “well-known” port for HTTP services, this is not very practical for production, but it is possible (and is a common practice in local development). In local development, the developer’s web browser is the client. The host is named localhost, and the port number will be 4567. To send an HTTP request to localhost:4567, the developer just needs to enter localhost:4567 in my web browser.
Once the application has parsed the HTTP request, it will perform whatever actions specified in the code, and will then have to return an HTTP-compliant response. That means returning the status, the headers, the body, all formatted properly as string data types.
Once the application code formats the input that it received from the client, it will use that input in some way to accomplish “server-side” tasks. Once it accomplishes these tasks, it will send a response back to the client through port 4567 and to the particular socket (IP_of_client:port_number) through which the client is receiving information.
Introducing a Web Server to Our Ruby Code
There are a few problems with the above approach. First, the application won’t be able to handle multiple requests at once. It will only be able to serve one client at a time. This is especially a problem for slow clients that take a while to send all the data for their request through the socket, or for clients that receive data slowly through the socket. While the data is being sent to the client or being received from the client, the single application instance is occupied. Second, as the HTTP requests grow in complexity, it will take more and more application logic to parse the request successfully.
The solution is, obviously, to implement all of this functionality. The question is whether to implement these features within the application code itself or separate the HTTP relevant functionalities of the application (the features that allow the application to function over the web) and the rest of the application code. It would make more sense to implement the HTTP-relevant features separately so that future applications can reuse this implementation without a lot of effort on the part of the developer.
It would, therefore, be most efficient to develop an application that handles HTTP-relevant features. It would be even better if an application that handles sockets and HTTP relevant features already existed, and application developers only had to worry about developing the business logic of the application itself.
In this case, instead of opening a TCP connection directly between our app and the socket, we could let a pre-built application make use of the socket API of the operating system. This pre-built application would interface with the socket API of the OS, and would, therefore, be receiving HTTP requests from the client. It would pass the HTTP request to our application, would handle multiple requests at once, and would even provide features like killing connections that take too long, or serving static files without calling the application itself, so that the application truly only needed to be dedicated to producing the dynamic content and making server-side changes in response to forwarded HTTP requests.
This pre-built application would take incoming textual information and format it into HTTP text that is more user-friendly (instead of one long string). Then, we can connect our Ruby application to THAT application. Client requests would first go through the pre-built “web facing” application, and the developer’s application would sit behind the scenes, no longer worrying about handling simultaneous requests, parsing HTTP requests into easily-usable formats, etc. This pre-built application is called a web server (think apache or nginx).
So, since we are delegating some of the responsibilities to a web server, we have less work to do ourselves. The web server is responsible for using the operating system’s socket API to receive information in the form of HTTP requests. We then connect the web server to our application code (written in Ruby), and the input to our application will be HTTP-formatted text, and other useful information about the request, from the web server rather than the unparsed text from the TCP socket. Therefore, our app isn’t receiving HTTP requests directly: it is receiving input from the web server.
There’s a problem with this approach, too. Even with the use of a web server, the web server doesn’t have a way to start our application directly. Further, the web servers speaks HTTP, and the application doesn’t.
One solution to this might be to simply require that all applications be ready to receive HTTP-textual input directly from a web server: to make web applications speak HTTP.
The problem with this potential solution is that the app alone would have to do a lot more than just parse an HTTP request sent from a web server, execute code based on the information in that request, and then return a properly formatted HTTP response. The communication between the web server and the application would have to be done over a socket, which means the application will have to implement socket communication. If more than one request comes through at the same time, the application would have to start multiple instances of itself. Further, the application would have to monitor for crashes and implement a way to handle those crashes. These are just a few of the many functionalities an application would have to implement.
All of these functionalities are markedly distinct from the business logic an application should contain. These features are general and necessary regardless of the web application. In other words, whatever web application you build, you will need these features (and more) to make it run on the web successfully and securely. It makes sense, therefore, to separate these responsibilities. It makes sense to build something like an HTTP interface that can translate HTTP requests, forwarded from the web server, into sensible arguments that are then passed to the application. This interface would also translate the application’s non-HTTP response into HTTP that is then passed back to the web server and, finally, to the client… There exist many such interfaces, and they are appropriately named application servers.
Putting an Application Server Between the Web Server and Ruby App
The application server is itself an application, and it will execute your application just like an object calls another object’s method in Ruby. This application server contains code that allow it to parse HTTP information into Ruby data structures that make sense (e.g. turning HTTP request headers from text into hashes) . It contains code that allows it to interface with a standard web server via socket connections (which utilize the
socket API of the operating system). It contains code that captures the return value of your application and transforms the return value into HTTP-formatted text. It does all of this so that your application doesn’t have to.
Application servers provide an HTTP interface for applications and web servers. The application server and web server communicate via a socket connection (either a TCP or a Unix socket. The difference is that Unix sockets replace a host with a file path, which is faster since the overhead of routing over a network is removed. This only works if the application server and web server are on the same machine). The application server receives HTTP text from the web server, and then the application server’s HTTP parser will format this into Ruby-friendly data structures. When the application server is ready to hand the parsed request to the Ruby app, it will call your Ruby app with a method, passing in the parsed request as an argument or series of arguments. The Ruby app will execute, and the application server will store the Ruby app’s return value in a variable. It will re-format this variable into an HTTP response that the web server will understand, and then it will send that formatted HTTP response back to the web server via a socket connection. From there, the web server sends the HTTP response to the client.
This architecture exemplifies the modular, separation-of-concerns based approach to development. With the use of application servers, our applications no longer have to understand HTTP at all: that is the job of the application server. Further, our applications no longer have to return HTTP-compliant responses: that is the job of the application server.
I mentioned above that an application server is simply an application that provides an HTTP interface. An interface has at least two sides. An application server is an interface between a web server and Ruby application. On the web server side, it is written to handle the type of information web servers send, and to successfully return the type of information web servers understand (HTTP text).
On the application side, it is written to call the application and to pass in certain arguments to the application. Those arguments represent the HTTP request. It is also written to capture the return value of calling that application, formatting that return-value into HTTP text, and sending it back to the web server.
This description begs the question: What arguments does the application server pass to the Ruby application? Without the answer to this question, a developer wouldn’t know how to write their Ruby app so that the Ruby app works with the application server in question. A developer needs to know what method the server calls on the Ruby app, what arguments are passed to the Ruby app, the data type of the arguments, the order in which the arguments are passed in, how the application server locates the Ruby app to that it can call it, etc. A developer needs to know all of this to ensure that the application code responds to the write method, accepts the right arguments, and returns the right values.
Since developers use multiple frameworks to build Ruby web apps, and since developers use multiple application servers to handle these web apps, it is easy to see how the problem of many-servers:many-frameworks could arise. If all the servers use a different interface to call the Ruby app, and if all the frameworks construct Ruby apps that expect to be called with different methods and arguments, managing compatibility issues can become a big problem.
This scenario is the perfect use-case for introducing a protocol or set of conventions. This protocol would specify which method all application servers should use to call an application, and which method all applications should respond to. It would specify which values the application would return, once called, and therefore it would specify which values the application server could expect to be returned from the Ruby app (making it much easier to construct a valid HTTP response out of the values returned by the Ruby app). It would also specify the argument(s) that should be passed into the app when called. In short, it would specify a common language, or interface, that Ruby apps and application servers could use to communicate.
Such a protocol exists, and it is called the “Rack Specification.”
The above diagram graphically represents the many-frameworks:many-servers problem, and then represents how Rack fixes this problem. The above diagram refers to application servers as “Ruby web servers.” Application servers and Ruby web servers are synonymous.
The details of Rack are not as important for the sake of this article. The purpose here was to develop a high-level mental model of Ruby web application architecture. If you are interested in learning more about Rack specifically, please read the series of articles I wrote about my experience developing a custom application framework using the Rack specification and the Rack ruby gem. In that series, I discuss the Rack specification in greater depth and then walk through the development of a framework that can be used to develop web applications. Every step of the way contains snippets of the actual code I used, an explanation of the reasoning process I used to write the code in that particular way, and an explanation of what the code does, line by line.