The Inner Workings of Server-side Swift
Very few developers, perhaps least of all iOS developers, need to write low-level networking code. We rely on native frameworks built into our target platform, such as (NS)URLSession
for iOS development, and often third-party frameworks for a higher level of abstraction.
So, as an iOS developer myself, why did I bother trying to understand C-level networking APIs in order to write my own Swift server? The answer is simple: for fun, of course! 🤓
An example of when an iOS developer might use Swift to build a server is for building a mock backend for UI testing.
Swift on the server
With the news that Swift was going to be open sourced and able to run on other platforms beside Apple’s, so came along many server-side Swift frameworks, such as Perfect, Kitura, Vapor and Smoke.
While it’s great that there are so many options, I always like to have a basic understanding of what’s going on under the hood when I use a third-party framework. So I got digging.
What I arrived at was Frank, a proof of concept, standalone Swift server (very!) loosely based on Ruby’s Sinatra framework. I’ve described below the journey I went on to build it.
Consuming remote content
Server-side code exists so that it may be consumed by clients such as apps and web browsers. There wouldn’t be much point in a server without a client, so this is where I started.
The client application needs to consume remote content, such as JSON, HTML, Images, or perhaps just mock data. I mention mock data, because the only Swift servers I’ve ever actually built are for just that. I see Swift as a great language to use for mock servers, because it means the app code and the mock server code are written in the same language.
So how does an iOS app consume remote content? What is that magic?
- Your app code will likely interface with
URLSession
directly, or indirectly via a third-party framework. (There are a bunch of options: Alamofire, TABResourceLoader, Hyperspace, Siesta.) URLSession
uses C-level APIs to open a socket for communicating over the network.- The network may be within a single building, span a campus, or extend to every corner of the planet, but in essence it’s the same principle as if it was simulated on a single machine, as in my mock server example. (To connect to Medium, my laptop has communicated with my router, then my ISP, in order to setup the socket-to-socket connection.)
- The server application uses its own C-level APIs to open its own socket and “bind” it to a port number in order to listen to incoming connections.
The communication between the socket on the client and the socket on the remote is conducted using a collection of protocols. Operating systems use TCP/IP to specify how data is batched, transmitted, routed and received over the network (or internet). HTTP sits on top of this. Among other things, it specifies how the data is formatted so that the consuming application knows how to interpret it.
Inspecting the HTTP messaging format
HTTP Request
The request takes the following format:
POST /account/update_password?trace_id=29acf01f100a HTTP/1.1
Host: 127.0.0.1:8080
Content-Type: application/json
User-Agent: curl/7.54.0
Accept: application/json
Authorization: ABE0FD5A3B2C2D4A56FE7B8B9A9CF6543B21AEE8FCA{
"password": "As if! 😂"
}
Let’s break this down:
POST
is the HTTP method./profile/update?trace_id=29acf01f100a
is the path component of the URL combined with any URL query parameter(s), which in this case is just atrace_id
, but we could pass any key/value pairs here.HTTP/1.1
is the protocol version number, so the recipient knows how to interpret the rest of the message.Host: 127.0.0.1:8080
specifies the target machine and port number.- Then comes a series of
Key: value
pairs called headers. These are added by the client at different stages. For example, above I’ve specified values for Content-Type, Authorization and Accept keys. Thecurl
command line application that I used to send the request has added a value for the User-Agent key. The key/values above are just examples. There may be any number of key/value pairs added to a real request. - A single empty line separates the headers from the body (if any; not all requests include a message body).
- The body is everything below the empty line, until the EOF. The header
Content-Type: application/json
tells the receiver to expect JSON in the body.
HTTP Response
An HTTP response takes the following format:
HTTP/1.1 400 Bad Request
Content-Type: application/json
Content-Length: 182{
"error": {
"status": "INVALID",
"message": "The password provided contains invalid characters. You may only use letters, numbers and standard punctuation characters."
}
}
Breaking this down we have:
HTTP/1.1
confirms the protocol version number.400 Bad Request
specifies the status code and the description of that status code. If this had been successful, we would have expected200 OK
.- Then there are a series of key/value pairs (headers), in this case Content-Type and Content-Length. The content length is important. It means the consuming application knows how much memory to allocate for the body of the message.
- A single empty line separates the headers from the response body.
- The body is everything below the empty line, until the EOF.
It’s the format of the HTTP response that is important for us to understand in order to build a server, because the server will be responding to client requests.
Responding to HTTP requests
Accepting and responding to HTTP requests is actually a fairly simple process. The hardest part is figuring out how to interact with the C APIs after they’ve been bridged to Swift. In the code that follows, the complex C-wrangling is hidden behind these helper methods.
The steps to run a server are as follows:
- Create a socket reference
- Set socket options
- Bind the socket to an IP address and port number
- Listen on the socket
- Wait for a request
- Parse and route the request
- Write the response (if any) to the socket
- Close the request
Steps 5–8 are repeated for the lifetime of the server application (i.e. it’s a loop that runs indefinitely until the application is terminated).
The following code sets up the socket and listens on it:
Then we need to loop indefinitely while we wait to accept and respond to requests.
It’s by no means trivial to read the contents of the HTTP request and it’s completely up to you how you build the response, depending on what it is that your server is meant to be doing. I’ve simplified this in the code below so as not to detract from the C-level API usage I’m trying to demonstrate. (I’m using my own Request
type and additional methods I’ve written, routeRequest
, handleError
and httpResponse
, which are explained later.)
As you can see from the code above, accepting the connection (step 5), writing the request (step 7) and closing the request (step 8) are pretty straightforward. It’s parsing and routing the request (step 6) which is the more difficult part.
The fd
variable is a file descriptor. It’s just an integer that the lower-level functions use to reference the data received on the socket. It represents the connection to one specific client. (The sock
variable is another file descriptor representing the socket itself.)
I’m passing the file descriptor into the initialiser for my Request
type. In the initialiser, I read and parse the data to set httpMethod
and path
properties—assuming the data is in the HTTP format as expected.
The routeRequest
method decides how the request should be processed by the server application, i.e. what is the request and how should we respond. Is it a request for profile information? Search results? Is it a friend request? Is it an update password request? This is where the server executes the business logic that the request is initiating. The server can optionally return a response body string to be sent back to the client that sent the request.
The handleError
method transforms an error into a response body string and status code in order to be written to the response to the client.
The httpResponse
method builds an HTTP-formatted response, given the body string and the status code.
And that’s pretty much it! Now we can use it…
Serving content
I said this server was (very!) loosely based on Sinatra, but Frank doesn’t look much like Sinatra yet, does it?
Sinatra endpoints (or routes) are defined, for example, like so:
get '/home' do
"🏠"
end
So the final pieces of the puzzle are to wrap up all the server code above and put it in a static start
method; provide the get
method to offer similar syntax to Sinatra; and to define the routeRequest
method that I mentioned above:
Then I can define my endpoints and launch the server as follows:
Build and run in Xcode to advertise the server on the port specified. Or archive to a command line utility to run from the command line.
The full code is available here: https://github.com/samdods/Frank
Where to go from here?
Frank is not going anywhere. It’s a demo project and I’m obviously not going to compete with the server-side Swift heavyweights. But if I wanted to, I’d begin by adding support for:
- Dynamic paths, e.g.
/profile/:id
, whereid
is a variable I can use in the scope of handling the request. - Query parameters, e.g.
/search?query=sausages
, where the query parameters are available in the scope of handling the request. - All HTTP methods,
POST
,DELETE
,HEAD
, etc. - Command line arguments for things like port number and IP address.
- Handling multiple requests at the same time. (Current solution is “blocking” the single thread while waiting for data to be received and processing each request.)
- Running the server on Linux.
In order to support Linux and macOS at the same time, I’d need to wrap a load of my code in compiler directives, like so:
#if os(macOS)
... import Darwin, etc.
#elseif os(Linux) || os(FreeBSD) || os(Android)
... import Glibc perhaps, etc.
#else
fatalError("Unsupported operating system")
#endif
What have I learnt from this?
It wouldn’t be much use without some lessons to take away…
- Working with C APIs is a pain, especially if you need to support multiple operating systems. But it’s fun once you crack it!
- Working with a higher-level framework is much nicer. Even SwiftNIO is a nice abstraction (Apple’s framework, which Kitura and Smoke both use under the hood).
- Don’t attempt to bind an IPv4 address (
AF_INET
) to an IPv6 socket (AF_INET6
). It won’t work! 🤬 - Don’t use
myString.count
for string byte length! UsemyString.utf8.count
instead. 🤪
Conclusion
It’s awesome that there are so many open source frameworks allowing us to write Swift code for server-side applications. However, like any third-party dependency, it’s nice to have a vague understanding of how they work internally. When choosing a framework, have a look at the selling points of each one and make your choice based on use case. And bear in mind, that if your requirement is basic, then you may not even need a dependency at all.
Unfortunately I can’t tell you which server-side Swift framework will come out on top, but I can tell you it won’t be Frank. 😉
If you’re interested in how things work and you want to join a passionate team of Swift enthusiasts, head to our careers page!
Follow me on Twitter for occasional mutterings.