The Inner Workings of Server-side Swift

Sam Dods
Kin + Carta Created
8 min readApr 10, 2019

Very few developers, perhaps least of all iOS developers, need to write low-level networking code. We rely on native frameworks built into our target platform, such as (NS)URLSession for iOS development, and often third-party frameworks for a higher level of abstraction.

So, as an iOS developer myself, why did I bother trying to understand C-level networking APIs in order to write my own Swift server? The answer is simple: for fun, of course! 🤓

An example of when an iOS developer might use Swift to build a server is for building a mock backend for UI testing.

Swift on the server

With the news that Swift was going to be open sourced and able to run on other platforms beside Apple’s, so came along many server-side Swift frameworks, such as Perfect, Kitura, Vapor and Smoke.

While it’s great that there are so many options, I always like to have a basic understanding of what’s going on under the hood when I use a third-party framework. So I got digging.

What I arrived at was Frank, a proof of concept, standalone Swift server (very!) loosely based on Ruby’s Sinatra framework. I’ve described below the journey I went on to build it.

Consuming remote content

Server-side code exists so that it may be consumed by clients such as apps and web browsers. There wouldn’t be much point in a server without a client, so this is where I started.

Consuming remote content

The client application needs to consume remote content, such as JSON, HTML, Images, or perhaps just mock data. I mention mock data, because the only Swift servers I’ve ever actually built are for just that. I see Swift as a great language to use for mock servers, because it means the app code and the mock server code are written in the same language.

So how does an iOS app consume remote content? What is that magic?

The flow of communication between a client app and a remote service
  • Your app code will likely interface with URLSession directly, or indirectly via a third-party framework. (There are a bunch of options: Alamofire, TABResourceLoader, Hyperspace, Siesta.)
  • URLSession uses C-level APIs to open a socket for communicating over the network.
  • The network may be within a single building, span a campus, or extend to every corner of the planet, but in essence it’s the same principle as if it was simulated on a single machine, as in my mock server example. (To connect to Medium, my laptop has communicated with my router, then my ISP, in order to setup the socket-to-socket connection.)
  • The server application uses its own C-level APIs to open its own socket and “bind” it to a port number in order to listen to incoming connections.

The communication between the socket on the client and the socket on the remote is conducted using a collection of protocols. Operating systems use TCP/IP to specify how data is batched, transmitted, routed and received over the network (or internet). HTTP sits on top of this. Among other things, it specifies how the data is formatted so that the consuming application knows how to interpret it.

Inspecting the HTTP messaging format

HTTP Request

The request takes the following format:

POST /account/update_password?trace_id=29acf01f100a HTTP/1.1
Host: 127.0.0.1:8080
Content-Type: application/json
User-Agent: curl/7.54.0
Accept: application/json
Authorization: ABE0FD5A3B2C2D4A56FE7B8B9A9CF6543B21AEE8FCA
{
"password": "As if! 😂"
}

Let’s break this down:

  • POST is the HTTP method.
  • /profile/update?trace_id=29acf01f100a is the path component of the URL combined with any URL query parameter(s), which in this case is just a trace_id, but we could pass any key/value pairs here.
  • HTTP/1.1 is the protocol version number, so the recipient knows how to interpret the rest of the message.
  • Host: 127.0.0.1:8080 specifies the target machine and port number.
  • Then comes a series of Key: value pairs called headers. These are added by the client at different stages. For example, above I’ve specified values for Content-Type, Authorization and Accept keys. The curl command line application that I used to send the request has added a value for the User-Agent key. The key/values above are just examples. There may be any number of key/value pairs added to a real request.
  • A single empty line separates the headers from the body (if any; not all requests include a message body).
  • The body is everything below the empty line, until the EOF. The header Content-Type: application/json tells the receiver to expect JSON in the body.

HTTP Response

An HTTP response takes the following format:

HTTP/1.1 400 Bad Request
Content-Type: application/json
Content-Length: 182
{
"error": {
"status": "INVALID",
"message": "The password provided contains invalid characters. You may only use letters, numbers and standard punctuation characters."
}
}

Breaking this down we have:

  • HTTP/1.1 confirms the protocol version number.
  • 400 Bad Request specifies the status code and the description of that status code. If this had been successful, we would have expected 200 OK.
  • Then there are a series of key/value pairs (headers), in this case Content-Type and Content-Length. The content length is important. It means the consuming application knows how much memory to allocate for the body of the message.
  • A single empty line separates the headers from the response body.
  • The body is everything below the empty line, until the EOF.

It’s the format of the HTTP response that is important for us to understand in order to build a server, because the server will be responding to client requests.

Responding to HTTP requests

Accepting and responding to HTTP requests is actually a fairly simple process. The hardest part is figuring out how to interact with the C APIs after they’ve been bridged to Swift. In the code that follows, the complex C-wrangling is hidden behind these helper methods.

The steps to run a server are as follows:

  1. Create a socket reference
  2. Set socket options
  3. Bind the socket to an IP address and port number
  4. Listen on the socket
  5. Wait for a request
  6. Parse and route the request
  7. Write the response (if any) to the socket
  8. Close the request

Steps 5–8 are repeated for the lifetime of the server application (i.e. it’s a loop that runs indefinitely until the application is terminated).

The following code sets up the socket and listens on it:

Then we need to loop indefinitely while we wait to accept and respond to requests.

It’s by no means trivial to read the contents of the HTTP request and it’s completely up to you how you build the response, depending on what it is that your server is meant to be doing. I’ve simplified this in the code below so as not to detract from the C-level API usage I’m trying to demonstrate. (I’m using my own Request type and additional methods I’ve written, routeRequest, handleError and httpResponse, which are explained later.)

As you can see from the code above, accepting the connection (step 5), writing the request (step 7) and closing the request (step 8) are pretty straightforward. It’s parsing and routing the request (step 6) which is the more difficult part.

The fd variable is a file descriptor. It’s just an integer that the lower-level functions use to reference the data received on the socket. It represents the connection to one specific client. (The sock variable is another file descriptor representing the socket itself.)

I’m passing the file descriptor into the initialiser for my Request type. In the initialiser, I read and parse the data to set httpMethod and path properties—assuming the data is in the HTTP format as expected.

The routeRequest method decides how the request should be processed by the server application, i.e. what is the request and how should we respond. Is it a request for profile information? Search results? Is it a friend request? Is it an update password request? This is where the server executes the business logic that the request is initiating. The server can optionally return a response body string to be sent back to the client that sent the request.

The handleError method transforms an error into a response body string and status code in order to be written to the response to the client.

The httpResponse method builds an HTTP-formatted response, given the body string and the status code.

And that’s pretty much it! Now we can use it…

Serving content

I said this server was (very!) loosely based on Sinatra, but Frank doesn’t look much like Sinatra yet, does it?

Sinatra endpoints (or routes) are defined, for example, like so:

get '/home' do
"🏠"
end

So the final pieces of the puzzle are to wrap up all the server code above and put it in a static start method; provide the get method to offer similar syntax to Sinatra; and to define the routeRequest method that I mentioned above:

Then I can define my endpoints and launch the server as follows:

Build and run in Xcode to advertise the server on the port specified. Or archive to a command line utility to run from the command line.

The full code is available here: https://github.com/samdods/Frank

Where to go from here?

Frank is not going anywhere. It’s a demo project and I’m obviously not going to compete with the server-side Swift heavyweights. But if I wanted to, I’d begin by adding support for:

  • Dynamic paths, e.g. /profile/:id, where id is a variable I can use in the scope of handling the request.
  • Query parameters, e.g. /search?query=sausages, where the query parameters are available in the scope of handling the request.
  • All HTTP methods, POST, DELETE, HEAD, etc.
  • Command line arguments for things like port number and IP address.
  • Handling multiple requests at the same time. (Current solution is “blocking” the single thread while waiting for data to be received and processing each request.)
  • Running the server on Linux.

In order to support Linux and macOS at the same time, I’d need to wrap a load of my code in compiler directives, like so:

#if os(macOS)
... import Darwin, etc.
#elseif os(Linux) || os(FreeBSD) || os(Android)
... import Glibc perhaps, etc.
#else
fatalError("Unsupported operating system")
#endif

What have I learnt from this?

It wouldn’t be much use without some lessons to take away…

  • Working with C APIs is a pain, especially if you need to support multiple operating systems. But it’s fun once you crack it!
  • Working with a higher-level framework is much nicer. Even SwiftNIO is a nice abstraction (Apple’s framework, which Kitura and Smoke both use under the hood).
  • Don’t attempt to bind an IPv4 address (AF_INET) to an IPv6 socket (AF_INET6). It won’t work! 🤬
  • Don’t use myString.count for string byte length! Use myString.utf8.count instead. 🤪

Conclusion

It’s awesome that there are so many open source frameworks allowing us to write Swift code for server-side applications. However, like any third-party dependency, it’s nice to have a vague understanding of how they work internally. When choosing a framework, have a look at the selling points of each one and make your choice based on use case. And bear in mind, that if your requirement is basic, then you may not even need a dependency at all.

Unfortunately I can’t tell you which server-side Swift framework will come out on top, but I can tell you it won’t be Frank. 😉

If you’re interested in how things work and you want to join a passionate team of Swift enthusiasts, head to our careers page!

Follow me on Twitter for occasional mutterings.

--

--

Sam Dods
Kin + Carta Created

Tech Lead and Mobile Evangelist based in Edinburgh, Scotland