Dyno 🦕 : AWS, Swifter

10 min readMar 15, 2019

Last time, we did a lot of setup to get started with Amazon Web Services’ DynamoDB, including using the Swift-Python bridge so we could use the official AWS interface boto3 to communicate with DynamoDB.

Pterodactyl. Wikimedia Commons/openclipart

But boto3 has several limitations, aside from being Python-based and hence without Swift type-safety : it’s complex, difficult to use — and has a major problem that it is synchronous.

To see the problem, run the code we arrived at last time:

…now, try switching off your computer’s WiFi and run it again. What happens? The table.scan() line just hangs there for 30 seconds, until you get a nasty-looking exception and your program crashes (with an unrecoverable fatal error). That’s really not the behaviour we’d expect from a library call or an application which might have intermittent network connectivity like a mobile app¹.

This article — a better `boto`

The Dyno library aims to do better than that, and in this article we’ll see how! As before, although the idea is to produce a useful library, I also hope to show techniques you can use in your own code.

We’ll make the boto3 calls asynchronous. This will demonstrate the use of Semaphores, Work Queues, and Work Items
We’ll have Dyno publish out an Observable stream of results, leveraging the new Swift 5 Result type. Here we’ll demonstrate Observables and Reactive programming with complex data streams
We’ll add some useful, type-safe ways to read and write data from DynamoDb, natively from Swift. This demonstrates some great functional constructs like zip and flatMap on our datatypes

At the end of the article, we’ll be writing our Dinosaurs (🦕 and 🦖, of course) to our DynamoDB database, directly from Swift — then reading them back, all in an asynchronous fashion and properly taking network delays into account. This will form the beginning of our Dyno library. As before, this library is being developed in the open so you can see the source code on github here (swiftify branch).

There’s a lot to do, so let’s get going!

Observable Streams

As I mentioned in the prior article, Observables are a way to represent streams of data. We can hook these up to Reactive components in order to be able to process our data stream in a highly functional and declarative way. This is an incredibly powerful way of representing data operations — we’ll get into this in a later article, but for now we will look at how our DynamoDB interactions can be represented as Observables.

The key to modeling our data interactions is to note that they all look like this:

Ask DynamoDB to do something (Scan the table, update a row, etc)
Wait for a result (200 rows returned, successful update)…
or for an error, eg. a timeout or a data integrity error.

We model these stages with an observable stream of the DynoActivity data type:

Which will look like this as a stream of Observables (the orange and red marbles represent the Observable events):

One thing we do for now is assume that we get all our data returned in one go, even for large queries (hundreds of rows returned, for instance): we don’t “page” the output. We might change this in the future².

You might also notice that we want our observable streams to work asynchronously and in a multi-threaded fashion : we can have multiple streams running at the same time, some reading data and some writing.

Why are we not using a Future for this type of async data request/response? Using an Observable stream makes it very easy to handle interaction patterns like “Show a wait icon until the data is returned, or show an error”. This is pretty fundamental for a real-world application.

Reality Check

Before we get to creating our high-level Observables, we need to deal with the realities of interfacing synchronously with a remote database over an unreliable connection — and using a Python interface to boot.

Specifically we need to make sure that Dyno is controlling the activity on the AWS connection, and not leaving it to Boto3’s 30-second synchronous, program-terminating timeouts. So how do we make Boto3 asynchronous and multi-threaded, without having control over the Boto3 code ourselves?

We’re going to make use of DispatchSemaphores , DispatchQueues and DispatchWorkItems.

DispatchWorkItems allow us to wrap up a parcel of work – in this case, where we perform the call to Boto3 – and send it off to be executed on a DispatchQueue. Importantly, a DispatchWorkItem can also be terminated at any time – for example, after a timeout has been reached. We’re going to use this to force the Boto3 call to stop gracefully without crashing the whole program when it times out.

perform returns a value of Result type, which is new in Swift 5. Results are either .success (with a success value) or .failure (with a failure). As we’ll see a bit later, we use Result in many places in Dyno in order to ensure we have a consistent way to report any errors.

Our use of DispatchWorkItem initially looks like this:

We can use a DispatchSemaphore to wait for the DispatchWorkItem to complete. Semaphores are a common concept in asynchronous computing, and are essentially a flag which can be set across multiple threads of execution, and used to coordinate access to shared resources. The shared resource in this case is the access to the AWS connection via Boto3.

So what we do in the DispatchWorkItem is to signal the Semaphore when the Boto3 call is completed : either when it succeeds, or when it fails. The Semaphore sits and waits to get the signal - or it waits until a timeout period which we set on the Semaphore (by default, 5 seconds). This has then put Dyno back in control of the AWS connection: we can now start and terminate the connection according to our own timeline, and signal an error – rather than failing the program – if the connection times out.

Adding the semaphores gives code like this (slightly simplified)

Furthermore, by running the whole DispatchWorkItem / DispatchSemaphore asynchronously by putting it on the DispatchQueue.global().async work queue, we can spawn off separate threads of AWS connections, each with their own timeouts, running independently of each other…

…almost: there’s one final gotcha to trip us up: Swift’s ultra-safe memory model won’t let us run multiple calls to the same Python object (the boto3 connection) simultaneously, because it can’t prove they won’t interfere with each other. Fortunately, there’s a simple way round this: we just run the DispatchWorkItem on our own DispatchQueue, rather than spawning off a brand new thread each time. That forces serial access to the underlying AWS connection, which actually is not a bad thing as it enforces transaction safety on the database side.

In future if this becomes a performance bottleneck, we could look at thread pooling via multiple boto3 connections.

You can see the final code in the perform function in the main Dyno struct.

Back to the functional world

So, now we have our perform function which is returning an Observable stream - let’s make some calls to DynamoDb !

To do that, we use a protocol which allows us to abstractly represent actions on the database:

The perform function there is actually the same one in the DispatchWorkItem code above: this is what gets called ultimately to impact the database.

There are currently 3 structs which implement this, and provide 3 extensions to Dyno:

ActionGetItem which retrieves a single item given its key
ActionPutItem which puts an item with a key in the database (or updates the existing item with that key)
ActionScanAll which scans all of the rows in the table, looks for a filter, and returns the remainder (the all part of the name means that DynamoDB will look at the whole table, even if you specify a filter that returns only a few items, or no items)

Bringing a Dyno to life

Let’s walk through one Action in a bit more detail. The ActionGetItem.perform implementation is below (simplified slightly, and annotated) :

Firstly, you might notice that the function returns DynoResult : this is simply a typealias for the regular Result type, but always returning a DynoError in the case of failure.

Walking through each of the steps:

At Step 1 we use a helper function to call the Python Boto3 library. We don’t call the function get_item directly, but get boto3Call to do it, and we pass the arguments ["Key":[keyField: keyValue ]] separately. Why do we do this? In boto3Call this allows us to catch any exceptions thrown by Python so we can turn them into .failure values, rather than crashing the program!. You can look at boto3Call to see how this is done.
Note the result of Step 1 is a DynoResult value. If that result was .success, then we want to continue on to manipulate the returned value. We do that via a flatMap at Step 2. The flatMap will take the result of the Boto 3 call (named lookup) and make sure we did indeed get the right value returned.

But note, if the Boto 3 call returned .failure, then the lookup check won’t be executed, and we’ll just return .failure from the whole function. It’s this consistency of return processing which makes it really nice to use the Result type. Readers of my prior articles may note that Result is a Monad.

In Step 3 we call a builder. This is given to us by the user of the library when they call the getItem function. In the Boto3 library, the return values are just a dictionary; this is common in Python, but in Swift we much prefer type safety, so the builder allows us to translate a dictionary into a value of type T . Once more, builder can return a .failure if the type can’t be converted.

Builders and an old friend

If you look at the main function you’ll see an example Builder which creates Dinosaur objects:

This is a bit clunky at the moment: we are exposing the PythonObject to our Swift code – and of course the right way to do this in Swift is via Codeable objects. We’ll fix this later!

getStr is a helper function which checks if a given key does indeed exist in the dictionary we are given, and if so, returns it as .success; otherwise it’ll return a .failure.

But what is zip3 ? I’ve added a number of zipX functions to the Dyno library too, specifically for the Result type. zip3 is a bit like regular zip, but taking 4 parameters, rather than 2: the first parameter (the with) is a function to call only if the remaining 3 parameters all evaluate to .success: if any of them are .failure then the Dinosaur.init won’t be called.

This means we can treat Result as an applicative functor : read the linked article for more insight although zipX gives a straightforward way to use Result as an applicative.

Jurassic Park 🦕 🦖

The final thing we do is to create a wrapper object Dyno which we use to abstract our database connection. We can then add helper functions like getItem to that, to kick off our Actions.

So let’s give it a whirl. Because we return Observables, this looks a bit more complicated with merge and subscribe and dispose… but if you look past that you will see our Dyno library being called to setItem which writes an item to the Dinosaurs database, followed by scan to retrieve the items written.

Those reactive Observables look complicated, but they actually give us a lot here. We want to write the Dinosaurs in parallel; but we don’t want to read until both writes have succeeded. Observable.merge allows us to run the setItem Observable streams in parallel; and the .concat then waits for the merged stream to complete before running the scan.

In a future article we’ll see how to plug those Observables into UI components; but for now we’ll just let the log method show what is written and read.

For Next Time

Phew, that was a lot of work! But the library is taking shape. There are a few things I didn’t go into — take a look at the filters for scan, for instance. But we have more to do:

Testing! How on earth do we test a library relying on a remote database?
Codable Let’s shoo out those last few 🐍 …

Until next time!

Update — the next article in this series is now out!

¹This is not a problem with Swift, or our connection via PythonKit: it’s true in the Python library too and there are many online discussions about it!

²In fact Boto3 does automatically page large requests, but Dyno will concatenate the pages back together and return the total resultset. You can look at the `ActionScanAll` struct to see how this is done.