Dyno š¦ : AWS, Swifter
Last time, we did a lot of setup to get started with Amazon Web Servicesā DynamoDB, including using the Swift-Python bridge so we could use the official AWS interface boto3 to communicate with DynamoDB.
But boto3 has several limitations, aside from being Python-based and hence without Swift type-safety : itās complex, difficult to use ā and has a major problem that it is synchronous.
To see the problem, run the code we arrived at last time:
ā¦now, try switching off your computerās WiFi and run it again. What happens? The table.scan()
line just hangs there for 30 seconds, until you get a nasty-looking exception and your program crashes (with an unrecoverable fatal error). Thatās really not the behaviour weād expect from a library call or an application which might have intermittent network connectivity like a mobile appĀ¹.
This article ā a better boto
The Dyno library aims to do better than that, and in this article weāll see how! As before, although the idea is to produce a useful library, I also hope to show techniques you can use in your own code.
- Weāll make the
boto3
calls asynchronous. This will demonstrate the use of Semaphores, Work Queues, and Work Items - Weāll have Dyno publish out an Observable stream of results, leveraging the new Swift 5
Result
type. Here weāll demonstrate Observables and Reactive programming with complex data streams - Weāll add some useful, type-safe ways to read and write data from DynamoDb, natively from Swift. This demonstrates some great functional constructs like
zip
andflatMap
on our datatypes
At the end of the article, weāll be writing our Dinosaurs (š¦ and š¦, of course) to our DynamoDB database, directly from Swift ā then reading them back, all in an asynchronous fashion and properly taking network delays into account. This will form the beginning of our Dyno library. As before, this library is being developed in the open so you can see the source code on github here (swiftify branch).
Thereās a lot to do, so letās get going!
Observable Streams
As I mentioned in the prior article, Observables are a way to represent streams of data. We can hook these up to Reactive components in order to be able to process our data stream in a highly functional and declarative way. This is an incredibly powerful way of representing data operations ā weāll get into this in a later article, but for now we will look at how our DynamoDB interactions can be represented as Observables.
The key to modeling our data interactions is to note that they all look like this:
- Ask DynamoDB to do something (Scan the table, update a row, etc)
- Wait for a result (200 rows returned, successful update)ā¦
- or for an error, eg. a timeout or a data integrity error.
We model these stages with an observable stream of the DynoActivity
data type:
Which will look like this as a stream of Observables (the orange and red marbles represent the Observable events):
One thing we do for now is assume that we get all our data returned in one go, even for large queries (hundreds of rows returned, for instance): we donāt āpageā the output. We might change this in the futureĀ².
You might also notice that we want our observable streams to work asynchronously and in a multi-threaded fashion : we can have multiple streams running at the same time, some reading data and some writing.
Why are we not using a Future for this type of async data request/response? Using an Observable stream makes it very easy to handle interaction patterns like āShow a wait icon until the data is returned, or show an errorā. This is pretty fundamental for a real-world application.
Reality Check
Before we get to creating our high-level Observables, we need to deal with the realities of interfacing synchronously with a remote database over an unreliable connection ā and using a Python interface to boot.
Specifically we need to make sure that Dyno is controlling the activity on the AWS connection, and not leaving it to Boto3ās 30-second synchronous, program-terminating timeouts. So how do we make Boto3 asynchronous and multi-threaded, without having control over the Boto3 code ourselves?
Weāre going to make use of DispatchSemaphores
, DispatchQueues
and DispatchWorkItems
.
DispatchWorkItems
allow us to wrap up a parcel of work ā in this case, where we perform
the call to Boto3 ā and send it off to be executed on a DispatchQueue
. Importantly, a DispatchWorkItem
can also be terminated at any time ā for example, after a timeout has been reached. Weāre going to use this to force the Boto3 call to stop gracefully without crashing the whole program when it times out.
perform
returns a value of Result
type, which is new in Swift 5. Results
are either .success
(with a success value) or .failure
(with a failure). As weāll see a bit later, we use Result
in many places in Dyno in order to ensure we have a consistent way to report any errors.
Our use of DispatchWorkItem
initially looks like this:
We can use a DispatchSemaphore
to wait for the DispatchWorkItem
to complete. Semaphores are a common concept in asynchronous computing, and are essentially a flag which can be set across multiple threads of execution, and used to coordinate access to shared resources. The shared resource in this case is the access to the AWS connection via Boto3.
So what we do in the DispatchWorkItem
is to signal the Semaphore when the Boto3 call is completed : either when it succeeds, or when it fails. The Semaphore sits and waits to get the signal - or it waits until a timeout period which we set on the Semaphore (by default, 5 seconds). This has then put Dyno back in control of the AWS connection: we can now start and terminate the connection according to our own timeline, and signal an error ā rather than failing the program ā if the connection times out.
Adding the semaphores gives code like this (slightly simplified)
Furthermore, by running the whole DispatchWorkItem
/ DispatchSemaphore
asynchronously by putting it on the DispatchQueue.global().async
work queue, we can spawn off separate threads of AWS connections, each with their own timeouts, running independently of each otherā¦
ā¦almost: thereās one final gotcha to trip us up: Swiftās ultra-safe memory model wonāt let us run multiple calls to the same Python object (the boto3 connection) simultaneously, because it canāt prove they wonāt interfere with each other. Fortunately, thereās a simple way round this: we just run the DispatchWorkItem
on our own DispatchQueue
, rather than spawning off a brand new thread each time. That forces serial access to the underlying AWS connection, which actually is not a bad thing as it enforces transaction safety on the database side.
In future if this becomes a performance bottleneck, we could look at thread pooling via multiple boto3 connections.
You can see the final code in the perform
function in the main Dyno
struct.
Back to the functional world
So, now we have our perform
function which is returning an Observable stream - letās make some calls to DynamoDb !
To do that, we use a protocol which allows us to abstractly represent actions on the database:
The
perform
function there is actually the same one in theDispatchWorkItem
code above: this is what gets called ultimately to impact the database.
There are currently 3 structs which implement this, and provide 3 extensions to Dyno:
ActionGetItem
which retrieves a single item given its keyActionPutItem
which puts an item with a key in the database (or updates the existing item with that key)ActionScanAll
which scans all of the rows in the table, looks for a filter, and returns the remainder (the all part of the name means that DynamoDB will look at the whole table, even if you specify a filter that returns only a few items, or no items)
Bringing a Dyno to life
Letās walk through one Action
in a bit more detail. The ActionGetItem.perform
implementation is below (simplified slightly, and annotated) :
Firstly, you might notice that the function returns DynoResult
: this is simply a typealias for the regular Result
type, but always returning a DynoError
in the case of failure.
Walking through each of the steps:
- At Step 1 we use a helper function to call the Python Boto3 library. We donāt call the function
get_item
directly, but getboto3Call
to do it, and we pass the arguments["Key":[keyField: keyValue ]]
separately. Why do we do this? Inboto3Call
this allows us to catch any exceptions thrown by Python so we can turn them into.failure
values, rather than crashing the program!. You can look atboto3Call
to see how this is done. - Note the result of Step 1 is a
DynoResult
value. If that result was.success
, then we want to continue on to manipulate the returned value. We do that via aflatMap
at Step 2. TheflatMap
will take the result of the Boto 3 call (namedlookup
) and make sure we did indeed get the right value returned.
But note, if the Boto 3 call returned
.failure
, then the lookup check wonāt be executed, and weāll just return.failure
from the whole function. Itās this consistency of return processing which makes it really nice to use theResult
type. Readers of my prior articles may note thatResult
is a Monad.
- In Step 3 we call a
builder
. This is given to us by the user of the library when they call thegetItem
function. In the Boto3 library, the return values are just a dictionary; this is common in Python, but in Swift we much prefer type safety, so thebuilder
allows us to translate a dictionary into a value of typeT
. Once more,builder
can return a.failure
if the type canāt be converted.
Builders and an old friend
If you look at the main
function youāll see an example Builder which creates Dinosaur
objects:
This is a bit clunky at the moment: we are exposing the
PythonObject
to our Swift code ā and of course the right way to do this in Swift is viaCodeable
objects. Weāll fix this later!
getStr
is a helper function which checks if a given key does indeed exist in the dictionary we are given, and if so, returns it as .success
; otherwise itāll return a .failure
.
But what is zip3
? Iāve added a number of zipX
functions to the Dyno library too, specifically for the Result
type. zip3
is a bit like regular zip
, but taking 4 parameters, rather than 2: the first parameter (the with
) is a function to call only if the remaining 3 parameters all evaluate to .success
: if any of them are .failure
then the Dinosaur.init
wonāt be called.
This means we can treat Result
as an applicative functor : read the linked article for more insight although zipX
gives a straightforward way to use Result as an applicative.
Jurassic Park š¦ š¦
The final thing we do is to create a wrapper object Dyno
which we use to abstract our database connection. We can then add helper functions like getItem
to that, to kick off our Actions
.
So letās give it a whirl. Because we return Observable
s, this looks a bit more complicated with merge
and subscribe
and dispose
ā¦ but if you look past that you will see our Dyno library being called to setItem
which writes an item to the Dinosaurs database, followed by scan
to retrieve the items written.
Those reactive Observable
s look complicated, but they actually give us a lot here. We want to write the Dinosaurs in parallel; but we donāt want to read until both writes have succeeded. Observable.merge
allows us to run the setItem
Observable streams in parallel; and the .concat
then waits for the merged stream to complete before running the scan
.
In a future article weāll see how to plug those Observable
s into UI components; but for now weāll just let the log
method show what is written and read.
For Next Time
Phew, that was a lot of work! But the library is taking shape. There are a few things I didnāt go into ā take a look at the filters for scan
, for instance. But we have more to do:
- Testing! How on earth do we test a library relying on a remote database?
- Codable Letās shoo out those last few š ā¦
Until next time!
Update ā the next article in this series is now out!
Ā¹This is not a problem with Swift, or our connection via PythonKit: itās true in the Python library too and there are many online discussions about it!
Ā²In fact Boto3 does automatically page large requests, but Dyno will concatenate the pages back together and return the total resultset. You can look at the `ActionScanAll` struct to see how this is done.