Serverless Stack Experiences

We are in the pre-launch phase of a new product and that early stage means an opportunity to select technologies without an emphasis on “integrating with what is already there”. It is tempting to think of this phase as “laying strong foundations”, this is true, but, not about the tech itself: “Static” and “made from concrete” and “building for a generation” are not only misguided but can be destructive goals for tech. Instead of being solid it needs to be resilient. It needs to be flexible and adaptable and you need to realize that you are going to get some parts wrong and swap them out for alternate decisions or new emerging tech. However, “Laying strong foundations” is more true about the tech values and engineering practices such as: separation of concerns, role of boundaries and abstractions, quality culture around testing, systems thinking for ops, and longer term decisions such as data lifetime and id space.

But it’s also really fun to set up and work with new systems. The Properly stack is pre-release and below are the tech choices we have made so far on the back end, and some thoughts about how effective the systems are and how well they meet some of the large goals above. There will be more to share as we see how these technologies and decisions respond to going live.

Starting with boundaries:

Swagger for API definitions, specifically SwaggerHub has online editing and documentation as well as tools to export into systems (AWS gateway) and export libraries for language bindings. This has been awesome to have a structured definition of the API. High points have been the fact that AWS has extended the semantics to permit describing how AWS AP Gateway binds to Lambda. This is a strong “Ops as config/code” pattern. One challenge we encountered is that it is a bit rough to edit some of the extension fields such as “body mapping templates” in swagger hub: they are a magic json string with lots of escape characters needed. Next steps: at this point we have just been periodically extracting the api definition to archive changes in github and been periodically exporting the sdks to get the java serialization model classes created. Integrating the definition into source control would give us some history and separation of tool from source. More interestingly I would really like to use the swagger code generation project to create the models and api sdk during the build process. We will learn a lot about this flow post launch when API evolution (or versioning) gets much more real.

ID Space

I am extremely (perhaps disproportionately?) focused on id space in tech design. In my experience poor selection has constrained the scalability and design of a lot of projects, and it is one of the lease switchable tech choices you will make. (The scalability and availability connection relates to the fact that concepts like “auto increment integers” rely on a central source of truth or complicated distributed semaphores, as parallelism goes you end up wanting to move to eventually consistent and ridiculously distributed models as much as possible and it is nice if your id generation doesn’t have to go back to some single DB master.). And if you want to see a teams face turn pale at the same time suggest changing the id space in a 5 year old system.

So my very favourite id is the UUID standard because you get collision avoidance without coordination and a high degree of confidence you won’t have to change your mind. However, I am discomforted by the use of string protocols on the wire and in databases for storing them because technically the standard string format is ambiguous in allowing [a-f] and [A-F] and other format variants. It doesn’t sit well for me. Also there is a pragmatic voice in my head that wants URLs that don’t wrap twice in a editor and design voice in my head that wants the urls slightly less ugly.

Where does that leave us? For Properly we are going with uuidv4 base64 URI variant encoded. And then most of the system treats them as opaque strings. Every platform has a decent base64 encoder and even if they don’t have built in the variant that substitutes to uri safe characters that can quickly built. I’ll get off my soapbox now.

Id’s are just part of it, there is also the rest of the data:

Data

Data outlasts code. (Hat tip to my colleague Michal Swart for highlighting that). So data modelling and storage choice is important, and nearly as impactful as id space for future scaling and resilience approaches. We are working in dynamo db. The server code is in java, but, we are not binding java objects directly at this point and instead passing in primitive types in maps and lists. This decision was relatively arbitrary and is something we will do a spike on in the future. One concern was losing some of the schema-less capabilities during a time we expected to make a lot of changes to our data model.

The big promise of dynamodb is resilience and scale and we haven’t tested the limits there, so can’t speak to the upside fully. In terms of challenges, some elements have seemed a bit like magic strings and a little bit fragile to magic incarnations: getting the Update Expression string correct for an “upsert” style operation, particularly with nested or complicated objects was rough. And if some of the data you are supplying is null there seems to be no way in the expression to conditionally set it (the conditions only apply to existing data) so you have to dynamically construct the expression string. We have had to do some tech spikes as dynamo is new technology for our team. It is very expressive, and functional, but we have run into edges of the capability.

Its the kind of code you want a lot of tests around:

Testing

Unit and Automation tests, from the start. We are being selective about the amount and type of tests to be most useful during a time of rapid change. Happy path and common negative case integration tests for each api operation and unit tests around any materially complex unit. So far we have been building with explicitly injected dependencies and avoiding mock tooling.

I have also been very happy with mavens plugin package management and even maven’s opinion format for directories and build phases; it allows the “defaults to work” on what is normally a complicated part of the system. Even setting up a dynamodb local instance (for integration tests against dynamodb) was relatively painless. Deciding to go with that structure involved a lot of research and avoiding some defunct approaches, but it is working seamlessly. The biggest gotcha we experienced was needing an aws_id and secret (both fake) in the environment path for the aws sdk to be happy connecting to the local instance during a test run in CI and the error messages of what was happening were a bit unclear.

Speaking of CI, everything has been set up to run through travis-ci. And travis has just enough aws integration for us to control automatic deployment to a staging environment for both front end code (new S3 buckets for each build, with a predefined lifecycle rule to expire the build out) and for the backend (automatic lambda deploy of a new version). Travis is awesome for us and it has been keeping cycles tight to have linked the workflow right from code through to the deployment to aws:

Operations Environment (aka Working Software has to be live)

This is a decision that has some inertia as well, particularly as I mentioned dynamodb. When you have a lot of code and looking built up around a provider, and particularly when you have a lot of data in that provider you are unlikely to want to change. Not quite “lock in” but moves are expensive. However, it is also really expensive to try to maintain generality and build for portable targets. So our path here has been to commit to native aws tooling and just be ready to take switching cost impacts if we find we need to move. AWS really has a capability and support and sophistication lead, so we haven’t felt constrained at this point, and are loving going full server less.

Our serverless aws operations stack is:

  1. Route53 dns configured to point to …

2. CloudFront as CDN in front of …

3. S3 buckets that contain our all-front-end-rendered app (yeah SEO and page load time on an empty cache is something we will work on) and the app gets its data through …

4. API Gateway (authorized with jwt tokens from Cognito — hooray we don’t manage user secrets) bound to …

5. AWS Lambda methods written in java (yeah we might switch to node or go) which uses …

6. Dynamo DB for persistence.

We have nicely consistent tagging for staging stacks to be able to separate clearly from prod and do that promotion process easily. We have separate IAM roles for various operations, but, have not tightened the security policies as much as we want to in order to avoid inadvertent data loss. Still to come is cloud watch for monitoring and deploying caching when we uncover where we want to divert load or improve responsiveness. I expect as we get further in the project we will be employing an event system and processing data async with the lambda functions as well, but, the realtime direct storage fits the current use case.

Conclusion

Very happy with the above tech so far, and feeling well prepared with the separation of concerns and other engineering practices and considerations that will let us adapt in the future. I will share more about the front end tech choice as well as more as we go live and and as the project responds to changing requirements.

Like what you read? Give Craig Dunk a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.