Bringing death certificates to life with agile tools

Published in

Innovation and Technology

15 min readMar 28, 2018

The City of Boston’s Digital Team is very excited this week for the launch of our death certificates web app. This project is a collaboration among ourselves, the Enterprise Applications team, and the Registry and Treasury departments. We’ve written a blog post all about the new app and how it makes buying a death certificate from the City more convenient than ever before.

Home page for the new death certificates web app

We’re particularly proud of the software engineering behind this launch. This is the first site that uses our new tech stack and production infrastructure for web applications. These tools have let us build a site that is fast, accessible, and (we hope!) maintainable for years to come.

In this post, we’re going to take you through how we chose to build the death certificates site. We’ll explain how the tools we picked helped us work agilely, with quick iterations leading to a solid first release.

Why we picked death certificates

The Digital Team is a part of the City of Boston’s Department of Innovation and Technology. To quote our mission statement: “We build digital experiences designed around the needs of our constituents. We work to make these tools beautiful, welcoming, and highly useful.”

Our most important responsibility is Boston.gov, the City’s website. Though it’s written by people from all over the City, we handle content editing, maintenance, and development of new content management features. Our team also oversees design and branding for the entire City.

We also build web apps! We have an ongoing project to modernize the services hosted on the legacy cityofboston.gov website, but we also make brand new things, like an upcoming rewrite of the 311 web portal and this death certificates app for the Registry Department.

Before this app, the only options to buy a death certificate from the City were by mail or by coming to City Hall in person. The only online option was a third-party service that partnered with the Commonwealth of Massachusetts. We knew that if we offered death certificates online ourselves, we could make an easier-to-use site and provide them at a lower price.

How do we want to build?

We started working on the death certificates app at about the same time as we started our rewrite of the 311 web portal (coming later this year). We knew that if we built the two apps similarly that we could share best practices between them and make long-term maintenance more consistent. We would also lay a solid groundwork for how to develop more apps in the future. Since we’re only planning to grow our portfolio of apps, we knew that taking advantage of economies of scale would be essential for our small team to keep on top of everything.

There are a huge number of options out there today for building a web app. To narrow down our choices, we set out looking for tools that met these criteria:

A big, active community. If a lot of people are using something then there will be tons of tutorials and places to go to get help, and tons of libraries we could re-use rather than build ourselves.
Generally useful. If we want those economies of scale, what we pick for 311 and death certificates also needs to be adaptable to apps we work on in the future.
Provides guard rails. We want tools that will catch our bugs or keep us from writing them in the first place.
Lets us iterate quickly. We want to be able to make incremental improvements and ship them—either to staging or production—as soon as they’re ready.

That last point, iterating quickly, consistently guided our choices. We like to work in an agile manner, continuously delivering incremental improvements. Though “agile” is often talked about around team process, we’ve found it’s critical that the software libraries, frameworks, and tools that a team uses support and reinforce an agile workflow. If you take a look at the Principles behind the Agile Manifesto, you’ll see that the first three points, around rapid continuous, rapid delivery and handling changing requirements, are directly affected by a team’s development tools.

When we think about rapidly delivering changes, we often think in terms of “cycle time”: the time between starting a piece of work and shipping it out, either to a pre-release staging environment or to production. We want our cycle time to be short; the sooner that a change gets released, the sooner it’s providing value to someone and the sooner we can get the feedback we need to make it even better.

One way to reduce cycle time is to keep each individual change to the site as small as possible. Smaller changes can be written, tested, and reviewed faster, and are less risky to release. If we need to do something big, we’ll still try to break it down into separate pieces that can be shipped one at a time. This is often called working in “small batches.” You’ll see below that we settled on tools that encourage making those small changes.

Tools for a changing UI

React for building our interfaces

Our first choice was a framework for building a user interface. Though the UI for death certificates is fairly modest (search results, details page, shopping cart, checkout flow), the one we were designing for 311 was shaping up to be a lot more complex, with dynamic forms and interactive maps. For a JavaScript-heavy site like that, we knew we wanted React.

React is hugely popular, with good reason. It could handle what we wanted for 311 but still scale down to death certificates’ straightforward UI without being overly complex. We love React’s declarative rendering and especially how it just uses JavaScript (with JSX syntactic sugar) as a template language.

With the help of a CSS-in-JS solution (we’ve used both styled-jsx and Emotion), our UI changes can often be done one file at a time, instead of to a separate JavaScript file, template file, and CSS file. Not only does this mean the changes are smaller, it’s less code that needs to be kept in sync. That helps reduce bugs.

Next.js for setting things up

Using React became much easier by adding in Next.js, a Node library specifically for running React apps. The death certificates site feels so fast because Next.js gives us server-side rendering of the initial page, code splitting for smaller JavaScript files that download faster, and prefetching so that the browser is all ready to render the next page before you even click to it.

You can see this in action by clicking around on the site. Other than when loading search results or actually submitting an order at the end, each page appears instantaneously. In fact, we had to take out the “loading” progress bar from most pages because they appeared so fast that it was just a distracting flash.

Next.js makes an engineer’s day-to-day development faster as well. It enables hot module reloading so that when you change code in a text editor you can see it immediately in your browser. This speeds up both building and debugging.

Next.js is a great example of a tool that gives us guard rails. Though we could have set up the configurations and architectures for all of the above features ourselves, doing so is notoriously tricky. So is keeping them working as the development environment changes. With Next.js, that becomes somebody else’s problem. In agile terms, for us, it’s “maximizing work not done.” We can just focus on our app.

Storybook for getting it right

A Storybook story that shows just our “order details” component, with some sample certificates.

Our third tool for developing UIs rapidly is Storybook. Storybook renders the UI components of our app in isolation. You can browse the Registry app’s Storybook to see what we mean.

With Storybook, we can get pieces of the UI looking and working the way we want them to before plugging them into the rest of the app. We can test them in different situations, such as when errors pop up, without having to reproduce those conditions over and over again. For example, you can see what the certificate page looks like normally, when it’s already in the cart, when it’s pending, or when the certificate ID isn’t found.

Storybook is even more valuable because we can automatically use it with Percy for visual regression testing. This is a guard rail that tells us if a code change will make things look different in someone’s browser. If we had to configure Percy separately, it would eventually fall out of sync and lose its usefulness. But, since these tools overlap, the more we use Storybook because it makes development easier, the better our Percy coverage becomes. We’re always on the lookout for times when our tools can reinforce each other like this.

Keeping a tight development loop

The concept of cycle time applies not only to a product’s overall process, as we described above, but also to how a developer does their day-to-day work. As we code, we’re changing files in a text editor, going to a browser to make sure everything works, and repeating. With React, Next.js, and Storybook, we have tools that let us make small changes and see them immediately.

With a different set of tools, we might have to wait for a long compilation process, for a slow site to load up, or to click our way back to the part of the app we were working on. Even delays of a minute or two would encourage us to batch up several changes before testing them. Larger batches means longer cycle time, which means our fixes and features take longer to write.

Continuous delivery to staging

Code just sitting on an engineer’s laptop isn’t doing anyone any good. We need to “deliver [our] working software frequently” to our product managers, designers, and stakeholders in Registry and Treasury. To do that, we need to be able to ship painlessly.

Our first commit to source control initialized the repository with GitHub’s default README. Our second commit was a bare-bones app, based on the structure of the 311 web portal. By our third commit, we were automatically deploying from Travis CI to Heroku.

We like these tools because they’re easy to set up, integrate well with GitHub, and, for our open-source, low-traffic use, free.

From the very beginning of death certificates, any change that was committed went to staging within minutes, without any further action. Our tools supported us when we made small changes. Since there was no cost to deployment, there was no reason not to check things in as soon as they were ready and see them live.

If getting new code out were a long, manual process, as is sometimes the case, we’d probably wait until several changes were ready before pushing one out. That lengthens cycle time: the earliest change is just inventory, not providing any value, until a fixed schedule or critical mass of other changes prompts a new release. With continuous deployment, our product managers and designers can see their feedback implemented by lunch, not batched up with a bunch of other things that goes out sometime next week.

Developing with and without data

Fetching with GraphQL

The death certificates app needs to show search results, individual certificate records, and write orders to a database. To communicate this data between the React app running in the browser and the backend server, we turned to GraphQL.

Since our first experiences with GraphQL, we’ve never wanted to go back to the REST model for client/server communication. REST typically makes you choose between generalized endpoints —which either return too much data (slow) or require multiple network requests to get everything (also slow)—and view-specific endpoints, which can be tedious to maintain.

With GraphQL and the graphql-js library, our server code connects to the database and returns data in a general way, independent of any UI designs. Since it’s not coupled to the client’s needs, it can be tested in isolation and is fairly straightforward to maintain.

On the browser side, our app makes GraphQL queries for the data it needs for the particular page the user is on. If we decide to change the UI in a way that displays different data, we just modify the query on the browser side and don’t have to touch the server at all. Here’s the GraphQL we use to do a search, which returns individual certificates (under results) as well as the info needed for pagination:

query SearchDeathCertificates($query: String!, $page: Int!, $startYear: String, $endYear: String) {
  deathCertificates {
    search(query: $query, page: $page, startYear: $startYear, endYear: $endYear) {
      page
      pageSize
      pageCount
      resultCount
      results {
        id
        firstName
        lastName
        deathYear
        deathDate
        pending
        age
        birthDate
      }
    }
  }
}

How to get data is the sole concern of the server. Which data is needed is the sole concern of the client. The lack of coupling between the two speeds us up. Just as React let us make isolated changes to our UI, GraphQL lets us make isolated changes to how we get data. Those isolated changes? They’re small! Small is good.

GraphQL nicely leverages Node’s native concurrency, and Promises especially. For example, when you visit your cart and the client makes a GraphQL query for all the certificates in it, fetching them all in parallel is as straightforward as using Promise.all and map.

Admittedly, a full GraphQL backend is slight overkill just for the Registry app. It provides some benefits over a more ad-hoc solution, but requires a bit of boilerplate and a level of conceptual complexity as well. Nevertheless, GraphQL is invaluable for the 311 web portal. Being consistent between the two reduces the conceptual burden somewhat.

Safely iterating on data with Flow

Early on, we didn’t have any data because we were still setting up our connection to the Registry’s database. We wanted to get started on our UI though, so we introduced fake data (mostly based on comic book characters) to have something to show as we iterated with product and design.

One real danger when working with fake data is that it might be formatted differently than the eventual real data, leading to wasted work and possible bugs adapting to the database once it’s in place. We guarded against that risk in two ways:

First, we had GraphQL as a layer between our UI and the database. We wrote a GraphQL schema that we knew we could eventually fulfill with live data, so all we needed to do was have the GraphQL endpoint return fake data directly. The UI code we wrote against the GraphQL schema wouldn’t have to change once the database was available.
Second, we used Flow to statically type-check our code. We used apollo-codegen to generate types from our GraphQL queries. Flow made sure that every piece of data used by our UI was actually going to be returned by our GraphQL queries. As we iterated and tweaked the schema or the queries, Flow would tell us everywhere that needed updating to match.

Once we did have database access, Flow helped us yet again. Based on talking to the Enterprise Applications team and inspecting the responses to our SQL queries, we were able to write Flow types for the SQLServer response structure and death certificates table specifically. With those in place, we could confidently write the code that converted from the database to the GraphQL schema the UI code was already using, knowing that Flow would ensure we were using field names that existed and matched our expectations for their types.

Static type checking is one of the best guard rails we have. It’s useful during early development when everything is changing a lot and it’s easy to forget where you need to update. It pays off again down the road when you need to update things without breaking pieces of the app you’ve completely forgotten about.

We chose Flow early on because we knew it had good support for React, and, since it acted as a linter, we didn’t have to integrate it into our JavaScript bundling (which is handled by Next.js). We may experiment with TypeScript in the future—we already use it for our web components, since it’s built into Stencil—if third-party library or editor support reaches a tipping point in its favor. The next-typescript plugin integrates it into Next.js.

Shipping to production early and often

Getting started on AWS

Up to this point, we had been doing all of our staging deployments to Heroku. We like Heroku because it has tight GitHub integration and takes care of all of the complexities of deployment for us. It worked great right up until the point where we needed to connect to the Registry database, which is safely behind a firewall in a City data center.

We liked having a cloud-based deployment, but exposing a SQLServer cluster to the Internet just so we could access it from Heroku seemed like a bit of a security risk. So, we followed the lead of the Analytics and Data team and got up and running on Amazon Web Services.

Unlike with Heroku, we could set up a VPN tunnel from our AWS VPC’s private subnets back to the City data center. Now our apps could be deployed to the cloud but still securely connect to the Registry database.

Shipping with Docker

Without Heroku, we had to find a new way to continuously deploy our code. We tested out Amazon’s Elastic Beanstalk tool, but the way it put our apps on separate EC2 instances was a waste of resources. By tech industry standards, City web services do not get very much traffic.

Instead, we looked for a way to pack just a handful of instances with as many apps as could fit into memory, which led us to choose Docker containers. Each container is an isolated environment for a single app, so—within the limits of the machine’s resources—they can co-exist without interfering with each other.

To deploy, we have our Travis CI job build a container image, upload it to Amazon’s Elastic Container Service, and create a CloudFormation change set to push it out. We then get a message in Slack with the exact command line to paste into a terminal to trigger the release. (For security reasons, the AWS users we use for Travis don’t have permissions to change production directly.)

Shippy-Toe, our very own shipit squirrel, lets us know in Slack how to push the new release out.

Amazon’s Application Load Balancers give us zero-downtime deploys. They check to make sure new container instances are up and healthy before draining connections from the old ones. This eliminates any user-visible cost to deployments (no one’s use of the app is interrupted) and de-risks them at the same time (if the new version won’t launch then the old, working one will safely stay in place). Since there’s no downside to making a release, we can do it whenever we want, with as small a batch as we want.

Though we like the functionality that we have built on AWS, we’re not satisfied with what it takes to do it. CloudFormation does not have a very friendly language for defining infrastructure. We had to write a lot of hard-to-understand templates to make this work. We also had to write some deployment scripts ourselves, which means we have to maintain them as well. For example, EC2 + ECS has no built-in feature to prevent auto-scaling instances from shutting down (such as during system updates) while they have containers running on them. We had to adapt code from a blog post to keep from dropping traffic when we need to update our AMIs.

Migrating this to something that’s friendlier (we hear Terraform is good, and the Data and Analytics team is already using it) and more industry-standard (Kubernetes for container orchestration, which is now directly supported by AWS) is something we’re interested in, though we would have to prioritize it against building new apps.

Other tools for going fast

As part of the new branding that launched with Boston.gov, we developed the Fleet pattern library of CSS and HTML components. We use this across all of our websites and apps to keep the UIs looking stylish and consistent.

Prettier is the best. Auto-formatting means that we don’t have to spend any time editing spaces and indents to conform to a style guide. The noticeable productivity benefits of having code just jump into place when you press save have to be experienced to be believed.

For unit testing, we’ve adopted Jest. It can run tests in parallel for speed, and its snapshot feature saves the effort of writing tedious test assertions when what you really want to say is, “this is fine now, just give me a heads-up if it changes.” These days, the code that gets exercised by unit tests doesn’t tend to be affected by differences among our supported browsers, so we don’t miss not being able to run them directly in a browser. We much prefer the stability and speed of Jest’s jsdom-based approach.

Looking ahead

We’re proud of what we’ve built so far with the death certificates app. If you do happen to need it, we hope that you find it fast and easy to use. We’re looking forward to bringing more Registry services online as well, with marriage and birth certificates on our roadmap.

We’re also excited to keep working in the tech stack described above because of how happy and productive we feel when we can be agile and quickly make new improvements. As we evolve it over time, we’ll balance the value of staying consistent with any gains we can get from a different tool.

Does this sound like a way you’d like to work? Do you want to bring beautiful, welcoming, and highly-usable tools to the people of Boston? We’re hiring a full-time software engineer to join the team in City Hall. Apply now!