Upgrading your outreach: Serverless Transit Accessibility with Taui

Trevor Gerhardt
Conveyal
Published in
11 min readOct 4, 2018

This is part three in a series about interactive transit accessibility sites and is more aimed towards web developers. We recommend reading one and two first.

Taui is Conveyal’s solution for publishing fast interactive transit accessibility websites using high fidelity models that scale to handle any number of users for a fraction of the cost of previous methods. To accomplish this we created a method that allows us to pre-generate all necessary data up front enabling the sites to run without needing an active server for network analysis and routing. Designing these “serverless” websites required many engineering hours, tweaking binary formats, generating many terabytes of data, and crashing our browsers countless times with experimental prototypes. So why go through the trouble?

Sharing personalized transit accessibility models with all interested parties can be difficult and expensive due to the variable loads servers need to handle and the computational power required to generate them. The required capacity of the server also depends on the geographic size of a region and complexity of the transit system. Lifespan and expected max load for publicly shared and tweeted projects all come with variable costs. For a small team, generating and maintaining an increasing number of public facing sites backed by these number crunching servers became an exceedingly difficult task with future costs impossible to estimate. We realized this after the huge success of our first public interactive site.

At Princeton Junction, jobs reachable by public transit within 80 minutes: 3,086,000

In 2014 we worked with the Regional Plan Association, an urban advocacy group focused on the New York metropolitan area, to create our first iteration of these sites to answer the quintessential accessibility question — how does where you live affect the number of jobs you can reach? Through our web application thousands of users were able to see personalized accessibility information that was previously impossible to share. It became one of our most popularly shared projects to date and inspired this CityLab article.

It was also one of our costliest projects.

The single beefy high memory AWS EC2 server required to handle the New York metropolitan area cost over $500 per month! After running for over three years (don’t do the math) the tool is no longer online, but the original code can be found here.

This experience, as well as reactions to subsequent tools built by others (e.g. Bus Connects and the Toronto Transit Explorer), convinced us of the value of interactive sites. As we argued previously, such sites allow users to see impacts in dynamic, personalized, and meaningful ways that static printed reports alone cannot achieve. It is clear that demand will continue to grow for sites that quickly and easily display:

  • Travel time isochrones for specifically geocoded locations
  • Accessibility information to one or more sets of geographic points (or opportunities) like jobs or homes
  • Transit routes to a destination along with the estimated travel time and potential alternative paths

But meeting this growing demand with costly live servers like our initial New York City example would not be practical or sustainable. We needed to develop a consistent way to customize, deploy, and host lower-cost, higher-stability sites for our customers.

What is Taui?

Taui is our open source solution for creating and publishing interactive transit accessibility websites for any scenario created in Conveyal Analysis.

What sets Taui apart is its combination of speed, customizability, ease of deployment, a low running cost, and scalability to handle any number of concurrent users. The main technical innovation areas enabling this are in the custom binary data formats, generation and storage of the data, and the website deployment. These differences are important when comparing it to similar features found in Conveyal Analysis itself.

With each deployment we generate many gigabytes of data up front — hundreds in larger scenarios. Origins and destinations are represented on high resolution regular grids rather than with administrative boundaries or arbitrary polygons like many other transportation analysis systems.

The five distinct datasets we generate are:

  1. Analysis manifest
  2. Transit network metadata
  3. Travel times (per origin)
  4. Transit routes (per origin)
  5. Opportunities

Each is necessary in being able to create the full user experience but each is also distinct in its creation and use.

Analysis Manifest

The manifest file describes the parameters used to create the analysis. This includes standard navigation details like walk speed and modes used and also internal parameters like the exact version of our transit analysis engine (R5) and the dimensions and location of the data.

Transit Network Metadata

The transit models we use are generally based on one or more sets of GTFS schedule data which can be quite a large set of CSV files depending on the system. These files contain data that is absolutely necessary to generate transit models. But by the time Taui utilizes the base data we only need to use a much smaller subset of that data:

  • Routes and their colors, shapes, and names
  • Patterns and which route they are in
  • Stops and their locations

We store these by their IDs so that they can be looked up easily later. This format is a simple JSON format and is used in other applications of ours that also utilize the Transitive.js library to display stylized routes.

Travel Times

Travel time isochrones generated in Taui

For each point in our grid (tailored for the geographic area) we find the travel time to every other point in the grid. We store this in a custom binary format with a header indicating the dimensions of the grid. The times are integers in grid order which enables geographically close points, and therefore points likely to have similar or close times, to be consecutive. We then delta encode the values. This results in a lot of “0”s and allows for compression algorithms to compress them even further.

Each point’s travel time grid is stored separately in a “bucket” on Amazon’s Simple Storage Service (S3) and therefore can be retrieved individually. With knowledge of the place we stored the bucket and the grid dimensions it’s “easy” to translate a user friendly location (like an address) to a point that we can lookup in the bucket and retrieve the travel times.

An example conversion from an address to a usable index:

  1. A user enters an address: 1301 U St NW, Washington, DC
  2. Address is geocoded into a geographic point: [longitude, latitude]
  3. Using the grid dimensions found in the manifest file we can bin the geographic point into corresponding grid location: [x, y]
  4. And then retrieve the data directly from the webpage without going through a server with an HTTP GET of the URL: https://{cloudfrontId}.cloudfront.net/{projectId}/{x + y * grid.width}_times.dat

We also have the ability to store multiple travel times per point in a grid. For example, this enables storing the 25th, median, and 75th percentile travel times for each point to help characterize the variability created by transit schedules (a topic explained more in detail here). But since our default is to only use the median travel time we rearranged a previous version of encoding and storage order to contain only one percentile in each block and add additional percentiles onto the end of the file in a new block.

Transit Routes

Routes, stops, shapes, names, and colors

In the process of generating a travel time from an origin we find hundreds of possible paths used to arrive at the destination. We then select a few representative paths near the chosen percentile. These paths are stored using the board stop ID, pattern ID, and alight stop ID used for each segment of the journey. Using those IDs we can look up the stop and pattern information (geographic location, name, route shape) from the GTFS data file. With the full stop and pattern information we can show the route on the map and a route summary in the side panel. This data is also stored in a custom binary format to maximize compression for storage and delivery speed.

Transit paths data can be retrieved similarly to travel times data, just replace step five URL with: https://{cloudfrontId}.cloudfront.net/{projectId}/{index}_paths.dat.

Opportunities

Residents in Toronto, ON displayed as opportunities via a dot map in a Taui deployment

Typically we look at job types or population demographic data — like the datasets that can be found in the Longitudinal Employer-Household Dynamics Origin-Destination Employment Statistics (LODES) maintained by the US Census Bureau (and can be found here) — but anything that is a value tied to a geographic point can potentially be used as an opportunity data set.

Usually, we’ll upload Shapefiles or CSVs to Conveyal Analysis which converts them automatically into our regular grid format.

Serverless Website

One of the main distinguishing features of Taui is that there is no active server running. Typically a website like this would have a server that could deliver the website data — HTML, CSS, and JavaScript — and a server running to calculate paths and travel times via the transit and street network on demand using OpenTripPlanner or R5. Since we’ve already pre-generated all the the paths and travel times we’ve removed the need for the transit modeling server. Next we can customize the style, text and features before generating the specific website data and deploy it in a way that it’s attached to the transit data.

To do this we include a set of four easily editable YAML configuration files. These files are typical when using our internal tool Mastarm. The env.yml file requires you to set a Mapbox token for geocoding and Leaflet tile URL for customizing which base map you want to see. The messages.yml file allows customizing all text shown on the screen and is intended to be easily translated into non-English languages. The store.yml file allows customizing which transit data sets you want to use, where you want the default origin point to be, what opportunity datasets you want to use and whether or not they should be displayed as a dot map with Gridualizer, and if you want to see the action log and transit data customization panel on the screen.

Lastly, the settings.yml file is used for customizing build and deployment data like which S3 bucket to use, where the built JavaScript and CSS files should be put, and which AWS CloudFront distribution the cached files live under.

Output from a deployment in the console. It can also be configured to post to a Slack channel.

Once these files are customized specifically to your deployment and you have an S3 bucket ready for a static site deployment (see Amazon’s own guide here) you can easily run, build, and deploy right from the command line. We have a default HTML file that can be placed alongside the JavaScript and CSS that’s generated from deploy. Lastly we use AWS’s Route53 domain name service to point a named URL (https://taui.conveyal.com) to the CloudFront distribution (https://dtxp4r6unhi7.cloudfront.net).

From URL to Isochrones

Once the transit data has been generated and the website has been deployed, it is ready to be accessed by a modern web browser. An example URL — https://taui.conveyal.com/boston-2018-07/#centerCoordinates=-71.0635057836771%2C42.36132113316221&start=122%20Cambridge%20Street%2C%20Boston%2C%20Massachusetts%2002114%2C%20United%20States&startCoordinate=-71.0635057836771%2C42.36132113316221&end=99%20Townsend%20Street%2C%20Dorchester%2C%20Massachusetts%2002121%2C%20United%20States&endCoordinate=-71.09046936035158%2C42.3179394544685 — points to a subfolder of our Taui production site which contains a specialized deployment for Boston. It also contains data in the query string which shows a preset origin and destination.

Let’s use this example with two scenarios in Boston to run through each step needed to display isochrones on your screen:

  1. The URL is converted into an IP address via our DNS provider turning taui.conveyal.com/… into 36.86.63.182/…
  2. That IP address points to a CDN which retrieves its data from an AWS S3 bucket containing our static site data.
  3. The path of the URL — boston-2018-07 — indicates to the AWS S3 static site to load the index.html file that is located in our S3 bucket at s3://taui.conveyal.com/boston-2018-7/index.html.
  4. The HTML file is downloaded and parsed by the browser. The browser then downloads the JavaScript and CSS files indicated in the HTML file that are specific to this deployment in the same path.
  5. The CSS file is parsed and its styles are applied while the JavaScript file is parsed and begins loading the application.
  6. Next the main React application component is mounted (or added) to the page. During the mounting process, data that is compiled into the website — like map styles, default location, and site specific text — are reflected directly in the site.
  7. After the main application is mounted the analysis manifest, opportunity datasets, and transit network metadata load from an S3 bucket where they are stored (which also sits behind a CDN for faster global delivery). Each transit network being analyzed by the instance gets its own manifest and network metadata file.
  8. Next, because the query string of the URL contains a starting address — 122 Cambridge Street, Boston, Massachusetts — the travel times and transit routes for that address are fetched from S3 using the mechanisms explained above by first converting it to geographic coordinates, then a lookup index using the grid dimensions.
  9. The binary travel time data is de-delta coded and parsed into a travel time surface that can generate isochrones (using jsolines) in minute increments up to the maximum travel time specified in the manifest. By default the 60th minute isochrone is generated and displayed on the map — a blue one for late afternoon travel, and an orange one for late night travel. This travel time surface is also used to generate the accessibility count — 193,511 homes for afternoon and 168,951 at night.
  10. Next, the binary transit routes are parsed into a format that we can use to look up routes to each possible destination. Because the query string also contains a destination address — 99 Towsend St, Dorchester, Massachusetts — we can geocode and find the index similar to the origin and look up the fastest routes used to get to the destination.
  11. The transit route data for each network is correlated with its corresponding transit network information to lookup route names (in this case Orange Line), route colors (also orange), and the shape of the route and location of the stops used for displaying on the map.
  12. And voilà! A transit accessibility comparison map created without an active running server.
End result of following the url

There’s a lot of technical aspects that we could individually dive into further. Our hope is that the overarching technical capabilities and innovations of Taui have been shown and that those translate into improved outreach for our customers.

Are you are interested in having your own transit accessibility data published on the web? Check out our website for more details and get in touch.

--

--