Creating a Route Map with Offsets

Using Carto, PostGIS and PostgreSQL

11 min readApr 5, 2017

Several weeks ago, I posted a map of the BART system with trips animating over the AM peak commuting period. As part of this map, the BART routes were shown. This post will walk you through an exercise on how to create the route layer which offsets overlapping segments. There is a lot more to it than meets the eye.

It All Begins With Good Data

As mentioned in the previous post, GTFS data is foundational. If you don’t know much about GTFS, get started here. We will dive right into terms, files, and fields used in the GTFS spec.

Making a Route Map Using Trips

Typically, people think of transit systems having routes. We’ve all seen schematic route maps. However, routes don’t give enough information — there is no time element. Routes change throughout the day.

That’s where trips come in. If you are going to travel on a bus route to work, you actually are taking a specific bus traveling on that route at a particular time. This is called a trip.

So, when making a transit system map of routes, you need to choose a particular time of day to base it on. Typically, weekday peak commuting time are good (7am-9am or 4pm-7pm).

Using Stops To Draw Routes

Not always, but often GTFS datasets come with shapes.txt files. These contain nice route geometries that follow the contours of the roads. We will not be using this, however. The primary issue with making a system-wide transit map is getting lost in overlapping portions of routes. Below, we will build an algorithm for detecting these overlapping portions and offsetting them. This typically doesn’t work well with the shapes.txt file, as we cannot detect overlapping routes well. With stops.txt, however, multiple routes (and thus trips) will often use the same exact stop. Therefore, we can determine overlapping segments that share the same nodes.

Our Toolkit: Carto, PostGIS, and PostgreSQL

Carto is a mapping platform that utilizes a PostgreSQL database to store and retrieve data. PostGIS is integrated into the database and can be used to make spatial queries and to manipulate the data spatially. To do this exercise, you can create a free account with Carto and import the following GTFS files as datasets/tables: trips.txt, stops.txt, and stop_times.txt. It will create a PostgreSQL table for each one. In my examples, I use GTFS files for the BART system (click link about half way down the page) and have added the “bart_” prefix to these for the table names. However, feel free to use your own city’s GTFS dataset. If you do, please share your map in the comments below.

Note: I did these steps using Carto’s map “Editor” (which is being deprecated); however, you can also do it in Carto’s “Builder” experience. When editing a map, click on the layer. While editing the layer, click on the Data tab. On the footer of the panel, switch to the SQL view. Here you can execute the queries I describe below. I should also note that Carto Builder relies on having a cartodb_id index field for certain functionality like infowindows. We can generate that on-the-fly using the window function row_number, described in more detail in this post.

If you are new to SQL or a bit rusty, head on over to codecademy.com to learn it. They have an excellent free and quick course, and we will be using it a lot.

Preparing The Data

In Carto, if the stops table doesn’t automatically georeference the table, you can do it manually by creating the_geom (the field Carto stores the geometry in, in this case latitude and longitude) from stop_lat and stop_lon. The stop_times table does not contain geometry for the stops, but we’ll need it to have the geometry. Let’s run our first SQL statement to join it from the stops table.

Joining Geometry to stop_times Table

Note: If you have a large dataset (e.g., stop_times has >100,000 rows), it might timeout. You can run “SELECT * FROM stop_times WHERE arrival_time > ‘07:00:00’ AND arrival_time < ‘09:00:00'” to reduce it to just a specific time window (e.g., 7am -9am). Click “Create dataset from query” to create a much smaller version of this table.

You will also want to verify that your data types (e.g., string, numeric) got interpreted correctly. For simplicity, I recommend keeping arrival_time and departure_time as strings. Timezone offsets and dates get confusing and you can cast a string to a date type on-the-fly if needed.

BART

So why choose the BART for this exercise? It has routes with many overlapping segments. Drawing the raw routes on the map looks very lame and is a painfully bad representation of the transit system.

The Classic Lame Transit Map

So, let’s get started on how to create a proper map that offsets overlapping segments of the routes.

Step 1: Create a List of Trips (That Well Represent The Routes)

As mentioned above, when making a system-level map of routes, it is important that each route is drawn from a trip that is representative of a common service period (e.g., weekday) and time of day. The service period (called service_id) is found in the trips table. The time of day information is in the fields arrival_time and departure_time found in the stop_times table.

Our first query will be to select a list of the trips (one per route) that we will use to create our route lines with. In the calendar.txt file it shows that a service_id=’WKDY’ represents weekday trips. On the BART, every trip traces the same path for each direction of travel, thus we filter out one of the directions.

NOTE: This is often not the case for bus routes. For routes that loop and overlap themselves, remove the direction_id=0 from the WHERE clause and include direction_id in the GROUP BY clause after route_id.

Ideally, you would carefully choose a trip for each route, one that falls within a peak period. Unfortunately, run times are only in the stop_times table and require some work to fish them out. Since almost all of the trips for the BART follow the same path, we will simply take a random weekday trip from each route. I am use a minimum function on trip_id (a string field, yes a bit crazy) for simplicity, but it is likely that you will want to hand-pick the trip_id’s.

The resulting table returned is as follows.

NOTE: You can use WITH to chain queries like the following. We will be doing lots of chaining. Take note of the comma syntax in my examples. Carto doesn’t like a comma before the final query.

Step 2: Select Each Stop Along Each Trip

With our trip_id’s that represent each route (Step 1), we now select the stops for each trip_id that will represent our route lines after we stitch them together.

NOTE: You might wonder why we continue using trip_id instead of route_id. In this case where we are using only one direction_id you probably could. However, trip_id is more specific, so we will use it until the very end, at which point we’ll join the route_id back.

Our resulting table (top six rows) looks like this.

Now, we can visualize our data on a map! Keep in mind, there are duplicate points at each stop_id where the path of trips overlap. Here I’ve labeled the count of points at each location (using count(*) … GROUP BY the_geom).

NOTE: In order for Carto to render the data on the map, you need to project the_geom into the_geom_webmercator using the following technique.

Including the query from Step 1 and transforming the Step 2 query, we have the following SQL.

Step 3: Turn Points Into Lines

Now that we have the stop_id’s along each trip_id (point geometries), we will stitch them together into line geometries. We will be using the ST_MakeLine function in PostGIS to do this.

Stitching points together can be tricky. We will need to define an order by which to stitch them as well as how to group them into separate geometries (so that we don’t end up with one giant line). Since one of our missions is to offset segments of routes that overlap, we create lines between each stop_id of each trip_id so that we can detect overlapping lines.

Interestingly, we can’t just tell ST_MakeLine to create lines between each stop_id. So, we will duplicate our previous query (called b from Step 2) and offset the stop_sequence so that we can do this.

Note: UNION ALL allows us to stack one table on top of another (as long as they have the same fields). Also, we can define a new field on-the-fly using “calculation AS new_field”.

We will now make lines, ordered by trip_id then by stop_sequence_true (ascending). We will break the lines at each trip_id and then break them further at each stop_sequence. Studying the table above and table below helps to understand what is happening in the next query.

In the table above, you can see that there are two stop_id’s for each line, except at the start and end of each trip_id. These geometries aren’t actually lines. We can filter these out by adding

WHERE array_length(stop_id_array,1)>1

We now have the following chain of queries and the associated map. Again, I’ve labeled the count of lines between each stop location (using count(*) … GROUP BY the_geom).

Step 4: Determining Offsets

Now that we have our lines between each stop_id for each trip_id, we simply offset the lines, right? If only it were that easy. Not all the lines are oriented in the same direction. One trip might have a line from stop A to stop B, while another trip (or even the same trip) might have a line from stop B to A. To figure this out, we calculate the angle of each line using ST_Azimuth. This function calculates the angle from a start point to an end point. We can create a point at the start of each line by using ST_Line_Interpolate_Point(the_geom,0) and a point at the end of each line by using ST_Line_Interpolate_Point(the_geom,1).

Next, we create criteria that defines whether a line is “reversed”. If a direction falls within some 180 degrees it is considered not reversed and if it falls in the other 180 degrees it is reversed. We define this as a Boolean field.

NOTE: This is one of those things you can play with if your offsets aren’t looking good later on.

Now, we need some way of giving each line with the same geometry an ID. This is necessary for determining how far each one should be offset. We do this using the row_number() window function of PostgreSQL. This will create ID’s for each row. To make the sequence (1, 2, 3, 4…) restart for each geometry, we PARTITION BY the_geom and ORDER BY trip_id and reverse.

Great, now we can use offset_id to offset each route by this value, right? Nope. If there are an odd number of lines to offset, it is preferred that one be in the center and the others offset to each side. However, if there are an even number of lines to offset, it is preferred to offset evenly from the center. To do this, we need knowledge of how many lines overlap for each geometry.

Now that we have this count per unique the_geom, we will join it back to query result h. In addition, we will create a new field called offset_dist. This accounts for odd and even overlap_count values.

We are getting close, I can feel it! However, offsetting lines is direction specific. For example, if I have a line pointing north, when I offset it by a positive number it will move east. If I have a line pointing south, when I offset it by a positive number it will move west. So, for lines that are identified as “reversed”, we flip the sign (+/-).

Awesome! Now let’s join back our route_id’s so that we know them by their common names. We use the trips.txt file for this.

Putting all these queries together so far we have the following.

Step 5: Making the Map (using CartoCSS for offsets)

Finally, where it all comes together! Let’s make a transit map with offset routes. There are two ways to do this, offset the lines when it renders on the map or offset the actual geometry. The first is nice because it looks good at any zoom level. The second has benefits too — this will be covered in Step 6.

Now, we’ll get into something called CartoCSS. It was first used by MapBox and later by Carto (formerly called CartoDB, FYI). It looks and works similar to CSS. So far, we’ve kept our line styling properties simple.

The key CartoCSS property that makes the magic happen is line-offset.

line-offset offsets a line parallel to its actual path, defined by a number in pixels. Positive values move the line left, negative values move it right (relative to the directionality of the line).

Remember all those carefully calculated offset_factor values that we made? Now we want to define an offset for each unique offset_factor value. We can define conditional styling by using [condition]{style}. For lines with the condition being true, they will get the style applied to it. For example, we offset lines who have offset_factor=0.5 by 2 pixels by adding [offset_factor=0.5]{ line-offset:2 }. Putting this together we get the following.

And here is the map! If you zoom in, you will notice that the lines stay equally offset based on pixels, not geographic location.

Now, lets add some color. If you open routes.txt, you will see a route_color for each route_id. We could join this field to our resulting table, but unfortunately CartoCSS doesn’t allow you to drive the line-color directly from a field value. So, we’ll hard code the colors using the following technique.

Now our map has OFFSETS and COLORS!

You will probably notice that sometimes the ordering of the offset lines switch along the route. If you play with the directional definition for reverse in query f (first sub-step of Step 4), you can improve this. However, I have yet to find a silver bullet technique for perfecting this. If you do, please share it in the comments below. Let’s change the reverse criteria to be “(direction > 35 AND direction < 225)”.

You may also notice that at bends in the routes, you see gaps or overlaps between lines. There is no easy way around this, as you can’t stitch the lines together unless you are offsetting the actual geometry (which we’ll get to shortly). A good way around this is to simply add another layer for stops based on the stops.txt file which typically covers these flaws. Perhaps this will be in another post.

With CartoCSS, we can apply a technique for giving a line a “case” and “fill”. using a “::text” after the “#layer” tells the renderer to create another instance of the symbol that will be stacked with other instances. With this technique, each line will seem to have a casing around it to help make the line color pop (especially yellow). The following puts it all together.

Since this is a pretty gnarly chain of queries that are currently being executed every time we (or someone else) views this map, it is recommended that you click “Create dataset from query” to save the resulting table before telling the world about your new map.

There you have it folks, a stellar transit map with offset routes that you can easily see the path of each route at any zoom level! Ready to give it a try for your own city?

Step 6: Offsetting the Actual Geometries (Optional)

For those of you wanting to offset the actual geometry off the lines, I’ll show you how. To do this we use the ST_OffsetCurve function in PostGIS

Notice that when you zoom in the gap between the lines widens and when you zoom out the gap between the lines narrows. You can now stitch together the lines along each route and apply some smoothing the curves. I didn’t have great success in this, but please explore and share your findings in the comments below.

Again, it is recommended that you click “Create dataset from query” to save the resulting table before telling the world about your new map.

Now, see how I made an animated transit map using GTFS data.