Getting started with the SharedStreets referencing system: Matching a city’s GIS data
The SharedStreets referencing system creates a shared language for the street so there is a common way to refer to street segments, even when the underlying maps don’t match. This allows cities to easily create a link between GIS datasets so that they can port information across different maps. Map-matching can be done through an API or a command line interface (CLI). We recommend that cities use the CLI, since it can quickly handle large volumes of data.
This post steps through an example of how the referencing system CLI can be used to match maps. Specifically, we’ll focus on the City of Toronto and show how their city centerlines can be linked to two types of speed data — one dataset that was collected by the city’s Bluetooth sensors, and another dataset that was purchased from a third party. The SharedStreets referencing system enables speed data to be linked across these maps, so that cities can compare speeds with other information about infrastructure and safety.
If you don’t have the SharedStreets command line interface running, you can find instructions here. To find out more about the referencing system and how it works, see our website.
Step 1. Prepare the data inputs
First, we locate the data layers that we want to match to SharedStreets, and load them into GIS. In this case, we’re using Toronto’s Bluetooth speed data, third-party speed data, and “one-way streets” (which are basically street centerlines, but with topological information like directionality. In this example we refer to them as centerlines to avoid confusion).
The image below shows a zoomed-in view of these three datasets. Close examination shows that the three base maps don’t exactly match one another; there are small gaps between them. That’s common across GIS datasets. It’s especially common when working with external data, but this even happens with a city’s own map layers.
To prepare these layers for matching, we convert each into a GeoJSON file, using the WGS 84 coordinate referencing system (EPSG: 4326). Any standard GIS application (like QGIS or ArcGIS) can load and export data in the GeoJSON format.
Step 2. Match the data layers using SharedStreets
In the CLI, we use the basic matching command to match the Bluetooth data, adding in the option to snap the street segments to intersections:
$ shst match bluetooth_speeds.geojson --out=output_bluetooth_speeds.geojson --snap-intersections
And the third party speed data, again snapped to intersections:
$ shst match thirdparty_speeds.geojson --out=output_thirdparty_speeds.geojson --snap-intersections
In this case, we added an extra flag to each matching command, since the speed data links were digitized according to the direction of travel:
--follow-line-direction
To do this for your own dataset, substitute your filenames (and add the pathway to the input data directory, if necessary) and run the commands above, with the directionality flag if it’s applicable. You’ll see a progress bar and an indication of how many features were matched, like so:
Matching with directionality
The city centerline data has data fields that indicate directionality (ie whether a road is two-way or one-way). For one-way roads, the data also indicates whether traffic flows in the same direction as the street was digitized, or against the direction of digitization. (When streets appear on a map, it isn’t obvious which order the vertices were drawn in, but this matters when we consider routing. Direction-of-digitization offers a consistent way to refer to direction in any use case, unlike cardinal directions, which aren’t clear for certain road geometries.)
The directionality info in Toronto’s centerline data looks like this:
Since we have this data, we’ll use it to aid the matching. When the SharedStreets conflates maps, it analyzes intersections and the street segments between them. This gets more complicated when considering multi-level roads, or other cases when streets cross the same location but do not actually intersect. To handle this, the matching uses OSRM under the hood; it can factor in routing rules for cars, bikes, and pedestrians in order to determine which map features are “eligible” for matching, and whether certain streets should connect.
To help with these routing considerations, we add additional flags to the command. These point the matcher to where the directionality property lives within the centerlines dataset, and indicate what the value will be for each type of directionality:
$ shst match toronto_centrelines.geojson -- out=output_toronto_centrelines.geojson --direction-field=ONE_WAY_DI --one-way-against-direction-value=-1 --one-way-with-direction-value=1 — two-way-value=0 --snap-intersections
We recommend using these options when matching any street datasets that have directionality properties.
Sometimes datasets are digitized in the direction of travel, but don’t necessarily have a field to tell you that. In those cases, we use the --follow-line-direction
flag, as we did in the previous step with Bluetooth and third-party speed data. There is also a --best-direction
flag, which will attempt to match a street link in both directions, then keeps the one that results in the higher match score (a measure of matching accuracy).
Matching with road classifications
For reasons discussed above, highways are particularly difficult to match. To facilitate better matching, we’ve added another option that allows users to match highways separately from all other road types. To match highways, add the --match-motorway-only
flag to your match command. To match all other streets, add the --match-surface-streets-only
flag.
We recommend matching highways and other road types separately for best results.
Output
The matching generally produces two files: an output file containing features that matched successfully, and an output file containing features that did not match (if applicable). If the input data were not appropriate, then there will be a third output file containing features that were invalid and therefore excluded from matching. In this case, we are only interested in the data that matched successfully.
Step 3. Link the matched output datasets
The matched output can be linked together using GIS or database software. In GIS, we load the matched layers. These are in GeoJSON format, which can be read by any standard GIS application. In QGIS, you can drag-and-drop the data layer into your project. In ArcGIS, you may need to convert it to a feature class first (you can do this using Geoprocessing
→ Conversion Tools
→ JSON
→ JSON to Feature Class
).
The attribute table for each dataset will include the original properties (preceded by “pp_
”) as well as the SharedStreets properties, such as the shstReferenceID
(which refers to the street segment in each direction, so that a two-way street has two reference IDs) and the shstGeometryID
(which refers to the street segment as a whole, so that a two-way street has one geometry ID).
In this case, we use the shstReferenceID
to perform a join, adding the Bluetooth speed data and the third-party speed data back to the centerline data. We’ll now be able to see all of this information together:
Once data sources are combined together, we could compare the differing speed data sources or perform additional analysis to consider speed against other information, like crash locations, facility types, or other safety-related data.
Summary
Through a few commands, the SharedStreets matching tools enable users to easily port information — like speed data — between datasets, even when the base maps differ. This makes it easier for city staff to perform analysis and get the information they need to plan and manage their streets.
Want to start using the referencing system? You can get started today using our instructions on Github.
Have questions or need assistance? Reach out and the SharedStreets team will be happy to try to help you.