Understanding On-demand rides with Tableau buffer calculations
interactive & geospatially driven dashboards for on-demand mobility operatorsšš
Earlier this year, Tableau released a new spatial analysis feature called Buffer. Buffer calculations allow us to visualize the distance around a point on a map. Not only that, they can also be used to join two different data sources which eliminates a lot of data preprocessing work and accelerates time-to-insight.
This feature, together with known features offered by Tableau like MakeLine and MakePoint, helped us capture more distance based use cases and delve into usage patterns of our riders. In this blog post we will take a look at a real example where we used buffer calculations and spatial joins to perform proximity analysis.This may help data analysts learn a new way to spatially aggregate data and avoid the complex workarounds.
The Challenge
Our on-demand ride sharing service aims to be integrated in the public transportation network of the city. Itās a complimentary service to existing public transit offerings. On-demand serves transit desert areas and connects them with public transit stations in an intermodal way. We want to use data to prove that our service is a flexible addition focused on the individual needs of the users and that it doesnāt replace the traditional train and bus lines. More specifically we will try to answer the following questions using data:
- Where do pick-ups and drop-offs take place?
- How many are within a certain radius from Public Transportation hubs?
- Do Public Transportation hubs fit as a first/last leg in the riders journey?
The Data
Before we move further, letās check out a sample of the datasets at our disposal. The first table provides data about the rides and the second provides hub related data (Public Transit Hubs). Below, we can see both tables including dummy metrics and dimensions we will use in our analysis.
The solution
Now that we have some data to play with, the next step is to use the buffer function as a join calculation. Buffering creates a boundary (or a radius) around a point using a spatial calculation. This calculation returns a spatial object that can be rendered on a map. Furthermore, it can be used in a join with a different dataset to return rows where a point intersects with the buffer area.
To define a buffer we need to provide 3 information:
- A spatial point (latitude & longitude)
- Distance (the radius around the point)
- Distance unit (āmetersā, ākilometersā, āmilesā ..)
In our dataset, we have latitude/longitude pairs for each hub. Luckily Tableauās MAKEPOINT
calculation makes it effortless to pass a pair of lat/lon and return a spatial point. We will then use the output to create our buffer as follows:
BUFFER(MAKEPOINT([HUB_LATITUDE], [HUB_LONGITUDE]), 250, āmetersā)
Next, we join the two datasets using intersects
which will match all rows where records spatially overlap.
As a result, the dataset would only keep the rides with latitudes and longitudes within 250 meters from the hubs and discards the rides that fall outside of the specified boundary. The latter can be dynamically increased or decreased using a parameter (we will come back to this later).
When rendered, the result looks like the animation to the left. This helps us see and understand how many rides start or end within proximity to a public transportation hub. Now we can capture further insights about the mobility patterns when it comes to the first and last mile and the use of on-demand service along with the public transportation network.
A complete example: Proximity Analysis
Thus far, we have all the basic components to analyze the data. We want to make the analysis interactive and build a geo-spatially driven dashboard that answers our questions. For this, we will use dynamic buffer calculations.
First, we will create a parameter of data type integer
, allowable values range
and current value equal to 250. We will call it Radius because we will pass it to our buffer calculation as a distance argument instead of the hard coded radius value.
Second, we will create another parameter of data type string
and allowable values list
. Our list has two values: Pickup and Drop off. This parameter will enable us to switch our spatial points from ridesā start locations to ridesā end locations dynamically. Then, we will need to create a calculated field that will return the respective lat/lon based on the parameterās current selection. Hereās the example for latitudes calculation:
CASE [Pickup/Drop Off] //our parameter name
WHEN 'Pickup' THEN [PICKUP_LATITUDE]
WHEN 'Drop off' THEN [DELIVERY_LATITUDE]
END
Since we want to know what percentage of accomplished rides happen within the buffer area, we will first calculate the distance between the rides and the hubs locations as follows:
DISTANCE(
MAKEPOINT([LAT],[LON]), //start or end location
MAKEPOINT([HUB_LATITUDE],[HUB_LONGITUDE]), //PT Hub location
āmetersā //unit
)
Next, if the distance is smaller or equal to our radius, we will count the total number of nearby rides:
COUNT(
IF [DISTANCE] <= [Radius] THEN
[RIDE_ID]
END
)
In the worksheet, we can use the calculations to get the percentage of rides near the different hubs and how they compare to the total rides. We can also drill down to analyse metrics, without any preprocessing of the raw data. This means that, for example, we are able to look at different dimensions such as the day of the week or the hour of the day on the fly.
Conclusion
Buffer calculations helped us reduce the time required to reach insights. It has also empowered more people at the organisation to dive into the data and answer geospatial related questions in a simple and intuitive way. We were able to take a closer look at how our on-demand service fits in the transit system, however, insights rely on a lot of assumptions. Therefore, we always seek to engage with Public Transit operators and people on the field to make sure that we get real value from the data. How do you do proximation analysis?