While researching our project ideas for the upcoming group assignment, my team encountered the problem how to determine changes in customer demand for a market or industry analysis. Even though there are databases of sales and market data publicly available for many industries, we had issues find detailed time series data of customer demand and interest for some sectors, such as car-hailing apps.
After we had not any luck finding a suitable dataset for our application, I had the idea to use Google Trends data as a proxy for customer demand. Google Trends is a free online tool, which provides information about the popularity of a search term over the last years. While Google does not reveal the absolute number of searches, it shows the popularity of search term in relative numbers in the form of the search popularity index (SPI), which ranges from 0 to 100. If you search for multiple search terms at the same time, all measurements will be relative to the maximum of the most popular search term. The SPI values can therefore only be compared to each other if they were part of the same Google Trends query.
When it comes to using Google Trends data as a proxy for product sales, there is the problem that it is unclear how strong Google searches and customer behaviour are related to each other. The idea is to test this relationship based on a publicly available dataset that is from a similar market segment. In this case, I will use the TfL’s Santander Cycle dataset, which includes every single bicycle hire during the year of 2016 (n = >10 million).
The following graph shows the SPI data for search terms related to London’s bicycle hire scheme from the years 2010 to 2017. I also included variations such as “Santander Bikes” besides the official name “Santander Cycles”. We can see overall the interest in these search terms is declining. One potential explanation for this trend is that Londoners slowly get used to this new mode of transportation and do no longer have to search for information about it actively. Besides this long-term trend, we can also see how seasonality affects the SPI. During the summer months, we can see a substantial increase in the search terms’ popularities.
For the next step, I only used the relevant SPI values for the year 2016. The hypothesis is that Google Trends data is strongly related to the actual usage of the Santander Cycles and therefore a reasonable proxy for changes in customer demand in this market. The following graph shows both the weekly SPI values for the relevant search terms and the total number of bicycle hires.
The graph indicates that these two variables are in fact correlated. To properly test this, I used a simple linear regression with the number of rides as the dependent variable. The following plots show the results of this analysis:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 97563.6 9288.3 10.50 4.94e-14 ***
santander 1862.4 155.3 11.99 4.77e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 24590 on 48 degrees of freedom
Multiple R-squared: 0.7498, Adjusted R-squared: 0.7446
F-statistic: 143.8 on 1 and 48 DF, p-value: 4.766e-16
In summary, the results of this quick analysis demonstrate that with an R2 of nearly 75%, the SPI data is strongly related to the actual usage metrics of the Santander Cycles. Therefore, it should be acceptable to use Google Trend data as a proxy for demand changes in London’s personal-transportation sector.