Forecasting post-COVID willingness to travel based on online searches
Time goes by, and with the easing of lockdown restrictions people are slowly putting back in motion their lives. As the post COVID-19 social guidelines are taking shape, we are starting to have clearer in mind the do and don’ts of our next new normal and planning our everyday-future according to that. But, are we ready to plan our next trip yet as well? In my previous post I have presented a potential model which was aimed at understanding how much touristic demand might change after COVID-19 and how long will it take for it to raise back to pre-outbreak levels. In more details, we broke down travel anxiety in a shock component, directly related to the presence of the disease and the impact it could have on travelling, and a fear one, representing the permanent marks that this pandemic will leave on our way of travelling and our safety perception for different touristic venues.
This article aims at investigating shock term in more details, by means of an online search trends analysis regarding the travel topic. We will first see the evolution of these trends in the Chinese area using Baidu Index data, comparing them with the evolution of the COVID-19 epidemic. We will then perform this same comparison using Google Trends and COVID-19 evolution figures for US and the European countries registering the most relevant number of infected to date. At the end of this article, we will see a high level prediction about when we would expect the interest on travel to rise again reaching pre-epidemic levels based on a prediction of the epidemic evolution in the next months.
A measurement for the “willingness to travel”
In order to better understand people readiness to travel we need to find some quantitative measure of how much consumers are starting to dream again their next destination. For a clearer view of what I am talking about it’useful to graphically recall the different phases of the travel decision making funnel:
The objective is thus to dive into consumer sentiment in the early stages of the funnel, the ones related to Discovery and Research, seeking for a measure that could represent the willingness to travel again, by mitigating as much as possible external influences. For this reason, among the different paramenters usually employed in monitoring touristic demand, not all of them would serve the current analysis scope. Let’s list few of them in the following:
- Scheduled transportations: Scheduled transport represents the flights/busses/trains with confirmed operations to date. Even though it is a relevant KPI to take into account to understand actual companies activity and how much they expect to be operating in the next future, this parameter would not help in understanding consumers’ attitude to travel, since it is highly impacted by current travel bans.
- Travel reservations: This parameter, represents the number of bookings and number of cancellations made by consumers regarding travelling periods in the mid term (for example from August 2020 on). This parameter is maybe the most natural to consider when analysing intention to travel. However, also this KPI could be strongly influenced by external factors which are not the focus of the present analysis, such as the economic instability many families have to face right now, and their prudence in spending now their saving.
- Online searches: Online searches provide a good representation of the willingness to travel, as they cover an early stage of the travel decision making funnel, the one more related to dreaming an inspiration. In order to give few (dated) statistics coming from The 2014 Traveler’s Road to Decision from Google, on average 65% of the leisure travellers begin researching online before deciding where or how they are willing to travel, with search engines representing the 61% of the sources of inspiration.
I decided then to take this last parameter into account in order to measure consumers’ attitude to dream about travelling and intention to do that. In more details in the following we will consider the research trends for the generic travel topic in the last 90 days for different countries, in order to understand trend interplay with COVID-19 evolution and governments restrictions.
Data sources and preprocessing
As stated in the previous session, we want to compare contagions trend with interest shown online for the travel topic, in order to understand interplays between these two series. In more details this analysis aims at proving via a data-driven approcach that exists a first direct influence of the virus presence in the consumers’ willingness to travel, which is likely to be mitigated as soon as the evolution of the COVID-19 will be controlled and limitated.
For this reason the dimensions we focus on the daily confirmed COVID-19 cases and the online searches for the travel topic by looking at the countries which have been most affected by the outbreak to date, meaning China, U.S., Italy, France, Spain, U.K. and Germany.
- Confirmed daily cases: Confirmed cases have been retrieved using the daily contagion repository made available by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). Even though daily contagions data are highly impacted by the effective number of swab tests made by each country every day and might in turn not represent the pure precence of the disease, it is also true that these acquisitions are the ones which are communicated more by media. For this reason we expect this dimension to be one of the most important parameter for consumers’ perception of the COVID-19 presence, and thus the one which could drive the most their decision making.
- Online Travel searches: In order to retrieve searches for keywords related to the travel topic we will look at the data made available by the most popular search engines in the analysed countries, Baidu and Google. More precisely, we will take advantage of the tools made available by these two major websites, meaning Baidu Index and Google Trends respectively, in order to download interest data.
The raw data for both the dimensions above present outliers and noise, due for example to concentrated samples collection in specific dates. In order to mitigate the effect of those two components, data samples which lied significantly out of the data standard deviation have been discarded by applying a 2σ filter and data have been smoothed by means of a rolling average with 5 days period to reduce daily noise and have a better focus on the series trend.
All the analysed component have been in the end rescaled between 0 and 1 for two main reasons:
- Comparability between countries: The populations of the countries we will focus on are significantly different. So, if we want to compare the outcomes of the analysis across different geographic areas it is necessary to reconduct the measurements to a scale which is the same for all the analysed countries.
- Comparability between searches and daily confirmed cases: Rescaling all the quantities analysed between 0 and 1 allows us to coherently compare the trends for indexed data (such as Baidu and Google ones) and absolut data (such as the confirmed cases ones) and quantitatively check relations between these two series.
Please note that even though data have been rescaled, for the purpose of the present analysis this doesn’t imply any loss of relevant information, since with the applied transformations we are preserving underlying trends, which are our main focus.
COVID-19 cases vs online searches
Once we have clarified data sources and preprocessing we can go to the data analysis in order to understand interrelations between COVID-19 and willingness to travel and how to model them, starting with a visualization approach and going then to a quantitative one.
I’ve analysed first the China case, it being the most mature one in terms of COVID-19 evolution, and thus the one which is able to give us a broader view on the topic. China has been the country where COVID-19 has moved its first steps, by registering, as stated in the World Health Organization report, the first cases in December 2019 and having the media focus on that from early January 2020.
Registered cases in the CSSE JHU database are available from late January, date at which the presence of the disease was well known in China area as well as in the western one. If we compare travel search trends with confirmed cases one, we see a significant drop during the month of January when the disease was becoming more severe and media impact started increasing. In more details, a comparison between travel search trends in the last 90 days and the same trends during previous year, we can notice that the year over year gap in travel interest dropped from an average -20% of the December month to a -36% during the month of January and a -27% during February, peak months for the epidemic evolution.
At the same time, we can notice a recovery of the travel interest from March on, when the COVID-19 disease started being controlled and limited. This same resumption of the interest in travelling from Chinese population is confirmed by a study conducted at the end of March by the Joint Tourism Big Data Lab of China Tourism Academy and Ctrip, China’s leading online travel agency, which revealed that the 40% of the interviees were yet thinking about travelling and giving attention to the different promotions and an additional 10% was actively booking their next trip.
We could then conclude that, intuitively, the willingness to travel of consumers decreases as the daily confirmed cases increse, but then it is also positively recovering as soon as the disease appears to be controlled. In order to explore further this first insight, let’s focus now on the western coutries which have been affected the most by the COVID-19 outbreak.
As anticipated, we will analyze US, Italy, France, Spain, Germany and U.K. by using search trends for the travel topic made available by Google Trends. Comparing search data from the same period in the previous year, we can notice a substantial drop in the travel interest from mid March on, ranging from a -20% in Germany to a -56% in Italy and a -60% in France. If we analyze the evolution over time of the search trends comparing them to the COVID-19 daily confirmed cases, we can notice a steady decrease of the interest in travel as the confirmed cases curve increases before reaching the peak. However, even though in the analysed countries the spread of the COVID-19 hasn’t been controlled to the full yet, we can notice a slight recovery of the travel interest as the daily confimed curve slows down.
The interplay between COVID-19 evolution and travel interest can be further explored by looking at online searches as a function of COVID-19 confirmed cases. As shown in the scatter plots below, we can notice that the two quantities are inversely related.
The data shown above disclose the possibility of building an high level model which could be helpful in estimating the travel interest starting from COVID-19 daily confirmed cases. The development of a model which relates those two quantities would allow us to undestand, once an estimate of daily confirmed cases for the future months has been developed, when the shock impact on the touristic demand will vanish and thus consumers interest in travelling will raise again to previous year levels. This is what we will try to achieve in the next section.
Forecasting travel interest from daily COVID-19 data
In the following we will develop a data-based model to predict the consumer willingness to travel as a function of the COVID-19 daily confirmed cases. To do so, we will first build a high level model to foresee the evolution of the COVID-19 disease in the next weeks country by country. Then we will develop a model based on historical data to infer expected interest in travel in the same time frame. Please note that the analysis below doesn’t have the presumption of giving an answer about when the COVID-19 outbreak will be contained or when consumers will start travelling again. Actually, the scope of the analysis below is more to give a flavour about when we could expect the interest on travelling to lift up again in consumers mind.
As a first step, we provide a high level estimate of the outbreak evolution in the next weeks. By looking at the data collected to date, we can assume the daily cases evolution to be similar to an unbalanced normal distribution (i.e. a skewed normal distribution) centered at the peak date. In order to estimate the most accurate shape for each country, we have fitted the daily confirmed cases curve against a skewed-normal model, obtaining the shapes shown below. This model expects the confirmed cases curve to vanish between mid May and mid June, with some delay between the different countries.
Once we have obtained a prediction of the daily confirmed cases trend in the next weeks, we can use these data to predict consumers’ willingness to travel. Let’s clarify the details of data used and developed model, in case someone would like to follow the same path to retrieve these results:
- Training sample: All the historical data available to date have been used in order to train an algorithm which is able to understand, provided the evolution of the outbreak, how the interest in travel topic will change. In more details, in order to get the most from the data collected to date, which are actually not that many, we will not make differentiations between the different countries, assuming that the consumers’ behaviour will be overall similar in different geographic areas. This is for sure a strong assumption, but following a test and learning approach, it resulted to be able to provide a more robust model versus the one trained differentiating country by country. One of the most important contributors for the model accuracy are the China samples, which provide a wider dynamics to learn from for the other countries prediction.
- Input variables: Two main variables have been taken as predictors for the willingness to travel: the daily COVID-19 confirmed cases and number of overall confirmed cases at a specific date. I decided to take both those variables into account, since they both provide important and different information about the disease evolution. Actually, while daily confirmed cases allow us to understand whether we are in a peak moment of the contagions or in a lower one, the cumulative measure of cases to a specific date allow us to understand if we are in a curve increasing phase or a decreasing one. In other words the latter parameter allows the algorithm to differentiate between the so called pre and post peak phases.
- Model used: A simple regression model has been used to predict from the data available the expected trend for the Google travel searches. I decided to avoid using a too much complicated method since, as anticipated, we do not have much data available and, furthermore, we have just two variables as input for the model. At the same time, I decided not to use any autoregressive model provided the scarcity the data available to date and the absence of any recursive pattern for the moment. As a regression model, I opted for a simple Ridge regression using 80% of data for testing and 20% for validation. Even though, as you can see in the graphs below, the error on the specific searches is pretty high, the provided model returns a good approximate of the trend we could expect for the willingness to travel.
You can find the result of the employed model below. More precisely, the gray line is the result of applying the developed model on the skewed-normal representation for the COVID-19 evolution provided above country by country. The blue line represents online travel search trends collected to date. As we can notice, even though the point-wise distance between the two series is not negligible, the provided model is able to give relevant insights about the interest in travel evolution. More precisely, from the developed analysis we can conclude that we would expect a recovery in willingness to travel between the months of May and July, with different speed in recovery between the different countries. However, it has also to be noticed that for all countries and from data collected to date, the travel interest is not expected to reach the same levels registered before the spread of COVID-19 in the short term, with obvious consequent impacts on the touristic demand.
In the presented analysis we have explored in depth what we introduced in the previous post as the shock term used to model the travel anxiety caused by the COVID-19 outbreak. More precisely, we have provided a potential path to understand how the presence of the disease could impact willingness to travel and when we could expect consumers to think about travelling again. Having said so, questions about how consumers will plan their next trip, when they will travel again, with which budget and where still remain unanswered. We will try to tackle this topic in the next post.
Hope you have enjoyed this article and see you at the next read.