How many tourists will visit Sri Lanka in 2023?

How to predict and what to expect

Nuwan I. Senaratna
On Economics
6 min readJan 20, 2023

--

On what 36 months can tell us, things that “follow a little later”, two flavours of error, statistical earthquakes, Black July and Black Swans that flap their wings in only one direction, when all bets are off, and my predictions for 2023.

“This year [2023], we anticipate 1.55 million tourist arrivals…”, said Priyantha Fernando, the Sri Lanka Tourism Development Authority’s chairman, according to this EconomyNext article.

Is this reasonable? Unreasonable?

Quite a few of my Twitter followers seemed to agree, predicting between 1M and 2M tourist arrivals in 2023.

https://twitter.com/nuuuwan/status/1616109926543245317

How might we use statistical prediction to verify this (~1.55M arrivals) claim? And more importantly, how much can we trust our verification, and hence the original claim?

Read on to find out.

What 36 months can tell us

The past and the present are the only way we can predict the future.

So, I built a simple Linear Regression model to predict tourist arrivals in each month based on arrival statistics in the previous 36 months.

For example, the model could predict the arrivals for February 2023, based on the arrival statistics from January 2020 to January 2023.

Technically, we could use more sophisticated techniques and more complex data. But the above was sufficient for this article.

So how well did this simple model do?

At first glance, not too bad at all.

The blue line represents the actual arrivals, and the green line what our model predicted, from 1975 to the present day. The two correspond quite well.

On closer inspection, however, we learn a little more about the model. Let’s zoom in to the last decade or so.

We see that while the green line matches the blue line quite well, it is not an exact match. Also, we see that our prediction (green line) has a delay. Each time the actual arrivals (blue line) peaks or troughs, the green line follows it a little later.

Things that “follow a little later”

We can understand why this happens when we analyse how the model works. A linear regression tries to represent the “thing” you want to predict (in our case tourist arrivals in some month) using some “other things” (arrivals in the previous 36 months), by representing the former as a weighted some of the latter.

The weight our model computed are as follows:

The highest weight is for the previous month; hence the “follows it a little later” behaviour. The next highest weight is for the month 12 months ago; modelling some “seasonality”.

But how good is our model, exactly?

Two flavours of error

We usually measure the quality of a model using some statistic based on the “error” of the predictions; that is, how much the predictions diverge from the actual; or how much the green line diverges from the blue. Usually, we summarize these statistics in some single metric, like a “Mean Squared Error”.

However, let’s try to understand error more intuitively.

In this next chart, I plot the ratio of the Actual arrivals and Predicted arrivals. The Y-axis is logarithmic in powers of two.

If our model was perfect Actual arrivals and Predicted arrivals would be equal, and our ratio would be exactly one (or two to the power 0 on the logarithmic scale).

However, since our model (like all models) is not perfect we have some error; i.e. divergence from a ratio of one.

There are two flavours of errors visible on this chart. The first flavour of error is the small amount of fluctuation visible until around early 2019. This first flavour, we know to expect, because we have seen this type of small fluctuation in the past.

We often call this “noise”.

Statistical Earthquakes

The second flavour of error is what we see after early 2019. After the small fluctuations early in the chart, we see wild fluctuations; as if there is a statistical earthquake. This earthquake was caused by two events often referred to as instances of Black Swans.

A Black Swan is an event that, from the point of view of our model, comes as a complete surprise. It is something, without the benefit of hindsight, we don’t predict.

The first Black Swan was the 2019 April Easter Attack. The second was the CoViD-19 Pandemic starting in early 2020. Our model was “surprised” by these because there was nothing in the past data that would indicate these events happening when and as they did.

Black Swans cannot be predicted. By Definition.

Black July and other Black Swans

Zooming back out again, we see at least two other Black Swans: Black July in 1983, and the attack on the Bandaranaike International Airport in July 2001. There are also a few others that might qualify as Black Swans, like late 1988 when tourists arrivals were significantly influenced by the JVP insurrection.

Black Swans that flap their wings in only one direction

All the Black Swans we discussed above are “Negative”; bad things that reduced tourist arrivals. What about positive Black Swans? Situations where we are surprised with more tourist arrivals?

Often there is an asymmetry with Black Swans in particular, and statistics in general. It is easy for a single event to crash the tourist trade in a country or even the world. It is very rare for another single event to bring an instant recovery. Recoveries are often slow and might take many years.

Hence, at least in the tourist industry, there are few (no?) positive Black Swans. In this situation and others, Black Swans can be very unfair. They often only flap their wings in one direction.

When all bets are off

The most important thing we can learn from Black Swans and their innate asymmetry is that statistics works reasonably only when there are no Black Swans. That is, when the only flavour of error is the slow fluctuating kind; noise.

When there is a Black Swan, and there always can be, statistics stops working. All bets are off. Literally.

My prediction for 2023

Now, if 2023 has no surprises (given the recent past, not a very certain assumption), our model can be expected to do a fairly good job at predicting tourist arrivals.

Here’s what it says:

Let’s add some historical data for more context.

2023 is predicted to look a lot like 2022, only a bit better. We won’t reach the giddy heights of 2018 or even 2019, but, like I said, recovery takes time.

Will we end-up with 1.55M visits by the end of the year?

My model predicts something closer to 1M.

Concluding Thoughts

I used a very simple model in this article. A better model would result in better predictions. Note, however, a better model can only reduce the first flavour of error, i.e., noise. No model, however good, is Black Swan proof.

While we can’t predict Black Swans, we can “expect” them. That is, not how and when, but that they are possible. Statistics is not “Black-Swan-proof”, but we can be.

For example, we can’t predict if there is going to be another CoViD-19 pandemic or a terrorist attack or an insurrection. But we could prepare ourselves for such an event assuming it can happen.

Hope this article was useful!

Hears to better preparedness and the recovery of our tourist industry!

You can find the code used to generate the models and charts in this article at https://github.com/nuuuwan/tourism_lk.

--

--

Nuwan I. Senaratna
On Economics

I am a Computer Scientist and Musician by training. A writer with interests in Philosophy, Economics, Technology, Politics, Business, the Arts and Fiction.