Predicting Unemployment Rate in COVID situation

Souvik Majumder
Analytics Vidhya
Published in
5 min readAug 25, 2020

The COVID-19 pandemic has left millions of people unemployed across the globe.

This study is all about predicting unemployment rate in the coming year, through the use of Machine Learning. It has made use of the open source data-set published in the European Union Open Data Portal.

Exploratory Data Analysis

Raw data from EU Open Data Portal

The raw dataset contained more than 2000 records which consisted data from various countries. However, the data had columns that contain multiple merged information, so first I needed to extract separate features from the composite features. Hence, a bit of data formatting was needed.

I took the first column from the dataset and split it based on commas (,) and then merged the processed columns to the original dataset.

For this experiment, I focused on last 10 years of data for every country. So, I did a manual selection of the columns as shown below.

In the dataframe, it can be observed that we have column named geo_time (renamed to Country_code later), having various country codes.

So, I put a little bit of extra effort in gathering the list of all the country codes along with their respective country names.

This list was then fetched in the form of pandas dataframe.

Country Codes along with the Country Names

I later merged this dataframe with the original dataframe.

Visualizing the Data

For month-wise visualization, it was necessary to convert the dataframe to a pivot dataset (transpose) in order to generate the time-series data.

A little of further cleaning of the dataframe was done, by removing values which have spaces and non-numeric values.

So, I split the Date column into Year and Month and then I’ll group by Year to get aggregated Unemployment Count.

From the graph above, it is evident that United States did a significant amount of layoffs in the last year, followed by Turkey.

Spain had observed highest recession in the year 2013, after which it has quite developed the workforce across companies.

Since, my main aim in this experiment was to focus on the COVID situation, I decided to fetch insights only for the time period 2019–2020.

So, I applied filter on the Year column with values 2019 and 2020, after which I got the graph below.

Unemployment Rate for 2019–2020

Companies in the United States did massive layoffs during April this year, post which the trend started decreasing in the country.

By Age Group

Among the ones laid-off, the most affected age group is 25–74 years old.

Visualizing Last 10 years data

Unemployment rate for last 10 years

The sudden spike in the graph highlighted above, was during the COVID situation.

I needed the data above for building an efficient model on top of it.

Modelling the Data

I wanted to make predictions for a particular country. So, I selected one in a random manner (let’s say Spain).

Unemployment Trend in Spain in the last 10 years

The above data is clearly a non-stationary data. So, I carried out the Augmented Dickey Fuller Test to check stationarity through the p-value.

Current p-value as per the above figure is greater than 0.05.

So, I had to convert the data into a stationary data by taking logarithm of the time series and then differencing. For reference, you can visit my other article on Time Series Analysis using ARIMA Model.

The process was carried out repeatedly until the p-value reached below 0.05

I wanted to keep the modelling part as simple as possible. So, I decided to choose the ARIMA model for modelling my time-series data.

Now, for ARIMA Model, we need p and q values, which was determined by using the PACF (Partial Auto-Correlation) and ACF (Auto-Correlation) plots.

Training the ARIMA Model

I performed a Kernel Density Estimation (KDE) plot of the residual, which is almost a normal distribution and hence suggests that my predictions can be trusted.

Evaluating Mean Squared Error

Forecasting future Unemployment Count

We would now like to see or predict the Unemployment Count for Spain for the next two years, i.e from September 2020 till June 2022.

The above plot tells us that the Unemployment Rate in Spain for the coming two years would be almost stable with a minute rise.

Hope you liked the article. Please feel free to put down your comments and suggestions, if any.

--

--

Souvik Majumder
Analytics Vidhya

Full Stack Developer | Machine Learning | AI | NLP | AWS | SAP