Time-Series Analysis and Forecasting of Foreign Exchange Rate with ARIMA Model

Published in

Women in Technology

9 min readApr 21, 2024

The Foreign Exchange rates between two currencies determine the trade relations between the two countries. It includes imports and exports, tourism, foreign investments, and other internal factors like inflation and interest rates. This makes the analysis and forecasting of the Exchange Rate a valuable source of insight for economists and businesses across the globe.

In this article, we will discuss how to develop the ARIMA model to forecast the foreign exchange rate values of the European Union Euro (€), United Kingdom Pound Sterling (£), Indian Rupees (₹), Chinese Yuan (CN¥), and Japanese Yen (¥) as a ratio to United States Dollars ($) over 10 months using Python library — StatsModels.

Time-Series Analysis

Time-series data is a special type of data whose consecutive data points are dependent and vary with time. Therefore, unlike the data points in a regression dataset, we cannot shuffle, remove, or impute data using conventional central tendency measures. In short, time-series data is an overview of how the variables change over time, and time-series analysis is understanding any trend, seasonality, periodicity, or cyclicality in that data. As a result, we use specific models to forecast time-series data, as in here — ARIMA, SARIMA, and SARIMAX (in the coming articles — stay updated!).

Data

We can obtain the data required for this project from the Federal Reserve Economic Data website. Besides being an open data source, FRED has a Python API which is utilized to extract live data directly to the Python IDE, here Jupyter Notebook, and start working on it. See the below code snippet that shows how to interact with the FRED API.

!pip install fredapi
from fredapi import Fred

# Get a free API key from the FRED website
fred = Fred(api_key='****************')

# Search for the required data using keywords
fred.search("Exchange monthly india")

# Extract the required data set using the series ID obtained from the search
ind_monthly = fred.get_series(series_id = 'EXINUS')

Now we have the monthly exchange rate of Indian Rupees (₹) as a ratio to the US Dollars ($). Similarly, we extract the monthly exchange rates of the mentioned currencies as a ratio to USD. Here, we will only discuss the analysis, prediction, and forecasting of INR to USD which can be further extended the same way for the other currencies also.

Data Cleaning and Pre-Processing

The first step of any data analysis regardless of the type of data — regression data or time-series data — used is to clean and prepare it. The series data extracted using the API has data from the 1970s which we do not require. We will convert the series into a single-column data frame, slice the dataset to consider data from 2014, set the date column as the index and type cast as datetime, and set the period of the time series data as MS — Monthly Single data.

import pandas as pd

# Convert the series to a data frame
ind_df = pd.DataFrame({'Exchange_rate': ind_monthly})

# Slice the dataset from 01st January 2014 to 31st October 2023
ind_df = ind_df.loc['2014-01-01':'2023-11-01']

# Set the date index to `datetime` format
ind_df.index = pd.to_datetime(ind_df.index)

# Set the period of the time-series data
ind_df = ind_df.asfreq('MS')

Once we have obtained the required dataset and defined the data frame's index, we should check for any missing values. It is a very crucial step in time-series analysis as the following operations will throw an error in case of NULL values. Also, since we have defined a periodicity for the data, the models will throw errors if we attempt to remove any dates from the dataset. Therefore, we should consider one of the data imputation methods — forward fill, backward fill, interpolation, or other — to proceed. However, for our luck, this dataset does not have any missing values.

For non-time-series data, at this point, we can say the data is ready for splitting into train and test sets for modeling. However, for time-series data, we have to deal with the non-stationarity before attempting any model fitting and prediction. Let’s discuss what stationary data is and how to convert non-stationary data into stationary data.

Stationarizing the Data

A stationary time-series data is defined as the one with the following properties —

A constant mean throughout the period.
A constant standard deviation throughout the period.
No trend or seasonality is observed in the data.

In short, the data should not be a function of time. Let us take a look at the raw data of exchange rate and compare the properties to start with.

(a) The monthly exchange rate ratio from 2014–2023. (b) Visual test for stationarity by plotting the rolling mean and standard deviation. Image by Author

From the above figures, we can conclude the following —

The mean of the time-series data is not constant
There seems to be some trend in the data.

We can decompose the data into trend, seasonality, and residuals.

# import the seasonal decomposition function
from statsmodels.tsa.seasonal import seasonal_decompose

# Decompose into trend, seasonal, and residual components
ind_decomposition = seasonal_decompose(ind_df['Exchange_rate'], model='additive', period = 12)
ind_trend =ind_decomposition.trend
ind_seasonal = ind_decomposition.seasonal
ind_residual = ind_decomposition.resid

# Plot the deconposition
plt.figure(figsize = (20,8))
plt.subplot(411)
plt.title("Indian Rupees - Seasonal Decomposition of Exchange Rates with US Dollars")
plt.plot(ind_df, label = "Original")
plt.legend(loc = "upper left")
plt.subplot(412)
plt.plot(ind_trend, label = "Trend")
plt.legend(loc = "upper left")
plt.subplot(413)
plt.plot(ind_seasonal, label = "Seasonality")
plt.legend(loc = "upper left")
plt.subplot(414)
plt.plot(ind_residual, label = "Residual")
plt.legend(loc = "upper left")
plt.tight_layout()
plt.show()

Seasonal Decomposition of the time-series data. Image by Author

We have to statistically prove the non-stationarity. For this, we use the Augmented Dickey-Fuller (ADF) test. It is a hypothesis testing function with the below conditions —

Null Hypothesis: The dataset is non-stationary.

Alternate Hypothesis: The dataset is stationary.

The Null hypothesis is rejected if the p-value is less than 0.05

# ADF test for stationarity
from statsmodels.tsa.stattools import adfuller

# Perform the Augmented Dickey-Fuller (ADF) test
result = adfuller(ind_df['Exchange_rate'])

# Extract and print the test statistic and p-value
adf_statistic = result[0]
p_value = result[1]
print("ADF Statistic:", adf_statistic)
print("p-value:", p_value)
print('Critical Values',result[4])

From the ADF test, we obtain the p-value = 0.962; Therefore, the Null hypothesis is valid and the data is non-stationary.

To convert the data into stationary, we can either use the residuals from the seasonal decomposition or the difference between the consecutive data points. In this project, we will consider the differenced data. We will take a visual and statistical look at the differenced data to ensure it is stationary.

# Creating the differenced data from the existing column
ind_df['Differenced_Data'] = ind_df['Exchange_rate'].diff()
ind_df = ind_df.dropna() # dropping the first row

a) The monthly differenced exchange rate ratio from 2014–2023. (b) Visual test for stationarity by plotting the rolling mean and standard deviation with the differenced data. Image by Author

The ADF test of the differenced data gives - 
p-value: 5.701871112178754e-17

From the visuals and the ADF test, we can conclude that the differenced exchange rate data is stationary and can be used for time-series analysis. Further, we discuss the auto-correlation function (ACF) and Partial auto-correlation function (PACF) to determine the most suitable lags for the models.

Auto-Correlation and Partial Auto-Correlation Functions

Auto-correlation and partial auto-correlation functions determine the correlation of data points with their lagged versions. Auto-correlation considers the correlation from the data point itself lag 0 to lag k accounting for all the lags from 1 to k-1. The lag q that has a higher correlation (the first lag that exists outside the statistically significant — blue-shaded part of the plot — region) is considered the parameter for the Moving Average (MA) part of the ARMA model.

Partial auto-correlation considers only the correlation of the data point and its k-lagged version, not accounting for the lags in between. The lag p that has a higher correlation (the first lag that exists outside the statistically significant — blue-shaded part of the plot — region) is considered the parameter for the Auto-regressive (AR) part of the ARMA model.

# ACF plot
plt.figure(figsize=(10, 6))
plot_acf(ind_df['Differenced_Data'], label = "INR", lags=30)
plt.xlabel('Lag')
plt.ylabel('Autocorrelation')
plt.legend()
plt.show()

# PACF plot
plt.figure(figsize=(10, 6))
plot_pacf(ind_df['Differenced_Data'], label = "INR", lags=30)
plt.xlabel('Lag')
plt.ylabel('Partial Autocorrelation')
plt.legend()
plt.show()

(a) ACF plot — best lag for MA parameter, q = 9 (b) PACF plot — best lag for AR parameter, p = 9. Image by Author

With the stationary data and ARMA parameters obtained, we are all set to start fitting the data with our models! Let us split the data set into train and test sets to get going!

# Splitting the dataset into train and test sets
split_index = int(0.9 * len(ind_df))
ind_train = ind_df['Differenced_Data'].iloc[:split_index]
ind_test = ind_df['Differenced_Data'].iloc[split_index:]

ARIMA Modeling and Forecasting

ARIMA (AutoRegressive Integrated Moving Average) forecasting is a time series forecasting method that combines autoregressive (AR), differencing (I), and moving average (MA) components to model and predict future values of a time series.

Here’s a brief overview of each component:

AutoRegressive (AR) Component: This component models the relationship between an observation and a fixed number of lagged observations. The term “autoregressive” indicates that the model is regressing the variable on its own lagged values.
Integrated (I) Component: This component involves differencing the time series to make it stationary, which means removing any trend or seasonality present in the data. The differencing order (denoted by “d”) specifies how many times the data is differenced to achieve stationarity.
Moving Average (MA) Component: This component models the relationship between an observation and a residual error term based on a moving average of past errors. The term “moving average” refers to the averaging of past error terms in the model.

The ARIMA model is denoted as ARIMA (p, d, q), where:

p represents the order of the autoregressive component (AR).
d represents the degree of differencing needed to make the series stationary.
q represents the order of the moving average component (MA).

The ARIMA model captures time series patterns, including trend, seasonality, and other temporal dependencies. We use the differenced data for the analysis and pass ARIMA(p, 0, q), however, it is first-degree differenced.

Now, let us get started with the ARIMA model fitting.

# Fitting the ARMA model
ind_model = ARIMA(ind_train, order=(9,0,9))
ind_model_fit = ind_model.fit()

# Predicting the test values 
start = len(ind_train)
end = len(ind_train) + len(ind_test) - 1
ind_pred = ind_model_fit.predict(start, end)

The comparison of test data and predicted data. Image by Author

Performance Metrics for the ARMA Prediction. Image by Author

From Plot: We observe that the model captures the trend very well, however, it looks like there is a downward shift in the predicted results.

From Performance Metrics: The error values are small (RMSE < 1), indicating that the model captured the trend in the dataset and is reliable.

Now that we fit the model with the train data, we can forecast the differenced exchange rates for the next 10 months. We have used the data till November 2023 for training and testing the model, therefore, we will predict the exchange rate from December 2023 to September 2024. We will set the forecast index with the start date as 2023–12–01, frequency as MS, and horizon or periods as 10.

# Defining the forecast horizon and index
forecast_index = pd.date_range(start = '2023-12-01', periods =10, freq = 'MS')

# Forecasting the differenced exchange rate
ind = ARIMA(ind_df['Differenced_Data'], order = (9,0,9))
ind_model = ind.fit()
ind_forecast = ind_model.forecast(steps=10)
ind_forecast.index = forecast_index.astype(str)

# Converting the differenced data into actual exchange rate ratios
ind_forecast_rates = ind_df['Exchange_rate'].iloc[-1] + ind_forecast.cumsum()
ind_forecasts = pd.concat([ind_df['Exchange_rate'], ind_forecast_rates], axis = 0)

(a) The forecasted exchange rate as differenced data (b) The forecasted exchange rate as an actual ratio of USD/INR. Image by Author

The forecasting shows that the USD currency value tends to remain steady as compared to INR over the next 10 months from December 2023. However, there are continuous ups and downs in the value maintaining a constant mean for the forecasted 10 months.

Thus, we have successfully forecasted the foreign exchange rate using the ARIMA model. I will be publishing part two of this article where we will be dealing with the SARIMAX model. It is a bit more complicated than ARIMA with the seasonality and exogenous variables coming into action.

Meanwhile, feel free to visit my GitHub page to learn more about this project and connect with me on LinkedIn!