Time-Series Analysis and Forecasting of Foreign Exchange Rate with SARIMAX Model

Rindhuja Treesa Johnson
Women in Technology
9 min readMay 15, 2024

Forecast the future of your foreign investments with factors that might affect the currency exchange rate using the SARIMAX model.

Collage by Author

As promised, I am here with the second part of the Foreign Exchange Rate Time-Series Analysis and Forecasting. In this article, we will go through the SARIMAX modeling which takes factors like inflation, interest rates, balance sheets, and other financial aspects that could contribute to the fluctuation in the foreign exchange market.

SARIMAX

SARIMAX modeling requires exogenous variables that contribute to the change in the endogenous variable. This requires extensive subject matter knowledge, therefore, we restricted ourselves to the forecasting of USD/INR only.

Exogenous Variables

We forecast foreign exchange rates using SARIMAX with inflation, interest rates, and trade transactions as exogenous variables. However, these are generic and therefore, we used indicators for each of these.

Inflation: For inflation, we used the Consumer Price Index (CPI) as the indicator. We obtained the most significant CPI for the US and India for modeling. In the US, the total number of Urban consumers is considered to have the greatest weightage. In India, all items category of CPI has the greatest weightage.

Interest Rates: For interest rates, we used the Federal Funds Rate for the US and 10-year Long-Term Government Bond Yields for India as indicators.

Current Account Balance: The third indicator we used was the Balance of Payments in the current account as a percent of GDP for both countries

We gathered all the datasets from the FRED using their Python API. Here are the specific FRED series codes for each category:

Inflation

  1. Consumer Price Index for All Urban Consumers (US): CPIAUCNS
  2. Consumer Price Index for All Items (IND): INDCPIALLMINMEI

Interest Rates

  1. Effective Federal Funds Rate (US): FEDFUNDS
  2. 10-Year Long-Term Government Bond Yields (IND): INDIRLTLT01STM

Trade Transactions

  1. Balance of Payment — Current Account (US) as a % of GDP: USAB6BLTT02STSAQ
  2. Balance of Payment — Current Account (IND) as a % of GDP: INDB6BLTT02STSAQ

Now that we have the exogenous variables and their FRED API code, we can extract the data as we did in the previous article. I am including the code again for easy access. We extract each dataset, convert it into a data frame, and then slice it from the date 2014-01-10 to 2023-11-01.

# Extracting the Consumer Price Index for All Urban Consumers (US)
cpi_us_monthly = fred.get_series('CPIAUCSL')
cpi_us_monthly.to_csv('data/cpi_us.csv')

# Converting into a data frame
cpi_us_df = pd.DataFrame({'CPI_US':cpi_us_monthly})

# Slicing the dataset
cpi_us_df = cpi_us_df.loc['2014-01-01':'2023-11-01']

The same steps are followed for all the above exogenous variables. For the target variable — INR/USD ratio, we extract the monthly data as follows —

# Extracting monthly exchange rate
us_ind_monthly = fred.get_series("EXINUS")
us_ind_monthly.to_csv('data/us_ind_monthly.csv')

# Converting into a data frame
us_ind_df = pd.DataFrame({'exchange_rate':us_ind_monthly})

# Slicing the dataset
us_ind_df = us_ind_df['2014-01-01':'2023-11-01']

We have all the variables required to perform the SARIMAX modeling. However, this is where SARIMAX prediction and forecasting get complicated. We have to predict the exchange rate, however, the exogenous variables involved in the modeling should be predicted in advance! We have the data till 2023-11-01 for all the variables and we plan to forecast the exchange rate for the coming 10 months, which means we need to find a way to gather the forecasted data of all the endogenous variables!

In this work, we used the ARIMA model to forecast the endogenous variables, which are then used to forecast the exchange rate using the SARIMAX model. However, feel free to gather data from other trusted sources that give projections for the near future of the considered endogenous variables to reduce labor and maybe even for better accuracy!

As we did in the first part of the project, we will use the ARIMA model to predict and forecast the endogenous variables. I used the user-defined functions in coding this part to enhance the efficiency and organization of the codes.

We begin with finding the first difference in data points of each variables

# function that calculates the first differences
def first_difference(data):
data_diff = data.diff().dropna()
return data_diff

# The function is called for each variable
cpi_us_diff = first_difference(cpi_us_df)
cpi_ind_diff = first_difference(cpi_ind_df)
fund_rate_diff = first_difference(fund_rate_df)
int_rate_ind_diff = first_difference(int_rate_ind_df)
bop_us_diff = first_difference(bop_us_mon_df)
bop_ind_diff = first_difference(bop_ind_mon_df)

The next crucial step in ARIMA modeling is determining the p and q lags associated with the Auto-regressive and Moving Average parts of the model. We use the Auto-Correlation function and Partial Auto-Correlation function to determine q and p respectively.


from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf

def acf_pacf_plots(data,title):
fig, ax = plt.subplots()
plot_acf(data, ax = ax,lags = 20, label = "Auto Correlation")
plot_pacf(data, ax = ax,lags = 20, label = "Partial Auto Correlation")
plt.title(title)
plt.legend()
plt.show()


acf_pacf_plots(cpi_us_diff, "Consumer Price Index (US)")
acf_pacf_plots(cpi_ind_diff, "Consumer Price Index (IND)")
acf_pacf_plots(fund_rate_diff, "US Federal Fund Rate")
acf_pacf_plots(int_rate_ind_diff, "Long Term Interest Rate (IND)")
acf_pacf_plots(bop_us_diff, "Balance of Payments (US)")
acf_pacf_plots(bop_ind_diff, "Balance of Payments (IND)")

The below figures give the ACF and PACF graphs for all the endogenous variables and the best p and q parameters chosen from the graphs.

Auto-Correlation and Partial Auto-Correlation function graphs. Images by the Author

Now, we move to the training of the ARIMA model for each of the variables to predict the test dataset, and then forecast the variables for the next 10 months.

from statsmodels.tsa.arima.model import ARIMA

# Splitting the train-test data
def train_test(data):
split_index = int(0.8*len(data))
data_train = data.iloc[:split_index]
data_test = data.iloc[split_index:]
return data_train, data_test

# Fit the model for the variable and predict
def arima(data_train, data_test, params):
start = len(data_train)
end = len(data_train) + len(data_test) - 1
arima_model = ARIMA(data_train, order = params)
arima_fit = arima_model.fit()
arima_predict = arima_fit.predict(start, end)
arima_predict.index = data_test.index
arima_pred_df = pd.DataFrame(arima_predict)
return arima_pred_df

# Plotting the graphs comparing the predicted data and test data
def pred_graphs(data_train, data_test, data_predict,title):
plt.title(title)
plt.plot(data_train, label='Train')
plt.plot(data_test, label='Test')
plt.plot(data_predict, label='Predictions')
plt.xlabel('Date')
plt.legend()
plt.show()

Prediction versus test data for the endogenous variables

Once we train and test the model, we move to the forecasting phase.

# Function to forecast the variables
def forecast(data, title, params):
forecast_index = pd.date_range(start = '2023-12-01', periods =10, freq = 'MS')
arima_model = ARIMA(data, order = params)
model_fit = arima_model.fit()
model_forecast = model_fit.forecast(steps=10)
#model_forecast = pd.DataFrame(model_forecast)
model_forecast.index = forecast_index#.astype(str)

# Plotting the forecasted variables
plt.plot(data[:], label='Actual')
plt.plot(model_forecast, label='ARMA Forecast')
plt.title(title)
plt.xlabel('Date')
plt.legend()
plt.tight_layout()
plt.show()
return model_forecast

We are all set for the SARIMAX modeling and forecasting of the exchange rate! To start, we first merge all the variables into a single data frame and find the first difference in the exchange rate. Then, we plot the Auto-Correlation and Partial Auto-Correlation functions to determine the lags.

X = pd.concat([cpi_us_diff,cpi_ind_diff,fund_rate_diff,int_rate_ind_diff,bop_us_diff,bop_ind_diff], axis = 1)
X.set_index(cpi_us_diff.index)
y = us_ind_df.diff().dropna()

acf_pacf_plots(y,"US Dollar to Indian Rupees Monthly")

split_index = int(0.9*len(X))
X_train = X[:split_index]
y_train = y[:split_index]
X_test = X[split_index:]
y_test = y[split_index:]
Auto-Correlation and Partial Auto-Correlation Function. Image by the Author

As we obtain the lag parameters, we can use it to train and test the SARIMAX model and compare the predictions with the test data.

from statsmodels.tsa.statespace.sarimax import SARIMAX

start = len(X_train)
end = len(X_train) + len(X_test) -1

sarimax = SARIMAX(y_train, exog = X_train, order=(9,0,9))
sarimax_fit = sarimax.fit(disp=0)
sarimax_predict = sarimax_fit.predict(start, end, exog = X_test)
sarimax_predict.index = X_test.index

plt.plot(y_train, label='Train')
plt.plot(y_test, label='Test')
plt.plot(sarimax_predict, label='SARIMAX Predictions')
plt.title('USD to INR Diff SARIMAX Prediction')
plt.xlabel('Date')
plt.legend()
plt.show()

perf_metrics(y_test,sarimax_predict,"USD to INR")
(a) Prediction and test data comparison. (b) SARIMAX Forecasting of the Exchange rates. Images by the Author.

And here we are, the final step of the project, forecasting the INR/USD exchange rate for 10 months using the SARIMAX model.

X_forecast = pd.concat([cpi_us_forecast,cpi_ind_forecast,fund_rate_forecast,int_rate_forecast,bop_us_forecast,bop_ind_forecast], axis =1)
X_forecast.index = cpi_us_forecast.index

sarimax_model = SARIMAX(y,order = (9,0,9))
sarimax_model_fit = sarimax_model.fit(disp = 0)
sarimax_forecast = sarimax_model_fit.forecast(steps =10, exog = X_forecast)
sarimax_forecast.index = X_forecast.index

plt.plot(y, label='Historical')
plt.plot(sarimax_forecast, label='SARIMAX Forecast')
plt.title('USD to INR Exchange Rate Diff SARIMAX Forecast')
plt.xlabel('Date')
plt.legend()
plt.show()

What we have in the above figure is the differenced values of the forecasts. Therefore, we have to convert them back to actual values by consecutive addition — the opposite of the first difference.

xchange_rate = us_ind_df['exchange_rate'].iloc[-1] + sarimax_forecast.cumsum()
sarimax_ind_forecasts = pd.concat([us_ind_df['exchange_rate'], xchange_rate], axis = 0)
sarimax_ind_forecasts.to_csv('data/sarimax_ind_forecasts.csv')

plt.plot(us_ind_df, label='Historical')
plt.plot(xchange_rate, label='SARIMAX Forecast')
plt.title('USD to INR Exchange Rate SARIMAX Forecast')
plt.xlabel('Date')
plt.legend()
plt.show()
The forecasted INR/USD Exchange rates. Image by the Author.

We can see that the INR/USD exchange rate is on a constant increase throughout the forecasted period except for a slight dip in July 2024. We have obtained the performance metrics of the three models compared and the SARIMAX model outperforms the others.

Performance metrics. Image by the Author

The following observations are made from the three models. The ARIMA and SARIMAX models tend to give a similar trend whereas the SARIMA model varies from this trend.

Forecast comparison among the models

The last part of this project was to develop an interactive Power BI dashboard for each of the currencies. Below is the dashboard for USD to INR conversion rates as forecasted by the three models. We can use different filters to enhance the visual analysis. In the figure, we used the data from January 2023 to October 2023 and the forecasting till March 2024.

MS Power BI dashboard illustrating the forecasts done by different models. Image by the Author.

The SARIMAX model is dependent on the forecasted data using the ARIMA model of the endogenous variables which can affect the accuracy of the forecast. This is a severe drawback of this approach. Obtaining forecasted data from other financial sources can help minimize the accumulation of errors in the model.

Summary & Conclusion

I would like to summarize the whole project — including the findings from the Part I.

  • Among all the currencies, the Chinese Yuan tends to be in the strengthening phase against US Dollars.
  • The strengthening phase of the Chinese Yuan will be a good opportunity for investing in U.S. exporting companies dealing with Chinese Exports as they can sell the goods at a lower price in China.
  • The Chinese buyers can get more goods from the US Exporters for the same Chinese Yuan during the Weak Dollar phase.
  • The US can attract more tourists from China during the weak dollar phase and it can be a boon for the tourism industry in the US.
  • For the other countries and the EU, it’s a strong dollar period and they would gain from exporting goods to the US.
  • A US Investor investing in a Chinese company will gain; A US investor investing in India, the UK, or the EU will suffer a loss!

The foreign exchange rate between two countries determines the trade relations and investment strategies between the traders and investors in these countries.

When a weak dollar uplifts the investment, exporting and tourism industry in the US, it affects imports and domestic population by inflation.

References

https://medium.com/@dagorhan20/usd-try-next-30-days-with-sarimax-a11bbb4a7a00

Thank you! Feel free to visit my GitHub page to learn more about this project and connect with me on LinkedIn!

--

--

Rindhuja Treesa Johnson
Women in Technology

Data Scientist | Graduated from UMBC | Grad in Physics | Author @Towards Data Science