An Empirical Approach to Explain Deep Reinforcement Learning in Portfolio Management Task

5 min readNov 18, 2021

This blog is a tutorial based on our paper: Explainable Deep Reinforcement Learning for Portfolio Management: An Empirical Approach, presented at 2nd ACM International Conference on AI in Finance.

Explainable Deep Reinforcement Learning for Portfolio Management: An Empirical Approach

Deep reinforcement learning (DRL) has been widely studied in the portfolio management task. However, it is challenging…

arxiv.org

The Jupyter notebook codes are available on our Github and Google Colab.

GitHub — AI4Finance-Foundation/FinRL: Deep Reinforcement Learning Framework to Automate Trading in…

Disclaimer: Nothing herein is financial advice, and NOT a recommendation to trade real money. Please use common sense…

github.com

Google Colaboratory

Explainable Deep Reinforcement Learning for Portfolio Management: An Empirical Approach.

Overview

Deep reinforcement learning (DRL) has been widely studied in the portfolio management task. However, it is challenging to understand a DRL-based trading strategy because of the black-box nature of deep neural networks.

We propose an empirical approach to explain the strategies of DRL agents for the portfolio management task. First, we use a linear model in hindsight as the reference model, which finds the best portfolio weights by assuming knowing actual stock returns in foresight. In particular, we use the coefficients of a linear model in hindsight as the reference feature weights. Secondly, for DRL agents, we use integrated gradients to define the feature weights, which are the coefficients between reward and features under a linear regression model. Thirdly, we study the prediction power in two cases, single-step prediction and multi-step prediction.

In particular, we quantify the prediction power by calculating the linear correlations between the feature weights of a DRL agent and the reference feature weights, and similarly for machine learning methods. Finally, we evaluate a portfolio management task on Dow Jones 30 constituent stocks during 01/01/2009 to 09/01/2021. Our approach empirically reveals that a DRL agent exhibits a stronger multi-step prediction power than machine learning methods.

Part 1: Reference Model & Feature Weights

We use a linear model in hindsight as a reference model. For a linear model in hindsight, a demon would optimize the portfolio with actual stock returns and the actual sample covariance matrix. It is the upper bound performance that any linear predictive model would have been able to achieve

We use the regression coefficients to define the reference feature weights as

Part 2: Feature Weights for DRL Agents

We use integrated gradients to define the feature weights for DRL agents in portfolio management task.

We use a linear model to find the relationship between features and portfolio return vector q.

Lastly, we define the feature weights of DRL agents in portfolio management task using integrated gradients and the regression coefficients.

Part 3: Feature Weights for ML Methods

We use conventional machine learning methods as comparison.

Firstly, it uses the features as input to predict the stock returns vector.

Secondly, it builds a linear regression model to find the relationship between the portfolio return vector q and features.

Lastly, it uses the regression coefficients b to define the feature weights as follows.

Part 4: Prediction Power

Both the machine learning methods and DRL agents take profits from their prediction power. We quantify the prediction power by calculating the linear correlations 𝜌 (·) between the feature weights of a DRL agent and the reference feature weights and similarly for machine learning methods. Furthermore, the machine learning methods and DRL agents are different when predicting future. The machine learning methods rely on single-step prediction to find portfolio weights. However, the DRL agents find portfolio weights with a long-term goal. Then, we compare two cases, single-step prediction and multi-step prediction.

Part 5: Experiment

Algorithms:
1.1 DRL agents: PPO, A2C
1.2 ML Methods: SVM, Decision Tree, Random Forest, Linear Regression,
Data: Dow Jones 30 constituent stocks, accessed at 7/1/2020
2.1Training: 1/1/2009 to 6/30/2020
2.2Trading: 7/1/2020 to 9/1/2021
Features: MACD, CCI, RSI, ADX
Benchmark: Dow Jones Industrial Average (DJIA)

Portfolio Performance

Prediction Power’s Distribution

Single Step

Multi Step

Statistical Test

Mean Prediction Power & Sharpe Ratio

We compare the prediction power with Sharpe ratio in all the algorithms.

We find that:

The DRL agent using PPO has the highest Sharpe ratio:2.11 and highest average correlation coefficient (multi-step): 0.09 among all the others.
The DRL agents’ average correlation coefficients (multi-step) are significantly higher than their average correlation coefficients (single-step).
The machine learning methods’ average correlation coefficients (single-step) are significantly higher than their average correlation coefficients (multi-step).
The DRL agents outperform the machine learning methods in multi-step prediction power and fall behind in single-step prediction power.
Overall, a higher mean correlation coefficient (multi-step) indicates a higher Sharpe ratio

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com