An Empirical Approach to Explain Deep Reinforcement Learning in Portfolio Management Task

MG
5 min readNov 18, 2021

--

This blog is a tutorial based on our paper: Explainable Deep Reinforcement Learning for Portfolio Management: An Empirical Approach, presented at 2nd ACM International Conference on AI in Finance.

The Jupyter notebook codes are available on our Github and Google Colab.

Overview

Deep reinforcement learning (DRL) has been widely studied in the portfolio management task. However, it is challenging to understand a DRL-based trading strategy because of the black-box nature of deep neural networks.

We propose an empirical approach to explain the strategies of DRL agents for the portfolio management task. First, we use a linear model in hindsight as the reference model, which finds the best portfolio weights by assuming knowing actual stock returns in foresight. In particular, we use the coefficients of a linear model in hindsight as the reference feature weights. Secondly, for DRL agents, we use integrated gradients to define the feature weights, which are the coefficients between reward and features under a linear regression model. Thirdly, we study the prediction power in two cases, single-step prediction and multi-step prediction.

In particular, we quantify the prediction power by calculating the linear correlations between the feature weights of a DRL agent and the reference feature weights, and similarly for machine learning methods. Finally, we evaluate a portfolio management task on Dow Jones 30 constituent stocks during 01/01/2009 to 09/01/2021. Our approach empirically reveals that a DRL agent exhibits a stronger multi-step prediction power than machine learning methods.

Overview of explanation method

Part 1: Reference Model & Feature Weights

Reference model in hindsight

We use a linear model in hindsight as a reference model. For a linear model in hindsight, a demon would optimize the portfolio with actual stock returns and the actual sample covariance matrix. It is the upper bound performance that any linear predictive model would have been able to achieve

We use the regression coefficients to define the reference feature weights as

Reference feature weights

Part 2: Feature Weights for DRL Agents

Feature weights of a trained DRL agent.

We use integrated gradients to define the feature weights for DRL agents in portfolio management task.

Integrated gradients

We use a linear model to find the relationship between features and portfolio return vector q.

Lastly, we define the feature weights of DRL agents in portfolio management task using integrated gradients and the regression coefficients.

Part 3: Feature Weights for ML Methods

We use conventional machine learning methods as comparison.

Firstly, it uses the features as input to predict the stock returns vector.

Secondly, it builds a linear regression model to find the relationship between the portfolio return vector q and features.

Lastly, it uses the regression coefficients b to define the feature weights as follows.

Part 4: Prediction Power

Both the machine learning methods and DRL agents take profits from their prediction power. We quantify the prediction power by calculating the linear correlations 𝜌 (·) between the feature weights of a DRL agent and the reference feature weights and similarly for machine learning methods. Furthermore, the machine learning methods and DRL agents are different when predicting future. The machine learning methods rely on single-step prediction to find portfolio weights. However, the DRL agents find portfolio weights with a long-term goal. Then, we compare two cases, single-step prediction and multi-step prediction.

Part 5: Experiment

  1. Algorithms:
    1.1 DRL agents: PPO, A2C
    1.2 ML Methods: SVM, Decision Tree, Random Forest, Linear Regression,
  2. Data: Dow Jones 30 constituent stocks, accessed at 7/1/2020
    2.1Training: 1/1/2009 to 6/30/2020
    2.2Trading: 7/1/2020 to 9/1/2021
  3. Features: MACD, CCI, RSI, ADX
  4. Benchmark: Dow Jones Industrial Average (DJIA)

Portfolio Performance

Portfolio performance comparison

Prediction Power’s Distribution

Single Step

Single step prediction power’s histogram

Multi Step

Multi step prediction power’s histogram

Statistical Test

Statistical test for mean value

Mean Prediction Power & Sharpe Ratio

We compare the prediction power with Sharpe ratio in all the algorithms.

We find that:

  1. The DRL agent using PPO has the highest Sharpe ratio:2.11 and highest average correlation coefficient (multi-step): 0.09 among all the others.
  2. The DRL agents’ average correlation coefficients (multi-step) are significantly higher than their average correlation coefficients (single-step).
  3. The machine learning methods’ average correlation coefficients (single-step) are significantly higher than their average correlation coefficients (multi-step).
  4. The DRL agents outperform the machine learning methods in multi-step prediction power and fall behind in single-step prediction power.
  5. Overall, a higher mean correlation coefficient (multi-step) indicates a higher Sharpe ratio

--

--