Machine Learning in Asset Management — Trading Strategies

Published in

Sov.ai

22 min readFeb 7, 2020

Machine Learning in Asset Management — Trading Strategies

Snow, D (2020). Machine Learning in Asset Management — Part 1: Portfolio Construction — Trading Strategies. The Journal of Financial Data Science, Winter 2020, 2 (1) 10–23. (Adapted for Medium)

Sponsored by Sov.ai

Machine learning can help with most portfolio construction tasks like idea generation, alpha factor design, asset allocation, weight optimization, position sizing and the testing of strategies.
This is the first in a series of articles dealing with machine learning in asset management and more narrowly on trading strategies equipped with machine learning technologies.
Each trading strategy can end up using multiple machine learning frameworks. The author highlights nine different trading varieties each making use of a reinforcement, supervised, unsupervised, or a combination of these learning frameworks.

Historically, algorithmic trading could be more narrowly defined as the automation of sell-side trade execution, but since the introduction of more advanced algorithms the definition has grown to include idea generation, alpha factor design, asset allocation, position sizing and the testing of strategies. Machine learning, from the vantage of a decision-making tool, can help in all of these areas.

Financial machine learning research can loosely be divided into four streams. The first concerns (1) asset price prediction where researchers attempt to predict the future value of securities using a machine learning methodology. The second stream involves the prediction of (2) hard or soft financial events like earnings surprises, regime changes, corporate defaults, and mergers and acquisitions.

The third stream entails the prediction and/or (3) estimation of values that are not directly related to the price of a security, such as future revenue, volatility, firm valuation, credit ratings and factor quantiles. The fourth and last stream comprises the use of machine learning techniques to solve (4) traditional optimization and simulation problems in finance like optimal execution, position sizing, and portfolio optimization.

The first three streams are concerned with the creation of trading strategies, while the last stream is concerned with everything else, like weight optimization, optimal execution, risk management, and capital management.

Exhibit 1: Financial Machine Learning in Portfolio Construction

Exhibit 1 outlines a few different ways in which machine learning can be used in portfolio construction. Portfolio construction can broadly be broken into trading strategies and weight optimization. In the first part of this series we will look at trading strategies and in the second part we will look at weight optimization.

The trading strategy styles in the first three streams of financial machine learning research, Price, Event, and Value, can be split into unique trading themes depending on the data used and the outcome one is trying to predict. Price strategies include Technical, Systematic Global Macro, and Statistical Arbitrage, because of the central role price has to play in the input data and predicted outcomes.

Event strategies include Trend, Soft-Event, and Hard-Event themes, because of the need to predict a change. Value strategies include Risk parity, Factor Investing and Fundamental themes, because these measures estimate intermediary values not directly related to the asset price.

Each trading theme can end up using different machine learning frameworks. For example, Technical and Statistical Arbitrage strategies can use a supervised or reinforcement learning approach or a combination of both, and Factor investing strategies can use a supervised or unsupervised learning approach.

It is necessary to define the difference between the aforementioned themes:

Technical trading is the use of market data and its transformations to predict the future price of an asset.
Trend trading are strategies where one takes a position in the asset only after you predict a change in trend.
Statistical arbitrage seeks mispricing by detecting asset relationships and/or potential anomalies, believing the anomaly will return to normal.
Risk parity strategies diversifies across assets according to the volatility they exhibit; when one asset class’s volatility exceeds another rebalancing can occur by selecting individual units within each asset class or simply by using leverage.
Event trading involves the prediction of hard or soft financial events like corporate defaults, mergers and acquisitions, and earnings surprises.
Factor investing attempts to buy assets that exhibit a trait historically associated with promising investment returns.
Systematic global macro relies on macroeconomic principles to trade across asset classes and countries.
Fundamental trading relies on the use of accounting, management and sentiment data to predict whether a stock is over or undervalued.

In this section, I will provide machine learning applications for some of these trading themes.

As soon as you have trained your machine learning model, you can decide whether you want to use the model as part of a greater ensemble of models to create an improved final prediction model that would be used for trading purposes. This ensemble of models can additionally pass through a second supervised machine learning model that would decide on the most profitable model weighting scheme, this is known as a stacked model.

Once you have devised a few independently stacked trading models, you can pass the proprietary model returns to an unsupervised-learning portfolio-weight-optimization scheme like hierarchical risk parity for the final strategy allocation. It is currently feasible to substitute a large portion of traditional algorithmic trading techniques with their machine learning equivalents.

It’s possible to construct a strategy that ‘learns’ all the way down. For example, although one can create one reinforcement learning agent that are able to ingest a lot of data and make profitable decisions inside an environment, you are often better off to create more simple agents at the lower level while pyramiding additional decision-making responsibilities upwards.

Lower level agents can look at pricing, fundamental and limited capital market data while a meta-agent can select or combine strategies based on potential regime shifts that happen at the economic level and only make the trading decisions then.

A meta-learner can choose between a few hundred models based on the current macro-regime. At the end, all the meta learners, or depending on how deep you go, meta-cubed learners, should form part of the overall portfolio. The core function is to carry our portfolio level statistical arbitrage to the outmost extreme using all the tools available financial or otherwise.

You needn’t use an additional level of reinforcement or supervised learning algorithms; you can also use unsupervised clustering algorithms. You can discover multiple economic regimes by using k-nearest neighbours (KNN) clustering, an unsupervised learning technique, to select the potential regime of the last 30-days. Then one can select strategy by looking at the historic success across all regime types,

Machine learning is ‘limitless’ in the sense that you can tweak it endlessly to achieve some converging performance ceiling. Some of these tweaks include the different methods to perform validation, hyperparameter selection, up-and down-sampling, outlier removal, data replacement, and so on.

Features can also be transformed in myriad ways; the dimensions of features can be reduced or inflated; variables can be generated through numerous unsupervised methods; variables can also be combined, added, or removed; models can be fed into models; and on top of that it is possible to use all machine learning frameworks, i.e., supervised, reinforcement and unsupervised learning, within a single prediction problem. The only way to know whether any of these adjustments are beneficial is to test it empirically on validation data.

How do we know if any of these adjustments would lead to a better model? Most of the time we can use proxies for potential performance like the Akaike information criterion (AIC), or feature-target correlation. These approaches get us halfway towards a good outcome. The best approach is to re-test the model each time a new adjustment is introduced.

The tests should not be performed on the data that would be used in testing the performance of the model, i.e. the holdout set; instead a separate validation set should be specified for this purpose. It is also preferable to change the validation data after each new empirical test to ensure that these adjustments do not overfit the validation set; one such approach is known as K-Fold cross-validation where the validation set is randomly partitioned into K equal-sized subsamples for each test.

In this article, I will survey a few ways in which machine learning can be used to enhance or even create trading strategies. As stated, we are only limited by our imagination when devising machine learning strategies. In this section, I will highlight nine different trading varieties that make use of a reinforcement, supervised, unsupervised or a combination of learning frameworks.

I named these strategies Tiny RL, Tiny VIX CMF, Agent Strategy, Industry Factor, Global Oil, Earnings Surprise Prediction, Deep Trading, Stacked Trading, and Pairs Trading. I organised them according to machine learning framework and each strategy is titled by name, theme, method, and sub-method.

REINFORCEMENT LEARNING

Reinforcement learning (RL) in finance comprises the use of an agent that learns how to take actions in an environment to maximise some notion of cumulative reward.

We have an agent that exists in a predefined environment, the agent receives as input the current state St and is asked to take an action At to receive a reward Rt+1, the information of which can be used to identify the next optimal action, At+1 given the new state St+1. The final objective function can be the realised/unrealised profit and loss and even risk-adjusted performance measure like the Sharpe Ratio.

Tiny RL — Technical/RL/Policy

In this example we will make use of gradient descent to maximise a reward function.

The Sharpe ratio will be used as the reward function. The Sharpe ratio is used as an indicator to measure the risk adjusted performance of an investment over time. Assuming a risk-free rate of zero, the Sharpe ratio can be written as:

Further, to know what percentage of the portfolio should buy the asset in a long only strategy, we can specify the following function which will generate a value between 0 and 1.

The input vector is the following t where rt is the percent change between the asset at time t and t+1, and M is the number of time series inputs.

This means that at every step the model will be fed its last position and a series of historical price changes that is used to calculate the next position. Once we have a position at each time step, , we can calculate our returns R at each time-step using the following formula. In this example, δ is the transaction cost.

To perform gradient descent, one must compute the derivative of the Sharpe ratio with respect to theta or using the chain rule and the above formula. It can be written as.

Code and Resources:

Data, Code

Hat tip, Teddy Kokker

Tiny VIX CMF— StatArb/RL/Policy

CBOE Volatility Index (VIX) and Futures on the Euro STOXX 50 Volatility Index (VSTOXX) are liquid and so are exchange-traded-notes/exchange-traded-funds (ETNs/ETFs) on VIX and VSTOXX. Prior research shows that the future curves exhibit stationary behaviour with mean reversion toward a contang

First, one can imitate the futures curves and ETN price histories by building a model and then use that model to manage the negative roll yield. The Constant Maturity Futures (CMF) can be specified as follows:

One can then go on to define the value of the ETN so that you take the roll yield into account. I want to focus on maturity and instrument selection, and therefore ignored the roll yield and simply focused on the CMFs. But, if you are interested, the value of the ETN can be obtained as follows.

where r is the interest rate.

Unlike the Tiny VIX CMF approach, this strategy makes use of numerical analyses before a reinforcement learning step. First, out of all seven securities (J), establish a matrix of 1 and 0 combinations for simulation purpose to obtain a matrix of combinations.

Then use a standard normal distribution to randomly assign weights to each value in the matrix. Create an inverse matrix and do the same. Now normalise the matrix so that each row equals one in order to force neutral portfolios. The next part of the strategy is to run this random weight assignment simulation N (600) number of times depending on your memory capacity as this whole trading strategy is serialised.

Thus, each iteration (N) produces normally distributed long and short weights (W) that have been calibrated to initial position neutrality (Long Weights = Short Weights); the final result is 15,600 trading strategies.

The next part of this system is to filter out strategies with the following criteria. Select the top X percent of strategies for their highest median cumulative sum over the period. From that selection, select the top Y percent for the lowest standard deviation.

Of that group, select Z percent again for the highest median cumulative sum strategies. X, Y and Z are risk-return parameters that can be adjusted to suit your investment preferences. In this example, they are set at 5%, 40% and 25% respectively. It is possible to efficiently select these parameters by adding them to the reinforcement learning action space.

Of the remaining strategies, iteratively remove highly correlated strategies until only 10 (S) strategies remain. With that remaining 10 strategies, which have all been selected using only training data, use the training data again to formulise a reinforcement learning strategy using a simple MLP neural network with two hidden layers to select the best strategy for the specific month by looking at the last 6 months returns of all the strategies, i.e., 60 features in total.

Finally test the results on an out of sample test set. Note in this strategy no hyperparameters selection was done on a development set, as a result, it is expected that results can further be improved.

Data, Code

Hat tip Andrew Papanicolaou

Agent Strategy — Price/RL/Various Sub-methods

Here, 20+ reinforcement learning sub-methods are developed using different algorithms, the first three in the code supplement do not make use of RL; their rules are determined by arbitrary inputs. This includes a turtle-trading agent, a moving-average agent, and a signal-rolling agent.

The rest of the coding notebook contains progressively more involved reinforcement learning agents. The notebook investigates, among others, policy gradient agents, q-learning agents, actor-critic agents, and some neuro-evolution agents and their variants.

With enough time, all these agents can be initialised, trained and measured for performance. Each agent individually generates a chart that contains some of the performance information as shown in Exhibit 2.

Exhibit 2: Example of a Reinforcement Learning Strategy’s Performance

In this section we will look at three of the most popular methods, being Q-learning, Policy Gradient, and Actor-Critic. Some quick mathematical notes: s=states, a=actions, r=rewards. In addition, action value functions Q, state-value functions V, and advantage functions A, are defined as:

Q-learning: is an online action-value function learning with an exploration policy, e.g., epsilon-greedy. You take an action, observe, maximise, adjust policy and do it all again.

Policy Gradients: here you maximise the rewards by taking actions where higher rewards are more likely.

Actor-Critic is a combination of policy gradient and value-function learning. In this example, I will focus on the online as opposed to the batch model.

Code (Data Self-Contained)

SUPERVISED LEARNING

Supervised learning (SL) techniques are used to learn the relationship between independent attributes and a designated dependent attribute. SL refers to the mathematical structure describing how to make a prediction yi given xi.

Instead of learning from the environment like RL, SL methods learn the relationships in data. All supervised learning tasks are divided in classification or regression tasks. Classification models are used to predict discrete responses (e.g., Binary 1, 0; Multi-class 1, 2, 3). Regression is used for predicting continuous responses. (e.g., 3.5%, 35 times, $35,000). In the examples that follow, we will both use classification and regression models.

Industry Factor — Factor/SL/Lasso

In this example, we will look at the use of machine learning tools to analyse industry return predictability based on lagged industry returns across the economy (Rapach, Strauss, Tu, & Zhou, 2019). A strategy that longs the highest and shorts the lowest predicted returns, returns an alpha of 8%.

In this approach, one has to be careful about multiple testing and post-selection bias. A LASSO regression is eventually used in a machine learning format to weight industry importance; but before that we should first formulate a standard predictive regression framework:

where,

In addition, the lasso objective 𝜰T can be expressed as follows, where ϑi is the regularisation parameter.

The LASSO regression generally performs well in selecting the most relevant predictor variables. Some argue that the LASSO penalty term over shrinks the coefficient for the selected predictors. In that scenario, one can use the selected predictors and re-estimate the coefficients using OLS.

This sub model — an OLS regression model in this case — can be replaced by any other machine learning regressor. In fact, the main and sub-model can both be machine learning regressors, the first selecting the features and second predicting the response variable based on those features.

Data, Code

Global Oil — Systematic Macro/SL/Elastic Net

When oil exits a bear market then the currency of oil producing nations should also rebound. With this strategy, we will investigate the effect the price of oil has on the Norwegian krone (NOK) and identify whether a profitable trading strategy can be executed. To start we need a ‘stabiliser currency’ to regress against.

The currency should be unrelated to the currency under investigation. Something like the Japanese yes (JPY) is a good candidate. From here on, one would use the price of the NOK and Brent as measured against JPY to identify whether the Norwegian currency is under or overvalued.

I will use an elastic net regression as the machine learning technique. It is a good tool when multicollinearity is an issue. An elastic net is a regularised regression method that combines both L1 (Lasso) and L2 (Ridge) penalties. The estimates from the elastic net method are defined by.

The loss function becomes strongly convex as a result of the quadratic penalty term therefore providing a unique minimum. Now that the predictors are in place, one has to set up a pricing signal; one sigma two-sided is the common practice in arbitrage. We short if it spikes above the upper threshold and long on the lower threshold. The stop-loss will be set at two standard deviations. At that point, one can expect the interpretation of the underlying model to be wrong and therefore choose to exit the position.

Data, Code

Deep Trading — Technical/SL/Various DL

There are 30 different neural network sub-methods investigated here. This includes Vanilla RNN, GRU, LSTM, Attention, DNC, Byte-net, Fairseq, and CNN methods. The mathematics of the different frameworks are vast and would take too much space to include here. I have not turned any of the methods into trading strategies yet.

Here, I am simply predicting the future price of the stock, so the models can easily be transformed into directional trading strategies from this point. You can construct the trading policies by hand or rely on reinforcement learning strategies to ‘develop’ the best trading policies.

Exhibit 4: Architecture of RNN, GRU and LTSM cells.

Exhibit 4 can help us to understand the major differences between the sub-methods. A Vanilla recurrent neural network (RNN) uses the simple multiplication of inputs (xt) and previous outputs (ht-1) passed through a tanh activation function.

A Gated Recurrent Unit (GRU) introduces the additional concept of a gate that decides whether to pass a previous output (ht-1) to a next cell in an attempt to solve the vanishing gradient problem. It is simply an additional mathematical operation performed on the same inputs.

With the Long Short-Term Memory Unit (LSTM) an additional gate is introduced to the GRU method. Again, these are additional mathematical operations on the same inputs. Moving from RNN to LSTM we are simply introducing more ‘control knobs’ for the flow and mixing of input data to establish the final weights.

The LSTM method is designed to focus on establishing weights that maintain information that persist for longer periods of time. The code of these three methods and many others are available in the online supplement.

Code (Data Self-Contained)

Stacked Trading — Technical/SL/Stacked

This is purely experimental, it involves the training of multiple models (base-learners or level 1 models), after which they are weighted using an extreme gradient boosting model (metamodel or level 2 model). In the first stacked model, which I will refer to as EXGBEF, we use autoencoders to create additional features.

In the second model, DFNNARX, autoencoders are used to reduce the dimensions of existing features. In the second model, I include additional economic (130+ time series) and fund variables to the stock price variables. Similar to the Deep Trading example, we have price movement predictions, but we have not developed a trading policy yet. Exhibit 5 graphically shows the concept of stacking.

Exhibit 5: Architecture of Stacked Models

The training data X has m observations, and n features. There are M different models that are trained on X. Each model provides predictions ŷ for the outcome y which are then cast into a second level training data X^(l2) which is now m x M sized. The M predictions become features for this second level data. A second level model (or models) can then be trained on this data to produce the final outcomes ŷ-fin which will be used for predictions. With stacking it can help to use out-of-sample training data at each modelling level, otherwise the nth level model will be biased to use only the best performing model in the previous modelling level.

Code (Data Self-Contained)

SUPERVISED LEARNING VS REINFORCEMENT LEARNING

The general pipeline for supervised machine learning trading involves the acquisition of data, processing of data, prediction, policy development, backtesting, parameter optimization, live paper simulation and finally trading of the strategy.

The basic supervised learning task involves some form of price prediction. This includes regressors that predict the price level and classifiers that predict price direction and magnitude in predefined classifications for future time steps.

Supervised machine learning models, especially neural networks, can keep up with changing market regimes as long as it is able to do online training[1]. The reason supervised learning processes tend to fail is because the iterative steps from ML prediction through to policy development, backtesting and parameter optimization are fragile, slow and prone to error.

A further issue is that the performance simulation turns up too late in the game after much hard work has been done. Also, the policy does not develop ‘intelligently’ with the machine learning model.

The benefit of reinforcement learning algorithms is that the final objective function can be the realised/unrealised profit and loss, but also values like the Sharpe Ratio, maximum drawdown, and value at risk measures.

Reinforcement learning only has four or so steps as opposed to the seven or eight of supervised learning. RL allows for end-to-end optimization on what maximises rewards. The RL algorithm directly learns a policy. RL has to take an action in an interactive environment.

Compared to supervised learning which answers the question, “will the asset increase in price tomorrow?”; reinforcement learning answers the question, “should I buy the asset today?”. The reinforcement learning algorithm is therefore already packaged as a trading strategy.

This does not mean that it is necessarily hard to create a trading strategy out of a supervised learning task, for example, one can simply buy all assets that are predicted to increase in price tomorrow.

Therefore, the reinforcement learning process draws on a larger process of automation. Similar to supervised strategy development, you still have to ensure that the model works, here instead of backtesting you use a simulated environment or paper trading.

Remember that the focus should remain on out-of-sample performance at the end of the day, so be sure to deflate your performance metrics appropriately to control for multiple-testing.

In a nutshell, RL comprises data analysis, agents training in a simulated environment, paper trading, and then finally live trading. In each of the last three steps the agent gets exposed to an environment.

The simplest RL approach is a discrete action space with three actions, buy, hold, and sell. Unlike supervised models, reinforcement models specify an action as opposed to a prediction, however the decision masks an underlying prediction.

So, if RL provides all these miraculous benefits, why is it barely used in industry. Well even though RL can lead to a great strategy in fewer steps with less human involvement, it takes longer to train and is very computationally intensive.

RL needs a lot of data, even more so than supervised machine learning. It can also be expensive to test if you can’t reconstruct a good simulated environment.

In finance this is mostly not a big issue, but this does become an issue when accurate environment feedback is necessary; in which case you might have to revert to the real environment when the simulated environment won’t cut it; in which case it can become very expensive. Lastly, the bigger the action space the harder it is to optimise an RL agents[1].

It is likely that supervised learning would still rule the pack in the foreseeable future. Supervised learning is already quite flexible, and we should expect to see a lot of innovations to bring the experience of developing strategies closer to that of reinforcement learning without forsaking the benefits of supervised learning.

For example, researchers in SL have for a long time looked at embedding policy decisions into SL algorithms. Researchers in finance have also written about creating models that predict the best position sizes and entry and exit points (de Prado, 2018). Bringing the trading policy and rules closer to the ML model and closer to a form of automated intelligence.

Let us consider a few more disadvantages of reinforcement learning. First, RL’s convergence to an optimal value is not guaranteed; the famous Bellman update can only guarantee the optimal value if every state is visited an infinite number of times and every action is tried an infinite amount of times within each state, so essentially never.

You of course don’t need a truly optimal value; approximate optimality is fine. The big issue is that the sample size needed to obtain a good level of approximate optimality increases with the size of the state and action space. Further, without any assumptions there is no better way than to explore the space randomly, so progress at first is small and slow.

Continuous states and actions are a serious problem; how are we supposed to visit an infinite number of states, an infinite number of times for an infinite number of continuous values with small and slow-time steps?

Some of the best approximations can only be done through the generalised nature of supervised learning. Generalisation can also be adopted in RL using function approximation as opposed to storing infinite values in an infinitely large table.

It is worth nothing that this function approximation is still orders of magnitude harder than normal supervised learning problems, the reason being that you start the model off with no data, and as you collect data the action value changes and the ground truth labels also remain unfixed; a point previously labelled as good, might look bad in the longer run.

To get closer to the true function, the agent has to keep exploring. This exploration in uncertain dynamics means that RL is way more sensitive to hyper-parameters and random seeds than SL as it does not train on a fixed data set and is dependent on network output, exploration mechanism, and environment randomness.

Thus, the same run can produce different results. But do notice how great it is that you are never given any samples from the ‘true’ target function, yet you are able to learn by optimising on a goal, that is why RL is so popular.

I simultaneously expect to see a lot of improvement on the RL trading front, so that RL adopts the advantages of SL trading methods while not forgoing its own strengths. Conceptually RL offers a kind of paradigm shift where we are not overtly focused on predictive power, which is an auxiliary task, but rather the optimization of actions which is and has always been the primary goal.

SL and RL algorithms indirectly pick up on well-known trading strategies without having to predefine and identify them. For example, the gradient step that leads the machine agent to buy more of what did the best yesterday are indirectly creating a momentum investing strategy. We can expect machine learning to become part of the toolkit of all asset managers in the future.

SUMMARY

Around 40 years ago Richard Dennis and William Eckhardt put systematic trend following systems on a roll, 15 years later statistical arbitrage made its way onto the scene, 10 years later high frequency trading started to stick its head out, in the meantime, machine learning tools was introduced to make statistical arbitrage much easier and more accurate.

Machine learning today, among other things, assist investment managers to refine the accuracy of their predictions⁠ — by using supervised learning, improve the quality of their decisions⁠ — by using reinforcement learning, and enhance their problem discovery skills⁠ — by using unsupervised learning.

Technological adoption within portfolio management moves fast and over the decades we have seen technologies come and go. It is likely that this cycle in quantitative finance will persist and that it also applies to machine learning in asset management, with one caveat, machine learning is also practically revolutionary, instead of just maximising alpha it also minimises overheard costs.

Machine learning is already having large economic effects on many financial domains and it is poised to grow further. Advanced machine learning models present myriad advantages in flexibility, efficiency, and enhanced prediction quality.

In this article we have paid special attention to how machine learning can be used to improve various types of trading strategies. We started by identifying important components to asset management in the context of machine learning, one of which is portfolio construction, which itself was divided into trading and weight optimization sections.

The trading strategies were classified according their respective machine learning frameworks, i.e., reinforcement, supervised and unsupervised learning. The article finished with a section explaining the difference between reinforcement learning and supervised learning, both conceptually and in relation to their respective advantages and disadvantages. The next article in this series will be on weight optimization strategies.

References

Britten‐Jones, M. (1999). The sampling error in estimates of mean‐variance efficient portfolio weights. The Journal of Finance, 54(2), 655–671.

de Prado, M. L. (2018). The 10 reasons most machine learning funds fail. The Journal of Portfolio Management, 44(6), 120–133.

de Prado, M. L. (2016). Building diversified portfolios that outperform out of sample. The Journal of Portfolio Management, 42(4), 59–69.

Rapach, D. E., Strauss, J. K., Tu, J., & Zhou, G. (2019). Industry return predictability: A machine learning approach. The Journal of Financial Data Science, 1(3), 9–28.

Author Derek Snow — Is a doctoral candidate of Finance at the University of Auckland and previously a visiting PhD at NYU Tandon and the University of Cambridge.

LinkedIn, Twitter.

Machine Learning in Asset Management — Trading Strategies

Machine Learning in Asset Management — Trading Strategies

REINFORCEMENT LEARNING

Tiny RL — Technical/RL/Policy

Written by Derek Snow