# Stochastic Optimal Control and Optimization of Trading Algorithms

### Dynamic Programming Principle and the Hamilton-Jacobi-Bellman (HJB) equation

Let’s assume we have a plane(or a rocket) flying from point A to point B, but as there’s lots of turbulence on the way, it can’t move in a straight line, as it’s constantly tossed in random directions. Control systems have to adjust trajectory (“control policy”) all the time, and since the amount of fuel is limited, it has to be done in an optimal way. The dynamic programming method breaks this decision problem into smaller subproblems. Richard Bellman’s principle of optimality describes how to do this:

An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

Basically, that means that part of the optimal trajectory is also an optimal trajectory: if the bold line between C and D wasn’t an optimal trajectory, we should’ve substituted it with some other (dashed) line. That is why such problems are usually solved backwards in time: if we’re at some (random) point C’ near C, we know how to get to C, and so on.

Mathematically, the problem could be formulated like this:

we need to minimize the value function:

over the time period [0,T], where C[ ] is the scalar cost rate function and D[ ] is a function that gives the economic value or utility at the final state, x(t) is the system state vector,x(0) is assumed given, and u(t) for 0≤tT is the control vector that we are trying to find.

For this simple system, the Hamilton–Jacobi–Bellman partial differential equation is:

subject to terminal condition:

In general, the goal of stochastic control problems is to maximize(minimize) some expected profit(cost) function by choosing an optimal strategy which itself affects the dynamics of the underlying stochastic system. Let’s have a look at some classic toy problems:

### -The Merton Problem

The agent is trying to maximize the expected utility of future wealth by trading a risky asset and a risk-free bank account. The agent’s actions affect her wealth, but at the same time, the random dynamics in traded asset modulate agent’s wealth in a stochastic manner. Or more strictly, agent is trying to maximize expectation of U(X), where X — agent’s wealth — is modeled as:

where W is a Brownian motion, used to model price of a risky asset:

where π is a self-financing trading strategy, μ is expected compounded rate of growth of the traded asset and r is compounded rate of return of the risk-free bank account.

### -The Optimal Liquidation Problem

Suppose that our alpha model signals us that it’s profitable to liquidate a large number N of coins at price St and we wish to do so by the end of the day at time T. Realistically, market does not have infinite liquidity, so it can’t absorb a large sell order at the best available price, which means we will walk the order book or even move the market and execute an order at a lower price (subject to market impact denoted as ‘h’ below). Hence, we should spread this out over time, and solve a stochastic control problem. We may also have a sense of urgency, represented by penalising utility function for holding non-zero invenotry throughout the strategy. Let νt denote the rate at which agent sells her coins at time t. Agent’s value function will look like:

where dQ=-νtdt — agent’s inventory, dS — coin price (as in Merton’s problem above), S’t=St-h(νt) — execution price and dX=νtS’tdt — agent’s cash.

### -The Optimal Entry-Exit Problem for Statistical Arbitrage

Suppose we have two co-integrated assets A and B (or, in trivial case, one asset on different exchanges) and have a long-short portfolio which is linear combination of these two assets. Optimal strategy should determine when to enter and exit such a portfolio and we can pose this problem as an optimal stopping problem. We can model the dynamics of the εt, co-integration factor of these assets, as

where W is a standard Brownian motiom, κ is a rate of mean-reversion, θ is the level that the process mean-reverts to and σ is the volatility of the process. The agent’s performance, for example, for exiting the long position can be written as

where c is the transaction cost for selling the portfolio, ρ represents urgency, usually given by the cost of margin trade and E[ ] denotes expectation conditional on εt= ε.

The value function will seek for the optimal stopping time when unwinding the position (long portfolio) maximizes the performance criteria. Alternatively, we can find performance criteria for entering long position, and finally, criteria for entering and exiting short positions.

There are, of course, many more optimal stochastic control problems in trading and almost any execution algorithm can be optimised using similar principles. Performance of two algorithms based on exact same signals may vary greatly, which is why it is not enough to have just a good “alpha” model that generates accurate predictions.

As a group of “quants” with academic background in Numerical Methods, Computational Mathematics, Game Theory and hands-on experience in High Frequency Trading and Machine Learning, our interest was in exploring opportunities in cryptocurrency markets, with the goal of exploiting various market inefficiencies to generate steady absolute returns (not correlated with market movements) with low volatility, or simply put, steady profit without major drawdowns. For more information please visit http://www.TensorBox.com and if you like what we do you can participate in our Initial Token Offering.

Like what you read? Give tensorbox a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.