Sean O'Gorman
Apr 30 · 2 min read

now that I had a chance to read your article in a bit more depth, I’ll add some more input beyond what I posted on reddit yesterday.

  • Aside from trying other models, I think the easiest way to significantly improve the perform of the agent is to make your observations stationary. The states that my agent sees are the difference between the current period and the previous period for all of the data, even the non-price fields. It seems that when I give the agent just scaled data, rather than take a losing trade to get into a better position for future trades, it just holds on to a losing position because it knows that the reward will stay at 0 instead of negative whatever. I typically use scikit-learn’s normalizer or maxabs_scalers because they are the quickest.
  • I don’t like your reward function. If it builds enough of a profit early on, it would presumably just ride the past gains as the price sinks. I’ve found that only rewarding (positive or negative) when it makes a trade gets the best results. If I try rewarding every bar based on gain or loss, it also tends to just gravitate to whatever has worked best, and again will rely on rewards from unrealized profits. What I do is give it a reward at selling that is the difference between sell value and cost basis, at which point I record the amount of Bitcoin that I sold. When it goes to buy again, it gets a reward that is based on the value of the next purchase vs. what the position would have been worth had I never sold. Basically, if I sell 10 coins at $100, and buy 10% more because the price is now $90, it gets a reward that would have been equivalent to shorting at $100 and closing a short with a 10% gain. I think this is important because otherwise, the model’s only motivation to close a long position is to avoid a negative reward.

The fields that I feed my agent are date, current price, current volume (in BTC and USD), the fee, and theoretical bid and ask prices (with a spread of about 1.2%), the amount of cash on hand, the number of coins, the USD value of those coins, the value of the whole portfolio in USD, the value of the whole portfolio in BTC, total pnl, and the theoretical reward if it were not to sell.

The biggest problem I’m dealing with at the moment is that it works so well (even on unseen data, with all traces of potential lookahead bias removed) that the results are too unrealistic to really be of any use.

    Sean O'Gorman

    Written by