Thanks for the prompt reply. I have the historical price data of a stock. I like to model the states to be a set of continuous and discrete values, for example, continuous values could be some indicators (moving averages, etc), discrete values could be whether we are currently long, short or no position and some other indicators giving discrete values like are we in overbought or oversold zone etc. The available actions are long,short or do nothing and the rewards will be the profit/lost when the trade is liquidated minus the transaction cost, otherwise the rewards is zero.

The time step could be every minute the state variables are calculated presented to the RL agent which will make a decision to buy-sell-hold, if a trade action is taken to close an existing trade, the profit/loss-commission will be calculated as a reward to the action otherwise zero reward is given.

How can I use A3C to model this RL scenario? As I understand, A3C is by far the best RL technology available to-date.

