Member-only story
Five Ways To Handle Large Action Spaces in Reinforcement Learning
Action spaces, particularly in combinatorial optimization problems, may grow unwieldy in size. This article discusses five strategies to handle them.
Handling large action spaces remains a fairly open problem in Reinforcement Learning. Researchers have made great strides in terms of handling large state spaces, with convolutional networks and transformers being some recent high-profile examples. However, there are three so-called curses of dimensionality: state, outcome, and action [1]. As of yet, the latter is still rather understudied.
Still, there is a growing body of methods that attempt to handle large action spaces. This article presents five ways that handle the latter at scale, focusing in particular on the high-dimensional discrete action spaces that are often encountered in combinatorial optimization problems.
Refresher: three curses of dimensionality
A quick refresher on the three curses of dimensionality is in order. Assuming we express the problem at hand as a system of Bellman equations, note there are three sets to evaluate — in practice in the form of nested loops — each of which may be prohibitively large: