Mike Shi
Mike Shi
Sep 22, 2018 · 1 min read

Thanks for reading! A policy is a function that maps a state to an action. So the input of a policy is a vector contained in the state space, and the output of the policy is an action contained in the action space (in this game it’s a scalar, but can be a vector as well). So a policy is only a single function that can take in any state from state space and output any action in the action space. (Really, technically, state should be replaced with observation, since we can’t always observe the complete state of the game)

Our policy function in this case is composed of a linear operation (dot product of state vector and policy matrix) and a non-linear operation (0 or 1 depending if the dot product is positive/negative). So the policy array/vector/matrix is just part of what makes our policy a policy.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store