Thanks for reading! A policy is a function that maps a state to an action. So the input of a policy is a vector contained in the state space, and the output of the policy is an action contained in the action space (in this game it’s a scalar, but can be a vector as well). So a policy is only a single function that can take in any state from state space and output any action in the action space. (Really, technically, state should be replaced with observation, since we can’t always observe the complete state of the game)
Our policy function in this case is composed of a linear operation (dot product of state vector and policy matrix) and a non-linear operation (0 or 1 depending if the dot product is positive/negative). So the policy array/vector/matrix is just part of what makes our policy a policy.