See more
Key-Value(-Predict) attention
As a result, the alphas — the weights of hidden states when computing a context vector — show how important a given annotation is in deciding the next state and generating the output word. These are the attention scores.
The weight of each annotation is computed by an alignment model which scores how well the inputs and the output match. An alignment model is a feedforward neural network, for instance. In general, it can be any other model as well.