Sep 6, 2018 · 1 min read
That’s a cool application of the attention mechanism! One question regarding your code. What are the axes referring to in
out = Lambda(lambda x: K.batch_dot(x[0], x[1], axes=[4,3]), output_shape=(l, nv, dv))([att, v])given that counting of axes starts at 0 (for the batch dimension) and
print K.int_shape(att) # (None, 36, 8, 8)
print K.int_shape(v) # (None, 36, 8, 24)Thanks for sharing!