Sorry for my late answer. If I am not wrong, it is the way the Bidirectional layer form Keras is working (https://github.com/keras-team/keras/blob/master/keras/layers/wrappers.py#L332). It manages both layers in the same time, the forward and the backward layers. So you do not have to set and specify the reverse input.
It is the same using two similar LSTM layers, but one with the go_backwards parameter set to True (i.e. the backward layer). Then you combine then the way you want.
In my script, I did not set any merge mode : outputs are concatenated and return as a list (before a Dense layer).
Maybe the fuzzy is coming from the fact that, in this case, it is a not a "real" time-serie subject. The problem is handled as a sequence classification problem (classification of sequences of words, where classes are the words of the vocabulary).