In general, I don’t want to participate in public holy wars, so I will make just two points.
Dzmitry Bahdanau

The AAAI paper that proposed Chinese poetry evaluation had a human evaluation component in addition to the BLEU scores. Besides, it’s rather modest in its claims and, in my opinion, a good read.

