More than a Million Pro-Repeal Net Neutrality Comments were Likely Faked
Jeff Kao
41K93

Adding to Runze Wang’s comment about other document/sentence similarity algorithms. I agree with Jeff Kao that Doc2vec is not a good option in it does not even have deterministic outcome for the inference of a sentence, in my experience the variance of outcome vectors for the same sentence was too high. 
A recently published paper by EPFL shows that their new sent2vec algorithm outperforms in accuracy and CPU time. It also uses a kind of wordvector averaging but due to its speed can allow you to add ngram-vectors to the sentence average allowing you to capture more semantics.

Like what you read? Give Ruben Wolff a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.