MIT paper in machine learning for drug discovery at ICML 2018: very incomplete

ICML is approaching. This conference is a kind of World Cup for machine learning research. PR departments in industry and academia are cheerleading their teams, like football fans.

However, there is a difference between football and machine learning. In football, it’s hard to trick referees in order to score a goal (unless you are Maradona):

Maradona scoring a goal with his hand

In machine learning, it’s easier to cheat. Referees make mistakes during the peer-review process. There is more technical fog in machine learning than in football.

For example, let’s take this MIT paper appearing at ICML 2018 (this paper already appeared last February on Arxiv).

Update: one MIT author replied, check comments

The MIT press release is here:

It’s about generating new molecules for drug discovery. I already covered this topic here and here, and you can also check this nice Youtube video by Siraj Raval here.

The MIT press release by Rob Matheson says:

In the first test, the researchers’ model generated 100 percent chemically valid molecules from a sample distribution, compared to SMILES models that generated 43 percent valid molecules from the same distribution.

The ICML paper abstract, by Wengong Jin, Regina Barzilay and Tommi Jaakkola, says:

Across these tasks, our model outperforms previous state-of-the-art baselines by a significant margin.

However, Rob Matheson does not mention the fact that without their ‘validity check’ trick, the MIT team only gets 93.5 percent valid molecules (page 7 of the ICML version). More importantly, the ICML paper does not cite other relevant studies. For example, as early as April 2017, a paper by AstraZeneca reached 94 percent validity, using a reinforcement learning method on SMILES (check page 6 here).

Moreover, in November 2017, using the same family of algorithms as this MIT paper (autoencoders), AstraZeneca already reached 78.3 percent validity (page 8 here), much better than the so-called ‘state-of-the-art baseline’ of 43 percent.

On the top of that, the MIT Public Relations department bragged about this dubious ICML paper, in the same way as Maradona bragged after his dubious goal. So maybe the MIT should be called the Maradona Institute of Technology.