MIT paper in machine learning for drug discovery at ICML 2018: very incomplete
ICML is approaching. This conference is a kind of World Cup for machine learning research. PR departments in industry and academia are cheerleading their teams, like football fans.
However, there is a difference between football and machine learning. In football, it’s hard to trick referees in order to score a goal (unless you are Maradona):
In machine learning, it’s easier to cheat. Referees make mistakes during the peer-review process. There is more technical fog in machine learning than in football.
Update: one MIT author replied, check comments
The MIT press release is here:
MIT researchers have developed a model that uses machine learning to find lead molecules with desired properties and…news.mit.edu
The MIT press release by Rob Matheson says:
In the first test, the researchers’ model generated 100 percent chemically valid molecules from a sample distribution, compared to SMILES models that generated 43 percent valid molecules from the same distribution.
The ICML paper abstract, by Wengong Jin, Regina Barzilay and Tommi Jaakkola, says:
Across these tasks, our model outperforms previous state-of-the-art baselines by a significant margin.
However, Rob Matheson does not mention the fact that without their ‘validity check’ trick, the MIT team only gets 93.5 percent valid molecules (page 7 of the ICML version). More importantly, the ICML paper does not cite other relevant studies. For example, as early as April 2017, a paper by AstraZeneca reached 94 percent validity, using a reinforcement learning method on SMILES (check page 6 here).
Moreover, in November 2017, using the same family of algorithms as this MIT paper (autoencoders), AstraZeneca already reached 78.3 percent validity (page 8 here), much better than the so-called ‘state-of-the-art baseline’ of 43 percent.
On the top of that, the MIT Public Relations department bragged about this dubious ICML paper, in the same way as Maradona bragged after his dubious goal. So maybe the MIT should be called the Maradona Institute of Technology.