Member-only story
Do NLP Models Cheat at Math Word Problems? Microsoft Research Says Even SOTA Models Rely on Shallow Heuristics
“Yoshua recently turned 57. He is three years younger than Yann. How old is Yann?” Solving such a math word problem (MWP) requires understanding the short natural language narrative describing a state of the world and then reasoning out the underlying answer. A child could likely figure this one out, and recent natural language processing (NLP) models have also shown an ability to achieve reasonably high accuracy on MWPs.
A Microsoft Research team recently took a closer look at just how NLP models do this, with surprising results. Their study provides “concrete evidence” that existing MWP solvers tend to rely on shallow heuristics to achieve their high performance, and questions these models’ capabilities to robustly solve even the simplest of MWPs.
MWP tasks can be challenging as they require the machine to extract relevant information from natural language text and perform mathematical operations or reasoning to find the solution. MWPs come in many varieties, the simplest being the “one-unknown” problems involving arithmetic operators (+, −, ∗, /).