How good is AI at Quantitative Reasoning?

3 min readJul 16, 2022

Artificial Intelligence (AI) has made amazing progress in the last couple of decades. Artificial neural networks are really successful in various domains like computer vision, natural language processing, playing games with reinforcement learning, etc. Still, quantitative reasoning is a challenging problem for AI. It is not just about following what is present in training data but requires understanding of underlying rules and theorems. 🤔

There are various language models like GPT-3, PaLM and Bloom trained on large data sets considering various kinds of tasks. They are able to give state-of-the-art performance in various tasks but they struggle when it comes to quantitative reasoning. The one-step reasoning tasks are still easy compared to the multi-step tasks where intermediate steps also play an important role.

Taking a step forward in this direction Google published a model Minerva in the paper called “Solving Quantitative Reasoning Problems with Language Models”. let’s look at some of the examples mentioned in the paper.🧐

So here the question is given in text form and the model is required to understand it and provide the answer accordingly. As we can see from both the examples, the model not only provided the correct answers but also provided the correct steps to reach the answer.

The model answers mathematical questions using a mix of natural language and mathematical notation. Minerva combines several techniques, including few-shot prompting, chain of thought or scratchpad prompting, and majority voting, to achieve state-of-the-art performance on STEM reasoning tasks. Minerva builds on the Pathways Language Model (PaLM), with further training on a 118GB dataset of scientific papers from the arXiv preprint server and web pages that contain mathematical expressions using LaTeX, MathJax, or other mathematical typesetting formats. Standard text cleaning procedures often remove symbols and formatting that are essential to the semantic meaning of mathematical expressions. By maintaining this information in the training data, the model learns to interpret it correctly using standard mathematical notation.

Wow, so now AI is able to learn math, and children have a model to do their math homework😅. Really!!! There are various limitations also. Let’s look at an example.

Here we can see the model simply removed the square root🙄 and provided the answer. The model knows how to make its life easy😬. The teacher will provide a big 0 for the answer. It raises the question, is the model really able to learn math? or as the model is trained on the huge amount of scientific data it is directly able to find proof or steps for the standard problems by memorizing it. You can explore more examples here. The Paper also provides some analysis to check the overlap between the problems in test and train settings. Paper claims to have a small overlap and after updating such problem statements, the model is able to provide the correct answers. Also, It claims to have low false positive rate for different test sets. When it is being compared with other state-of-the-art results it is winning by a great margin across the fields.

Definitely, it is a big leap forward in the domain of quantitative reasoning but looking at some examples it feels still we are far away from when the model will be able to build new theorems or will be able to provide proof for new concepts, or able to solve unsolved math problems. So really excited to see further progress in this direction.

How good is AI at Quantitative Reasoning?

Written by Dhaval Parmar