ChatGPT and Math

Dhaval Parmar
5 min readFeb 3, 2023

--

ChatGPT has the fastest-growing user base. People are trying various kinds of experiments for generating emails, poems, code, and whatnot. Just after it was released I experimented with the tool and I have mentioned my experience in my blog here. After that also almost every day I am trying something new with the tool. There are many viral examples showing ChatGPT fails at math. Recently on 30th Jan 2023, OpenAI released the latest version with the message that the tool is updated for mathematical reasoning. So I tried some general examples given below.

Here is the problem of finding the average speed. As we can see it is able to understand the problem and using correct formulas it is giving the correct answer. 🎉

I tried differentiation on the fairly complex function. It is able to find the rule required to be used and also the differentiation of polynomial and trigonometric functions. I got a bit impressed with this but at the last step, it made a small mistake x³*(1/x) should be x².

The tool is also aware of various probability distribution functions. It is able to pick the correct function based on the problem but made mistakes in calculation. 0.4⁵=0.010 and multiplication is also wrong. It is also able to find the probability for the tail based on the head but made similar kinds of mistakes again.

So far so good. I am quite sure, it is able to solve many other kinds of math problems also. I have also reviewed other AI models for mathematical problems and the most important thing for complex problems or proof is the chain of steps. So I also tried some problems on which Google’s Minerva model is doing really well as shown in this blog

And it is done 😂. It is not able to find the proper formula and realize the problem is self-sufficient. Minerva did well with this.

Now I asked to prove an inequality that requires a chain of proper steps. It started well but it get lost in between steps. Did you notice? (ab)² suddenly became ab², randomly -a²b² term added on the left side in the second last step, not sure how the last step is valid. Also, just a is not equal to b so (a-ab)(a+b) can’t be negative, Really!!! and not sure how it proves the given statement 😂.

Here is an example in which Minerva also failed. Again it started well but suddenly 81 became 162 (by expansion 😅). Then n disappeared due to some magic trick. After these still, you are expecting the right answer???

I also asked a different version of the car problem and it messed up. Relative speed is correct but 100/60=1 😵 and not sure how the final answer came.

Also seems like ChatGPT doing well with trigonometric functions, differentiation, and integration but not sure about the actual numerical value of that functions. Cos(3*pi/4)=2 ???

I have seen ChatGPT doing really amazing things in n-number tasks but when it comes to math it is still struggling even after the update. It is not that straightforward, many language models doing well in other task has failed when it comes to reasoning or math problems. It will be exciting to see what the future updates of ChatGPT will bring for us.

Update 13/02/2023

After getting many wrong answers to math problems, I wanted to try prompt engineering to direct ChatGPT for the correct answers. Surprisingly, just regenerating the response worked for many. I am not sure if the model is continuously getting updated based on the feedback but it is doing really well now.

As you can see, even if this version of the car problem is difficult but the model is able to understand and solve it step by step. As you can see input text is almost similar.

Now the model is able to prove the statement with proper steps without adding random variables.

Also now the model is not applying any magic tricks and is able to find the value of n correctly

Unfortunately, still it failed to find the right formula for variance and provided the wrong answer for the given problem.

Just because of my interest, I tried the physics problem of parabolic tractory and it is able to find the right formula and answer.

I strongly feel something has been updated but still previously also I have experienced answers getting changed a lot when we regenerate responses with the same input. It is based on output ranking generated by a different AI model. I personally feel it makes the evaluation and deployment of such models in production really challenging. When we talk about MLops, reproducibility is an important part of it. We should be quite sure that model in production will produce the same answer as in the research or development environment. Here due to the randomness of the model, it is challenging. We can adjust the temperature parameter in the OpenAI playground to make the output more deterministic but with ChatGPT we do not have that functionality yet. Such models are directly useful for applications like search, and customer service where there is a human in the loop and can ask the more clear question again if do not receive a proper answer. To scale applications, there is a strong need to make output deterministic. Let me know your thoughts, excited to hear from you.

--

--

Dhaval Parmar

Trying to understand the physical world better with numbers. Data Scientist https://dhavalparmar61.github.io/