Gemini Pro 1.0 Bard vs. ChatGPT 4.0

Are the benchmarks really true?

Alex Agboola
8 min readDec 24, 2023

This month, Google has released Gemini 1.0. Google claims that Gemini will have a multi-modal system, meaning many data types, which are image, text, speech, and numerical (number) data, are combined with multiple algorithms to perform better results.

While I was looking at the benchmarks of Gemini Ultra:

Gemini Ultra vs ChatGPT 4

I couldn’t help but wonder, is Gemini really that good against ChatGPT, or are the benchmarks not telling the full story?

I am comparing ChatGPT 4.0 with Bard’s new AI, Gemini Pro with 5 key topics to see the truth and why.

1. Natural Language Understanding and Responses

One of Gemini ‘s perks is how it is very good at understanding hard riddles and tricky questions. Google has even made a whole video on it (which many people assume was faked):

Google Gemini “hands on”

I then asked Bard, first, a question:

Three playing cards in a row. Can you name them with these clues? There is a two to the right of a king. A diamond will be found to the left of a spade. An ace is to the left of a heart. A heart is to the left of a spade. Now, identify all three cards.

I got the riddle from a blog, “10 Hardest Riddles To Solve

I also asked ChatGPT 4.0 the same question:

I got the riddle from a blog, “10 Hardest Riddles To Solve

The answer was Ace of Diamonds, King of Hearts and Two of Spades.

Looks like Bard won this one.

But I have two more tricky question for the chatbots to solve.

The second riddle (from the same blog) was,

“A group of campers have been on vacation so long, that they’ve forgotten the day of the week. The following conversation ensues. Darryl: What’s the day? I don’t think it is Thursday, Friday or Saturday. Tracy: Well that doesn’t narrow it down much. Yesterday was Sunday. Melissa: Yesterday wasn’t Sunday, tomorrow is Sunday. Ben: The day after tomorrow is Saturday. Adrienne: The day before yesterday was Thursday. Susie: Tomorrow is Saturday. David: I know that the day after tomorrow is not Friday. If only one person’s statement is true, what day of the week is it?”

To my surprise, both AI chatbots got this wrong!

ChatGPT said:

ChatGPT’s answer was Friday

Bard said:

Bard’s answer was Tuesday

The answer was Wednesday.

Overall, it looks like Bard won this battle. Over the 2 questions, I have noticed ChatGPT breaking the riddles down so I could understand. One feature that has saved Bard is that Bard creates multiple drafts and picks out the best one. Although the second riddle, all of the drafts were wrong, the first draft had only one correct answer and that correct answer happened to be chosen by Bard.

2. Data Processing and Integration Capabilities

This basically means collecting, understanding, and using a bunch of information from different places.

Let’s just say I want to write an essay on Python, the programming language. I can’t find any good sources, so I turn to AI.

Now, both chatbots disclaim that they “May display inaccurate info, including about people, so double-check its responses.” or “Can make mistakes. Consider checking important information.”, but one of Gemini’s strengths is researching and displaying information from Google, so I have to put it to the test.

Don’t worry though, I am a Python expert, so I won’t be fooled by AI.

I told both AI bots,

I need to write an essay on Python, but I can’t find any good sources. Can you please provide information on the syntax of Python, an example project, and why you should code in Python?

Bard responded:

This is the 3rd draft

I have tested the code Bard provided, and it indeed works.

ChatGPT responded:

This was first try

I have tested the code on multiple python code editors and replaced the placeholder website with my portfolio, and it did not work.

The error was in fact,

Traceback (most recent call last):
File "./prog.py", line 2, in <module>
ModuleNotFoundError: No module named 'bs4'

That’s right. ChatGPT didn’t even code the right module. That is misinformation. The last thing I need is having false info in my informative essay.

ChatGPT was more ambitious with its example, and it did provide more information, but Bard was more accurate.

Bard even has a feature where you can double-check Bard’s results using Google. So that’s what I did here:

Green = Google found and is accurate Orange = Google can’t find/may be an opinion

The 2 orange sentences seem true for the most part. Explicit (straight-forward) variable declaration may be true for most people, but not for all. Same for the second orange sentence. A large, active and supporting community might be true for GitHub, for example, but for other platforms, that might not as be true.

I will still hand it to Bard. I feel like, compared to ChatGPT, Bard provides more easy and simple bullet-point information I could use for my essay.

3. Creativity and Content Generation

Another important topic. Although not included in the benchmark, Gemini should have great creativity as shown in their demo:

Gemini’s demo video

ChatGPT is also known for its creativity as well, making it easy for AI detectors.

I asked both of them:

Tell me a joke about AI that a 4-year-old would understand, only 7 words, and only letters A through I.

This sentence not only has a topic I would say wasn’t the easiest, it also has constraints, as mentioned in the prompt.

Bard joked,

First Draft

ChatGPT cracked,

This is the first result

Both of these jokes are terrible, but obviously, I think Bard made a better joke, but didn’t listen to the constraint I gave it. Said starts with an “S”, and I said only letters A though I.

ChatGPT’s joke was worse, but satisfied the constraints I gave it for all 3 tries I did.

It is pretty hard to say “Who was more creative” with these kinds of responses, so I give them one more prompt:

Display the most creative HTML code you can do.*

*This is not the exact prompt I gave

Bard coded:

Bard’s Website (There was even a trail of various colored circles from my mouse!)
Bard’s code

ChatGPT programmed:

ChatGPT’s Website
ChatGPT’s code

It is another clear winner. Bard wins this round. I do have to give it to ChatGPT for making the website more polished and professional compared to Bard’s “testing my code” project, but Bard created animations and non-default fonts, which look bad, but it’s the thought that counts.

4. Learning and Adaptability

How quickly both chatbots can adapt to different user preferences.

Let’s just say I need help on a project, but normal, bots won’t do it. I want a personal assistant to guide me through my project, so I created this prompt:

I want you to be my personal code assistant. I am working on a project, and I need you to 1) help me fix my bugs, 2) suggest more features/ content and show me how to append them into my code, and 3) give me overall outstanding feedback and tips, like a real expert was right next to me.

ChatGPT 4 indeed already has a feature, where you can “create your own GPT” with prompts like these, but we are testing ChatGPT’s ability, so we won’t be needing this GPT builder.

Bard wrote:

My AI Code Assistant!

ChatGPT typed:

My Second Code Assistant!

ChatGPT did get to the results I wanted faster, but the answers were vague. Bard did take 3 prompts to get to want I wanted, but the results were more direct towards my needs. Bard did steer away a bit on the 3rd result, so I hand this to ChatGPT. Yes, ChatGPT won this round.

5. Handling of Complex Scenarios and Problem Solving

Similar to number 1, handling of complex scenarios and problem solving means to solve hard problems, but in this cause, math.

One of Gemini’s features is its great problem solving skills against ChatGPT 4. So, let’s put it to the test.

I found a hard math problem online to solve,

There are many trios of integers (x,y,z) that satisfy x²+y²=z². These are known as the Pythagorean Triples, like (3,4,5) and (5,12,13). Now, do any trios (x,y,z) satisfy x³+y³=z³?

and Bard answered:

Answered: NO

ChatGPT replied:

Answered: NO

Both bots are correct! Let’s try another question:

SAT test question from the article, “The 15 Hardest SAT Math Questions Ever

Bard guessed:

k= -3

ChatGPT shows:

k < 2 and k > -1

ChatGPT tried to play the safe route by responding with an inequality, but it still got it wrong. Bard won once more. Bard not only solved more efficiently, but it also searched the internet to back up its results. You could say that’s cheating, but ChatGPT 4 could do that too.

The Truth

Although Gemini hasn’t officially been released yet, we have seen Gemini 1.0 already beat ChatGPT 4.0. Gemini 1.0 does not have the multi-modual feature Google has been droning about in their trailers and articles, ChatGPT has that, actually!

Conclusion

Gemini has not been released yet, so we can only hope for the AI to have the features Google has worked on.

AI isn’t perfect, as we have seen many times. So, seeing major improvements like these is a great head-start into what chatbots and AI have to offer.

--

--

Alex Agboola

I am a young full-stack developer with over four years of experience. I have a great interest in AI. I also like to write about what I think is right.