Gemini Pro vs. GPT-3.5: Another Evaluation, Another Conclusion

Changing hyperparameters is all you need

Benjamin Marie
3 min readDec 28, 2023

When Google announced Gemini, they presented an evaluation showing that Gemini Pro, the best version of Gemini currently available through Google’s API, significantly outperforms GPT-3.5 on some benchmarks.

We don’t know much about this evaluation. Many parameters, such as the prompts and the decoding hyperparameters, have not been disclosed but we know that they have a huge influence on the final results.

To better understand how Gemini is better than GPT models, the NeuLab of CMU performed a new evaluation on a much larger number of tasks:

  • Knowledge-based question answering (MMLU)
  • Reasoning (BIG-Bench Hard)
  • Math (GSM8k, SVAMP, ASDIV, MAWPS)
  • Code generation (HumanEval, ODEX)
  • Translation (FLORES)
  • Web Instruction Following (WebArena)

They ran Gemini Pro, GPT-3.5 Turbo, GPT-4 Turbo, and Mixtral on these benchmarks using the same prompts and the same decoding hyperparameters for all of them.

--

--

Benjamin Marie

Ph.D, research scientist in NLP/AI. Medium "Top writer" in AI and Technology. Exclusive articles and all my AI notebooks on https://kaitchup.substack.com/