Gemini Pro vs. GPT-3.5: Another Evaluation, Another Conclusion
Changing hyperparameters is all you need
When Google announced Gemini, they presented an evaluation showing that Gemini Pro, the best version of Gemini currently available through Google’s API, significantly outperforms GPT-3.5 on some benchmarks.
We don’t know much about this evaluation. Many parameters, such as the prompts and the decoding hyperparameters, have not been disclosed but we know that they have a huge influence on the final results.
To better understand how Gemini is better than GPT models, the NeuLab of CMU performed a new evaluation on a much larger number of tasks:
- Knowledge-based question answering (MMLU)
- Reasoning (BIG-Bench Hard)
- Math (GSM8k, SVAMP, ASDIV, MAWPS)
- Code generation (HumanEval, ODEX)
- Translation (FLORES)
- Web Instruction Following (WebArena)
They ran Gemini Pro, GPT-3.5 Turbo, GPT-4 Turbo, and Mixtral on these benchmarks using the same prompts and the same decoding hyperparameters for all of them.