Battle of the TOP — LLama 3, Claude 3, GPT4 Omni, Gemini 1.5 Pro-Light and more
Since everything seemed to calm down until the next second at least, let’s compare the latest TOP models considering multimodality, abilities, context, performance and price.
MULTIMODALITY
Now, image is common in commercial models today, except for LLama 3 all of these have it (META promised a multimodal launch later this year).
Gemini 1.5 both versions and GPT 4 Omni stand out for being able to process audio and video(sort of, snapshots of it).
And right now only GPT 4 Omni has all in and out of these modalities, although they are not available in their api today with promise later this year.
CONTEXT LENGHT
Now we have the bigger context window with Gemini with 2M tokens for beta testing for full release later, followed by Claude 3 with 1M. GPT 4 continued with 128K and LLama had an increase to 8K.
The actual use shows a score based on the paper RULER: What’s the Real Context Size of Your Long-Context Language Models? That tells us how much of this context is actually used when we are at the limit. It is interesting to see that while Llama 3 has a smaller window it has a very optimized use of it.
BENCHMARKS
TEXT
Analyzing the scores everyone seems to be on the same level, now that makes GPT4o and Gemini 1.5 Flash quite impressive since they respond very fast even with all their added capabilities contrasting with others that perform well while costing a latency in response.
LLama 3 of course is also very impressive considering its size compared to others and its very on par scores.
VISION
Again, everyone seems to be in the same range.
PRICE
Here we have the price of these models, since GPT4 Omni today does not have the differential capabilities from others available it’s a faster version of GPT 4 Turbo with alternatives on Gemini 1.5 and Claude 3 for vision and all of those for text.
We can see the GPT 4 Omni is still very expensive right now, only losing to Claude 3 Opus.
And finally we have LLama 3, Claude 3 Haiku and Gemini 1.5 Flash with the best performance/cost prices, since they can do most intermediary/simple tasks.
I hope these compilation of data help you guys in choosing the better performance/cost model for your solution.
I intend to do a comparison of benchmark of recent open-source models as well in the future.
OBS: I did not make benchmarks of audio and video because we did not yet have access the technical report of GPT4 Omni, when those are released we analyze them.