The Groundbreaking Arrival of Claude 3

Everything you need to know about Anthropic’s new Claude 3 models

Chandler K
The AI Archives
Published in
5 min readMar 12, 2024

--

March 4th marked a major milestone in the AI world with the groundbreaking arrival of Claude 3. This new family of models will shake the AI ecosystem and could potentially result in a new company being at the forefront of AI.

So far this release has caused a stir in the industry when Anthropic (the creators of the Claude models) released the performance metrics and highlighted how it compares to other state-of-the-art products like OpenAI’s GPT-4. Based on Anthropic’s blog post, Claude 3’s Opus model out performs GPT-4 in EVERY common AI benchmark test.

This article will explore the following:

  • What is Claude?
  • How is it different from older models?
  • Why does this new model matter?
  • Real world use cases
This chart shows the results of 10 industry standard AI benchmark tests that have Claude 3’s Opus model outperforming every other Generative AI model.

What is Claude 3?

In the simplest terms, Claude 3 is a “family” of Large Language Models (LLM) that are meant to answer user’s queries in a natural and conversational manner, its an alternative to ChatGPT and the GPT-4 model. Just like OpenAI’s GPT-4, Claude 3 is the latest iteration of Anthropic’s models which built upon and improves the older Claude 2. There are actually three different versions of Claude 3 that have been released: Haiku, Sonnet, and Opus. Each versions is progressively more advanced and powerful than the last. This is meant to allow users to choose the correct model given the specifics of their use case. Each model differs in price, speed, and “intelligence”. Anthropic has stated that both Sonnet and Opus are currently available worldwide through their API and claude.ai (their version of ChatGPT). While no release date has been published, Haiku is expected to be made available in the next few weeks. Claude 3 is primed to redefine how we engage with AI by offering enhanced practical applications for everyday tasks while setting new standards in innovation and adaptability.

“Opus, our most intelligent model, outperforms its peers on most of the common evaluation benchmarks for AI systems, including undergraduate level expert knowledge (MMLU), graduate level expert reasoning (GPQA), basic mathematics (GSM8K), and more. It exhibits near-human levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence.” — Anthropic’s Claude 3 announcement

Claude 3 Improvements

As previously mentioned, Claude 3 boasts numerous improvements over Claude 2. We’ll break this down by version:

Haiku:

  • Speed: Can understand and respond to a query in ~3 seconds, meaning that this is one of the fastest models currently available for its “intelligence” level

Sonnet:

  • Speed: Most responses will be 2x faster than Claude 2.

Opus:

  • Speed: While this model’s speed hasn’t been improved, it’s important to note that it hasn’t increased even though the reasoning (intelligence) has improved.
  • Accuracy: The model is now 2x more accurate (than Claude 2) when answering complex questions. This model also shows a noticeable drop in incorrect answers (hallucinations).
  • Recall ability: Can recall over 99% of its 200k context window.

General Improvements:

  • Increased contextual understanding: All Claude 3 models can now correctly interpret a user’s prompt and can more accurately determine if it goes against Anthropic’s guardrails.
  • All three models have a base context window of 200k tokens, but can reach over 1 million tokens in certain instances.
  • Ethical responsibly: Providing accurate and verifiable information is the goal of all LLMs and Claude 3 is no different. Steps have been taken to reduce misinformation, election interference, and more.
  • Multi-step directions: All models have improved abilities to complete complex multi-step prompts.
  • Output formats: Formats like JSON have been improved to be a more reliable output type.
This graph shows the breakdown of responses when the model is given a Complex Question (that has a factual answer). The drop in both “Incorrect” and “Unsure” responses is roughly 20%.

“All Claude 3 models show increased capabilities in analysis and forecasting, nuanced content creation, code generation, and conversing in non-English languages like Spanish, Japanese, and French.” — Anthropic’s Claude 3 blog post

The Influence of Claude 3

Why does Claude 3 matter? These advanced models hold remarkable potential for everyday people. Whether as a direct user of Anthropic products or by utilizing services that are integrated with their API. The impressive performance of Opus means the challenges and pitfalls of current AI models are being not only addressed but removed entirely. In particular, a near 20% increase in Math problem- solving (when compared to GPT-4) means we will see new applications of LLMs in the coming months. Claude 3’s ability to process and understand other forms of data like photos, charts, graphs, and technical diagrams also surpasses GPT-4. This update will allow users to interact with their own files and receive meaningful responses that are specific to them. Overall, Claude 3 is yet another step toward perfecting human-level understanding and responses through conversation in LLMs.

The Impact of Claude 3: Real-world Use Cases

Like other Generative AI models, Claude 3 can be utilized in a wide range of fields. From customer service to education to personal assistants, LLMs have become commonplace in 2024. Claude 2 is already being used by thousands of companies worldwide and we can expect most of them to start using the Claude 3 models to ensure the best quality. Below are a few examples from Anthropic’s blog and elsewhere:

  • Keymate.AI: Is using Claude 2 to read large (50+ page) PDFs and allowing users to interact with these uploaded documents. They have integrated Claude 2 with OpenAI’s GPT-4 to create a product that utilized the best models in AI.
  • LexisNexis Legal & Professional: is a “leading provider of information and analytics” across 150 countries.
  • Lonely Planet: This travel guide company uses Claude 2 to ensure customers can find and experience unforgettable vacations.

There are also ways that Claude 3 can be used in education. Whether for activity / curriculum planning or acting as a personal tutor, the increased intelligence, speed, and context window ensures that better responses are shared. Claude 3’s potential to expand access to AI tools highlights significant promise for individuals and small businesses, empowering them to utilize powerful solutions that were previously out of reach.

--

--

Chandler K
The AI Archives

Harvard, UPenn, prev NASA , writing about AI, game development, and more..