AI: Meta gears up with Llama 3.1 405b. RTZ #427

Michael Parekh
6 min readJul 25, 2024

--

… the biggest open source LLM AI model to date

Meta shifted gears again today, with its industry leading, open-source Llama 3 model being released in its largest size yet. In the much desired, 405 billion parameter size. I

t’s a release that’s been long anticipated, and offers unlimited large scale AI applications by users, businesses, and enterprises small and large. As Meta themselves say it: “Introducing Llama 3.1: Our most capable [LLM AI] model to date”:

“Llama 3.1 405B is the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation. With the release of the 405B model, we’re poised to supercharge innovation — with unprecedented opportunities for growth and exploration. We believe the latest generation of Llama will ignite new applications and modeling paradigms, including synthetic data generation to enable the improvement and training of smaller models, as well as model distillation — a capability that has never been achieved at this scale in open source.”

They’re also upgrading the smaller Llama 3 models:

“As part of this latest release, we’re introducing upgraded versions of the 8B and 70B models. These are multilingual and have a significantly longer context length of 128K, state-of-the-art tool use, and overall stronger reasoning capabilities. This enables our latest models to support advanced use cases, such as long-form text summarization, multilingual conversational agents, and coding assistants.”

And they tweaked the open-source terms of service:

“We’ve also made changes to our license, allowing developers to use the outputs from Llama models — including the 405B — to improve other models. True to our commitment to open source, starting today, we’re making these models available to the community for download on llama.meta.com and Hugging Face and available for immediate development on our broad ecosystem of partner platforms.”

All this is important since increasingly these LLM AI models are being used to create ‘synthetic data’ for other models and applications.

The Meta announcement then goes into a range of benchmark performance evaluations that compare Llama 3–405B against OpenAI’s GPT-4 in various forms, and Anthropic’s Claude 3.5 Sonnet, which I’ve discussed. And of course LLM AI models from Google, Mistral, Nvidia and others.

“For this release, we evaluated performance on over 150 benchmark datasets that span a wide range of languages. In addition, we performed extensive human evaluations that compare Llama 3.1 with competing models in real-world scenarios. Our experimental evaluation suggests that our flagship model is competitive with leading foundation models across a range of tasks, including GPT-4, GPT-4o, and Claude 3.5 Sonnet. Additionally, our smaller models are competitive with closed and open models that have a similar number of parameters.”

An important element in the larger sizes of the LLM AI models are the ‘tokens’ it’s trained on, signifying the Data inputs that go into these models as they’re built from the ground up:

“As our largest model yet, training Llama 3.1 405B on over 15 trillion tokens was a major challenge. To enable training runs at this scale and achieve the results we have in a reasonable amount of time, we significantly optimized our full training stack and pushed our model training to over 16 thousand H100 GPUs, making the 405B the first Llama model trained at this scale.”

There’s a lot more detail in these release that is worth reviewing. The Verge has more details on this model vs competitors worth reading. And some notable consumer facing ‘Meta AI’ applications with these models that may appeal to Meta’s billions of Instagram/Facebook/WhatsApp users:

“A new “Imagine Me” feature in Meta AI scans your face through your phone’s camera to then let you insert your likeness into images it generates. By capturing your likeness this way and not through the photos in your profile, Meta is hopefully avoiding the creation of a deepfake machine. The company sees demand for people wanting to create more kinds of AI media and share it to their feeds, even if that means blurring the line between what is discernibly real and not.”

Bloomberg goes onto describe Meta Founder/CEO Mark Zuckerberg’s “Aims to rival OpenAI, Google with new Llama AI Model”:

“The Meta CEO defends both his open source strategy and a massive investment in artificial intelligence.”

“The new model released Tuesday, called Llama 3.1, took several months to train and hundreds of millions of dollars of computing power. The company said it represents a major update from Llama 3, which came out in April.”

“I think the most important product for an AI assistant is going to be how smart it is,” Zuckerberg said during an interview on the Bloomberg Originals series The Circuit with Emily Chang. “The Llama models that we’re building are some of the most advanced in the world.” Meta is already working on Llama 4, Zuckerberg added.”

And the company seems to continue to invest in the next generations of these models, along with the AI GPU chips, data center and power infrastructure needed, with the attendant focus on security required:

“Meta’s investments in AI have been steep. Zuckerberg said that Meta’s Llama 3 models cost “hundreds of millions of dollars” in computing power to train, but that he expects future models will cost even more. “Going forward it’s going to be billions and many billions of dollars of compute” power, he said. Meta in 2023 tried to reign in some of its spending on futuristic technologies and management layers, cutting thousands of jobs in what Zuckerberg dubbed the “year of efficiency.” But Zuckerberg is still willing to spend on the AI arms race.”

“I think that there’s a meaningful chance that a lot of the companies are over-building now, and that you’ll look back and you’re like, ‘oh, we maybe all spent some number of billions of dollars more than we had to,’” Zuckerberg said. “On the flip side, I actually think all the companies that are investing are making a rational decision, because the downside of being behind is that you’re out of position for like the most important technology for the next 10 to 15 years.”

“After all the investment, Meta makes the technology behind Llama available for the public to use for free, so long as they adhere to the company’s “acceptable use policy.” Zuckerberg hopes the open-access strategy will help make the company’s work the foundation of other successful startups and products, giving Meta greater sway in how the industry moves forward.”

Also worth reading in full is Mark Zuckerberg’s “Open Source AI is the Path Forward”. In particular, the argument that the US can compete with China better with open source AI, is one that hits home. Both in ‘Big and Small AI’. Especially as I’ve written often in these pages, that the US/China ‘Threading the Needle’ tug-of-war, is the pivotal headwind for this AI Tech Wave.

As I’ve recounted before, Meta’s focus on open-source LLM AI models in this AI Tech Wave is industry leading. Along with related AI GPU infrastructure, Meta continues to execute on a long-term differentiated strategy to ‘throw sand in the gears’ of its peers and competitors.

And for now, the company continues at an industry-pace setting clip. Stay tuned.

(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)

(You can also subscribe to my Newsletter “AI: A Reset to Zero” for free on Substack for more content like this.)

--

--

Michael Parekh

Investor in tech-driven business resets. Founded Goldman Sachs Internet Research franchise in 1994. https://twitter.com/MParekh