Apple’s Pivot To Small AI

Andrew Zuo
Published in
7 min readMay 6, 2024

--

I wrote a piece a while ago titled Have Large Language Models Gotten Too Big? mostly in response to Facebook’s Llama 3 model. I pointed out that it was really interesting that Facebook chose to stick with 8 billion and 70 billion parameter models because that’s a lot smaller than the flagship models from Anthropic, Google, and OpenAI:

And that would create an interesting implication. There are only 8 billion and 70 billion parameter versions of these things. There’s no trillion parameter version. Is Facebook saying that they don’t need a trillion-parameter version? Or said another way, have large language models gotten too big?

And then I proceeded to do some back-of-the-napkin math. These models only stay relevant for a year or so and take millions of dollars to train.

So if you only have a chance of staying relevant for a year how many requests do you have to have people make a day? Over 2.7 million. Yeah, that’s pretty unlikely. Especially with people using cheaper models like Gemini 1.0 Pro, GPT 3.5 Turbo, and Claude 3 Haiku. I know that’s what I’m doing. And these models, due to having fewer parameters, are also significantly faster.

I almost wanted to call these large trillion+ models a dick-measuring contest because it is very unlikely that these models will ever recoup their initial investment.

--

--