Consumer AI is Ripe for Centralization
One of the greatest ironies of the last few months was when we found out that decentralized finance (De-Fi) was, in fact, centralized.
De-Fi’s hostility towards the Cathedral was not entirely founded on the technology itself. Although blockchain-backed financial assets don’t need to have any trusted centre of control to work, our desire for convenience and The Way We Do Things™ almost guarantees that such institutions will emerge and justify their existence in the face of contrary ideals.
This is not the same for Consumer AI, a space practically invented by OpenAI’s productizing of their generative models. The attraction to these new intelligent tools, plus the genuine desire to leverage their capabilities (and the fear they’ve inspired) have come together to create an actual market that has a chance of changing the way we live and work. With this kind of attention comes the debate about where power should lie: with the core or at the edge?
Here, the current state of the technology practically demands that the power lies with the core.
ML is Expensive
Machine learning models are easily amongst the heaviest components of any software architecture. Even the simplest, low-level models that perform specific functions like sentiment analysis and facial recognition require a hefty payload when deployed. Creating such models in the first place presents a bit of a barrier because of how much data they need to learn their functions.
The models that have all the attention (hah!) today — Large Language Models, Diffusion Models — require even more processing power to train to be useful, and they’re prohibitively resource hungry at inference-time. Google’s PaLM maxes out at 540 billion parameters, a little over three times as massive as the largest GPT-3.
In spite of the recent gains in efficiency and proof that fine-tuning can teach a lightweight model new tricks, the fact remains that their performance is subpar compared to the state-of-the-art hosted by the folks at OpenAI and their well-endowed competition.
What’s more, LLMs get out of date quickly when it comes to knowing about the current state of the world: expectation about their capabilities is high, but current technology decisions make them fall just short of that. An evergreen LLM must be aware of any change to external information in real-time, and it must incorporate that new information into itself. It must be connected just like we are, consuming new data as its produced. Just like we are.
I’m convinced the pressures of the market will get them there, and the economic realities of maintaining such a beast will increase the advantage the core has over the edge.
The Edge Hit Back
Days after Facebook graced us with that LLaMA leak, Inference at the Edge became a possibility for anyone with only an M1 Macbook; no need to sport a beefy third-party GPU or have 120 GB of RAM “just lying around”. Even I benefitted from these gains, and I’ve been playing with expensive models since. And thanks to projects like TransformersJS, you can even run GPT-2 in your browser!
For all the gains they’ve made, these consumer-friendly generally intelligent models are far from impressive. For anyone not actually interested in them (which means most of us who care more for the high-quality output we were promised) they’re simply not worth it. Far better to make an API call to OpenAI than to roll your own hefty models on-device.
And when you consider that ChatGPT has been switched to evergreen versioning (as at 19th March, the listed version was “March 14”), it’s not hard to imagine that someone figured out the business value (and competitive edge) continually updating a model will provide. This puts additional pressure on the edge to compete. I don’t see it keeping up.
Room for Good-Enough AI
But does Inference on the Edge need to be globally aware? I’d argue that it’s not necessary. There is a place for good-enough AI, as there is a place for highly connected super artificial intelligences.
Current tech depends on ingesting volumes of data in order to perform meaningfully. Even in the case of fine-tuning, there’s still a not insignificant amount of data a model will have to consume to be good at some specific task or information domain. Even in the case of in-context learning at inference time, a capable model cannot know what it has not been told. The same applies to humans, for what it’s worth.
There are use-cases where good-enough AI will excel — where the confidentiality of the necessary data is more of a priority than how cutting-edge the tech is. And perhaps that’s where this will shine.
On-device ML has been a thing for a couple of years now. Citing privacy, Google and Apple have done a lot to bring this to their mobile platforms, and their ease of integration makes deploying these almost trivial. We’re yet to see similar gains with our most impressive models, but this could be round the corner.
A New Hope
What’s more important as this field gathers pace is the work being done to make these generally intelligent models more efficient. The progress LLaMA exhibited was drastic, and Stanford Alpaca probably sent shockwaves throughout the community because of what they did with what little they had.
Even OpenAI admits it in its papers that the resources needed to build more capable foundation models is a concern, and better zero-shot learning is an ideal to be pursued in further research.
I suspect more work will be done in the corporate and academic research spaces on how to reduce the resource requirements to train and use the most capable models. The open source community will spur this on so that actual consumers will be able to them in private.
Until this becomes a reality, calling a pay-as-you-infer API will be the go-to way of consuming AI, in spite of the risks.
If you enjoyed this, let me know.