Is “the bigger the better” true for LLMs? How small can they get?

Sensory's Desk
Sensory Perspectives on AI
3 min readJul 2, 2024

With the introduction of Apple AI and on-device Chat GPT there are growing conversations about small language models that run on the edge. I’ve seen them referred to as SLM’s which I find somewhat humorous, as SLM used to be a Statistical Language Model.

That’s right, language models have been around for a long time. Sensory has been building them since the early 2000’s! What changed wasn’t as much the formula but more the expansion going deeper and wider with larger data sets, more memory, and increased processing power.

And size does make a BIG difference. Today’s LLMs exceed trillions of parameters with tens or hundreds of terabytes of data. And they are amazing at conversational capabilities. But they don’t run on device, making them less private, more costly, and impractical without a strong data connection.

The general-purpose use, conversational memory and dialog, and broad expertise are not possible to replicate on device, where practically speaking you would need orders of magnitude decreases in size. However, it is possible to have a very sophisticated domain-dependent system that runs in as little as 35MB, including wake word, speech-to-text, and NLU.

Many people don’t realize this is possible and are amazed when they see Sensory’s demonstrations of what can be done with minimal memory on an old Android phone.

Let’s take a look at what works well in this small platform and where sacrifices need to be made (creating our challenges for tomorrow!).

The Great Small Stuff

- Wake word. Sensory’s wake word technology is the best in the world, is available in any language, is supported by all the most popular platforms (OS’s and Chips/DSPs) and can range in size from 20 KB to 1 MB.

- Speech-to-Text. Sensory’s end-to-end acoustic models can be done with as little as 15 MB of memory. Domain accuracy can be added with small language models that add a small amount of memory.

- NLU and Actions. Intents and entities can be pulled out using a tiny domain-specific LLM that’s as small as 15MB. We can even accept multiple intents in a single phrase. A simple lookup table to assign results to actions is added requiring very little additional memory

- No Hallucinations! One of the nice things about going small with an edge voice assistant is that responses tend to be more premeditated so hallucinations can be completely avoided

As an example in this video, Sensory uses a 16MB acoustic model, a 2 MB language model, and a 15 MB NLU model with a less than 1 MB wake word! The total memory is about 34 MB, yet it can recognize an unlimited number of words and sentences with intents and entities only restricted by the domain of use.

The Challenges of Being Small

- Broad expertise. The above approach to a domain-specific voice assistant works great for a specific domain of expertise, but if someone speaks out of domain, it will require a different model to respond intelligently and informatively. This could be a cloud based ChatGPT/Gemini/Llama type solution or a smaller on-device model that still requires many hundreds of MB of memory.

- Dialog and memory. The smaller domain-specific models can handle simple responses for confirmations but cannot carry on a dialog with memories going back to much earlier sentences.

Giant models and tiny models each have their own unique advantages. If you are thinking of building an edge-based voice assistant or just need a great technology like a wake word or speech-to-text that’s tiny and accurate, then think Sensory!

--

--