Web Browsers as a New Home for Language Models?

Andrii Chumak
GoodData Developers
5 min readMay 6, 2024

There has been a lot of buzz around small language models (SLMs) lately. Microsoft has just released Phi-3, while Apple has published its small model and a training framework on Hugging Face, both boasting very high scores given the model size. Certainly, SLMs do not have the same reasoning skills as large language models (LLMs), but many applications do not require advanced AI reasoning. Given that SLMs can run on off-the-shelf hardware, you can see why many companies would opt for these to maintain better control over the model and the data it accesses. Heck, Apple claims their models can run on a phone, which is reportedly what the next version of Siri will utilize under the hood.

But how small can you go before a model loses its usefulness? Can it be small enough to run in a web browser?

As we push forward with AI adoption at GoodData, we uncover more and more use cases where language models can help. One application would be to build a UX Toolbox — a set of AI-enabled utilities that our frontend engineers can use ad hoc. If there is a search field on the page, it might as well be a semantic search. If you have a text area, why not autocomplete the text as the user types? You get the idea.

This is a perfect opportunity to test how useful a tiny model running in a web browser can be. You can try it out on your own with this demo I’ve built, or you can extend it with more models and use cases by forking my PoC code on GitHub.

Use cases

In the abovementioned PoC, I’m testing two use cases: semantic search and typeahead.

I’m using metadata from GoodData’s demo workspace for the semantic search dataset. All metadata objects (like metrics, visualizations, etc.) are embedded as separate documents and cached. Then, whenever a user enters a search term, the term is embedded, and the similarity score for each metadata object is computed. The results are listed in a table with the most relevant items.

Semantic search

For typeahead, I’m passing the user’s prompt to the model to suggest how the sentence should be completed. This is a very naïve approach with no model fine-tuning, so I’m not expecting high-quality suggestions, but it should still give me a good idea of how feasible such a solution would be.

Typeahead

Feasibility

When talking about feasibility, there are two areas to consider.

First, the model should have a small enough footprint to be loaded into the client machine memory and cached for subsequent usage. It’s not uncommon to see a modern progressive web app that consumes 300–400 MB. Some can even eat up a whole GB (I’m looking at you, Jira). So, unless AI is the web app's main and only feature, the model's reasonable size should be under 100 MB. Luckily, plenty of such micro models are available on Hugging Face for us to try.

The models I’m using in PoC (quantized versions):

Second, the model inference must be fast enough. There is little point in an on-device model if it computes slower than a network request to a hosted model would take. We are again in luck here. Both Hugging Face and Tensorflow have WebAssembly libraries. There are even tools capable of running the model with WebGPU — a new Chrome API currently available in beta. Depending on the model and use case, a single computation should take from 20 ms for semantic search to up to 1.5 s for typeahead. Both numbers are much faster compared to a larger hosted model execution plus network overhead.

Inference quality

Is it any good? You might have noticed from the screenshots above that it’s not. At least not without fine-tuning.

Semantic search does not look all that bad. I do believe that with fine-tuning, it can be useful, especially if combined with a non-AI full-text search. Traditional search manages exact and partial matches, while semantic search deals with synonyms. However, a limitation of small language models (SLMs) is that they struggle with typos and translations, unlike large language models (LLMs), and this issue can’t be resolved through fine-tuning.

The situation is worse with typeahead. It would take a lot of effort to make it work reliably, and even then, there is a high chance of getting to the “uncanny valley” when suggestions seem to make sense at first but are not valid if you consider the workspace context. If I had to make this work, I’d try the following:

  • Find a small model with good “core” English and fine-tune it on data analytics questions. Make sure it only suggests a few words at a time and not a complete sentence.
  • Use that model as a base and further fine-tune it with synthetic data based on a specific workspace to take into account specific facts, metrics, and attributes available.
  • Ideally, I’d need to calculate a confidence score and only show the suggestion if the score is high enough.

Overall, a larger and more capable model would probably fit better for this task, even if it would be somewhat slower.

Conclusions

At GoodData, we are planning to host a mid-sized SLM or connect to an LLM to cover various use cases, and we intend to employ a vector database as well. We can also use it for semantic search and typeahead. Despite this, I am still captivated by the idea of running a tiny model directly in the user’s browser. The suitability of this approach depends on the specific use case, and I can imagine many scenarios where it would be sufficient. At the very least, we know it is technically feasible.

With the advancements in edge AI, who knows? Perhaps in a few years, we’ll have a new, nice, shiny WebLLM API as a standard part of browser APIs! One can dream…

Want to Learn More?

If you’d like to discuss any of our ideas regarding AI (or anything else), feel free to join our Slack community!

Or, if you’d like to try our new experimental features (AI, Machine Learning, and more), feel free to sign up for our Labs Environment.

--

--