Local LLM in the Browser Powered by Ollama (Part 2)

4 min readJan 12, 2024

ChatGPT 4 prompt: “Create an image of Lumos. Lumos can parse any webpage with custom parsing logic. Lumos can also be configured with custom content chunking for RAG document indexing. Lumos is a reference to a spell cast by a wizard to illuminate a dark area. The image should be generated in the style of Salvador Dali.”

This post is a quick follow-up to my previous article: Local LLM in the Browser Powerd by Ollama. It’s highly recommended to read the previous article before proceeding to this one.

The Return of Lumos! 🪄

I previously described two shortfalls that made Lumos inconsistent and inefficient. Because of Lumos’s simplistic content parsing logic, the application would perform well on basic webpages (i.e. mostly static HTML), but perform poorly on complex ones (i.e. pages containing navigation, ads, etc). This made the experience of using Lumos across a variety of different websites quite unsatisfactory. Secondly, the implementation used static content chunking parameters (chunkSize and chunkOverlap), which made content indexing slightly inefficient and likely suboptimal.

Since the launch of Lumos, a new update has been made to address these two issues specifically. The new approach for custom content parsing resolves both previously stated challenges and paves a path for potentially new functionality. The design of the update strikes a balance between ease of use, ease of development, and exceptional functionality, while also keeping the size of the Chrome Extension package relatively small.

Custom Content Parsing 🎨

Too much time was spent trying to bypass Chrome’s Manifest V3 security features. I was convinced there was a way to download and execute remote JavaScript (i.e. eval) but rest assured, there is no way. After noodling on an old idea for too long, I finally realized two insights that helped me move past my original thinking.

Parsing the content of a website does not need to be perfect because LLMs are especially good at ignoring irrelevant and extraneous content.
Most web pages have only a handful of relevant “blocks” or pieces of content.

These two points change the goal of parsing all content on a page perfectly to parsing some content on a page imperfectly. The difference is subtle, but enormously contrasting in technical implementation.

Query Selector

Instead of writing custom functions to parse every website, the new approach leverages two core Web APIs for content search: querySelector() and querySelectorAll(). A user simply configures the desired selector queries (see contentConfig.ts) to select specific portions of content from a webpage. The implementation automatically executes the queries and passes the returned content to the Chrome extension’s background script for RAG document indexing. Selector queries are specified for each domain so that every website has unique content parsing logic.

Example configuration:

export const contentConfig: ContentConfig = {
  "default": {
    chunkSize: 500,
    chunkOverlap: 100,
    selectors: [
      "body",
    ],
    selectorsAll: [],
  },
  "blogwebsite.com": {
    chunkSize: 500,
    chunkOverlap: 100,
    selectors: [
      "article",
    ],
    selectorsAll: [],
  },
  "forum.com": {
    chunkSize: 100,
    chunkOverlap: 0,
    selectors: [],
    selectorsAll: [
      "comment",
    ],
  },
  ...
}

Selector queries can be precise. For example, a query containing a leading hash (#) indicates that an element with id equal to the value following the hash should be returned.

"businesspage.com": {
  chunkSize: 100,
  chunkOverlap: 0,
  selectors: [
    "#location-and-hours", // return element with id="location-and-hours"
  ],
  selectorsAll: [],
},

The query selector APIs support other complex queries that provide extraordinary flexibility to the user. There is still a burden to understand how to use the query selector APIs. However, the fine-grain control results in much better RAG indexing performance, which in turn results in better responses from the LLM. Now, inline JavaScript code can be excluded from document indexing, and peripheral “recommended content” is no longer interrupting core information on the page.

In the future, the implementation should be extended to support URL patterns instead of just top-level domains. Additionally, the values in contentConfig.ts should be moved to an external app setting to reduce the size of the extension package and preserve the maintainability of the codebase.

Custom Chunking 🍪

All website content is different, which means every website has an optimal set of RAG chunking parameters. The preceding code examples show how chunkSize and chunkOverlap can be configured for each domain.

For a website containing many back-and-forth messages from users in a thread, it may be preferable to have a small chunk size to encapsulate each message as an individual document. But for a blogging website with long-form content, it may be better to have large chunk sizes with some amount of overlap to ensure that concepts and ideas are never split across document chunks.

The content parsing configuration is extended to include RAG chunking configuration. In the future, other RAG parameters (e.g. search type, retrieved document count) may be added to the configuration, which will allow the user to further optimize the RAG document indexing and retrieval process.

Let the Games Begin! 🏅

Fork the repo and try out your selector queries! Everyone can have unique content parsers for the websites they visit. Feel free to contribute a parser for a well-known website and if you need help forming a selector query, don’t hesitate to reach out!

So what’s next for Lumos? There are still many improvements and changes left to be made. The entire RAG workflow can be refactored and optimized, a new UI can be built for the content parser and Ollama settings. Token streaming is still on the roadmap!

Lastly, I’d like to give a shoutout to my colleague Tony who helped me get out of my own way while thinking through some of these changes. Thanks. 💯

Lumos!