Indexing and searching of email inside the web browser

Runbox
5 min readJun 3, 2019

--

How we used Xapian, WebAssembly, and HTML 5 Canvas to create an instant email experience

Email is a challenging medium to deal with computationally for several reasons — spam, security, and myriad combinations of character encoding to name a few. However, email can be problematic not least because of the sheer volume and the ever-changing nature of the typical email message corpus.

Email users receive an increasing amount of messages of all sorts, and processing email has become a resource intensive task for both computers and people. Studies have shown that while email is central to most people’s professional and personal life, it’s also among the most tedious and time consuming tasks.

Because email is so essential to our workflow, any delay in the response from the email client, app, or browser quickly becomes a nuisance and the task more tedious.

Difficulties finding just the message you need in the sea of other email quickly turns into frustration, and multitasking or postponing or even abandoning the task becomes more likely.

Any improvement in email listing, searching, and rendering performance can translate into a more efficient and streamlined work day where more gets done in the same amount of time.

We therefore decided to try to remove all such delays by moving the indexing and searching of email from the server to the client and offer an instantaneous email experience.

Enter Xapian, WebAssembly, and HTML 5 Canvas

To accomplish this we utilized Xapian, which is an open source search engine library written in C. By compiling Xapian with the Emscripten compiler we were able to build this library for WebAssembly which is increasingly supported in modern browsers.

The library is available on Github and enables a fully featured search index in your browser. This is demonstrated in Runbox 7 where an entire email corpus can be searched without interacting with a server.

By targeting WebAssembly we are also able to reuse the same open sourced search engine code on the server side with Node.js. It also means that we don’t need to create separate builds for different operating systems. Furthermore, since it runs inside a Javascript sandbox we benefit from the security features that comes with this.

We believe this is both safer and more portable than native builds, without losing much when it comes to performance. Rather, we’ve seen a gain in development productivity and performance through the tight integration with the Javascript runtime that comes with WebAssembly. When compared to using traditional script language bindings to C libraries, WebAssembly is superior.

Storing the search index in the browser would of course not work for a search engine for the entire world wide web, but an email account needs a limited amount of data storage. Therefore the search index can fit in the browser’s local storage engine such as IndexedDB, and even in memory while in use. Since the typical index size constitutes 5–10% of the email corpus size, the size of the index for 1 GB of email rarely exceeds 100 MB which is negligible on most devices.

In our Runbox 7 app the performance of the WebAssembly Xapian port is matched by the message listing which is written in HTML5 Canvas and enables handling of large tables and quick re-rendering. Regular HTML tables would not be suitable because browsers struggle with the rendering of tables containing just a few thousand rows, let along tens or hundreds of thousands — or even millions — of rows. The Canvas element is contained in an Angular 2+ user interface written in HTML/Typescript with Angular Material UI elements, and this entire codebase is available on Github as well.

Benefits: Speed, incremental search and privacy

There are several benefits of having a search index in the browser rather than on the server. First of all there’s a significant gain in speed since you don’t need the round trip of query and results between the two, resulting in a zero latency situation.

Complete search results can be returned instantly as you type (incremental search), and more features can be offered when it comes to sorting and counting the number of hits. Because search results are displayed instantly you becomes more efficient since you can adjust your search query interactively.

Another benefit is that no server is able to monitor what you are searching for in your local index or which email message you happen to be viewing, which contributes to protecting your privacy. You can even search your content while offline without an internet connection, and display any content that’s cached locally in the browser.

By also using an operating system that offers encrypted file systems on your devices you ensure that your locally stored data is secured. When not using your own personal device you can enable private browsing mode so that no data remains on the device. Or you may instead choose to use the server’s search capabilities since WebAssembly supports the same code on the server and can provide the same API’s there as in the browser.

And if you have coding skills, having full access to the search index and the libraries to interact with it opens up possibilities for custom processing of the index. In an email scenario this could be anything from smart searches to surveillance and alerts.

Conclusion

Modern web applications increasingly use the browser’s local storage for caching, storing settings and content, and more. Utilizing the browser’s storage and the local device’s processing power can provide a greatly improved search performance and even offline capabilities.

Especially in the context of email we can demonstrate substantial benefits to searching and rendering performance. By bridging the gap between email clients like Thunderbird and regular server based webmail services we combine the best of both worlds into an instantaneous, efficient, and flexible browser based solution.

Because the index utilization is moved to the client the user’s privacy is protected to a greater extent, and the data is available for any kind of inspection, analysis, or processing the user desires.

--

--

Runbox

Providing fast, secure, and privacy protected email services from Norway. See https://blog.runbox.com and https://status.runbox.com for updates.