How to add full text search to your website

Sam Dutton
Feb 9, 2018 · 16 min read

Many types of website need efficient, accurate search.

This article explains server and client-side alternatives, and shows how to implement search that works offline.

tl;dr

There are lots of ways to do search:

All the search engines, databases and managed services discussed in this article have integrations across multiple platforms, frameworks and languages — not just for the web. Whatever your target platforms, there are several key considerations when choosing a solution:

This is just an overview of some of the issues. Pros and cons for each option are explored in more detail below.

What is ‘full text search’?

For a small amount of simple textual data it’s possible to provide basic search functionality via simple string matching. For example, using JavaScript you could store product data for a small online shop as an array of objects in a JSON file, then fetch the file and iterate over each object to find matches.

This simplistic approach can be better than nothing, but successful search needs more flexible functionality to find relevant results:

In practice some of these features may not work as well as expected, especially in multilingual implementations or for structured, shorter-length data. Some search engine developers have made the case for preferring alternatives.

High quality search implementations provide additional features on the input side:

For a global audience, all this functionality must work across different languages, character sets, text directionality, and geographical locations — and potentially handle linguistics and cultural differences. Japanese stemming is very different from the way it’s done for English. Search engines and search services all provide different approaches to internationalisation and localisation.

What is a search index?

It’s possible to search a small amount of data simply by scanning all of the data for every query. As the quantity of data increases, this becomes slow and inefficient.

In its simplest form a search indexer gets around this problem by analysing a data set and building an index of search terms (words or phrases) and their location within the data — a bit like an index at the back of a book. The search implementation can then look up the query in the index rather than scanning all of the data. Indexers can also implement features such as stop-word handling and stemming.

You can view an example here of a simple index built for this demo using the Lunr JavaScript library:

What is a document?

Confusingly, the word ‘document’ is used with two different meanings in relation to search engines:

Providing high quality search results for a set of binary files can be much more complex than searching structured textual data. Imagine a video archive catalogue consisting of millions of legacy files with multiple different binary formats and a variety of content structures — how can you provide consistent and accurate search across the entire document set?

What else do you need to think about?

FT.com’s recent search engine update needed expert tweaking to work well. Their blog post about the project describes how the implementation initially tended to return results that made sense but were not really relevant, such as a plethora of articles that mention Trump but are not ‘about’ him. They also had to ensure that page ranking preferred recent news stories rather than always returning what might otherwise seem to be ‘most relevant’.


So… What are the options?

Search engine

Run your own search engine on a server. The two most popular are Elasticsearch and Solr, both open source.

Pro

Con

Managed search service

Use a commercial service such as Algolia, Amazon CloudSearch or a platform such as Firebase or Cloudant that integrates with third party search services (Firebase uses Algolia).

Pro

Con

Database with built-in search

NoSQL databases including MongoDB support full text search. CouchDB can implement search using couchdb-lucene or in pre-built alternatives such as Couchbase.

Full text search is also supported by open source relational databases such as MySQL and PostgreSQL as well as many commercial alternatives.

Pro

Con

Google Site Search and Google Custom Search Engine

Google Site Search is deprecated, but Google Custom Search Engine (CSE) is still available. The differences between the two are explained here.

You can try CSE with the example here, which searches products from the Polymer Shop project.

If you don’t want ads and if you’re happy to pay (or the free quota is enough) the CSE API might work for you.

Pro

Con

Client-side search

The Cache and Service Worker APIs enable websites to work offline and build resilience to variable connectivity. Local caching combined with client-side search can enable a number of use cases. For example:

Client-side search can be particularly compelling for a relatively small set of data that doesn’t change much. For example, the demo here searches Shakespeare’s plays and poems:

Image for post
Image for post

Client-side JavaScript full text search libraries include Lunr or ElasticLunr.

You provide a set of ‘documents’ in JSON format, such as a product list, then create an index. Here’s how to do that with the Elasticlunr Node module:

To initiate search on the client, you first need to fetch the index data and load it:

To enable offline search, the index file can be stored by the client using the Cache API. Alternatively, you could fetch document data and build the index on the client, then serialise and store that locally.

And finally:

WebSQL enabled fast text matching (demo) and full text search (demo).

However, the WebSQL standard has been discontinued and only ever had partial browser support.

Full text search in WebSQL is now being removed.

Pro

Con

Client-side search with automated replication

JavaScript libraries such as PouchDB and SyncedDB do much the same job as the client-side libraries described above, but they also offer the ability to automatically synchronise data on the client with a back-end database, optionally in both directions.

You can try an offline-enabled PouchDB demo here.

Pro

Con


What about UX and UI?

Query input

People have come to expect a high standard of design for search query input, particularly on shopping sites. Functionality such as synonym matching and autosuggest is now the norm.

For example, Asos does a great job of highlighting matches and suggesting other categories and brands:

Image for post
Image for post

This article has other great examples of high quality search, and provides sensible guidelines for search input design.

Make sure to understand the different types of search required by your users. For example, an online store needs to be flexible about the way people want to find what they’re looking for:

Search results

Search result content and presentation is critical:

Screwfix provides checkout options right on the search results page, and automatically transforms a query (in this case bosch drill) into a filtered set of results, which each include review ratings and a sensible level of product detail:

Image for post
Image for post

By contrast, Made keeps results clean and uncluttered, which suits the brand:

Image for post
Image for post

ft.com orders news stories by date, and suggests sensible options for refining results:

Image for post
Image for post

Getty Images focuses on… images! Filtering options are provided based on available metadata, along with layout and display options:

Image for post
Image for post

Cross-functional considerations

Search doesn’t live in a vacuum. Successful implementations need good communication between different stakeholders:

Searching audio and video

If you have video or audio content, you can enable users to find it by searching metadata such as titles and descriptions.

There are two problems with this approach:

If your audio or video has captions, subtitles or other kinds of ‘timed metadata’, search and navigation can be much more granular.

For example, the demo here searches Google developer video captions, and enables navigation to specific points within videos. The content in the demo is hosted on YouTube, which uses the SRT format for captions. You can see this in action if you view the transcript for a manually captioned YouTube video such as the one here:

Image for post
Image for post

Websites can use subtitles and captions with a track element (demo here) using the WebVTT format, which is very similar to SRT:

Testing

Whatever you do, test changes and keep an eye on analytics and search logging. Build discount usability testing into your workflow.

Make it easy for non-techies to monitor and understand search statistics:


Find out more

Installing and using search engines and managed services

The two articles below explain in more detail how to set up and use several of the most popular search engines and managed services:

Articles, books and podcasts


Thank you to all the people who helped with this article.

Dev Channel

Developers Channel - the thoughts, opinions and musings…

Sam Dutton

Written by

I am a Developer Advocate for Google Chrome. I maintain simpl.info: simplest possible examples of HTML, CSS and JavaScript. South Australian, living in London.

Dev Channel

Developers Channel - the thoughts, opinions and musings from members of the Chrome team.

Sam Dutton

Written by

I am a Developer Advocate for Google Chrome. I maintain simpl.info: simplest possible examples of HTML, CSS and JavaScript. South Australian, living in London.

Dev Channel

Developers Channel - the thoughts, opinions and musings from members of the Chrome team.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store