Fuzzy offline-first search with leven-match

Espen Klem
norch
Published in
3 min readAug 3, 2020

Fuzzy search for offline-first search solutions is here with leven-match, which is just a little code on top of leven — a Levenshtein distance calculation library. Leven-match takes a query array and returns all words from a word corpus array that are within a given Levenshtein distance.

<script src="leven-match.js"></script>

<script>
const query = ['qvery', 'words', 'levensthein']
const index = ['return', 'all', 'word', 'matches', 'between', 'two', 'arrays', 'within', 'given', 'levenshtein', 'distance', 'intended', 'use', 'is', 'to', 'words', 'in', 'a', 'query', 'that', 'has', 'an', 'index', 'good', 'for', 'autocomplete', 'type', 'functionality,', 'and', 'some', 'cases', 'also', 'searching']

lvm.levenMatch(query, index, {distance: 2})
// returns:
// [ [ 'query' ], [ 'word', 'words' ], [ 'levenshtein' ] ]
</script>

Use cases in a search engine

Autosuggest

Before you do an actual search query, a search engine often have some sort of auto-complete / auto-suggest. Leven-match can help you find the right query before you use it, when combined with a matcher and an array of all the different words used in the index.

Regular search query

When doing an actual search query, you could easily expand your search to all words within given Levenshtein distance. This is easiest set up with an OR-search, but with some more code could also work well with AND-search.

Hit highlighting in your search results

When a search result is returned for a give query it would be good usability to show why each result was returned. This is done with some sort of hit highlighting. Search-index doesn’t have an built in hit-highlighter, so you need to match your query with the words in each search result, and also expand it with some Levenshtein distance matching.

daq-proc query processing and hit highlighting demo, showing both matching with near similar words and hit highlighting.

Further development

To be more helpful and accurate, leven-match needs the ability to set some threshold for when to do matching. Or maybe a grading of distance connected to how many characters the query word has. Like 0 distance for 2 and less character words, 1 distance for 3–5 character words and 2 distance for 6 character words and up?

Part of document- and query processing

Leven-match is part of daq-proc — the document and query processing library that is a collection of different processors created to help with standard search engine-similar tasks that are not covered by the search engine library search-index itself.

And ultimately this will help to create a search engine service so that everyone can have their small, free search engines with whatever content they would like.

--

--

Espen Klem
norch
Editor for

Designing - Creating - Dismantling - Socialising - Nerding. Interaction Designer at Knowit. Tinkering with search when I can.