Dev Notes: Fuzzy Search on PostgreSQL

Ben Orozco
May 6, 2016 · 3 min read

My previous experience with search in Rails was 5+ years ago, then I used Sphinx as a Full-Text external search engine, which seems to have gone out of fashion.

At Nimbo X we use pg_search rails gem for searching purposes, with Full-Text already enabled in PostgreSQL itself, which has worked for us so far.

Nevertheless, with the increase of usage among practicians, along with our recent introduction of Medispan, support tickets related to mismatched search queries have become a cause of pain for the team. We knew our users deserve better.


First we needed to understand the current landscape on psql search. Luckily PgSearch documentation pointed us in the right direction: along with full-text search, other algorithms are supported through PostgreSQL contrib packages: fuzzystrmatch (Levenshtein, Double Metaphone) & pg_trgm (Trigrams).


Levenshtein (a.k.a. match difference)

Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.

Soundex (a.k.a. match soundalikes)

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.

Double Metaphone (a.k.a. better soundex)

Metaphone is a phonetic algorithm… It fundamentally improves on the Soundex algorithm… which does a better job of matching words and names which sound similar.

Trigrams (match misspellings)

Trigrams are a special case of the n-gram, where n is 3. They are often used in natural language processing for doing statistical analysis of texts.



PgSearch makes fuzzystrmatch straightforward to implement:

$ rails g pg_search:migration:dmetaphone
$ rake db:migrate
class Word < ActiveRecord::Base
include PgSearch
pg_search_scope :that_sounds_like,
:against => :spelling,
:using => :dmetaphone

four = Word.create! :spelling => 'four'
far = Word.create! :spelling => 'far'
fur = Word.create! :spelling => 'fur'
five = Word.create! :spelling => 'five'

Word.that_sounds_like("fir") # => [four, far, fur]


On the other side, pg_trgm queries are more expensive computationally. Creating a separate GIST (Generalized Index Search Tree) can be a palliative by increasing performance, although creating the index itself is expensive as well, and might involve some downtime. PgSearch makes pg_trgm a breeze to implement too:

class Website < ActiveRecord::Base
include PgSearch
pg_search_scope :kinda_spelled_like,
:against => :name,
:using => :trigram

yahooo = Website.create! :name => "Yahooo!"
yohoo = Website.create! :name => "Yohoo!"
gogle = Website.create! :name => "Gogle"
facebook = Website.create! :name => "Facebook"

Website.kinda_spelled_like("Yahoo!") # => [yahooo, yohoo]

Rails Migrations

It is nice to have rails migrations that activate the contrib packages in Postgres:



Fuzzy Search



The Backlog by Ecaresoft

Sharing our journey: from software development to company…

The Backlog by Ecaresoft

Sharing our journey: from software development to company culture and productivity hacks.

Ben Orozco

Written by

Healthtech Hacker — Full Stack Dev — Open Source & Crypto Enthusiast — CTO 🌳 @HealthTreeNet — Previously @ecaresoft & @Nimbo_X

The Backlog by Ecaresoft

Sharing our journey: from software development to company culture and productivity hacks.