Nevertheless, with the increase of usage among practicians, along with our recent introduction of Medispan, support tickets related to mismatched search queries have become a cause of pain for the team. We knew our users deserve better.
First we needed to understand the current landscape on psql search. Luckily PgSearch documentation pointed us in the right direction: along with full-text search, other algorithms are supported through PostgreSQL contrib packages: fuzzystrmatch (Levenshtein, Double Metaphone) & pg_trgm (Trigrams).
Levenshtein (a.k.a. match difference)
Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.
Soundex (a.k.a. match soundalikes)
Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.
Double Metaphone (a.k.a. better soundex)
Metaphone is a phonetic algorithm… It fundamentally improves on the Soundex algorithm… which does a better job of matching words and names which sound similar.
Trigrams (match misspellings)
Trigrams are a special case of the n-gram, where n is 3. They are often used in natural language processing for doing statistical analysis of texts.
PgSearch makes fuzzystrmatch straightforward to implement:
$ rails g pg_search:migration:dmetaphone
$ rake db:migrateclass Word < ActiveRecord::Base
:against => :spelling,
:using => :dmetaphone
four = Word.create! :spelling => 'four'
far = Word.create! :spelling => 'far'
fur = Word.create! :spelling => 'fur'
five = Word.create! :spelling => 'five'
Word.that_sounds_like("fir") # => [four, far, fur]
On the other side, pg_trgm queries are more expensive computationally. Creating a separate GIST (Generalized Index Search Tree) can be a palliative by increasing performance, although creating the index itself is expensive as well, and might involve some downtime. PgSearch makes pg_trgm a breeze to implement too:
class Website < ActiveRecord::Base
:against => :name,
:using => :trigram
yahooo = Website.create! :name => "Yahooo!"
yohoo = Website.create! :name => "Yohoo!"
gogle = Website.create! :name => "Gogle"
facebook = Website.create! :name => "Facebook"
Website.kinda_spelled_like("Yahoo!") # => [yahooo, yohoo]
It is nice to have rails migrations that activate the contrib packages in Postgres: