Full Text Search with Postgres and Rails

Skylar S
SkyTech
Published in
4 min readFeb 22, 2023

The “TLDR”:

  • A basic full text search implementation in Postgres looks like this:
    tsvector_column @@ plainto_tsquery(user_input)
  • The SQL to generate the tsvector column from a table field ( and continuously update) is
ALTER TABLE mytable
ADD COLUMN searchable tsvector GENERATED ALWAYS AS (
to_tsvector('english', coalesce(column, '')), 'A') ||
) STORED;
  • You should add a GIN index to this column as well
  • Alternatively, you can use the pg_trgm extension for “fuzzy” search

Why full text search?

The simplest way of implementing search in a relational database is using a LIKE% query. However, using this query simply returns matching all result containing the targeted phrase without any ranking or indication of relevance. In contrast, depending on how it’s implemented, Full Text Search allows for

  • Match multiword phrases even if the words aren’t in the same order or contain filler words. For example, searching for “how to train your dragon” using full text search might return “5 ways to train a dragon”, even though though the phrase “how to train your” isn’t in the result.
  • Normalizing words so that conjugation and suffixes don’t limit the results. For example “training your dragon” could still return the result “train your dragon”
  • Removing “stop words”, which are common words like “the”, “of”,
    “for” and “to” so they aren’t part of the requirements to create a match.
  • Allowing users the option to search for either exact or inexact phrases by using quotes, or using OR statements.

Implementation

To demo this functionality using Postgres, I built a search function for good read quotes and Star Trek quotes with Postgres database using Ruby On Rails

The search function has two “stages” — parsing the user input into as tsvectorcontaining the words to search against, and comparing that tsvector against the an index of other tsvectorscontaining the normalized quotes from the database. The to_tsvectorfunciton is responsible for removing stop words and trimming verb endings such as “ing” (depending on the dictionary used).

Note: you can also use the pg_trgm extension to perform “similarity” or fuzzy search, which is better at catching mispellings of words. This isn’t covered by this article, but you can read about it here

To get started, let’s create the “dialogues” table, each “dialogue” consisting of a line, character, series and episode.

class CreateDialogues < ActiveRecord::Migration[7.0]
def change
create_table :dialogues do |t|
t.string :line
t.string :episode
t.string :series
t.string :character

t.timestamps
end
end

end

Next, let’s add the column we will search against.

def up
execute <<-SQL
ALTER TABLE dialogues
ADD COLUMN searchable tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(line, '')), 'A') ||
setweight(to_tsvector('english', coalesce(character,'')), 'B')
) STORED;
SQL
add_index :dialogues, :searchable, using: :gin
end

Lets break this down. I am creating a tsvector field through a subquery. GENERATED ALWAYS means that this field will automatically update every time the fields used in query change, in this case line or character . I am using the “setweight” function to indicate that I am matching primarily against the characters lines, but we also want to match against the characters names. That way, I can search for the name of the character and see all their lines, as well as searching dialogue, in a single query.

Take note that the SQL || operator represents concatenation and not “OR” as in some other programming languages.

Next, I add an GIN index to this newly created field. This allows quicker searching.

Finally, lets handle the query:

class Dialogue < ApplicationRecord
conn = ActiveRecord::Base.connection
scope :search, -> (query){ select("dialogues.*, ts_rank_cd(dialogues.searchable, websearch_to_tsquery('english',
#{conn.quote(query)})) as rank").where("dialogues.searchable @@ websearch_to_tsquery('english', ?)", query).order('rank DESC')}

end

Above, I use websearch_to_tsqueryto convert the users words into a query. Normally, ts_query requires a specific syntax. “ fat & cat” to search for documents containing both fat and cat, and ‘|’ for either fat and cat. websearch_to_tsquery however is able to parse queries using english such as “fat OR cat”. Read more about parsing queries here:

For ranking the search results, I applied the ts_rank_cd function, which is a ranking function which takes into account the postion of the words in the phrase. I chose this function in order to bias search results towards similar “phrases” over ranking solely by the number of matching words.

This is all I needed to implement the search functionality! With the code above implemented, I could call Dialogue.search("Picard") to get all of that characters lines, or I could call Dialogue.search("Transporter Accident") to get all of the lines containing the words “transporter” and “accident”, and so on.

Below are some screen shots of the search in action.

--

--