Full Text Search in MySQL

Explore MySQL’s powerful text search capabilities

Auriga Aristo
Indonesian Developer
10 min readJun 29, 2024

--

Photo by Nathana Rebouças on Unsplash

Have you ever wondered how a search engine on a website does? I am not talking about search engines like Google or Bing, but the ones inside Medium or Amazon websites. Those search engines only search in their tables, not by crawling through many websites.

For searching text, we might use these MySQL queries:

SELECT id, title, content FROM articles WHERE title LIKE '%MySQL%';
# or
SELECT id, title, content FROM articles WHERE title RLIKE 'MySQL';

While the LIKE query is suitable for simple pattern matching, and the RLIKE can do complex pattern matching using regular expressions, they have some disadvantages:

  1. Full Table Scans: While practical for simple and complex pattern matching, the LIKE and RLIKE queries require full table scans. Unfortunately, this process is slow for large datasets, potentially leading to significant delays in search results.
  2. CPU-Intensive: The complexity of regular expression matching and searching patterns using a wildcard can be more CPU-intensive. This technical demand can further degrade performance, highlighting the need for a more efficient solution.
  3. Not Scalable: As the datasets grow, the performance of LIKE and RLIKE queries deteriorates significantly. They do not scale well for large volumes of text data.
  4. No Relevance Ranking: These queries do not provide any mechanism for ranking results by relevance. They return all matches without indicating which one is more relevant to the search terms.
  5. Index Limitations: The LIKE can use indexes if the pattern starts with a non-wildcard character; its effectiveness is limited compared to full-text indexes. Moreover, regular expressions do not benefit from indexing.

To address these problems, MySQL has a powerful Full-Text Search (FTS) feature that allows you to search large amounts of text data quickly and efficiently. Unlike the basic search methods like LIKE or RLIKE, which can be slow and resource-intensive for big datasets, FTS uses unique indexes to speed up searches and rank results by relevance.

It supports advanced search capabilities such as natural language queries, Boolean searches (with AND, OR, NOT), and query expansion to find related terms. FTS is an excellent choice for applications like search engines, where fast and accurate text searching is required.

To use the FTS feature, we must create the full-text index in the columns that we want to search:

CREATE FULLTEXT INDEX ft_index ON articles(title, content);
# or
CREATE TABLE articles (
id INT AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255),
content TEXT,
FULLTEXT (title, content)
);

The syntax to perform a full-text search is using the MATCH() function together with the AGAINST() function. MATCH() specifies the columns to be searched, and AGAINST() specifies the search string.

SELECT id, title, content, 
MATCH (title, content) AGAINST ('MySQL search') AS relevance
FROM articles
WHERE MATCH (title, content) AGAINST ('MySQL search');

Inside Full-Text Search, different search modes can be used with AGAINST():

  • Natural Language Mode: This is the default mode. It interprets the search string as a phrase in the natural human language.
  • Boolean Mode: This mode allows you to use operators like +, -, *, and > < to refine search.
  • Query Expansion Mode: This mode tries to find additional relevant rows. It performs an initial search and then re-searches using words from the most relevant rows.

Natural Language Mode

Natural language in the context of Full-Text Search (FTS) in MySQL refers to a search mode that interprets the search string as a natural human language phrase, which means that the search is conducted in a way that tries to mimic how humans naturally search for information, considering the context and relevance of words rather than just performing a direct keyword match.

Here’s a breakdown of how the natural language mode works:

  1. Relevance Ranking: It calculates each document’s relevance based on the frequency of the search terms. Documents that contain the search terms more frequently are considered more relevant and are ranked higher in the search results.
  2. Stopwords: Common words (like the, and, is, etc.) are ignored in searches because they appear so frequently that they do not help distinguish between documents.
  3. Word Stemming: This mode does not use word stemming (reducing words to their base or root form). Therefore, the search term must match the form of the words in the documents.
  4. Noise Words: Short words (typically less than three characters) are ignored, as they are usually impractical for searches.

When you use natural language mode, you don’t need to specify it explicitly because it’s the default mode. However, you can specify it explicitly using IN NATURAL LANGUAGE MODE for clarity.

SELECT id, title, content,
MATCH (title, content) AGAINST ('MySQL features') AS relevance
FROM articles
WHERE MATCH (title, content) AGAINST ('MySQL features');
Natural Language Mode Result

This query will return articles discussing MySQL features, ordered by their relevance to the search terms “MySQL” and “features.”

Using natural language mode, you can leverage MySQL’s full-text search capabilities to find the most relevant articles based on how humans search for information.

Boolean Mode

Boolean mode in MySQL FTS allows for more complex and powerful searches using Boolean operators. This mode lets you include or exclude specific terms, apply weights to terms, and handle more sophisticated queries than the natural language mode.

To use the boolean mode, you need to specify IN BOOLEAN MODE in the AGAINST() function.

SELECT id, title, content,
MATCH (title, content) AGAINST ('MySQL features' IN BOOLEAN MODE) AS relevance
FROM articles
WHERE MATCH (title, content) AGAINST ('MySQL features' IN BOOLEAN MODE);

In boolean mode, you can use a variety of operators to refine your search:

  • + (mandatory inclusion): The term must be present in the result.
  • - (mandatory exclusion): The term must not be present in the result.
  • * (wildcard): Used for partial matches.
  • > < (increase/decrease relevance): Adjusts the importance of a term.
  • () (grouping): Groups terms into subexpressions.
  • “” (phrase search): Searches for the exact phrase within the quotes.

Here are some examples:

Basic boolean search

SELECT id, title, content,
MATCH (title, content) AGAINST ('+MySQL +features' IN BOOLEAN MODE) AS relevance
FROM articles
WHERE MATCH (title, content) AGAINST ('+MySQL +features' IN BOOLEAN MODE);
Mandatory Inclusion Search Result

This query searches for articles that must include “MySQL” and “features.”

Excluding Terms

SELECT id, title, content,
MATCH (title, content) AGAINST ('MySQL -features' IN BOOLEAN MODE) AS relevance
FROM articles
WHERE MATCH (title, content) AGAINST ('MySQL -features' IN BOOLEAN MODE);
Mandatory Exclusion Search Result

This query searches for articles that contain “MySQL” but must not contain “features.”

Phrase Search

SELECT id, title, content,
MATCH (title, content) AGAINST ('"full text search"' IN BOOLEAN MODE) AS relevance
FROM articles
WHERE MATCH (title, content) AGAINST ('"full text search"' IN BOOLEAN MODE);
Phrase Search Result

This query searches for the exact phrase “full text search.”

Using Wildcards

SELECT id, title, content,
MATCH (title, content) AGAINST ('databas*' IN BOOLEAN MODE) AS relevance
FROM articles
WHERE MATCH (title, content) AGAINST ('databas*' IN BOOLEAN MODE);
Wildcard Search Result

This query searches for any word that starts with “databas”, such as “database” or “databases”.

Adjusting Relevance

SELECT id, title, content,
MATCH (title, content) AGAINST ('>MySQL <features >performance' IN BOOLEAN MODE) AS relevance
FROM articles
WHERE MATCH (title, content) AGAINST ('>MySQL <features >performance' IN BOOLEAN MODE);
Increasing and Decreasing Relevance Search Result

This query increases the relevance of “MySQL” and “performance” but decreases the relevance of “features.”

Combining Operators

SELECT id, title, content,
MATCH (title, content) AGAINST ('+(MySQL database) +(features -SQL)' IN BOOLEAN MODE) AS relevance
FROM articles
WHERE MATCH (title, content) AGAINST ('+(MySQL database) +(features -SQL)' IN BOOLEAN MODE);
Combining Operators Search Result

This query searches for articles that must include either “MySQL” or “database” and must consist of “features” but not “SQL.”

Boolean mode provides flexibility for more complex searches, making it a powerful tool for fine-tuning search results based on specific requirements.

Query Expansion Mode

Query expansion mode in MySQL FTS is a feature that enhances search results by expanding the search query with additional relevant terms. This mode performs an initial search and then uses the results of that search to find additional terms pertinent to the initial query. These additional terms are then used for a second search to broaden the results.

To use query expansion mode, you include WITH QUERY EXPANSION in the AGAINST() function.

SELECT id, title, content,
MATCH (title, content) AGAINST ('search terms' WITH QUERY EXPANSION) AS relevance
FROM articles
WHERE MATCH (title, content) AGAINST ('search terms' WITH QUERY EXPANSION);

In this mode, the initial search retrieves documents based on the original search terms, and the second search uses additional terms found in those documents. This mode helps to find a broader set of relevant documents, which can be helpful when the initial query terms are too narrow. The results are still ranked by relevance but include documents that might not have been found with the original terms alone.

While this mode can help find more relevant documents, it might retrieve less relevant documents due to the inclusion of additional terms. Also, the two-stage search process could make it more resource-intensive.

Natural Language Mode vs Query Expansion Mode

The main difference between the Natural Language Mode and the Query Expansion Mode lies in how they handle the search terms and expand the search results.

Natural Language Mode searches for articles containing “MySQL” and “features,” then ranks the results based on relevance using these exact terms. Only documents containing “MySQL” and “features” will be considered and ranked by their relevance to these terms.

SELECT id, title, content,
MATCH (title, content) AGAINST ('MySQL features') AS relevance
FROM articles
WHERE MATCH (title, content) AGAINST ('MySQL features');
Natural Language Mode Result

However, the Query Expansion Mode initially searches for articles containing “MySQL” and “features.” Then, it identifies the other terms from the most relevant articles found in the initial search (e.g., “database” and “indexing”). Last, the system will search again using “MySQL,” “features,” and the newly identified terms. This mode returns a broader set of documents relevant to “MySQL features.”, potentially providing a more comprehensive set of results.

SELECT id, title, content,
MATCH (title, content) AGAINST ('MySQL features' WITH QUERY EXPANSION) AS relevance
FROM articles
WHERE MATCH (title, content) AGAINST ('MySQL features' WITH QUERY EXPANSION);
Query Expansion Mode Result

Query Expansion Mode can be beneficial when users need to learn all the relevant terms in their search or when looking for related concepts and a broader set of results. Natural Language Mode is simpler and more direct, suitable for straightforward searches where the provided terms are expected to yield sufficient results.

Things to know

Supported Database

Full-text indexes are only supported in InnoDB tables starting from MySQL 5.6 and have traditionally been supported in MyISAM tables.

Stopwords

Stopwords are common words ignored in full-text searches because they appear so frequently that they don’t help narrow down the search results. MySQL uses a built-in list of stopwords by default, but you can customize this list if needed.

SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;

You can define your list of stopwords using the ft_stopword_file system variable.

SET GLOBAL innodb_ft_server_stopword_table = 'mydatabase.my_stopwords';

Configurable Word Length

MySQL ignores words shorter or longer than a specific length in full-text searches. In MySQL, the minimum word length by default is 3 characters, while the maximum is 84, but you can change this setting.

# MyISAM
show variables like 'ft_min_word_len'; # 4
show variables like 'ft_max_word_len'; # 84

# InnoDB
show variables like 'innodb_ft_min_token_size'; # 3
show variables like 'innodb_ft_max_token_size'; # 84

Relevance Score

Relevance scores are calculated based on several factors, including term frequency (how often a term appears in a document) and inverse document frequency (how common the term is across all documents).

Combined Mode Searches

You can combine boolean mode searches with natural language mode to refine results further. For example, you can use boolean operators within a natural language search.

Choosing between LIKE, RLIKE, and Full-Text Search depends on your needs and dataset size. LIKE is suitable for simple, small-scale searches but is inefficient for large datasets due to full table scans. RLIKE offers more complex pattern matching with regular expressions but is also resource-heavy. However, using indexing and relevance ranking, Full Text Search is the best choice for large datasets and complex search requirements.

Within FTS, use natural mode for straightforward relevance-based searches, boolean mode for precise control with operators like AND, OR, and NOT, and query expansion mode to broaden results by including related terms identified from initial search results.

Implementing FTS in MySQL significantly enhances the performance and relevance of text searches in large databases. By understanding and utilizing features like natural language mode, boolean mode, and query expansion, you can effectively tailor search functionalities to meet your application’s needs.

Ultimately, the choice depends on your specific requirements and search complexity. Explore these options to determine the best approach for delivering quick and accurate search results, improving user experience and data retrieval efficiency.

--

--

Auriga Aristo
Indonesian Developer

4+ years in Backend Developer | PHP, Java/Kotlin, MySQL, Golang | New story every week