Full-text search with MySQL and Elasticsearch

Azraar Azward
7 min readDec 4, 2022

--

For modern web applications, search functionality is a core component that enables users to quickly find the information they need. The purpose of this article is to illustrate the use of this combination of a relational database system, such as MySQL, and a dedicated search engine, such as Elasticsearch.

There are essentially two ways to search for records in MySQL. A LIKE query is one and the most straightforward option, but a full-text search is available for more demanding applications.

LIKE query

An easy way to implement a search function is to apply something like a LIKE query. Modern relational databases support this query scheme, which queries records based on the occurrence of a specific character string in the database. The % character can be used as a placeholder in a query, which stands for any character string. Adding case-insensitivity to the query can improve this solution. It depends on the database, but in MySQL, you can use the LIKE solve keyword. As an example, if you want to find a blog article containing the string “technology”, you could execute the following database query:

SELECT * FROM posts WHERE body LIKE ‘%technology%’

The search term “technology” is a dynamic search term and contains the user’s input under the hood.

This method is very easy to implement and is often sufficient for simple applications, but also for the first version of a search function. A disadvantage of the LIKE query is that it can be very slow, especially with a large number of data records, since all existing data records have to be checked individually for matches with the search query (full table scan). Another drawback of this method is that it only looks for exact matches and if the user has any typos in the search term.

Full-text search with MySQL

Full-text search is a technique that allows you to search for records that may not perfectly match your search criteria. In order for the index to be considered a full-text index, the index must be of the FULLTEXT type. A FULLTEXT index is a specific type of index that finds keywords in the text instead of comparing values with index values. Although FULLTEXT searching is different from other types of matching, do note that you can have a BTREE index and a FULLTEXT index on the same column at the same time — they will not conflict because they are suited for different purposes.

SELECT * FROM posts WHERE MATCH(body) AGAINST(“technology” IN NATURAL LANGUAGE MODE);

When MATCH() is used together with a WHERE clause, the rows are automatically sorted by the highest relevance first. To search for an exact string, enclose it with double quotes.

Limitations

  • Full-text searches are not supported on partitioned tables.
  • All columns in a FULLTEXT index must use the same character set and collation.
  • Full-text search operations do not treat the % string as a wildcard.

The full list of MySQL’s full-text restrictions can be seen at the MySQL’s documentation page.

Elasticsearch

Elasticsearch is an open-source search server based on the well-established search engine Apache Lucene. Elasticsearch runs in its own process and keeps a copy of the data records to be searched so that it can create optimized indices, which are necessary for efficient searches. With Elasticsearch it is possible to implement full-text searches like with MySQL; however, Elasticsearch’s true strengths come to the fore when you take advantage of its advanced features. These include aggregations, advanced text analysis, and much more. Unfortunately, outsourcing the search logic to a separate subsystem also has disadvantages. Above all, the synchronization of the data sets with an existing (usually relational ) database is a challenge that has to be mastered in the course of the development process.

Elasticsearch benefits

The most obvious advantage of Elasticsearch is the speed at which search queries are processed. As a rule, the distributed architecture of Elasticsearch even allows indexes of several gigabytes to be searched efficiently. However, Elasticsearch also offers other advantages. In particular, it offers some features that can only be implemented very poorly with MySQL and with unnecessary effort. In many applications, the results of a search query are spread across multiple pages, also known as pagination. This means that you do not have to wait for all data records to be returned and you can access the results more quickly. In MySQL, this can be implemented with the LIMIT and OFFSET commands, but performance suffers with large data sets. The pagination of results lists of practically any length is efficiently implemented in Elasticsearch so that the results can be output in partial lists of any size without additional effort. Especially for text attributes with short values ​​or a fixed list of possible characteristics, it makes sense to offer users an auto-completion of the possible values ​​before the actual search query is carried out. In many cases, this means that incorrect search queries can be corrected by the user and thus lead him to his goal more quickly. Elasticsearch offers a dedicated API to generate suggested values ​​for search queries.

Elasticsearch also offers a flexible solution for implementing a faceted search. It is thus possible to offer the user several filters, which he can then combine as desired to restrict the results. The well-thought-out configuration of the indices pays off here so that all conceivable data types can be queried individually. For some applications, it is necessary to reverse the order of indexing and search. In this way, notifications can be generated when new documents matching a search query have been added to the index. Classically, every change in the index requires searching through all data records with all search queries. However, percolation in Elasticsearch makes it possible to save search queries and then efficiently apply documents to those saved queries. The result is a list of queries that would result in the data record. Other Elasticsearch features, to name a few, are synonym detection, multi — index search, HTML/XML document search, geospatial search, positive/negative boosting, custom analyzers/tokenizers, failover, sharding, and replication. You can get an overview of all the features on the official website.
https://www.elastic.co/enterprise-search?elektra=home&storm=river1

Integration with an existing system

To take advantage of Elasticsearch, the data to be searched must be provided as JSON via Elasticsearch’s REST API. First of all, it is necessary to extend the existing application in such a way that the data records to be searched can be serialized in JSON format. Hierarchically organized substructures in the database can be stored together in an index, similar to document-based databases. However, in order to be able to model arbitrary relationships between objects, it may be necessary to create several indexes.

Synchronization with the database

The separation of the database and search server results in the necessity of synchronizing these two systems. If a record changes, is added or is deleted, then the corresponding entry in Elasticsearch must also be adjusted accordingly. This can be difficult to get a handle on, especially if you have included related records in the search index. The naive solution to this problem is to periodically re-import the search index. For reasons of time, however, this is only practical for a relatively small amount of data.

To remain performant, it is, therefore, necessary to include mechanisms in the application that connects the database and search server that pass on the corresponding information to Elasticsearch in the event of changes in the data. This increases the complexity of the problem, similar to cache invalidation. The invalidation mechanisms must be designed individually for each application and are therefore associated with a great deal of manual effort and are therefore more prone to errors. It is therefore advisable to re-import the entire index at regular intervals, eg monthly. Alternatively, other approaches such as event sourcing can also be used for synchronization. Due to the synchronization problem, some operators have switched to using Elasticsearch directly as the primary database. However, this is only practical if the data sets can be well-mapped in a document-based model. For a data model with multiple relationships between objects, a hybrid architecture with a relational database is probably still the better choice.

Configure the indexes and import records

When importing data, the so-called mapping, i.e. the configuration of the indices, is of particular interest, whereby between dynamic and static mapping is to be distinguished. With dynamic mapping, Elasticsearch is left to decide independently which data types should be used for which fields, and how they should be indexed, based on the first imported data records. To be able to fully use the possibilities of Elasticsearch, however, it is advisable to create a static mapping in which the data types and index options are specified for each attribute individually. For example, it is possible to apply various transformations such as lowercase, stemming, and transliteration to character strings. In addition, one or more indexes can be created for each attribute. This is especially useful if you want to search for an attribute in different ways. Here you can, for example, create an index for searching for exact hits, and configure one for fuzzy search and one for partial string (n-gram) search. This enables extremely finely tuned search queries later on.

Conclusion

Using MySQL, you will still index and search for your data. With ElasticSearch, you have a lot of flexibility in what you index as a unit. You can take all the content reviews and tags for an article and put it in Elasticsearch as a single item. You will also probably find that Elasticsearch will provide better performance and overall results than you would with MySQL. You also have more flexibility with things like synonyms and weighting. But it does mean you have another stack to maintain and you have to manage indexing and updating of content. This will depend on the size of your data and the importance of research as a functionality.

References

https://logz.io/blog/10-elasticsearch-concepts/
https://elastic-builder.js.org/
https://elasticsearch-cheatsheet.jolicode.com/

--

--

Azraar Azward

An eager Software Engineer always loving to reach the impossible.