Originally published at https://www.kevinpeters.net.
How to filter with any characters or wildcards in PostgreSQL
Table of contents
- Basic filtering in PostgreSQL or SQL
%wildcard operator in a nutshell
_wildcard operator in a nutshell
- Search with dynamic cases or full-text search
Basic filtering in PostgreSQL or SQL
PostgreSQL wildcards are an important tool for searching in SQL databases. Most of the time text in PostgreSQL databases are saved in columns as
VARCHAR type. The general recommendation is to use
TEXT though. It is no problem though if something like the
VARCHAR type is used. All the solutions work the same here in PostgreSQL.
Searching for an exact text in a column is quite easy with a statement like:
From a table with a lot of entries of fictional person data, it will filter all rows which have the column
example_column with the value
After executing the query it would filter out all non-highlighted rows above, resulting in the following table:
Exactly what we wanted to achieve. But for more advanced use cases like filtering all names which start with
Mel, we would need to use wildcards.
% wildcard operator in a nutshell
In our previous dataset, we filtered all values which start with
Mel in the exact casing. Now we filter all rows which have the value like Mel out of our original dataset. This can be achieved with a wildcard character. In this case, here it is not important how many characters are wildcarded. This basically means it could be anything like
Mel. For getting this to work, PostgreSQL offers the character
%. With this, we can build a query in theory which can match all of the words above. There is one nuance though, instead of using the
WHERE ... = '...' filter, we need to use a statement called
LIKE. The statement will look like
SELECT * FROM ... WHERE ... LIKE '...'.
To get these results the query could look like:
And it would result in the following table as the SQL query result.
% sign in the query acts as a match for any character. But not even just a match. It is a match for zero to infinite characters. That is why this operator is matching
Melissa. A lot of use cases result out of this and that is why this is the most used wildcard in SQL or PostgreSQL overall.
_ wildcard operator in a nutshell
In the last search, we have defined infinite wildcards with the
% which match strings with zero to an infinite amount of characters and other filters. Sometimes we have use-cases to just wildcard a specific character in some text. For example, there could be types with or without a single number attached. These types could be
SALE but also
SALE8. In this example, our matches look really similar, but if we know that there is just one number in the matching string, we could use the wildcard operator
_ to match exactly the cases we want. For our sale types, we could use
In our example table, we could have different results based on this. To make a good search though, we could find all the names in the table which have three characters but must start with
M. In the example table, there are three entries that will match the condition.
The highlighted results in this table should be queried for a PostgreSQL wildcard. The query is looking quite easy again.
An important point here is that the query is using two PostgreSQL wildcard characters which is a simple underscore
_. The reason for this is that one underscore
_ will replace one character in the queried value.
After executing the query with the PostgreSQL wildcard, it will output the following result:
This result is not a surprise and resulted in our expected output. It includes the rows for the
Mel, where for
Mel there exist two rows.
Search with dynamic cases or full-text search
Another problem most people are facing is to search in different cases with PostgreSQL wildcards. This functionality in most programming languages is called
includes prototype function is existing for arrays. This can be used to check if an array includes a specific value. This works relatively well for primitive data types.
The above script will output
true to the console since the list includes the number
includes functionality can also be used for texts like:
This script also outputs
true to the console. A problem here is that names like
Melissa would not be matched with the same script but just exchanging the name because the functionality is case-sensitive.
A simple trick is existing in almost all programming languages to make the incoming value, which is the name, in this case, lowercase. Basically it will transform all letters of a word to the small version of the letter.
Melissa would get to
includes functionality is not existing in PostgreSQL or SQL in general. But for these cases, the PostgreSQL wildcards exist. As in the section earlier mentioned, there is an operator
% that matches zero to an infinite amount of random characters in a word. We can use this operator to match our desired rows.
All of the highlighted rows have
mel of the value of the
example_column included. A simple approach here is to write a query which includes both like:
This is quite tedious, if there are more cases then just the first letter changing. The more dynamic approach to this would be to convert the
example_column value to lower-case to work in a simpler way with it. To enable this feature and adjust the query, we just have to use the
LOWER functionality of PostgreSQL. The functionality should be applied to the
example_column column. The SQL query for this scenario would look similar to this:
It would result in the exact same table output as before but the query looks far more condense and understandable.
Another approach would be to use the
ILIKE keyword. This keyword is supported in a large amount of PostgreSQL versions. It is not in the SQL standard which means that it might be not transferrable to other SQL dialects. Our SQL query from before would look like the following snippet:
This looks a bit cleaner, but it is good to mention that this is a bit slower than the
LOWER conversion which we have done before. But it is a good alternative for case-insensitive SQL queries.
Another alternative would be to use a Regular expression (also called Regex).
This could be realized with an SQL query like the following.
This query is even slower, but I would not recommend using Regular Expressions in these cases because they are hard to understand for a lot of people, even though the expressions are much more flexible. Sometimes you need to have the flexibilities and then regular expressions are exactly what you want to use.
All of these approaches here work in simple solutions for full-text searches. But when performance is important for your project it might be worth it to look into
to_tsquery and proper indexing of your PostgreSQL tables. You can find more information about the text search in tables in the PostgreSQL documentation here: https://www.postgresql.org/docs/current/textsearch-tables.html.
PostgreSQL wildcards are an amazing tool to extend queries. Queries are adjusted to include a wide range of dynamic filtering which is important for systems with big data. Most of the time this data is unstructured on the value-base. A good example is the sale types mentioned in an earlier example which could be
SALE2. For some sub-systems, this difference might be important but for some systems, it is not important. Also, the data should rather be structured to include a
detailed_... column for these cases to achieve SQL normalization. With the wildcards, we can search for these "unnormalized" values in an easy way.
Originally published at https://www.kevinpeters.net.