Regular Expressions
An Intro to Regex in Ruby
It’s Monday morning. You’ve just made a nice, hot cup of coffee, that you’re contentedly sipping as you plan your day. Feeling whimsical, you lean in close to a colleague, and whisper ‘regular expressionsssss’ around mouthfuls of crumbly biscuit. You’re likely to elicit one of the following responses:
“What’s a regular expression?”
“This is highly inappropriate behaviour. Also - eww.”
response.gsub!(/\b#{EXTREMELY_BAD_WORDS}\b/i, ‘Sunshine and Lollipops :D’)
If you’re in the “What’s a regular expression?” camp, then this introductory blog is for you.
Regular expressions (or ‘regex’) are a powerful tool for searching and matching specific patterns in strings. These patterns can consist of any combination of characters, including letters, numbers, white space and special characters. As a result, regex has numerous use cases, from pulling data, to extracting email addresses, to checking the strength of passwords.
But there’s a complication… regular expressions are HARD. The syntax is cryptic, extremely dense, and subject to different rules and dialects that vary according to the language you’re using. Luckily, regex is fairly straight-forward at an introductory level. Don’t believe me? Let’s have a look.
Literal Characters
/a/
This matches the string “a”, as well as any string containing the “a”. This works for all characters, with the exception of ‘special characters’, which have a specific meaning to the regex parser.
Special characters are given as [ ^, $, ?, ., /, \, [, ], {, }, (, ), +, * ], and can be searched as themselves in regex by prefixing them with a backslash, as follows:
/\?/
The Wildcard .(dot)
/.string/
Occasionally, you might wish to match any character at some point in your pattern. For example you might want to find all the occurrences of “tickle” and “pickle” in your pattern.
For reasons best known to you, and best left that way.
Regardless, this is where the wildcard .(dot) comes in handy, which will match any character, with the exception of newline. In our example, our regular expression would be formatted as such:
/.ickle/
This would not only match “tickle” and “pickle”, but “&ickle”, and “%ickle” too. This is where one must exercise caution — using the wildcard can result in more matches than required. Character Classes has entered the chat…
Character Classes
/[chars]/
A character class is a list of characters contained within the regular expression in square brackets. With our ‘tickle/pickle’ example, we can apply it as such:
/[tp]ickle/
This expression will match “t” or “p” followed by “ickle”, allowing you to avoid unwanted matches like “%ickle”.
We can also insert a range of characters into a character class, such as [aeiou] or [1–9], or even:
/[a-zA-Z0–9]/
Which will match any lowercase or uppercase alphabetical character, and any integer. As it happens this is quite a common request, so we have shorthand!
\w ---> is equivalent to [0–9a-zA-Z_]\d ---> is equivalent to [0–9]\s ---> matches white space (inc. tabs, regular space, newline)
This is merely an introduction to the basics of regular expressions in Ruby, so we’ll leave it here for now. But, those of you who would like to explore this topic further might appreciate the following link :)
Have fun!