Regular Expressions for Regular People
What started as a Flatiron exercise in creating custom validations ended in confusion when the solution included this:
CLICKBAIT_PATTERNS = [
/Won’t Believe/i,
/Secret/i,
/Top [0–9]*/i,
/Guess/i
]def is_clickbait?
if CLICKBAIT_PATTERNS.none? { |pat| pat.match title }
errors.add(:title, “must be clickbait”)
end
end
This innocuous block of code bewildered me because I had yet to venture into regular expressions (admittedly, I didn’t even know at the time that it was a regular expression, even if only in part). After just a little bit of research I learned that this is about as simple as it gets for regexes, but my interest was piqued and so I investigated a little further. Here’s a little overview that will, hopefully, dispel some of the confusion and/or anxiety around regex.
What is a Regex?
Simply put, a regular expression is a pattern that can be used to identify, verify, or alter texts that match that pattern. A common example of a regular expressions is email validation, but since the (now-embarrassingly) simple expression above was used as a custom validation, let’s use that to demonstrate what I mean. But first, a few notes…
How it Works
Regular expressions search for a specific pattern within a string and identifies the index. In its most basic expression, we can have a string, a regex operator (=~), and a pattern. The operator will compare the string to its left to the pattern to its right, and if the pattern isn’t found in the string, then it will return nil.
/Delimiters/
Forward-slashes (/) are delimiters that demarcate the boundaries of a pattern. (Already our validation code above is starting to make sense!) Below we have an example use of delimiters and the regex operator,
"Meet Mouse, my cat!" =~ /cat/
=> 15
Wait, why is it returning 15? Don’t worry, that’s a good thing. If the pattern is found in the string, then the expression will return the index of the string where the pattern starts. And if you counted, ‘cat’ does indeed start on the sixteenth character of the string, but since the index starts at 0 instead of 1, it returns 15.
Match( )
Okay, useful as it is, if we don’t want the return value to be a number, we can use the match( ) method, which will return the pattern found:
"Meet Mouse, my cat!".match(/cat/)
=> #<MatchData "cat">
Flags
Regular expressions include a variety of flags that alter the nature of the expression. For example, although regular expressions are by default case sensitive, the ‘i’ flag takes care of that.
"I want my pizza!" =~ /Pizza/
=> nil
However…
"I want my pizza! =~ /Pizza/i
=> 10
Interesting, right? Well, there are several other such flags to help us out, but to include them here would substantially broaden the scope of this blog, so we can save those for another time.
The * Quantifier
Finally, we have a quantifier. Quantifiers represent a range of numbers that correspond to the number of times the preceding character may (or may not) appear. For instance,
"adorable" =~ /ad+/ => 0
"so adddddddddorable" =~ /ad+/ => 3
As a metacharacter signifying that the preceding character will appear at least once, + appended to ‘ad’ will be matched whether there is one ‘d’ after an ‘a’ or nine ‘d’s following an ‘a’. Meanwhile, * indicates a quantitative value of 0 or more, meaning that the preceding character might appear, or it might not, but either way the pattern will match.
"so adddddddddorable" =~ /ac*d/
=> 3
So even if we include a random letter in our pattern, ‘c’ in this case, our pattern will still match because we are rendering that particular character inconsequential.
Now What?
With this little bit of information, we can refer back to the initial problem (the custom validation method) and understand the process:
- An array containing regex patterns is made a constant variable;
2a. In using the none? method in #is_clickbait? to iterate through regex patterns, each pattern is evaluated against title to find a match;
2b. Our third regular expression of the array, /Top [0–9]*/i, means that we are looking for at least the word “Top”, but also any possible number following it: Top 10 or Top 100000. (And don’t forget that, thanks to the ‘i’ flag, our title can even be a lowercase ‘top’.);
3. If the .none? method returns true, that means that the title is missing its clickbait appeal and will generate an error: “must be clickbait”.
What Else?
Now we see how simple it can be to create a regular expression. Admittedly, tailoring this introduction to regex to a simple, specific expression, fails to capture the breadth of regex as well as its utility; be that as it may, having merely skimmed the surface of regular expressions, I look forward to not only learning more about regex, but finding better, more relevant examples that explore the range of its capabilities.