Regex: Code Language

Introduction

Valerie W. McCarthy
3 min readDec 9, 2016

Ruby has the reputation of being a “plain English” coding language where the syntax supposedly flows like a language. Many ruby symbols and methods are so intuitive you can almost guess them: if you can’t remember which method counts the characters or elements in a string or an array, .size and .length both get the job done. "Valerie McCarthy".downcase returns the string in all lower case: “valerie mccarthy”. But the ease of “plain English” Ruby broke down quickly when I started diving into regex, short for regular expression.

A regex is a collection of symbols and syntax that creates a special sequence of characters, or pattern. This expression is used to help match or find other strings or sets of strings. Useful ruby methods like .sub and .gsub often depend on regex to work efficiently. And even though the origins of regex is complicated math theory, regex was no doubt created to simplify our lives. But it doesn’t take long to fall into what feels like a black hole when trying to build and decipher complex regex statements.

Basic Regex Nomenclature:

In Ruby, a regex is typically created by writing a pattern between slash characters: /pattern/. The pattern is built upon what a basic “alphabet” as follows:

Basic regex alphabet.

These foundational elements and syntax are combined to create powerful matching capabilities. These patterns can be relatively straightforward or frustratingly complex.

Simpler Examples:

Match exactly two characters, such as match “ruby” or “rube”: /rub[ye]/

Match exactly 3 digits: /\d{3}/

Delete Ruby-style comments:
phone = "2004-959-559 #This is Phone Number"
phone = phone.sub!(/#.*$/, "")
=> find the character “#” and select any number of any character until end of string; replace it with empty string
puts "Phone Num : #{phone}"
=> Phone Num : 2004-959-559

Remove anything other than digits
phone = "2004-959-559
phone = phone.gsub!(/\D/, "")
=> find any non-digit character and replace it with an empty string
puts "Phone Num : #{phone}"
=> Phone Num : 2004959559

Getting More Complicated:

But it doesn’t take long to come across a much more complicated regex. Let’s take a look at the following regex examples:

Match ruby&rails or Ruby&Rails: /([Rr])uby&\1ails/
If you do not look closely, you might think this is a simpler regex example. But this only works because of the \1 in the middle. This \1 is considered a back reference and matches whatever the 1st group matched.

Add commas to a large integer:
number.to_s.reverse.gsub(/(\d{3})(?=\d)/, '\\1,').reverse
Yes, there are so many use cases for this pattern, but it is far from obvious. Let’s walk through the parts:
.to_s.reverse.gsub: reverse the number so that we can count every three digits from the end of the number than apply .gsub method to search and replace/add
(\d{3})(?=\d): find 3 consecutive digits, and then check ahead to see if the next character is a digit
\\1,: backreference twice, grab that and add a comma
.reverse: reverse the string to put the number back in the right order.
Voila!

Match/validate a phone number:
1?[\s-]?\(?(\d{3})\)?[\s-]?\d{3}[\s-]?\d{4}
1?[\s-]: make the first digit “1” optional, matches any white space character along with “-”
\(?(\d{3})\): match any three digits
?[\s-]: match any white space along with "-"
?\d{3}[\s-]?\d{4}
: match three digits, optional white space and "-
" followed by 4 more digits

Good resources:

As you can see, it doesn’t take long to get deep with regex. But there are many resources out there to help you wade through the morass of regex. A few that I found to be particularly helpful are here:

https://www.tutorialspoint.com/ruby/ruby_regular_expressions.htm: really great resource. Has comprehensive library of syntax, most of which are paired with helpful examples.

http://www.regextester.com/: Regex test editor. Not only can you type in your regex and it will spit out what string is produced, but you can also hover over subsections of your regex to understand exactly what it is doing.

https://regexone.com/lesson/introduction_abcs (thanks JJ!): Regex interactive tutorial. If you have an hour to spend on regex, this is great. It starts simple and builds on your knowledge. The practice problems at the end are real world and help to bring home regex syntax and principles.

https://ruby-doc.org/core-2.1.1/Regexp.html: The source documentation for Ruby, but this is not super straightforward (or helpful) for a novice.

Have fun!

--

--