Regex in Ruby — The Very Basics

Melinda Diaz
The Startup
Published in
4 min readSep 17, 2020

If you’re new to Ruby (or any programming language), you may have come across these strange bits of coding when you search stack overflow for answers to life’s greatest questions — like how to count the number of sentences in a string:

string.strip.split(/\w[?!.]/)

You may be wondering — what is this code within a code?

It’s called Regex — short for Regular Expressions! It’s not unique to Ruby (many languages have some form of regex), but we’ll focus on Ruby’s version here.

Regex code is used for specifying a certain search pattern of characters to be matched in a string (like finding words that end with -ing, or places in a string where a space follows a punctuation mark). Once we use this search pattern, we can pull out those matches or manipulate them in some way.

An example of a real-world use for regex would be validating an email address is entered correctly — it has a string of before an @ sign, and a string before a .com (or some variation) — but there are an infinite number of ways you may find it useful in your own code!

Regex Decoded

First things first, regex code goes between two forward slashes to differentiate it from the rest of your code:

/ *fancy code stuff* /

Now, the interesting part — there are a few major types of regular expressions that can go inside these slashes:

  1. Anchors
^     Start of a line              $     End of a line\A    Start of a string            \z    End of a string$     End of a string, or line\b    Any word boundary            \<    Start of a word              \>    End of a word

Anchors tell the search where to start or stop. For example:

/\A *fancy code stuff* /

says “start searching for ___ (whatever you put next in the regex) at the start of every string.”

*Notice the clever way \A (the first letter of the alphabet) denotes the start of a string, and \z (the last letter) denotes the end of a string*

2. Groups and ranges

Groups and ranges can include numbers or letters, and tell the search what characters you’re looking to match:

[abc]         Any single character (a, b or c)[^abc]        Excluding any single character (a, b , or c)[a-x]         Any lowercase character between a-x[A-T]         Any uppercase character between A-T[a-zA-Z]      Any character between a-z or A-Z[0-7]         Any number from 0 through 7(a|b)         A or b (but not both)

So if we had:

/ \<[a-m] /

…the search is saying: “go to the start of every word, and find those that are any letter between lowercase a and m”.

3. Character Classes

Character classes are almost the same as groups and ranges, but they denote an entire class:

.        Any single character\s       Any whitespace character\w       Any word character\W       Any non-word character\d       Any digit\D       Any non-digit

…so if you wanted to find any digit followed by a white space:

/ \d\s /

In addition to using character classes on groupings and ranges, they can also be applied to more specific strings — for instance, if you wanted to match and words that ended with “ring” — using /[ring]/ would find any instance of a word with those letters, in any order. Instead, we would put them directly inside the forward slashes, and use a plus sign to combine the expressions:

/ \w+ring /

…this example would match any word character ending in ring (so bring and string would all match, but ring and ringing would not!).

3. Quantifiers

Quantifiers denote how many instances of something you want to match:

a?          Zero or one of ad*          Zero or more of dm+          More than one of m!{3}        Exactly 3 of !5{3, }      3 or more of 5p{3, 6}     Between 3 and 6 of p

Quantifiers work for any characters, so if you saw this:

/[!.]{3}/

…it’s searching for any exclamation point or period, that’s repeated 3 times in a row — like !!!.

4. Pattern modifiers

Pattern modifiers are unique, in that the go outside of the forward slashes, and apply to the entire expression:

/i    case insensitive/x    Ignore white space

…so if we wanted to find any string that starts with the letters k-s, either lower- or uppercase:

/ \A^[k-s] /i

Using Regex in code

Now that you know how to interpret regex, you’re ready to actually use it in your code! These are a few basic methods that work well regex. To use them, place the regex inside parentheses after the method.

  1. .scan

.scan will return an array of all the items in your string that match the given regex. So:

"Rain water washing down the drain".scan(/\w+ain/)

…would return the array

 #=> [“Rain”, “drain”]

2. .match

.match is nearly identical to .scan, but only returns the first instance that it matches:

"Rain water washing down the drain".match(/\w+ain/)

…would return

<#Matchdata “Rain”> 

*Note that .match returns an object, not just the string — this method is often used to return a boolean, just to quickly test if a string does contain a specified pattern.

3. .grep

.grep is similar to .scan, but works as an enumerator to iterate over arrays and hashes.

If you had an array of strings that listed pets and their species, and wanted to return those whose names were exactly 5 letters:

pets = ["Barkley the dog", "Spot the dog", "Whiskers the cat", "Kiwi the bird"]pets.grep(/^\w{4}\s/)

…would look at: the start of each string (^) , followed by any 4 word characters (\w{4}), then a white space (\s). The returned array would be:

#=> [“Spot the dog”, “Kiwi the bird”]

There are a lot of very complex-looking regex snippets you’ll come across when searching for ways to simplify your code, but knowing the basics will make this foreign-looking (ruby) language much more straight forward!

A great place to test out your own regex is https://regex101.com/ . Once you’re feeling more adventurous, you can head over to https://alf.nu/RegexGolf and try to pass the challenges in as small an expression as possible.

Additional resources:

https://catarak.github.io/blog/2014/10/13/ruby-regular-expressions/

https://ruby-doc.org/stdlib-2.6.1/libdoc/strscan/rdoc/StringScanner.html

--

--