Regular Expressions: A Parable

Skip the story, get to the goods

The Hay Farm

A hay farmer has a haystack. His competitor, trying to sabotage his business, has come in the night to hide needles in his product. The hay farmer hires you to extract the needles and save his business.

Lucky for you the haystacks are made of strings, so you write a little program to help you out.

The sabotaged haystack.

There might be a needle in this haystack. Needle A. But you aren’t sure, so you reach into your bag of programming tricks to help you find out.

You pull out another needle. Needle B. Needle B is also a string of characters, and you’re going use it as bait to draw Needle A out of hiding.

haystack1 = “There’s a needle in here somewhere!”
needle_B = “needle”
haystack1.include?(needle_B) # => true

You congratulate yourself on your detective work. You found the needle. Now you try extracting it from the haystack:

haystack1 = “There’s a needle in here somewhere!”
needle_B = “needle”
haystack1.delete(needle_B) # => “Thr’s a i hr somwhr!”

Well, you got rid of the needle but you also destroyed half the haystack in the process! That’s because .delete, along with .count, are counting any instance of the characters you supply, making those methods useless for our quest.

A questionably successful needle extraction.

You move onto another haystack, more in tact, but with more hidden needles. You ditch .delete and just try to substitute instead. What if you wanted to find the needles but NOT things that looked like needles?

haystack2 = “Needles needle the needled, needlessly.”
needle_B = “needle”
haystack2.gsub(needle_B, “”) #=> “Needles the d, ssly.”
A not-at-all successful needle extraction.

Not only did the first needle escape, you destroyed even more of this haystack than the first! Your employer is probably going to fire you unless you get your act together. You reach the third haystack, which not only has needles but pins too!

You reach back in your bag and pull out a strange object. It’s a magnet with some cryptic symbols on it. You hold it up to the third haystack:

A mysterious magnet comes to the rescue.
haystack3 = “Needles needle the needled, needlessly while pins are pining for pints.”
magnet = /needles?\b|pins?\b/i
haystack3.gsub(magnet, "*") # => “* * the needled, needlessly while * are pining for pints.”

The strange magnet not only extracted all our needles and pins, it left the innocent imposters alone! Instead of relying on strings, it used a pattern to find the needles.

Happy regex magnet, happy haystack.

Using the regex magnet made your last extraction painless, AND you left the haystack entirely in tact. You decipher the code as such:


  • / : The start of a regex
  • needle : The word we want to match
  • s? : A character with a ‘?’ behind it means “Find the string before this with or without this last character”
  • \b : Any word boundary — meaning it will ignore the matched word if it shows up within another word
  • | : The “or” operator — find needles OR pins
  • repeat for pin
  • /: The end of a regex
  • i: Ignore case, so it will find it if it’s capitalized or not
Regular expressions go beyond strings by describing a pattern of characters.

The farmer’s crop (mostly) salvaged, you investigate the magnet further. There’s a decoder ring attached. You inspect.

Ruby’s Magical Regex Decoder Ring

Basic Regex Decoder Ring

Or, 10 friendly regex codes you can start using today!

  1. Literal string, will match these letters in this order, case sensitive
string_1 = “Hittin’ the hay!”
regex_1 = /hay/
string_1.gsub(regex_1, “*”) #=> “Hittin’ the *!”

2. Brackets will find each instance of the characters within

string_1 = “Hittin’ the hay!”
regex_2 = /[hay]/
string_1.gsub(regex_1, “*”) #=> “Hittin’ t*e ***!”

3. Adding a carat (^) within the brackets will match everything EXCEPT what follows

string_1 = “Hittin’ the hay!”
regex_3 = /[^hay]/
string_1.gsub(regex_3, “*”) #=> “*********h**hay*”

4. Adding a carat outside the brackets will only search from the beginning

string_1 = “Hittin’ the hay!”
regex_4 = /^Hit/
string_1.gsub(regex_4, “*”) #=> “*tin’ the hay!”

regex_5 = /^hay/
string_1.gsub(regex_5, “*”) #=> nil

5. Adding a $ at the end will only search from the end

string_1 = “Hittin’ the hay!”
regex_6 = /hay!$/
string_1.gsub(regex_6, “*”) #=> “Hittin’ the *”

regex_7 = /the$/
string_1.gsub(regex_7, “*”) #=> nil

6. A period will match any character at that index that follows the preceding character. So it will find “h” plus matches of whatever letter follows.

string_1 = “Hittin’ the hay!”
regex_8 = /h./
string_1.gsub(regex_8, “*”) #=> “Hittin’ t* *y!”

7. A plus sign matches when the preceding character is found 1 or more times. Notice it took out both “t”s in the following example:

string_1 = “Hittin’ the hay!”
regex_9 = /t+/
string_1.gsub(regex_9, “*”) #=> “Hi*in’ *he hay!”

8. Using the pipe (|) operator means “or” and works like “and” in conditionals. The “i” tag makes the expression case insensitive.

string_1 = “Hittin’ the hay!”
regex_10 = /hit|hay/i
string_1.gsub(regex_10, “*”) #=> “*tin’ the *!”

9. Using \s will find all the whitespaces, \d will find any digit 0–9. We combined them here to only substitute the digits that come after a whitespace.

string_1 = “Hittin’ the hay4u 2day!”
regex_11 = /\s\d/
string_1.gsub(regex_11, “*”) #=> “Hittin’ the hay4u*day!”

10. A \w matches all the characters from [a-zA-Z0–9_].

string_1 = “Hittin’ the hay!”
regex_12 = /\w/
string_1.gsub(regex_12, “*”) #=> “******’ *** ***!”

Regex is super helpful for reformatting, splitting, counting, replacing, etc. The links below are great resources to play around with regex and learn the tricks that will be most valuable to your code.

  • Rubular: Ruby-specific regex tester.
  • Regex 101: Teaches you about what each expression is doing in real time while you’re typing it out.
  • RexEgg: A comprehensive website on all things regex. Bonus: a t-rex is the mascot.
  • Ruby Regex Tutorial: Bland but well laid-out tutorial on regex.
Courtesy XKCD