Regular Expressions

John Chriest
3 min readMay 9, 2018

I remember staring at the hieroglyphic-esc chicken scratch called regex (Regular Expressions), and thinking something like, “Oh man…I’m gonna ignore this and hope that I won’t have to mess with this anytime soon”. It seems like something that you may never have to use, but learning regex and using them will save you so much more time and difficulty…. that is after the difficulty of learning this language within a language.

Firstly, a little about regex. They are used as a way to find occurences of characters in a string. They can be used to find letters, numbers, punctuation, white-space, etc. The language that points to these specific characters is what looks so difficult.

/(?<=<b>)\w+(?=<\/b>)/

Regex are delimited with forward slashes / / or %r{ } and the regex goes in the center.

There can also be charcter classes that we want our method to operate on, that are delimited by brackets [ ]. AND we can create capture groups that are delimited by parentheses ( ).

Lets start easy and figure our how to capture some code. Take a look at the example below:

We used the brackets [ ] to specify what range we were looking within. We used [a-z] to find any one lowercase letter. This search stops when we find the first character that matches, and returns the Matchdata instance of “h”. “Cool… so how do we choose more than that?” Slow down a bit! I’m getting there!

So now, we have expanded our search past choosing the first instance we come along. Also, it should be noted here that the first method “ /[a-z]/ is choosing ‘a’ or ‘b’ or ‘c’ or ‘d’…. and so on and so forth. Hence why only one is chosen. But the second method does the same, BUT after it finds the first match it goes back in (thanks to that little + sign) and looks for the second match, and third, and continues to do so until there aren’t any alphabetical letters to match, and it stops when it hits the white space. In a sense changing the search from ‘a’ or ‘b’ or ‘c’….etc, to ‘a’ and ‘b’ and ‘c’.. etc.

What if we wanted to catch the next word too?

Well we can do that by continuing the pattern.

So here we used the first pattern of [a-z]+ to capture all of the letters we first came across, but where we stopped at the white-space ( \s ) the first time, we added it to our capture group this time. We then continued out pattern of [a-z]+ not forgetting to add the + sign to the end to make sure we chose all of the letters we could.

This is fine and all, but I would rather capture just the words. One more little trick to get you started. Create multiple captures that you can call later.

This time we used ( ) to create capture groups 1 ([a-z]+) and 2 ([a-z]+) separated by \s to denote a white space. The two separate capture groups are noted in this part of our code #<MatchData “hello world" 1:”hello" 2:”world">. Doesn’t this look like an instance? Well, it is an instance of the MatchData class. We can also save this instance to a variable now, and use the data that we just captured.

That wasn’t as hard as you thought, now was it! You just crawled a bit, then took your first step. There is plenty more that this blog post didn’t get into, but mess around a bit on your own, and some day you’ll be able to understand code like this:

Conclusion

Just like anything else in life, coding included, it takes babysteps and then falling and getting back up many many times before you get to run. Not that running means you don’t fall anymore. Actually when you’re running, you will be falling from higher and fast speeds so it may hurt even more. So learn to fall now so you can roll and keep on moving gracefully.

Sources:

https://ruby-doc.org/core-2.2.0/Regexp.html

https://www.regular-expressions.info/charclass.html

--

--