Making Regex as Painless as Possible

An Alien language

Rob Egeland
5 min readJul 14, 2022

If I told you to picture an alien language in your mind what do you think would come? Some alien-looking symbols, maybe a futuristic hieroglyph. Whatever came to mind I bet Regex isn’t far off.

But once you get the hang of it, it becomes a very powerful tool you can wield. You can tell your friends and family you learned an alien language!

So what is it?

Regex is short for regular expression and it is a language for writing and creating patterns for matching groups of text or in the case of us programmers…data! The critical phrase in that sentence is that it’s a language, like JavaScript, and it can be mastered saving you thousands of hours in your work. The real power of regex is its complexity, when you use it correctly you can make the most complex patterns for any application you may need for.

Break it up!

When I started diving into regex it was very overwhelming, trying to go character by character figuring out what each meant and how they were related to each, there’s a better way. All of the characters and symbols can be broken up into three groups Characters, Groupings, and Quantifiers. Below I will go throw each of these groups, explain them and show you some common characters/symbols in each.

Characters

These are the meat and potatoes of regex, the symbols that will tell your expression what exactly to look for. These all have special meanings but are pretty easy to figure out once you get the hang of it, ill show you some examples.

The lowercase \d character means any digit! So the example above will match with any three digits in a row ex. 123, 375, 344, etc. another example is the \w character.

This is any alphanumeric character so any four letters or numbers in a row will match. Ex. tree, race, 2532. There are also special characters in regex.

Two of these, ^ and $, are called anchors. These mark the start and end characters of the text.

This example will match any text that starts with the word if. If the ^ character matches the start of a word you can probably guess what the $ anchor does. If you guessed matches anything that ends with that text you would be correct! There are a few other special characters that we will talk about a little later on as they are quantifiers.

Groupings

Groupings in Regex boil down to three main components: characters classes — [], capture groups — (), and Or Operator — | . Character classes use brackets and they tell regex to match only one out of several characters you give it. An example will explain this concept better.

In this example lets say you were trying to match text with hay, like horses eat, and hey, a greeting you might say to someone. Instead of inputting both words to match you can use a character class and anything that starts with H and ends with Y with an A or E in the middle will match! You can also add a hyphen to input a range like 0–9 or A-E, more on that when we touch on quantifiers.

Building off of this are capture groups that use parenthesis. Capture groups are similar to character classes but instead of matching one out of the several characters they treat all of them like a single unit. So for example if I were to put the Cat into a capture group

This is a single group with the individual letter “c” “a” “t” in it. So obviously the word cat would be the only text that would match with this capture group. In other words, whatever you put into a capture group will be searched for exactly.

If you have some experience with programming the Or operator should look familiar to you, and good news! It works the same as it does in programming languages. The or operator will look for whatever is on the left side of the operator or whatever is on the right.

This is similar to the character class the main difference being that it can match with groups of characters instead of just one!

Quantifiers

Quantifiers are like an add on to your search, they can be used with everything we have just learned. They are tacked on just after whatever character or group you want to specify

So in the example above it will match text that has the characters de followed by 2 f’s. There are three symbols you can use if you don't want to specify each time.

If you need a specified amount you can use curly braces like in the example above. When using the curly braces you can also use a range like we could with the character classes but we use a comma instead of a hyphen.

These can also be used with capture groups as well!

To wrap up

I hope you found this a bit useful as I know when I tried to learn Regex for the first time it went in one ear and right out the other. But once you have a grasp of the topics and play around a little you will see the infinite potential it can have in your programs. Now I really just skimmed the surface of what regex has to offer so if you would like to dig in deeper there is a great free resource called how to become a regular expression power user by Blair Williams that is much more in-depth!

--

--