A practical beginner’s guide to RegEx (Regular Expressions)

Jen Weber
Frontend Weekly
Published in
5 min readFeb 4, 2019

Want to skip the jargon and giant list of foundational concepts? This is just enough to get you through whatever RegEx brought you here.

My regular expression. Photo by Bernard Hermant on Unsplash

What is RegEx?

It’s a handy way to search through strings to find what you need. You can search for patterns of numbers, letters, punctuation, and even whitespace. It looks ugly when you write it, but it’s really powerful. When you’re learning, it’s a good idea to practice using a Regex Tester like this or this, so you can see in real-time what is matching.

The examples in this article focus on the match method in JavaScript, but many other programming languages have RegEx support. To see a real-world application of RegEx, see this CodePen where we convert code syntaxes.

Matching a word

Let’s say you need to find out if a string has the word “Ember” in it. We put our search term inside the / /. Whatever is inside the slashes becomes part of a Regular Expression, aka Regex.

const str = "Ember is a front-end JavaScript framework."str.match(/Ember/g)
// returns ["Ember"]

The g that comes after the last slash is an option that means “show me all the matches, not just the first one.” It’s short for global.

Matching multiple words

To match multiple words, just write exactly what you want to be matched.

const str = "Ember is a front-end JavaScript framework."str.match(/JavaScript Framework/g)
// returns ["JavaScript Framework"]

Matching two different words

RegEx can do logic. For example, let’s say we want to match React or Ember in a string. We separate the terms with a | which is how you indicate “or”.

const str = "React is a front-end JavaScript library."str.match(/Ember|React/g)
// returns ["React"]

Since we use the g or global option, we’ll get all matches back:

const str = "React and Ember use JavaScript."str.match(/Ember|React/g)
// returns ["React", "Ember"]

Ignoring case

Sometimes, you don’t care if your word is uppercase or lowercase. You can use the i option, short for caseinsensitive . It goes with the other options, right next to the g.

const str = "Ember is a front-end JavaScript framework."str.match(/javascript/gi)
// returns ["JavaScript"]

Matching specific punctuation

Some punctuation has special meaning in RegEx. It can get confusing if you are searching for things question marks, periods, and parentheses. For example, a period means “match any character.” The easiest way to get around this is to “escape” the character. A \ is a way to say, “Hey RegEx, treat the very next character as a normal thing, not part of your syntax.”

const str = "Ember is a front-end JavaScript framework."str.match(/framework\./gi)
// returns ["framework."]

Matching a single digit, unknown number, like “6”

[0-9] will match a single number, any number.

const str = "Ember has a 6-week release cycle."str.match(/[0-9]/gi)
// returns ["6"]

Since we have g for global that finds all matches, if we have a multi-digit number, it will be split into single digits:

const str = "2019 will be a good year"str.match(/[0-9]/gi)
// returns ["2", "0", "1", "9"]

Matching multiple, unknown numbers, like “2019”

[0-9]+ will match a number that has multiple digits. The plus means, match any quantity of this kind of character.

const str = "2019 will be a good year"str.match(/[0-9]+/gi)
// returns ["2019"]

Matching a single, unknown letter

[a-z] will match a single letter, any letter. Let’s say you are looking for any letter that comes before o:

const str = "Let's build something"str.match(/[a-z]o/gi)
// returns ["so"]

Matching multiple, unknown letters

[a-z]+ will match any letters in a row.

const str = "Feb 14th"str.match(/[a-z]+/gi)
// returns ["Feb", "th"]

Matching whitespace

Let’s say you need to get only a word that has whitespace before it. The \s below means “match a whitespace.” [a-z]+ means match any number of letters.

const str = "web developer"str.match(/\s[a-z]+/gi)
// returns [" developer"]

Notice that it includes the space. Want to get rid of that space and only match developer? Your Google Search terms are “lookahead” and “lookbehind”.

Matching any one character

A . in RegEx means “match one of any character type.” It’s a single wildcard.

const str = "web developer"str.match(/w.b/gi)
// returns ["web"]

Matching multiple of any character

A .* means match any quantity of any character type — punctuation, spaces, letters, numbers… Here, we’re matching anything that comes between the letters d and r.

const str = "web developer"str.match(/d.*r/gi)
// returns ["developer"]

Layering the logic

Earlier, we covered the way to do “this or that” in RegEx using |. It’s common to want to combine logic, to say things like “I want all the pens and markers that are blue or green.” You can do this by grouping the OR statements in parentheses, (). I want a (blue or red) (marker or pen).

const str = "I have a blue pen and a red marker"str.match(/(blue|red) (marker|pen)/gi)
// returns ["blue pen", "red marker"]

Tips for reading other people’s RegEx

Part of the problem with RegEx is that it’s frequently a big old mess to look at. Our eyes are really bad at figuring out where different parts of the statement begin and end. My best advice to you is, if you are trying to read someone else’s RegEx, break it into multiple lines. It won’t run in your coding environment, but you can read it, and when you have figured it out, you can add a note for your coworkers or future self.

Here’s an example that’s gross:

const str = "EmberConf is March 18th-20th, 2019."str.match(/\s[0-9]+[a-z]+-[0-9]+[a-z]+/gi)

Here’s how to break it down:

\s      a single whitespace character
[0-9]+ any numbers
[a-z]+ any letters
- a dash
[0-9]+ any numbers
[a-z]+ any letters
gi show all matches, and ignore case

It’s a lot easier to tell that this should match March 18th-20th.

Tips for writing RegEx that other people can read

The most helpful thing you can do is make a note that show examples of things that do and do not match.

// input "I am going on vacation May 1st-12th", output "1st-12th".
// Will not match "May 1-12"
str.match(/\s[0-9]+[a-z]+-[0-9]+[a-z]+/gi)

Lastly, it’s better to make things readable rather than very clever. Don’t use wildcards you don’t need. Avoid fancy things unless you really, really need them. For most developers, RegEx is a thing they use infrequently, and it is a frustrating experience to have to do tutorials just to understand one line of code.

Continued learning

Good luck on your quest!

--

--

Jen Weber
Frontend Weekly

Huge fan of Ember.js, open source, HTML, and accessible, inclusive web apps. www.jenweber.me