Here’s what you need to know about Regular Expressions

Jennifer Olibie
9 min readMay 21, 2018

I feel like I’m the only one who still likes RegEx…maybe you would too.

I hear a lot of people complaining about how difficult and annoying RegEx can be. I've seen a lot of negative statements about it. Yeesh.

The first time I used RegEx, I was amazed. In just a few minutes, I was done with what I wanted to do. Simple as that. You can imagine my shock when I browsed online and saw a lot of hate speeches on my new love. Well, maybe I just haven’t worked for long with it..

Let’s talk about this Regular Expression(RegEx)

A regular expression is an expression/sequence of characters used to describe a pattern in a string.

imgflip.com

It allows you to define the pattern/arrangement of characters you are looking for in a string, e.g an email pattern.

Thus, if I am looking for all the emails in a string, I can simply ‘regex’ the string and it gives me the emails.

How does one create these expressions?

Regular Expressions can be constructed in two ways:

  1. Literally: Formally known as using a regular expression literal. The pattern is enclosed between forward slashes. This is cool and makes work faster when the pattern is not changing.
let pattern = /pattern/;

2. Constructing: Otherwise known as calling the constructor function of the RegExp object. This is good for when the pattern is changing or are getting the pattern from another source.

let pattern = new RegExp('pattern');

What makes up this pattern?

The RegEx patterns are made of the normal characters you know. Nothing scary or out of the ordinary, they just have special meanings here. Let’s look at some of them..

  1. Alphabets and numbers: Of course, the pattern can contain the letters of the alphabet and numbers, afterall the strings are mostly made of alphabets and numbers. These characters can be themselves or mean something else depending on what you want. For direct matching, let the characters be themselves. They are non-special characters. Eg, the pattern /love/ matches only when it sees the characters ‘love’ together as in ‘I love you’. It doesn’t match in ‘I lo ve you’ because there’s a space character in-between ‘o’ and ‘v’.

Most times, we need something more than a direct match, or we dont know what exactly we’re searching for, then we’ll involve special characters. Some are:

2. Escaped characters: A backslash(\) is generally an escape character which means that, when it precedes a character, it gives the character a different meaning.

A backslash before(preceding) a non-special character makes it a special character. E.g

The letter 'w' in itself finds w but \w finds all word characters ie a-z, 0-9 and underscore(_).

A backslash before(preceding) a special character makes it non-special. E.g

* by itself means 0 or more of the preceding character whereas \* finds * itself.

3. Anchors: Anchors are characters used to specify the position in which a match must occur. When you specify the anchors, the regex looks for a match only in the specified position. Examples include:

i. ^ : This specifies that the match must occur at the beginning of the input or line.

The regex, /^c/ matches the c in children and not the c in can or cook in the string: 'children can cook'

ii. $: This specifies that the match must occur at the end of the input or line. It can match the character before the newline character is the multi-line flag is set.

The regex, /k$/ matches the k in sick and not in kids in the string: kids get sick.

iii. \b : This specifies that the match must occur on a word boundary ie beginning or ending of a word where \B is the opposite.

The regex, /b\b/ matches the first b in babies and not the third b, babies. 

For more, visit this.

3. Character Classes : A character class defines a set of characters, of which, any of them can occur in a string and be matched. They include:

i. []: Called positive character groups. Any character inside the square bracket is matched.

/da/ matches a and d in abcd but doesn't match c and d.

ii. [^]: Called negative character groups. Any character inside the square bracket is not matched. Eg

/^da/ matches bc in abcd but doesnt match a and d.

iii. dot(.): This matches one out of any character. Eg

/d./ matches d and any other character after it like da, do, d_, etc

iv. \w: Known as the word character. This matches any one of the word character i.e a-z, 0–9 and underscore(_). Can also be written as [a-z0–9_].

v. \W : Non-word character. It is the opposite of \w. It matches any non-word character. It can also be written as [^a_z0–9_].

vi. \d: Known as digit character. Matches any digit from 0–9. Also written as [0–9]

vii. \D: Known as non-digit character. Matches non-digits. Also written as [0–9].

4. Groups: This enables you to combine some parts of a regular expression and work on them as a singular entity. Sub-expressions are usually grouped using brackets(). Grouping Sub-expressions can enable you to:

i. Capture the expression: Also known as capturing parenthesis. This essentially tells the regex to remember the captured expression and you can reference to them later. Eg:

using /(abc).\1/ on the string 'this string captures abcdabc' . The abc is captured and remembered and the \1 specifies that it should use the remembered expression. The . in-between means any character in this case 'd'.

Capturing parenthesis are numbered sequentially, i.e the first capture is 1, the second is 2 and so on and the the computer remembers them like that. Eg

/(hi)(hello)\2\1/ will match 'hellohi'

ii. Non-capturing parenthesis: To group elements but not capture them, use /(?:pattern)/. The ?: tells the computer that it should just group but not remember the expression.

iii. Lookaheads: I’ll explain this using examples.

/a(?=b)/ means match a only if it is followed by b. Thus, it matches the a in 'abcd' but not the a in 'acfd'./a(?!b)/ means match a only if it is not followed by b. Thus, it matches the a in 'acfd' but not the a in 'abcd'.

Visit here for more of these.

5. Quantifiers: This specifies the number of the characters or group that has to be present before a match is made. They include:

i. *: This specifies that it should match 0 or more of the preceding character. Thus:

/ca*/ matches c in cloth, ca in catch, caa in caald essentially means match c and any number of a's following it.

ii. +: This specifies that it should match 1 or more of the preceding character. Thus:

/ca+/ does matches any in cloth, but matches ca in catch, caa in caald essentially means match c and 1 or more of the a's following it.

iii. ? : This specifies that it should match 0 or 1 of the preceding character. Thus:

/ca?/ matches c in cloth, ca in catch but doesn't match caa in caald. It essentially means match c and 0 or 1 of the a's following it.

iv. {n} : n is an integer specifies the exact number of times the preceding expression is to be matched. {n,m} specifies that it should match the preceding character from n to m times. If the m is omitted, it should match n and above times.

/c{2}/ - matches c that occurs exactly 2 times
/c{2,}/ - matches c that occurs 2 or more times
/c{2,5}/ - matches c that occurs 2 times but not greater than 5 times.

v. Greediness and Laziness: Most regular expressions are greedy, in that they match the longest possible characters/occurence as possible. To make it lazy, appending ? after it specifies it should match as few characters as it can at a time. Eg

using the string '<html>Hi</html>'
The regex /<.+>/ matches the whole string <html>Hi</html> whereas
The regex /<.+?>/ matches just <html> and </html>.

6. Alternation: This enables conditional matching in regex.

i. or (|): This allows the regex to match any of the specified expression. Eg

/bl(ac|ak)s/ matches blacs and blaks in a string

ii. Conditional Matching with an Expression: This enables matching one of two patterns depending on if an initial pattern is matched or not. Eg

(?(abc)d|0) matches abcd in 'abcdef0' but matches 0 in 'fgh0'.

Visit here for more explanation.

7. Flags: Flags in regular expressions are used to specify how the expression in regular expressions are inteerpreted by the computer. The flags are appended at the end of a regular expression as follows ‘/pattern/flag’. We have:

i. global flag(g): It prevents the search from getting only the first match. It enables it to also get the subsequent searches in a string.

ii. ignore case flag(i): It makes the search case insensitive.

iii. multiline flag(m): This enables searches to done line by line. This is essential for start(^) and end($) anchors to match start and end of lines instead of start and end of the whole string.

iv. unicode flag(u): This when set, makes other escapes strictier. Thus unrecognised escapes cause errors. Also, it enables one to use extended unicode escapes. That is

Without the flag, things like \u{1234} can technically still occur in patterns, but they won’t be interpreted as Unicode code point escapes. /\u{1234}/ is equivalent to /u{1234}/, which matches 1234 consecutive u symbols rather than the symbol with code point U+1234.
Setting the u flag allows the expression /u{1234}/u to match the symbol with the unicode character(U+1234)

v. sticky flags(y): This allows one to specify the index(0 indexed) to start from in searches. The y flag indicates that it matches from only the index specified by the lastIndex property of the regex. Eg

var str = 'chidera'
var regex = /hi/y;
regex.lastIndex = 3;
str.match(regex); // this returns null because the search starts from position 3.
regex.lastIndex = 1;
str.match(regex); this matches because it starts from position 1.

Wow.. That feels like a lot to take in. Break Timeee:D

So, we know about Regex and all. How exactly do we apply it?

Basically, from all the above, Regex are used to find patterns in strings. What then? You can choose to

  1. Just check if the pattern exists in a string and/or return the position.
  2. Find and bring out the matched patterns.
  3. Find and replace the matched patterns.
  4. For validation i.e to check if an input matches the required pattern.

In JavaScript, you can use regex in the following methods :

  1. exec : It searches a string for a match and returns an array of the match or null if no result is found. This is called on the regex and the string passed in as argument.
syntax: regex.exec(string)

2. match : Same with exec except that this function is called on strings and the regex passed in as a parameter.

syntax: string.match(regex);

3. test: It tests for a match in a string and either returns true or false depending on if a match is found or not. It is faster than exec/match.

syntax: regex.test(string);

4. search : It tests for a match in a string and returns the index of the match or -1 if not match.

syntax: string.search(regex);

5. replace: It searches for a match and replaces the match with a specified replacement.

syntax: string.replace(regex, replacement);

6. split: It searches for a match in a string and breaks the string into arrays at that match.

let str = 'Icneedctoceat'
let words = str.split('c'); //outputs ['I', 'need', 'to', 'eat']

That’s it for now.

Work your way into regex. It’ll be fun.

Remember:

Believe!!

For more extensive read, visit this.

I’d recommend www.regexr.com for testing out and this for visualizing the expression.

Feel free to correct any errors though :D

--

--