Simplifying Regular expressions in javascript

gbols
The Andela Way
Published in
6 min readSep 13, 2019
Photo by NASA on Unsplash

Have you ever had to validate an input field that is consistent with a specific pattern, and you find yourself using a bunch of if statements to get the problem solved, you would have noticed that at some point it can get pretty laborious to write and somewhat inefficient.

Consider the following example that validates a phone number from the input field given by the user.
Below is an example of a valid phone number.

07036005659+234 70 3600 5659.

From the given input we can deduce that.

  • A valid phone number starts with either a +234 or 0.
  • There are 10 digits after the initial match.
  • A valid phone number cannot contain alphabets or special characters.

Let’s use the laborious ways of trying to validate this input that comes in as strings.

NB: The function above could be simplified further but still would be inadequate compared to actually using regex expressions.

And we call this function by passing the appropriate input to test whether we have a valid phone number or not.

const isValid = validatePhoneNumber("+23470360086590");console.log(isValid) // valid phone number

Or

const isValid = validatePhoneNumber("070360086590");console.log(isValid) // valid phone number

Luckily for us, Javascript has a built object that can help with pattern matching of any type. It is called regular expression and also known as regex or regexp. It is sometimes viewed as something mystical or of the dark arts and hence don’t feel too bad if you aren’t comfortable with writing yours yet, hopefully by the end of this article your appetite would have been wet enough to get you started on your way to understanding and writing regular expressions rather than the usual copy and paste from StackOverflow. 😏

There are 2 primary ways that a regular expression object is defined in javascript.

  • The Regular expression literal: /expression/flags;
  • The Constructor function: new Regex(‘expression’, “flags”);

It is best to use the regular expression Literal as it has been optimised for performance except in cases where the regex is input from the user then a constructor function should be used.

Let’s solve our phone number problem using the regular expression approach.

First, our phone number must start with a 0 0r +234.

let pattern = /0/;
const input1 = "07036005659";
const input2 = "+2347036005659";

There are several methods available to use with a regex pattern which are test, search, match, matchAll, exec, replace and split.

In this tutorial, we will be making use of both the match method of the String class and the test method of the regular expression object.

Lets run the match method against our input and our pattern.

input1.match(pattern);returns [“0”, index: 0, input: “07036005659”, groups: undefined];input2.match(pattern);returns [“0”, index: 5, input: “+2347036005659”, groups: undefined]

We have a match that is because our regular expression finds the occurrence of 0 in any part of the string in the case of input1 was found at the first index while at input2 it was found at the 5th index.

What we need is a way to make sure that it searches for a 0 only at the start of the string.

Let’s update our pattern variable to ensure that it only matches a number that starts with 0;

The caret ^ outside of a character group is used to denote that the expression that follows must start the input to be matched.

Here is how it looks in code.

pattern = /^0/;
running against the match method yields.
input1.match(pattern);returns [“0”, index: 0, input: “07036005659”, groups: undefined]input2.match(pattern);
returns null

Now we are able to successfully match a number that starts with 0, but a valid number input can either start with a 0 or +234.

The | (OR) character is used to match the expression before OR after it.

Lets update our regular expression to reflect this change.

pattern = /^0|+234/;VM35303:1 Uncaught SyntaxError: Invalid regular expression: /^0|+234/: Nothing to repeat at <anonymous>:1:1

On trying to save our regexp we get an error because the + char is a special character with is used to indicate one or more occurrence of the preceding, but the preceding character is also a special character, hence it can’t be matched except escaped. The way to match any special character is by escaping it with the \ character,
Updating our regex looks like this.

pattern = /^0|\+234/;

Testing our inputs against our updated pattern returns.

input1.match(pattern);
[“0”, index: 0, input: “07036005659”, groups: undefined]
input2.match(pattern);
[“+234”, index: 0, input: “+2347036005659”, groups: undefined]

Now we have successfully been able to match the beginning of a phone number that starts with either a 0 or a +234.

There are still 10 digits to match after the initial match.

The way we match digits is putting all the possible combinations of a digit we expect that is 0123456789 or we use a character group of digits [0–9].

Let’s update our pattern to reflect this new change we would like to make.

pattern = /^0|\+234[0–9]/;input2.match(pattern);
[“+2347”, index: 0, input: “+2347036005659”, groups: undefined]
input1.match(pattern);
[“0”, index: 0, input: “07036005659”, groups: undefined]

From the results returned above, we can deduce that for

Input1 returns just the first digit this is because it is matching just the first digit of the rest of the expression hence we need to tell it to match that and the rest of the expression we do this by wrapping the OR arguments of the | character in a parenthesis;

Hence our pattern now looks like this.

Pattern = /(^0|\+234)[0–9]/;

Input2 we have +2347 returned as the match that is it only appends a “7” to the already existing match rather than the 10digits after the expression, this is because regular expression are lazy matcher by default that is it makes the fewest possible matches hence we have to tell it we want it to match exactly 10digits after the initial match.

We do this by adding {n} after the preceding expression we want to match n number of times.

Hence our pattern now looks like this:

Pattern = /(^0|\+234)[0–9]{10}/;input1.match(pattern);
(2) [“07036005659”, “0”, index: 0, input: “07036005659”, groups: undefined]
input2.match(pattern);
(2) [“+2347036005659”, “+234”, index: 0, input: “+2347036005659”, groups: undefined]

Hooray looks like we finally arrive at matching a phone number but for something else.

Let's test our pattern against inputs that are more than 10 digits after the initial match.

const input3 = "+234703600565910";
const input4 = "0703600565911";
input3.match(pattern);
(2) [“+2347036005659”, “+234”, index: 0, input: “+234703600565910”, groups: undefined]
input4.match(pattern);
(2) [“07036005659”, “0”, index: 0, input: “0703600565911”, groups: undefined]

We can see that it still matches even when the length is more than 10digits after the initial match, but it returns the correct results and discards the extraneous input.

We need to modify our pattern so that any input that is longer than 10digits after the initial match is considered invalid.
We use the special character $ at the end of the pattern to denote that it must end matching exactly after the preceding count.

when used after a string the $ character signifies that the string to be matched must end with the preceding character.

Our new pattern:

pattern = /^(0|\+234)\d{10}$/;

Testing this new pattern against all 4 inputs yields.

input1.match(pattern);
(2) [“07036005659”, “0”, index: 0, input: “07036005659”, groups: undefined]
input2.match(pattern);
(2) [“+2347036005659”, “+234”, index: 0, input: “+2347036005659”, groups: undefined]
input3.match(pattern);
null
input4.match(pattern);
null

And so we have a valid pattern that matches a valid phone number and returns null otherwise.

\d is a shorthand syntax for representing [0–9];

for further reading check out the following links.

--

--