hello! in this article i will be giving an introduction to regular expressions. note that regular expressions is quite a big subject and i won’t be able to cover everything in this article but i will cover enough to give you fundemental knowledge in the topic. thus i will provide links to various resources with broad coverage on the topic.
now, what are regular expressions?
Regular expressions, sometimes shortend as regexp or regex. In the most general sense it Is a set of characters that make up a search pattern. They are implemented in almost all serious programming languages and they are a very powerfull tool. At first gaze they might look a little frightning, but if you break it up it isn’t that hard to wrap your mind around it.
Implementing regular expressions
matching a series of digits:
var regex = /[0–9]+/;
matching cat, rat or hat
var regex = /(c|r|h)at/;
matching an absolute url
var regex = /https?:\/\/(www\.)?\w+\./;
matching top, pop and prop
var regex = /(t|p)r?op/;
in this example we declare a range and we also use one of the repetition operators. the square brackets lets us do a few things, one of them is letting us define a range by using the dash character. The range is ordered by the characters unicode position. In this case we declare a range from 0–9. And since all the digits unicode position is right next to each other we get a range from 0–9. Which are all the digits. The pluss sign you see at the end is a repetition operator. What it means is that can be repeated one or more times. so this regular expression means, match any series of digits.
cat, rat or hat example:
the first thing that appears in this example is an opening parathesis. this means we are declaring a group. this can be a very usefull tool. With this we can do things like applying operators to a set of characters. like var regex = /bo(hoo)+/; . indicating that we want to allow ‘hoo’ to be repeated. In this case our group contains c, r and h, seperated by the pipe character, which means or. The group is then followed by ‘at’. This regular expression means c or r or h followed by at. This makes cat, rat and hat all
evaluate to true.
absolute url example:
this is probablly the most complicated one so we’ll step through it
slowly. the frist part of this regular expression is ‘http’ then followed by an s followed by a question mark. The question mark in the realm of regular expressions means that the previous character or group is optional (can apear 0 or 1 time). So that means it can be http or https. This is then followed by a colon and two sets of forward and backslashes. The forward slash is used when we want to escape the character following it (remove its possible special meaning). Here we escape both the backslashes because they they mean the start or end of a regular expression. At this point our regular expression resembles that of a http(s) request, which is what we want. This is then followed by a group containing ‘www\.’. We have to escape the because in regular expressions it means any character but a new line. Following the group is a question mark (which we covered earlier) making it optional. we do this because things like browsers let you omit ‘www’ from the request. And then we se an escaped w character? but w doesn’t have a special meaning. Well no, but the forward slash is used to escape characters AND for short hands. Short hands are commonly used patterns represendted by a forward slash and whatever the character the short hand is represendted by. In the first example instead of using [0–9]+ we could have used \d+ . d standing for digit. In this example we use the short hand \w+ . Meaning any word character, and it may be repeated one or more times. After the short hand all that is left of the regular expression is an escaped dot character. This regular expression will match anthing beggining with http or https followed by an optional www. and then any amount of word characters followed by a dot character. This would allow an infinitly long url, which obviously isn’t optimal but i have chosen to keep it like this for the sake of simplicity.
top, pop and prop example:
with the knowledge you have gained from the previous examples this one shouldn’t be too hard to digest. Here we start off by defining a group that allows for t or p. This is the followed by an optional r. The finally ended with op. This allows for both top and pop. Since we have allowed an optional r prop also evaluates to true.
regular expressions, a powerfull weapon with a sometimes wonky handle. If you learn to use them will serve you well. A wide array of things can be done with them, write parsers, strip comments from files, match urls, etc. Hope this helped or started your journey into the realm of regular expressions. Some helpful resources are linked bellow. Have a good rest of your day where ever you are in the world.
The wikipedia article covers everything you’ll need to know about regular expressions
Originally published at my deprecated maximilianlloyd.wordpress.com on July 23, 2015.