Regular Expression Training For Beginners

Kumar Gaurav
5 min readJan 28, 2019

--

Hi! Here I am going to give you training on Regex which will cover mostly basic part. We go over all the regex pattern one by one and try to understand them with examples.

1. (dot)

Pattern: (.)
Meaning: It is used to represent single character.
Example:
Let’s say we have a pattern as ”i.j” Since (dot) represent single character, this pattern will match with any word which has exactly one i and one j and in between if there is any single character. For example it’ll match with iAj, iaj, ibj, i.j etc.

2. (Question Mark)

Pattern: (?)
Meaning: 0 or 1 occurrence of previous reg expression.
Example:
Let’s say we have a pattern as ”a?b” Here a is previous regex expression, as the meaning says any word which will have 0 or 1 number of a and ends with b, so for this example we can have only two possibilities b and ab

3. (Asterisk)

Pattern: (*)
Meaning: 0 or more occurrence of previous reg expression.
Example:
Let’s say we have a pattern as ”ab*cd” Here b is previous regex expression, as the meaning says any word which will have 0 or more number of b and ends with cd, so for this example we can have many possibilities like acd, abcd, abbcd, abbbcd etc.

4. (Plus)

Pattern: (+)
Meaning: 1 or more occurrence of previous reg expression.
Example:
Let’s say we have a pattern as ”ab+cd” Here b is previous regex expression, as the meaning says any word which will have 1 or more number of b and ends with cd, so for this example we can have many possibilities like abcd, abbcd, abbbcd, abbbbcd etc.

5. (Box Brackets)

Pattern: ([ ])
Meaning: range. (Only one character allowed)
Example:
To understand this pattern, We should take a look on couple of examples:

  1. Let’s say we have a pattern as ”i[a]j” , in this case it it clear that the range value is just single a, so only one possibility is there which is iaj.
  2. Take this example pattern i[aA]j So range is a or A. Hence possibilities are: iaj and iAj
  3. if we have this pattern i[a-z]j, here a-z is nothing but range of small English letters from a to z, so possibilities are: iaj, ibj, icj,…izj
  4. Now if we have this pattern i[a-zA-Z]j Then this will allow all capital and small characters in English alphabets, so possibilities are: iaj, iAj,…izj,iZj
  5. Similarly we can also include numeric digits too, something like this: i[a-zA-Z0–9]j and the possibilities will be: iaj, iAj,i0j,…izj,iZj,i9j
  6. Now let’s combine one more pattern with this range pattern, and see what happens. i[aA]*j Here we have Asterisk after range, as we know Asterisk pattern says 0 or more previous expression, so we can have either 0 or more a and A combinations. So possibilities are: ij, iaj, iAj, iaaj, iAAj, iAaj
  7. We will try one more combination and will see how it’ll work, so let’s say we have this pattern: i[aA]+j, as we know plus pattern says 1 or more previous expressions, so we can have either 1 or more a and A combinations. So possibilities are: iaj, iAj, iAAj, iaaj, iaAj, iAaj.

6. (Box Bracket with Caret inside)

Pattern: ([^ ])
Meaning: negated range (Only one character allowed)
Example:
To understand this pattern, We will look at couple of examples.

  1. Let’s say we have a pattern : i[^a]j it means it can accept any character in between i and j but not a. So possibilities are: ibj, ikj etc.
  2. Similarly if we have this i[^aA]j, it’ll accept any character but not a and A so possibilities are: iBj, ibj, iZj etc.
  3. Now for this pattern i[^a-z]j it can accept any character in between i and j but not small English alphabets i.e. a,b,c,d, …z So possibilities are i9j, iAj etc.
  4. And if we have this pattern: i[^a-zA-Z0–9]j it can only accept any character which is not alphanumeric, So one possibility can be i$j etc.
  5. What do you think of this pattern: i[a-z^A-Z]j ?
    If you think it’ll not accept A-Z but will accept a-z, then My Dear Friend! You’re wrong, Totally!
    Here negation sign will lose its importance and act as an character because it is not at the start of the expression as mentioned above. So possibilities will be: iaj, izj, i^j, iAj,izj etc

7. (Curly Brackets)

Pattern: ({ })
Meaning: min max.
Example:
Let’s see few examples to understand this pattern:

  1. Let’s say if we have a{1,2}b, it means minimum value of a is 1 and maximum value is 2. So It’s pretty straight forward, isn’t it? So only two possibilities we can see here which are ab, aab
  2. what if we have a{3,}b it means number of a can be 3 or more no less. So possibilities are aaab aaaab etc.
  3. And if we have a{,4}b then it means the max can be 4 a’s So possibilities are: ab aab aaab aaaab
  4. The last example we will see is a{7}b which means exactly 7 a’s should be there. So there is only one possibility for that: aaaaaaab.

8. (Round Brackets)

Pattern: ( )
Meaning: grouping
Example:
As the meaning says, anything within the round brackets should be part of the string. For example:

  1. (ab)cd It’ll only accept string which starts with one set of ab and ends with cd. So only one possibility is there which is abcd.
  2. In previous example, we can see this regex won’t do much alone. Let’s see what it does when we make a combination with other regex. So if we have something (ab)+cd, it means it can accept any string which has 1 or more set of ab and ends with cd, so possibilities will be: abcd, ababcd, abababcd etc.

Similarly we can try with multiple combinations.

9. (Caret or Dollar sign)

Pattern: ^ or $
Meaning: ^->at the beg, $ ->at the end
Example:
To understand these two regex, Let’s take a string;

str = “Medium Medium Medium Medium”

  1. Now if we do, re.findall(“Medium”, str) it’ll return 4 since it has 4 “Medium”.
  2. If we will do re.findall(“^Medium”, str) It’ll check if at the start we the word “Medium” in the given string, since it has it’ll return 1.
  3. For the case when we do re.findall(“Medium$”, str) It’ll check if the given string has the word “Medium” at the end, since it has, it’ll return 1.
  4. Now if we do re.findall(“^Medium$”, str) it’ll check if it has the word “Medium” at the start and the end and nothing in between. So only combination for this case possible is “Medium” itself. So in above string it’ll return zero.

Note: Above code is Python way of regexing

10. (Miscellaneous)

Pattern: \n or \s or \S or \d
Meaning:

  1. \n ->new line
  2. \s->white space
  3. \S ->non white space
  4. \d ->digit
  5. \w-> word character
  6. \W-> for non-word character
  7. \D -> non-digit.

--

--

Kumar Gaurav

Android Developer, Otaku and well known with my secondary name “TheLittleNaruto” on internet.