Regular Expression

Regular expression or “REGEX” is used wildly in programming world. Every programming language supports regular expression. Regular Expression is basically used for searching any specific pattern in a string. There are lots of inbuilt method which are used to search a string within a string but all are very straight forward. With the help of regular expression we can search any type of pattern be it simple or complex.
Today I am going to explain how regular expression works and how to use it in different programming languages like Javascript, PHP, Python and Java. Along with this I will also share some of the most commonly used regular expressions with explanations and examples.
Syntax
/pattern/modifier
Modifier
Modifiers are used to perform case-insensitive and global searches
i : for case insensitive match
g : for global match( find all matches after the first one)
m : for multi-line match
Quantifiers
If n is the string, then
n+ : matches any string that contains atleast one n
n* : matches any string that contains zero or more occurrence of n
n? : matches any string that contains zero or one occurrence of n
n{X} : matches any string that contains a sequence of X n’s
n{X,Y} : matches any string that contains a sequence of X to Y n’s
n{X,} : matches any string that contains a sequence of at least X n’s
^n : matches any string that starts with n
n$ : matches any string that ends with n
Metacharacters
. : Find a single character, except newline or line terminator
\w : Find a word character.
\W : Find a non-word character
\d : Find a digit character.
\D : Find a non-digit character
\s : Find a white space character.
\S : Find a non-white space character
\0 : Find a null character.
\n : Find a new line character.
\t : Find a tab character.
Brackets
[abc] : Find any character between brackets
[^abc] : Find any character not between brackets
[0–9] : Find any digit between 0 to 9 (same as \d)
(a|b) : Find any of the alternative specified (either a or b)
Regular Expression in different languages
Note - I have used nazish as pattern and i as modifier
Javascript
var pattern = /nazish/i;var str = “Hello nazish”;str.search(pattern);
// Returns index position of pattern in str i.e. 6
// it will return the first matching position. starting index 0
PHP
$pattern = "/nazish/i";$str = "Hello nazish";preg_match($pattern, $str);
// it will return TRUE if pattern matches in the string, else FALSE
Python
import repattern = "/nazish/i"
str = "Hello nazish"m = re.search(pattern, str)
m.group(0)
// it will return the match string
Java
import java.util.regex.*;String pattern = "/nazish/i";
String str = "nazish";Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(str);
boolean b = m.matches();
// b will return true or false
Example
- Pattern for Name
/[a-zA-Z ]{3,}/matches all the strings which contain alphabets or blank space with minimum length 3Matched strings "and", "andrew", "Nazish Fraz"
Unmatched strings "I", "an","$@8"
- Pattern for UserName
/^[a-z0-9_-]{4,20}$/Begin with one of the characters inside square bracket.
The pattern says username may contain lowercase alphabets, numbers, dot, underscore or minus. its length can be 4 to 20Matched strings "nfraz007", "nazish_1234"
Unmatched strings "NFRAZ007", "123"
- Pattern for mobile number (India)
/(\+91|0)?[789][\d]{9}/(\+91|0)? whether +91 or 0 can comes 0 or 1 times
[789] first digit will be only 7, 8, 9
[\d]{9} after that digit will come which length will be 9
- Pattern for Password
/^[a-z0-9_-]{4,12}$/A password can contain a-z, 0-9, underscore or minus whose length can be 4 to 12.
- Pattern for hex value
/^#?([a-f0-9]{6}|[a-f0-9]{3})$/^#? start with # which can come 0 or 1 times
([a-f0-9]{6}|[a-f0-9]{3})$ ends with either group
[a-f0-9]{6} : group 1 : which contains a-f, 0-9 of length 6
[a-f0-9]{3} : group 2 : which contains a-f, 0-9 of length 3
- Pattern for Email
/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/^([a-z0-9_\.-]+) start with any characters from a-z, 0-9, underscore, dot (to escape this, we use backslash) or minus which can come 1 or more times.
followed by @
([\da-z\.-]+) any digit, a-z, dot, minus can come 1 or more times
followed by dot (for escape use slash)
([a-z\.]{2,6})$ end with any a-z, dot which length will be 2 to 6
- Pattern for URL
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/^(https?:\/\/)? start with http (s can come 0 or 1 times) then : then // for escape use backslash. this whole group can come 0 or 1 times
([\da-z\.-]+) digit, a-z, dot, minus can come 1 or more times
followed by dot (escape by slash)
([a-z\.]{2,6}) any a-z or dot which length will be 2 to 6
([\/\w \.-]*)* then a slash, word, dot or minus any thing can come for 0 or more times, and the whole group can come 0 or more times
\/?$ end with / which can come 0 or 1 times.
- Pattern for Vehicle Number
/^[a-zA-Z]{2}[0-9]{2}[a-zA-Z0-9]{1,2}[0-9]{4}$/^[a-zA-Z]{2} start with a-z or A-Z with length 2
[0-9]{2} followed by 0-9 of length 2
[a-zA-Z0-9]{1,2} followed by alpha numeric with length 1 or 2
[0-9]{4}$ then ends with 4 digits
- Pattern for Adhaar Number
/^[\d{4}\0?\d{4}\0?\d{4}]$/it can contains 4 digits, then a blank space can come 0 or 1 time. then repeat the same.
- Pattern for PAN Number
/[a-z]{5}[0-9]{4}[a-z]{1}/[a-z]{5} start with alphabets of length 5
[0-9]{4} then digits with length 4
[a-z]{1} then alphabets of 1 length
Practice
It is always a better idea to practice all example by your self. For this I know a pretty good website in which you can practice these example.
You guys can download a regex.txt file which I have created for practice. checkout my github repository