4 Regular Expressions You Should Know

Regular expressions (regex or regexp) are extremely useful in extracting information from any text by searching for one or more matches of a specific search pattern (i.e. a specific sequence of ASCII or unicode characters).
Fields of application range from validation to parsing/replacing strings, passing through translating data to other formats and web scraping.
One of the most interesting features is that once you’ve learned the syntax, you can actually use this tool in (almost) all programming languages (JavaScript, Java, VB, C #, C / C++, Python, Perl, Ruby, Delphi, R, Tcl, and many others) with the slightest distinctions about the support of the most advanced features and syntax versions supported by the engines).
Why should we use Regex?

Let’s start some basic topics
Basic topics
Anchors — ^ and $
^The matches any string that starts with Theend$ matches a string that ends with end^The end$ exact string match (starts and ends with The end)
Quantifiers — * + ? and {}
Example: abc ab acb aob a2b a42_c abcccabc* matches a string that has ab followed by zero or more c abc+ matches a string that has ab followed by one or more cabc? matches a string that has ab followed by zero or one cabc{2} matches a string that has ab followed by 2 cabc{2,} matches a string that has ab followed by 2 or more cabc{2,5} matches a string that has ab followed by 2 up to 5 ca(bc)* matches a string that has a followed by zero or more copies of the sequence bca(bc){2,5} matches a string that has a followed by 2 up to 5 copies of the sequence bc
Bracket expressions — []
Example: abc ab acb aob a2b a42_c abccc[abc] matches a string that has either a or b or c -> is the same as a|b|c[a-c] same as previous[0-9]% a string that has a character from 0 to 9 before a
% sign[^a-zA-Z] a string that has not a letter from a to z or from A to Z. In this case the ^ is used as negation of the expression
Character classes — \d \w \s and .
Example: abc ab acb aob a2b a42_c abccc\d matches a single character that is a digit\w matches a word character (alphanumeric character plus underscore) \s matches a whitespace character (includes tabs and line breaks). matches any character
Greedy and Lazy match
<.+?> matches any character one or more times included inside < and >, expanding as needed<[^<>]+> matches any character except < or > one or more times included inside < and >
Let’s start by looking at each example and explanation.
1. Matching a Username

Pattern:
/^[a-z0-9_-]{3,16}$/
Description:
We begin by telling the parser to find the beginning of the string (^), followed by any lowercase letter (a-z), number (0–9), an underscore, or a hyphen. Next, {3,16} makes sure that are at least 3 of those characters, but no more than 16. Finally, we want the end of the string ($).
String that matches:
my-us3r_n4m3
String that doesn’t match:
th1s1s-wayt00_l0ngt0beausername(too long)
2. Matching a Password

Pattern:
/^[a-z0-9_-]{6,18}$/
Description:
Matching a password is very similar to matching a username. The only difference is that instead of 3 to 16 letters, numbers, underscores, or hyphens, we want 6 to 18 of them ({6,18}).
String that matches:
myp4ssw0rd
String that doesn’t match:
mypa$$w0rd (contains a dollar sign)
3. Matching a Slug

Pattern:
/^[a-z0-9-]+$/
Description:
You will be using this regex if you ever have to work with mod_rewrite and pretty URL’s. We begin by telling the parser to find the beginning of the string (^), followed by one or more (the plus sign) letters, numbers, or hyphens. Finally, we want the end of the string ($).
String that matches:
my-title-here
String that doesn’t match:
my_title_here (contains underscores)
4. Matching an Email

Pattern:
/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
Description:
We begin by telling the parser to find the beginning of the string (^). Inside the first group, we match one or more lowercase letters, numbers, underscores, dots, or hyphens. I have escaped the dot because a non-escaped dot means any character. Directly after that, there must be an at sign. Next is the domain name which must be: one or more lowercase letters, numbers, underscores, dots, or hyphens. Then another (escaped) dot, with the extension being two to six letters or dots. I have 2 to 6 because of the country specific TLD’s (.ny.us or .co.uk). Finally, we want the end of the string ($).
String that matches:
john@doe.com
String that doesn’t match:
john@doe.something (TLD is too long)
