Regex tutorial — A quick cheatsheet by examples

Jonny Fox
Jonny Fox
Jun 23, 2017 · 6 min read
Image for post
Image for post

Basic topics

Anchors — ^ and $

^The        matches any string that starts with The -> Try it!end$        matches a string that ends with end^The end$   exact string match (starts and ends with The end)roar        matches any string that has the text roar in it

Quantifiers — * + ? and {}

abc*        matches a string that has ab followed by zero or more c -> Try it!abc+        matches a string that has ab followed by one or more cabc?        matches a string that has ab followed by zero or one cabc{2}      matches a string that has ab followed by 2 cabc{2,}     matches a string that has ab followed by 2 or more cabc{2,5}    matches a string that has ab followed by 2 up to 5 ca(bc)*      matches a string that has a followed by zero or more copies of the sequence bca(bc){2,5}  matches a string that has a followed by 2 up to 5 copies of the sequence bc

OR operator — | or []

a(b|c)     matches a string that has a followed by b or c (and captures b or c) -> Try it!a[bc]      same as previous, but without capturing b or c

Character classes — \d \w \s and .

\d         matches a single character that is a digit -> Try it!\w         matches a word character (alphanumeric character plus underscore) -> Try it!\s         matches a whitespace character (includes tabs and line breaks).          matches any character -> Try it!
\D         matches a single non-digit character -> Try it!
\$\d       matches a string that has a $ before one digit -> Try it!

Flags

We are learning how to construct a regex but forgetting a fundamental concept: flags.

  • m (multi-line) when enabled ^ and $ will match the start and end of a line, instead of the whole string
  • i (insensitive) makes the whole expression case-insensitive (for instance /aBc/i would match AbC)

Intermediate topics

Grouping and capturing — ()

a(bc)           parentheses create a capturing group with value bc -> Try it!a(?:bc)*        using ?: we disable the capturing group -> Try it!a(?<foo>bc)     using ?<foo> we put a name to the group -> Try it!

Bracket expressions — []

[abc]            matches a string that has either an a or a b or a c -> is the same as a|b|c -> Try it![a-c]            same as previous[a-fA-F0-9]      a string that represents a single hexadecimal digit, case insensitively -> Try it![0-9]%           a string that has a character from 0 to 9 before a % sign[^a-zA-Z]        a string that has not a letter from a to z or from A to Z. In this case the ^ is used as negation of the expression -> Try it!

Greedy and Lazy match

The quantifiers ( * + {}) are greedy operators, so they expand the match as far as they can through the provided text.

<.+?>            matches any character one or more times included inside < and >, expanding as needed -> Try it!
<[^<>]+>         matches any character except < or > one or more times included inside < and > -> Try it!

Advanced topics

Boundaries — \b and \B

\babc\b          performs a "whole words only" search -> Try it!
\Babc\B          matches only if the pattern is fully surrounded by word characters -> Try it!

Back-references — \1

([abc])\1              using \1 it matches the same text that was matched by the first capturing group -> Try it!([abc])([de])\2\1      we can use \2 (\3, \4, etc.) to identify the same text that was matched by the second (third, fourth, etc.) capturing group -> Try it!(?<foo>[abc])\k<foo>   we put the name foo to the group and we reference it later (\k<foo>). The result is the same of the first regex -> Try it!

Look-ahead and Look-behind — (?=) and (?<=)

d(?=r)       matches a d only if is followed by r, but r will not be part of the overall regex match -> Try it!(?<=r)d      matches a d only if is preceded by an r, but r will not be part of the overall regex match -> Try it!
d(?!r)       matches a d only if is not followed by r, but r will not be part of the overall regex match -> Try it!(?<!r)d      matches a d only if is not preceded by an r, but r will not be part of the overall regex match -> Try it!

Summary

As you’ve seen, the application fields of regex can be multiple and I’m sure that you’ve recognized at least one of these tasks among those seen in your developer career, here a quick list:

  • data scraping (especially web scraping, find all pages that contain a certain set of words eventually in a specific order)
  • data wrangling (transform data from “raw” to another format)
  • string parsing (for example catch all URL GET parameters, capture text inside a set of parenthesis)
  • string replacement (for example, even during a code session using a common IDE to translate a Java or C# class in the respective JSON object — replace “;” with “,” make it lowercase, avoid type declaration, etc.)
  • syntax highlightning, file renaming, packet sniffing and many other applications involving strings (where data need not be textual)

Factory Mind

Factory Mind is a young and dynamic cooperative consisting…

Jonny Fox

Written by

Jonny Fox

CTO@Factory Mind

Factory Mind

Factory Mind is a young and dynamic cooperative consisting of a team of passionate developers, with a kick for computer science, technology and innovation. The main goal of the cooperative is striving to become a center of excellence and a reference point for software development

Jonny Fox

Written by

Jonny Fox

CTO@Factory Mind

Factory Mind

Factory Mind is a young and dynamic cooperative consisting of a team of passionate developers, with a kick for computer science, technology and innovation. The main goal of the cooperative is striving to become a center of excellence and a reference point for software development

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store