Photo by rawpixel on Unsplash

Regex — What it is. How to learn it.

A Hands-On Primer for Regular Expressions

Decision-First AI
Published in
3 min readDec 14, 2018

--

Concepts in Data Science and Analytics often receive unhelpful names. Regular Expressions takes the crown. The name is so meaningless that shortening it to regex is actually an improvement. Go figure.

So what exactly is regex?

Well that was mostly unhelpful. By the way, that summary appears well beyond the fold in the Wikipedia article. So we aren’t getting much help there.

Link

To be honest, there is a bit of a pattern when it comes to Regular Expressions. I suppose that is only fitting.

Regex is a short-hand code for finding patterns in strings and text documents. It is supported by various database technologies, programming languages, and some search functions. It comes in a few flavors (much like SQL), but is fairly straightforward and easy to learn.

It is effectively a language that lives at the core of various crawlers, text miners, and other bot-type engines that allow the computer to produce a structure from unstructured data. This is done by finding and isolating patterns. For example, maybe you would like to scan thousands of html documents and extract all the email address. You might include this regex script:

\b[A-Z0–9._%+-]+@[A-Z0–9.-]+\.[A-Z]{2,}\b

Easy right! Really it is not that hard. The short hand is divided into concepts like groups () and ranges []. There are expressions and flags. This particular example sets two word boundaries \b at the beginning and end. It then sets further criteria, requiring the “word” to start with an alpha numeric [] followed + by an @ and another alpha numeric [] followed + by a final alpha with at least 2 {2,} letters. Once you understand the structure, it gets a lot easier.

How best to learn that structure?

There are any number of free sites. Be careful, the multiple flavors component could trip you up. Look for a resource that is tied to the language you intend to use regex in. That said, this article by Factory Mind available through Medium is one of the most popular:

Regular-Expressions.info is also a solid resource. They also offers tutorials.

https://regexr.com/

There are also plenty of youtube videos available. The Net Ninja have a rather nice series you can start here. But after a few videos, a tutorial, and a roll through the cheat sheet — you need to get your hands dirty! For that consider — regexr.com

Regexr is a totally free, online tool that allows you to sit and crank your way through endless amounts of Regex trial and error. You can upload your own text to search and the built in help, formatting, and hovers are great for new comers.

Other tools are out there. EditPadPro offers a free trial download, but unless you plan to be offline, it seems like more of a hassle.

So there you have it. Regex quickly explained. A set of links to sites and videos to get you started. And a pair of tools to allow you to get hands-on in a hurry. We hope this helps. If so, please throw some claps our way and bookmark the page. We intend to provide more hands-on primers soon. So stay tuned and thanks for reading!

--

--

Decision-First AI

FKA Corsair's Publishing - Articles that engage, educate, and entertain through analogies, analytics, and … occasionally, pirates!