Performance of Regular Expressions

Maciek Rząsa
Oct 30, 2018 · 12 min read
Image for post
Image for post
Photo by Bob Newman on Unsplash

Motivation: counting figures in a document

Basics of quantifier performance

Greedy repetition

Image for post
Image for post
Greedy quantifier takes as much as possible and then backtracks (red return symbols mark backtracking). Try in the online debugger
Image for post
Image for post
Greedy quantifier may take too much. Try in the online debugger
Image for post
Image for post
Limiting scope of the greedy quantifier solves the problem with too much text being matched. It also decreases backtracking. Try in the online debugger

Lazy repetition

Image for post
Image for post
Lazy quantifier takes as little as possible and tries to match next pattern element. Try in the online debugger
Image for post
Image for post
Lazy quantifier works correctly even without limiting its scope. Try in the online debugger

Possessive

Image for post
Image for post
The possessive quantifier takes as much as possible, just as the greedy one, but it does not backtrack. Try in the online debugger
Image for post
Image for post
Greedy quantifier with overlapping scopes of adjacent subpatterns backtracks a lot before it fails. Try in the online debugger
Image for post
Image for post
The possessive quantifier takes less steps before it fails, because it does not backtrack. Try in the online debugger

Catastrophic backtracking

Image for post
Image for post
Catastrophic backtracking is a situation when a number of steps needed to fail or match grows exponentially with input length. See all the steps in online debugger
/(a+)+b/ =~ 'aaaaaaaaaaaaaaaaaaaaaaaaaa1b'

Benchmark results explained

Image for post
Image for post
The regex engine tries all possible substring combination before it fails. See all the steps in the online debugger
Image for post
Image for post
Unrolling the loop limits backtracking to one pass — the regex engine tries to match the optional part of the regex in all possible positions, but it does not consider all substrings of the test string. See all the steps in online debugger

Other examples

Notable things to remember


TextMaster Engineering

TextMaster is the world’s first global translation solution…

Thanks to Pierre-Louis Gottfrois

Maciek Rząsa

Written by

Thoughts and questions of curious developer with passion for reading.

TextMaster Engineering

TextMaster is the world’s first global translation solution that is available entirely online. This publication is a humble description of heroic technical efforts aiming to provide our clients the best translations in the world. 🤓

Maciek Rząsa

Written by

Thoughts and questions of curious developer with passion for reading.

TextMaster Engineering

TextMaster is the world’s first global translation solution that is available entirely online. This publication is a humble description of heroic technical efforts aiming to provide our clients the best translations in the world. 🤓

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store