How a RegEx can bring your Node.js service down
The use of Regular Expressions (RegEx) is quite common among software engineers and DevOps or IT roles where they specify a string pattern to match a specific string in a text.
Often, programmers will use RegEx to validate that an input received from a user conforms to an expected condition. For example:
Testing that a user’s provided e-mail address is valid:
What does it have to do with Node.js?
The risk that is inherent with the use of Regular Expressions is the computational resources that require to parse text and match a given pattern.
A flawed Regular Expression pattern can be attacked in a manner where a provided user input for text to match will require an outstanding amount of CPU cycles to process the RegEx execution.
Let me show you why RegEx is a naughty word in our office
Say you’re building a music app and you want to validate song titles.
We need to match words, numbers, and spaces.
So you give the regex a few tries and come up with the following:
Maybe it’s not the perfect regex (hint: it isn’t).
But hey, it works.
I tested a few song titles and yeah, ready to push to production, woohoo! 🎉
Until a Britney Spears fan plays a joke on your app and enters the following song title as input:
Even if you have no clue what that is, sure sounds scary. And it’s in red too!
Curious to see what it means when you have this little RegEx gem in your Node.js code?
A relatively small input string was able to block the Node.js event-loop for about 6 seconds, during which time it consumed 99% cpu power.
Not exactly what you want to do on a single-threaded web application server.
tip: try that RegEx pattern on regex101.com and use their regex debugger to see what’s going on.
My number one rule is avoid writing RegEx on your own, but following are the alternatives I am suggesting.
Use a third party
Most of the time, if you need the common things it is better to rely on third party libraries which have a million of eyes looking at and improving both performance and security to get the job done than 3 colleagues code reviewing your version.
You’ll find all the common patterns — IP Addresses, e-mails, phone numbers, etc.
even validator.js had its own ReDoS vulnerabilities reported but better it, with a good community of maintainers and security researchers than rolling your own.
Lint your RegEx before using them
Of course you might need to end up writing your own RegEx pattern for something very unique in your use-case.
If that’s the case, consider using safe-regex which is package to help you identify potential bad regular expressions.
Detect possibly catastrophic, exponential-time regular expressions - davisjam/safe-regexgithub.com
safe-regex is a quick go-to but it isn’t perfect actually so if you’re able to integrate Jamie’s tool you’re better off with it:
You followed so far? Britney Approves!
If you’re interested in strengthening your skill around Node.js Security practices and avoiding Node.js pitfalls in production I invite you to grab a copy of the book I wrote:
Hands-on and abundant with source code for a practical guide to Securing Node.js web applications.Node.js Secure Code…leanpub.com
Some of the topics from the book were presented live at 2017’s JSHeroes conference:
And finally, you can find a gist of security practices I helped contribute to in the popular Node.js Best Practices GitHub repo:
nodebestpractices - The largest Node.JS best practices list. Curated from the top ranked articles and always updatedgithub.com
Thanks for reading, and stay secure!