Python DoS Prevention: The ReDOS Attack
What is a “ReDoS” Attack, and how can you make sure your code is safe?
What is DoS?
I’ve covered this in a few earlier posts, but DoS stands for Denial-of-Service. Denial-of-Service is a type of cyber attack technique where the attacker attempts to disrupt the availability of a service, application, or company. DoS attacks generally exist in one of two broad categories, Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS). Both have the same general intent in mind, but they take very different forms. Within the DoS there are a Network based attacks and Application based attacks.
Application layer attacks, also sometime called Layer 7 attacks, involve putting operational strain on the software serving the requests in such a way that it cannot handle additional requests — this is what we’ll be looking at with the ReDoS attack.
What is Regex?
Before we can look at ReDoS, we need to understand what Regex is and why it’s used. Regex stands for Regular Expression. Regular Expressions were first discussed as a concept in the early 50’s by mathematicians and those interested in theoretical computer science. Regex didn’t rise to mainstream use until the late 60’s when it was used to accomplish two primary use cases; pattern matching in text editors and lexical analysis in compilers. I won’t go too deep into how regex works as there are numerous other resources available on the topic.
At a very (very) high level, most programming languages’ implementation of regex build what are know as Nondeterministic Finite Automaton (NFA). These state machines are passed input and they test each possible path until a match is found or all paths are exhausted. All of the intermediate work is discarded after each check is done. While there are alternative implementations, NFA is the implementation type that is susceptible to ReDoS because testing solutions can be exponential.
Today regex is incredibly widely used for pattern matching in everything from browsers, proxies, firewalls, security scanning tools, web servers, and discretely within applications. Suffice to say, the risk of a denial of service at any or all of these steps is a major risk.
ReDos, also know as Evil Regex, is a regex that given a very specific input can result in exponential processing time. This exponential processing time will cause excessive CPU usage and may block processing of other requests.
ReDoS became infamous in 2016 when a misconfigured regular expression accidentally took down Stackoverflow.
Outage Postmortem - July 20, 2016
On July 20, 2016 we experienced a 34 minute outage starting at 14:44 UTC. It took 10 minutes to identify the cause, 14…
So what does Evil Regex look like? It’s actually a lot simpler than you might expect.
As an example how much things slow down, here’s a quick example script using the python standard library module, re that I ran on my laptop.
Real World Examples
Because theoretical examples are never the most satisfying, I tried to find some public code vulnerable to ReDoS. Github’s search is a bit limiting in this regard so I did some google dorking. I had some luck with this type of search
"*)*" site:github.com filetype:py. I won’t share the actual sources below, but I have notified the repository owners.
There are several things you can do to protect yourself from ReDoS attacks.
- Look at safer alternative libraries such as Facebook’s pyre2, which is a python wrapper around Google’s C++ Regex Library, re2.
- Always double check all regex you add to your application and never blindly trust regex patterns you find online.
- Utilize SAST and fuzzing tools to test your own code, and check out Ochrona to make sure your dependencies are not vulnerable to ReDoS attacks.
- If possible, limit the length of your input to avoid longer than necessary strings.