Mind your REGEX or it can put your program into an infinite loop
If your project uses or implements regular expressions, you need to check them for a weakness that might allow an attacker to stop your program from working. Regular expressions, also known as regex, allow programmers to parse or replace text with a common notation. If your project takes in input, there is a good chance that you will eventually need to use this tool.
In 2003, there was a presentation about a weakness in the implementation of specific ways that these expressions were implemented. While a select community knew about this weakness for a long time, this was popularized in 2021 under the name Regexploit or ReDoS. If an attacker has a specifically crafted input, some implementations could cause a CPU to run at 100% for years. As an example, this weakness led to a vulnerability reported in February 2020 (CVE-2020–5243) where parsing HTTP(S) User-Agent strings could shut down a web server.
The denial of service results when the regex is looking to match repeated characters and tries multiple ways of matching them. It does this by trying the longest string that would match and then repeatedly trying smaller strings until success or failure — a process feature called backtracking. A recent CWE, CWE-1333: Inefficient Regular Expression Complexity, provides more details on the issue that can also be demonstrated in a useful tool called regex101. The tool shows how removing underscores (_) in the input reduces the number of processing steps.
What Can You and Your Team Do?
1. Don’t use regex that support backtracking. If your programming language uses a backtracking system by default and you can’t limit regexes to those that would support backtracking, look for a library that does not use this method. These could be slower and might not support capture groups (e.g., the re2 libraries).
2. If you must use regex with backtracking:
a. Limit the exposure of untrusted input into your regexes.
b. Break up your regex into separate lines to minimize backtracking.
c. See if you can mark sensitive parts of your regex to not use backtracking.
d. Provide functionality that would normally be provided with backtracking outside of regex. (e.g., to remove whitespace, use strip and trim functions). Test regex in the Regexploit tool.
e. Limit input length (with some expressions being O³ or O⁴, the amount of work that can be requested for a small input string can be rather large).
f. Set up a system to limit the time your regexes can run and report when they cause a catastrophic failure so you can test and fix these.
There are several real-world examples of products that have fallen victim to this weakness. Due to the lack of capture groups, the popular network security monitor Zeek uses backtracking. Thus, users might create a regex filter that could lead to a denial of service. Historically, popular command-line tools ‘grep’ and ‘awk’ had this issue and were replaced in 2004–2005 with safer versions. CAPEC-492: Regular Expression Exponential Blowup describes the process, or attack pattern, that an adversary takes to exploit CWE-1333: Inefficient Regular Expression Complexity. If you don’t want your software to be shut down during a critical time, understanding and avoiding this weakness is critical.