gotchajavascriptCriticalpending
Regular Expression Performance Gotchas - Catastrophic Backtracking
Viewed 0 times
ReDoScatastrophic backtrackingregex performanceregular expressiondenial of service
Problem
Certain regular expressions cause exponential backtracking on specific inputs, freezing the application (ReDoS - Regular Expression Denial of Service).
Solution
Identify and fix dangerous regex patterns:
Rules to avoid ReDoS:
// DANGEROUS: nested quantifiers cause catastrophic backtracking
// These can hang for minutes on crafted input
const bad1 = /^(a+)+$/; // Nested quantifiers
const bad2 = /^(a|a)+$/; // Alternation with overlap
const bad3 = /^(.*a){10}$/; // Greedy with backtracking
const bad4 = /^([a-zA-Z]+)*$/; // Nested quantifiers
// SAFE alternatives
const good1 = /^a+$/; // No nesting
const good4 = /^[a-zA-Z]+$/; // No nesting
// Testing for vulnerability
// If this takes > 1s, the regex is vulnerable:
console.time('regex');
'aaaaaaaaaaaaaaaaaaaaaaaaaab'.match(/^(a+)+$/);
console.timeEnd('regex'); // Can take 30+ seconds!
// Prevention strategies:
// 1. Use atomic groups / possessive quantifiers (if supported)
// 2. Set timeout on regex matching
// 3. Limit input length BEFORE applying regex
// 4. Use linear-time regex engines (RE2, rust regex)# Python: use google-re2 for untrusted input
import re2 # pip install google-re2
pattern = re2.compile(r'^(a+)+
Rules to avoid ReDoS:
- Never nest quantifiers:
(a+)+, (a), (a+)* - Avoid overlapping alternation:
(a|ab)+ - Limit input length before matching
- Use possessive quantifiers or atomic groups when available
) # RE2 rejects dangerous patterns
# Or set a timeout
import signal
def timeout_handler(signum, frame):
raise TimeoutError('Regex timed out')
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(1) # 1 second timeout
try:
re.match(pattern, user_input)
finally:
signal.alarm(0)Rules to avoid ReDoS:
- Never nest quantifiers:
(a+)+,(a),(a+)* - Avoid overlapping alternation:
(a|ab)+ - Limit input length before matching
- Use possessive quantifiers or atomic groups when available
Why
NFA-based regex engines (most languages) try all possible matching paths. Nested quantifiers create exponentially many paths. Crafted input can make a simple regex take hours.
Gotchas
- ReDoS is a real security vulnerability - attackers can crash servers with crafted input
- Even seemingly simple patterns like email validation can be vulnerable
Context
Writing safe regular expressions for user input
Revisions (0)
No revisions yet.