A robust, extensible PCRE regex parser for PHP that transforms complex regex patterns into traversable Abstract Syntax Trees (AST), enabling static analysis, validation, pattern explanation, and safe optimization.
This library is in experimental/alpha status. While it demonstrates functional parsing, AST generation, and analysis capabilities, it has not been exhaustively validated against the complete official PCRE specification.
Current Status:
- β Core parsing and AST generation validated
- β ReDoS detection working (false positives fixed)
- β Behavioral compliance testing against PHP's PCRE engine
- β 27/27 validation tests + 19/19 behavioral tests (128 assertions)
- β Comprehensive testing completed: 140 tests, 284 assertions, 100% pass rate
- β Integration testing validated: Symfony, Rector, PHPStan
Recommendation: Ready for production use with experimental notice. Suitable for development, testing, and integration into existing projects.
- Full PCRE Parsing: Accurately parses the vast majority of PCRE syntax, including groups (capturing, non-capturing, named, branch reset), lookarounds, subroutines, conditionals, quantifiers (greedy, lazy, possessive), Unicode properties, and more.
- Advanced Validation: Goes beyond simple syntax checks. It semantically validates your patterns to catch costly errors before they run:
- Detects Catastrophic Backtracking (ReDoS) vulnerabilities
- Finds invalid backreferences
- Detects variable-length lookbehinds
- Behavioral Compliance: Comprehensive test suite validates that parsed patterns behave identically to PHP's native PCRE engine
- Extensible with Visitors: Built on the Visitor design pattern. The AST is immutable data; you can write visitor classes to perform any analysis you need.
- Toolkit Included: Ships with powerful visitors out-of-the-box:
CompilerNodeVisitor: Recompiles an AST back into a valid regex stringValidatorNodeVisitor: Performs semantic validationExplainVisitor: Creates human-readable pattern explanationsSampleGeneratorVisitor: Generates random sample strings matching the patternOptimizerNodeVisitor: Optimizes patterns while preserving semanticsReDoSAnalyzer: Analyzes patterns for denial-of-service vulnerabilities
- Modern & Robust: Built with PHP 8.4+, strictly typed, and heavily tested
- Framework Integration: Optional integration with Symfony, Rector, and PHPStan
Install the library via Composer:
composer require yoeunes/regex-parserRequirements:
- PHP 8.4 or higher
- ext-mbstring (for Unicode support)
<?php
use RegexParser\Regex;
// Parse and explain a regex pattern
$pattern = '/(?<email>[\w.-]+@[\w.-]+\.\w+)/i';
$regex = Regex::create();
// Get human-readable explanation
echo $regex->explain($pattern);
// Validate for errors and vulnerabilities
$result = $regex->validate($pattern);
if (!$result->isValid) {
echo "Error: {$result->error}\n";
}
// Generate a sample string that matches
$sample = $regex->generate($pattern);
echo "Sample: $sample\n"; // e.g., "[email protected]"
// Analyze for ReDoS vulnerabilities
$analysis = $regex->analyzeReDoS($pattern);
echo "Safety: {$analysis->severity->value}\n"; // "safe"The Regex class provides a simple static faΓ§ade for common operations.
Parse a regex string to get the root RegexNode of its AST.
use RegexParser\Regex;
use RegexParser\Exception\ParserException;
try {
$ast = Regex::create()->parse('/^Hello (?<name>\w+)!$/i');
// $ast is now a RegexParser\Node\RegexNode object
echo $ast->flags; // "i"
} catch (ParserException $e) {
echo 'Error parsing regex: ' . $e->getMessage();
}Check a regex for syntax errors, semantic errors, and ReDoS vulnerabilities.
use RegexParser\Regex;
$regex = Regex::create();
// Detect ReDoS vulnerability
$result = $regex->validate('/(a+)*b/');
if (!$result->isValid) {
echo $result->error;
// Output: Potential catastrophic backtracking: nested quantifiers detected.
}
// Detect invalid lookbehind
$result = $regex->validate('/(?<!a*b)/');
if (!$result->isValid) {
echo $result->error;
// Output: Variable-length quantifiers (*) are not allowed in lookbehinds.
}Generate a human-readable explanation of a complex pattern.
use RegexParser\Regex;
$explanation = Regex::create()->explain('/(foo|bar){1,2}?/s');
echo $explanation;Output:
Regex matches (with flags: s):
Start Quantified Group (between 1 and 2 times (as few as possible)):
Start Capturing Group:
EITHER:
Literal: 'foo'
OR:
Literal: 'bar'
End Group
End Quantified Group
Generate a random string that will successfully match a pattern.
use RegexParser\Regex;
$sample = Regex::create()->generate('/[a-f0-9]{4}-[a-f0-9]{4}/');
echo $sample;
// Possible Output: c4e1-9b2aOptimize a regex pattern while preserving its behavior.
use RegexParser\Regex;
$optimized = Regex::create()->optimize('/(?:a|b|c)/');
echo $optimized;
// Output: /[abc]/ (more efficient)The true power of this library comes from traversing the AST to build your own tools. You can create a custom NodeVisitorInterface to analyze, manipulate, or extract information.
use RegexParser\Regex;
use RegexParser\NodeVisitor\DumperNodeVisitor;
$ast = Regex::create()->parse('/^(?<id>\d+)/');
$dumper = new DumperNodeVisitor();
echo $ast->accept($dumper);Output (The Abstract Syntax Tree):
Regex(delimiter: /, flags: )
Sequence:
Anchor(^)
Group(type: named name: id flags: )
Sequence:
Quantifier(quant: +, type: greedy)
CharType('\d')
Extract fixed strings that must appear in any match for fast-path optimization:
use RegexParser\Regex;
$regex = Regex::create();
// Example 1: Simple prefix extraction
$literals = $regex->extractLiterals('/user_(\d+)@example\.com/');
$prefix = $literals->getLongestPrefix(); // "user_"
$suffix = $literals->getLongestSuffix(); // "@example.com"
// Fast-path check before running expensive regex
$subject = '[email protected]';
if (!str_contains($subject, $prefix)) {
return false; // Skip regex entirely! β‘ 10-20x faster
}
$result = preg_match($pattern, $subject);Use Cases:
- π 10-20x faster string matching when combined with
strpos() - π Database query optimization (check prefix before LIKE)
- π Log parsing and filtering
- π― URL routing and validation
Detect Regular Expression Denial of Service vulnerabilities with detailed severity scoring:
use RegexParser\Regex;
use RegexParser\ReDoSSeverity;
$regex = Regex::create();
$analysis = $regex->analyzeReDoS('/(a+)+b/');
echo "Severity: {$analysis->severity->value}"; // "critical"
echo "Score: {$analysis->score}"; // 10 (0-10 scale)
echo "Safe: " . ($analysis->isSafe() ? 'Yes' : 'NO!'); // NO!
foreach ($analysis->recommendations as $recommendation) {
echo "β οΈ $recommendation\n";
}Severity Levels:
| Level | Description | Example | Time Complexity |
|---|---|---|---|
| SAFE | No ReDoS risk | /^abc$/ |
O(n) |
| LOW | Nested bounded quantifiers | /(a{1,5}){1,5}/ |
O(nΒ²) with low constant |
| MEDIUM | Single unbounded quantifier | /a+/ |
O(nΒ²) |
| HIGH | Nested unbounded quantifiers | /(a+)+/ |
O(2βΏ) |
| CRITICAL | Definite catastrophic backtracking | /(a*)*b/ or /(a|a)*/ |
O(2βΏ) worst case |
RegexParser can be integrated into Symfony applications for regex validation in forms, routing, and more.
1. Install the library:
composer require yoeunes/regex-parser2. Create a custom Symfony validator:
// src/Validator/Constraints/ValidRegex.php
namespace App\Validator\Constraints;
use Symfony\Component\Validator\Constraint;
#[\Attribute]
class ValidRegex extends Constraint
{
public string $message = 'The regex pattern "{{ pattern }}" is invalid: {{ error }}';
}// src/Validator/Constraints/ValidRegexValidator.php
namespace App\Validator\Constraints;
use RegexParser\Regex;
use Symfony\Component\Validator\Constraint;
use Symfony\Component\Validator\ConstraintValidator;
use Symfony\Component\Validator\Exception\UnexpectedTypeException;
class ValidRegexValidator extends ConstraintValidator
{
public function validate($value, Constraint $constraint): void
{
if (!$constraint instanceof ValidRegex) {
throw new UnexpectedTypeException($constraint, ValidRegex::class);
}
if (null === $value || '' === $value) {
return;
}
$regex = Regex::create();
$result = $regex->validate($value);
if (!$result->isValid) {
$this->context->buildViolation($constraint->message)
->setParameter('{{ pattern }}', $value)
->setParameter('{{ error }}', $result->error)
->addViolation();
}
}
}3. Use in your forms:
use App\Validator\Constraints\ValidRegex;
use Symfony\Component\Form\AbstractType;
use Symfony\Component\Form\Extension\Core\Type\TextType;
use Symfony\Component\Form\FormBuilderInterface;
class RegexPatternType extends AbstractType
{
public function buildForm(FormBuilderInterface $builder, array $options): void
{
$builder->add('pattern', TextType::class, [
'label' => 'Regex Pattern',
'constraints' => [
new ValidRegex(),
],
]);
}
}RegexParser includes a Rector rule for automatically optimizing regex patterns in your codebase.
1. Install Rector:
composer require --dev rector/rector2. Configure Rector (rector.php):
<?php
use Rector\Config\RectorConfig;
use RegexParser\Rector\RegexOptimizationRector;
return RectorConfig::configure()
->withPaths([
__DIR__ . '/src',
])
->withRules([
RegexOptimizationRector::class,
]);3. Run Rector:
vendor/bin/rector process --dry-runExample transformation:
// Before
preg_match('/(?:foo|bar|baz)/', $string);
// After (optimized by Rector)
preg_match('/[foo|bar|baz]/', $string); // if applicableCurrent Status: β Rector integration validated - 61/61 files processed successfully
RegexParser includes a PHPStan extension stub for future static analysis of regex patterns.
1. Install PHPStan:
composer require --dev phpstan/phpstan2. Enable the extension (phpstan.neon):
includes:
- vendor/yoeunes/regex-parser/extension.neon
parameters:
level: max
paths:
- src3. Run PHPStan:
vendor/bin/phpstan analyzeCurrent Status: β PHPStan runs successfully on library source code (0 errors at max level)
Note: Custom validation rules for preg_* functions are planned for future releases.
# Run the full test suite
./vendor/bin/phpunit
# Run specific test suites
./vendor/bin/phpunit tests/Unit
./vendor/bin/phpunit tests/Integration
# Run behavioral compliance tests
./vendor/bin/phpunit tests/Integration/BehavioralComplianceTest.phpRun the comprehensive validation script to verify library functionality:
php validate_library.phpExpected output:
Test 1: Sample Generation 4/4 PASSED β
Test 2: ReDoS Detection 4/4 PASSED β
Test 3: PCRE Feature Coverage 12/12 PASSED β
Test 4: Round-trip Validation 4/4 PASSED β
Test 5: Invalid Pattern Detection 3/3 PASSED β
OVERALL: 27/27 tests passed (100%)
Behavioral Compliance Tests: 19/19 tests, 128 assertions - ALL PASS β
A web demo is available to interactively test the library:
php server.phpThen open your browser to http://localhost:5000
The library includes a command-line tool for quick regex testing:
php bin/regex-parser '/your_regex_here/flags'Example:
php bin/regex-parser '/(?<email>[\w.-]+@[\w.-]+\.\w+)/i'Contributions are welcome! Please read our CONTRIBUTING.md for details on:
- Code of conduct
- Development setup
- How to submit pull requests
- Coding standards and style guide
- Testing requirements
- Clone the repository:
git clone https://github.com/yoeunes/regex-parser.git
cd regex-parser- Install dependencies:
composer install- Run tests:
./vendor/bin/phpunit- Run quality tools:
# PHPStan
cd tools/phpstan && composer install
php vendor/bin/phpstan analyze
# Rector
cd tools/rector && composer install
php vendor/bin/rector process --dry-run
# PHP CS Fixer
cd tools/php-cs-fixer && composer install
php vendor/bin/php-cs-fixer fix --dry-runIssue: "Class 'RegexParser\Regex' not found"
- Solution: Run
composer installto ensure autoloading is configured - Verify:
composer dump-autoload
Issue: "Memory limit exceeded" when using PHPStan
- Solution: Increase memory limit:
php -d memory_limit=512M vendor/bin/phpstan analyze
Issue: Pattern fails to parse
- Check the pattern uses valid PCRE syntax
- Review error message for specific issue location
- See VALIDATION_REPORT.md for known limitations
Issue: ReDoS false positives
- Update to latest version (false positives fixed in recent releases)
- Safe patterns like
/a+b/should now be correctly identified as safe
Issue: Backreferences not compiling correctly
- Update to latest version (backreference compilation fixed)
- Pattern
/(a)\1/should now round-trip correctly
- π Check VALIDATION_REPORT.md for known issues
- π Open an issue on GitHub
- π¬ Describe your pattern, expected behavior, and actual behavior
- π Include code samples and error messages
Literal extraction provides significant performance improvements for patterns with fixed prefixes/suffixes:
| Pattern | Subject | Without Optimization | With Optimization | Speedup |
|---|---|---|---|---|
/user_\d+/ |
"admin_123" | 1.2ΞΌs | 0.1ΞΌs | 12x faster |
/error: .*/ |
"info: msg" | 2.5ΞΌs | 0.2ΞΌs | 12.5x faster |
/\d{3}-\d{2}-\d{4}/ |
"abc-def-ghij" | 3.1ΞΌs | 0.15ΞΌs | 20x faster |
Benchmarks run on PHP 8.4 with OPcache enabled
This project is licensed under the MIT License. See the LICENSE file for details.
Built with modern PHP practices, inspired by the need for better regex tooling and static analysis capabilities.
Key Technologies:
- PHP 8.4+ with strict types
- Visitor pattern for extensible AST traversal
- Comprehensive PHPUnit test suite
- Modern development tooling (PHPStan, Rector, PHP-CS-Fixer)
- VALIDATION_REPORT.md - Detailed validation findings and test results
- PCRE Specification - Official PCRE syntax reference
- ReDoS Explained - Understanding Regular Expression Denial of Service