Summary
Unicode property escapes (\p{L}, \p{N}, \P{L}, etc.) are not supported. Patterns using them are currently rejected or silently mis-matched.
Examples
\p{L} — any Unicode letter
\p{N} — any Unicode number
\p{Lu} — uppercase letter
\P{L} — negated: any non-letter
Impact
- 66 PCRE tests are currently filtered out entirely because they use unsupported features including Unicode properties
- Common in real-world patterns for internationalized text
Implementation Notes
- Difficulty: Medium-High (large Unicode category tables required)
- Files:
RegexParser.java (parse \p{...}), new UnicodePropertyCharClass AST node, ThompsonBuilder.java, charset integration
- Unicode data can be derived from JDK's
Character class to avoid external dependencies
Summary
Unicode property escapes (
\p{L},\p{N},\P{L}, etc.) are not supported. Patterns using them are currently rejected or silently mis-matched.Examples
\p{L}— any Unicode letter\p{N}— any Unicode number\p{Lu}— uppercase letter\P{L}— negated: any non-letterImpact
Implementation Notes
RegexParser.java(parse\p{...}), newUnicodePropertyCharClassAST node,ThompsonBuilder.java, charset integrationCharacterclass to avoid external dependencies