Optimize PipeTable parsing: O(n²) → O(n) for 3.7x–85x speedup, enables 10K+ row tables#922
Conversation
Pipe tables were creating deeply nested tree structures where each pipe delimiter contained all subsequent content as children, causing O(n²) traversal complexity for n cells. This change restructures the parser to use a flat sibling-based structure, treating tables as matrices rather than nested trees. Key changes: - Set IsClosed=true on PipeTableDelimiterInline to prevent nesting - Add PromoteNestedPipesToRootLevel() to flatten pipes nested in emphasis - Update cell boundary detection to use sibling traversal - Move EmphasisInlineParser before PipeTableParser in processing order - Fix EmphasisInlineParser to continue past IsClosed delimiters - Add ContainsParentOrSiblingOfType<T>() helper for flat structure detection Performance improvements (measured on typical markdown content): | Rows | Before | After | Speedup | |------|-----------|---------|---------| | 100 | 542 μs | 150 μs | 3.6x | | 500 | 23,018 μs | 763 μs | 30x | | 1000 | 89,418 μs | 1,596 μs| 56x | | 1500 | 201,593 μs| 2,740 μs| 74x | | 5000 | CRASH | 10,588 μs| ∞ | | 10000| CRASH | 18,551 μs| ∞ | Tables with 5000+ rows previously crashed due to stack overflow from recursive depth. They now parse successfully with linear time complexity.
|
Thank you! Yep, this code was bad when I implemented it in the first place, I was not very inspired. 😅 I assume you have been using a coding agent, would you mind sharing which one with which level of thinking? |
|
Aye! Claude Code Anecdotally this is the first time I had to get firm with it stating "THERE IS A WAY". It kept giving up on emphasis inlined in table cells with pipes that are part of the emphasis. In the end the trick was ensuring it registers itself after the emphasis parser. |
|
Hi @xoofx @MihaZupan is it possible to throw a release out with change ? Eager to get this updated in my app :) |
Yep, sorry, it is out in |
Includes xoofx/markdig#922 ## Benchmark — Best-of-3 (16,949 files) | Repository | Files | 0.44.0 | 0.45.0 | Delta | |---|---|---|---|---| | logstash-docs-md | 3,054 | 1.85 ms/file | 1.89 ms/file | +0.04 (~same) | | integration-docs | 645 | 6.21 ms/file | 4.91 ms/file | **-1.30 (21% faster)** | | docs-content | 5,058 | 0.56 ms/file | 0.56 ms/file | 0.00 (same) | | detection-rules | 1,728 | 1.74 ms/file | 1.63 ms/file | **-0.11 (6% faster)** | | beats | 1,813 | 1.31 ms/file | 1.26 ms/file | -0.05 (~same) | | elasticsearch | 2,460 | 0.85 ms/file | 0.87 ms/file | +0.02 (~same) | | **Total** | **16,949** | **26.75s (1.58 ms/file)** | **25.40s (1.50 ms/file)** | **-1.35s (5% faster)** | Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Includes xoofx/markdig#922 ## Benchmark — Best-of-3 (16,949 files) | Repository | Files | 0.44.0 | 0.45.0 | Delta | |---|---|---|---|---| | logstash-docs-md | 3,054 | 1.85 ms/file | 1.89 ms/file | +0.04 (~same) | | integration-docs | 645 | 6.21 ms/file | 4.91 ms/file | **-1.30 (21% faster)** | | docs-content | 5,058 | 0.56 ms/file | 0.56 ms/file | 0.00 (same) | | detection-rules | 1,728 | 1.74 ms/file | 1.63 ms/file | **-0.11 (6% faster)** | | beats | 1,813 | 1.31 ms/file | 1.26 ms/file | -0.05 (~same) | | elasticsearch | 2,460 | 0.85 ms/file | 0.87 ms/file | +0.02 (~same) | | **Total** | **16,949** | **26.75s (1.58 ms/file)** | **25.40s (1.50 ms/file)** | **-1.35s (5% faster)** | Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
This PR fundamentally rearchitects the
PipeTableParserto use a flat sibling structure instead of a deeply nested tree, reducing time complexity from O(n²) to O(n) for large tables.The Problem
How the Old Parser Worked
The original parser allowed pipe delimiters to nest content as children. For a simple table like:
The inline tree structure was deeply nested:
Depth = O(n) where n = number of cells
Why This Was Problematic
O(n²) Cell Boundary Detection: To find cell boundaries, the parser walked up the parent chain from each delimiter. With n delimiters nested n-deep, this required O(n²) operations.
Stack Overflow on Large Tables: .NET's default stack depth limit caused tables with 1000+ rows to crash with
DepthLimitExceededException.Quadratic Time Scaling:
Large Tables Simply Failed: 5000+ row tables couldn't be parsed at all.
The Solution
Flat Sibling Structure
By setting
IsClosed = trueonPipeTableDelimiterInline, subsequent content becomes siblings rather than children:Now produces a flat structure:
Depth = O(1) constant
Cell Boundary Detection
Finding cell content is now a simple sibling walk:
Handling Nested Pipes
Pipes can still end up nested inside unmatched emphasis:
The
PromoteNestedPipesToRootLevelmethod detects and promotes these:Benchmarks
Baseline Results (Before)
❌ = Failed with depth limit exceeded
Current Results (After)
Performance Improvement
Memory Improvement
Scaling Verification (Linear)
Time per row is nearly constant, confirming O(n) complexity.
Breaking Changes
None. The output AST is identical; only the internal parsing strategy changed.
Test Results
All 3,595 existing tests pass.