Optimize PipeTable parsing: O(n²) → O(n) for 3.7x–85x speedup, enables 10K+ row tables by Mpdreamz · Pull Request #922 · xoofx/markdig

Mpdreamz · 2026-01-30T15:42:55Z

Summary

This PR fundamentally rearchitects the PipeTableParser to use a flat sibling structure instead of a deeply nested tree, reducing time complexity from O(n²) to O(n) for large tables.

The Problem

How the Old Parser Worked

The original parser allowed pipe delimiters to nest content as children. For a simple table like:

| a | b |
| c | d |

The inline tree structure was deeply nested:

PipeDelimiter [|]
└── "a"
    └── PipeDelimiter [|]
        └── "b"
            └── LineBreak [\n]
                └── PipeDelimiter [|]
                    └── "c"
                        └── PipeDelimiter [|]
                            └── "d"
                                └── LineBreak [\n]

Depth = O(n) where n = number of cells

Why This Was Problematic

O(n²) Cell Boundary Detection: To find cell boundaries, the parser walked up the parent chain from each delimiter. With n delimiters nested n-deep, this required O(n²) operations.
Stack Overflow on Large Tables: .NET's default stack depth limit caused tables with 1000+ rows to crash with DepthLimitExceededException.
Quadratic Time Scaling:
- 100→500 rows (5x): 42x slower (not 5x)
- 500→1000 rows (2x): 3.9x slower (not 2x)
- 1000→1500 rows (1.5x): 2.3x slower (not 1.5x)
Large Tables Simply Failed: 5000+ row tables couldn't be parsed at all.

The Solution

Flat Sibling Structure

By setting IsClosed = true on PipeTableDelimiterInline, subsequent content becomes siblings rather than children:

| a | b |
| c | d |

Now produces a flat structure:

[|] ← [a] ← [|] ← [b] ← [|] ← [\n] ← [|] ← [c] ← [|] ← [d] ← [|] ← [\n]
 ↑────↑─────↑─────↑─────↑──────↑──────↑─────↑─────↑─────↑─────↑──────↑
                    All siblings at root level

Depth = O(1) constant

Cell Boundary Detection

Finding cell content is now a simple sibling walk:

For cell "b" in `| a | b |`:

    [|]  [a]  [|]  [b]  [|]  [\n]
               ↑    ↑    ↑
             start  │   current delimiter
                   cell content
                   
Walk backward from [|] until hitting another [|] or [\n]

Handling Nested Pipes

Pipes can still end up nested inside unmatched emphasis:

*a | b*|

The PromoteNestedPipesToRootLevel method detects and promotes these:

Before: EmphasisDelimiter { "a", Pipe, "b" }
After:  EmphasisDelimiter { "a" } ← Pipe ← Container { "b" }

Benchmarks

Baseline Results (Before)

Method	Mean	Error	StdDev	Gen0	Gen1	Allocated
'PipeTable 100 rows x 5 cols'	542.0 µs	2.25 µs	1.88 µs	2.9297	0.9766	367.38 KB
'PipeTable 500 rows x 5 cols'	23,018.4 µs	150.30 µs	133.24 µs	-	-	1818.08 KB
'PipeTable 1000 rows x 5 cols'	89,418.0 µs	507.04 µs	474.28 µs	-	-	3702.70 KB
'PipeTable 1500 rows x 5 cols'	201,593.3 µs	2,133.24 µs	1,995.44 µs	-	-	5660.16 KB
'PipeTable 5000 rows x 5 cols'	❌	--	--	--	--	--
'PipeTable 10000 rows x 5 cols'	❌	--	--	--	--	--

❌ = Failed with depth limit exceeded

Current Results (After)

Method	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
'PipeTable 100 rows x 5 cols'	147.2 µs	1.75 µs	1.46 µs	2.9297	0.7324	0.4883	360.54 KB
'PipeTable 500 rows x 5 cols'	743.3 µs	7.30 µs	6.10 µs	13.6719	5.8594	5.8594	1772.96 KB
'PipeTable 1000 rows x 5 cols'	1,530.0 µs	28.71 µs	29.48 µs	25.3906	11.7188	11.7188	3547.08 KB
'PipeTable 1500 rows x 5 cols'	2,360.1 µs	43.73 µs	117.48 µs	39.0625	19.5313	19.5313	5377.33 KB
'PipeTable 5000 rows x 5 cols'	8,044.9 µs	39.83 µs	33.26 µs	78.1250	46.8750	46.8750	18121.73 KB
'PipeTable 10000 rows x 5 cols'	16,383.8 µs	124.95 µs	116.88 µs	125.0000	93.7500	93.7500	36538.63 KB

Performance Improvement

Rows	Before	After	Speedup
100	542 µs	147 µs	3.7x
500	23,018 µs	743 µs	31x
1000	89,418 µs	1,530 µs	58x
1500	201,593 µs	2,360 µs	85x
5000	❌ crashed	8,045 µs	✅ works
10000	❌ crashed	16,384 µs	✅ works

Memory Improvement

Rows	Before	After	Reduction
100	367.38 KB	360.54 KB	1.9%
500	1818.08 KB	1772.96 KB	2.5%
1000	3702.70 KB	3547.08 KB	4.2%
1500	5660.16 KB	5377.33 KB	5.0%

Scaling Verification (Linear)

Rows	Time	Time/Row	Scaling
1000	1,530 µs	1.53 µs	-
5000 (5x)	8,045 µs	1.61 µs	✅ ~5x
10000 (10x)	16,384 µs	1.64 µs	✅ ~10x

Time per row is nearly constant, confirming O(n) complexity.

Breaking Changes

None. The output AST is identical; only the internal parsing strategy changed.

Test Results

All 3,595 existing tests pass.

Pipe tables were creating deeply nested tree structures where each pipe delimiter contained all subsequent content as children, causing O(n²) traversal complexity for n cells. This change restructures the parser to use a flat sibling-based structure, treating tables as matrices rather than nested trees. Key changes: - Set IsClosed=true on PipeTableDelimiterInline to prevent nesting - Add PromoteNestedPipesToRootLevel() to flatten pipes nested in emphasis - Update cell boundary detection to use sibling traversal - Move EmphasisInlineParser before PipeTableParser in processing order - Fix EmphasisInlineParser to continue past IsClosed delimiters - Add ContainsParentOrSiblingOfType<T>() helper for flat structure detection Performance improvements (measured on typical markdown content): | Rows | Before | After | Speedup | |------|-----------|---------|---------| | 100 | 542 μs | 150 μs | 3.6x | | 500 | 23,018 μs | 763 μs | 30x | | 1000 | 89,418 μs | 1,596 μs| 56x | | 1500 | 201,593 μs| 2,740 μs| 74x | | 5000 | CRASH | 10,588 μs| ∞ | | 10000| CRASH | 18,551 μs| ∞ | Tables with 5000+ rows previously crashed due to stack overflow from recursive depth. They now parse successfully with linear time complexity.

xoofx · 2026-01-30T16:42:54Z

Thank you! Yep, this code was bad when I implemented it in the first place, I was not very inspired. 😅

I assume you have been using a coding agent, would you mind sharing which one with which level of thinking?

Mpdreamz · 2026-01-30T17:54:11Z

Aye! Claude Code Claude Opus 4.5

Anecdotally this is the first time I had to get firm with it stating "THERE IS A WAY". It kept giving up on emphasis inlined in table cells with pipes that are part of the emphasis.

In the end the trick was ensuring it registers itself after the emphasis parser.

Mpdreamz · 2026-02-10T06:04:36Z

Hi @xoofx @MihaZupan is it possible to throw a release out with change ?

Eager to get this updated in my app :)

xoofx · 2026-02-10T06:23:03Z

Hi @xoofx @MihaZupan is it possible to throw a release out with change ?

Eager to get this updated in my app :)

Yep, sorry, it is out in 0.45.0 ☺️

Includes xoofx/markdig#922 ## Benchmark — Best-of-3 (16,949 files) | Repository | Files | 0.44.0 | 0.45.0 | Delta | |---|---|---|---|---| | logstash-docs-md | 3,054 | 1.85 ms/file | 1.89 ms/file | +0.04 (~same) | | integration-docs | 645 | 6.21 ms/file | 4.91 ms/file | **-1.30 (21% faster)** | | docs-content | 5,058 | 0.56 ms/file | 0.56 ms/file | 0.00 (same) | | detection-rules | 1,728 | 1.74 ms/file | 1.63 ms/file | **-0.11 (6% faster)** | | beats | 1,813 | 1.31 ms/file | 1.26 ms/file | -0.05 (~same) | | elasticsearch | 2,460 | 0.85 ms/file | 0.87 ms/file | +0.02 (~same) | | **Total** | **16,949** | **26.75s (1.58 ms/file)** | **25.40s (1.50 ms/file)** | **-1.35s (5% faster)** | Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Includes xoofx/markdig#922 ## Benchmark — Best-of-3 (16,949 files) | Repository | Files | 0.44.0 | 0.45.0 | Delta | |---|---|---|---|---| | logstash-docs-md | 3,054 | 1.85 ms/file | 1.89 ms/file | +0.04 (~same) | | integration-docs | 645 | 6.21 ms/file | 4.91 ms/file | **-1.30 (21% faster)** | | docs-content | 5,058 | 0.56 ms/file | 0.56 ms/file | 0.00 (same) | | detection-rules | 1,728 | 1.74 ms/file | 1.63 ms/file | **-0.11 (6% faster)** | | beats | 1,813 | 1.31 ms/file | 1.26 ms/file | -0.05 (~same) | | elasticsearch | 2,460 | 0.85 ms/file | 0.87 ms/file | +0.02 (~same) | | **Total** | **16,949** | **26.75s (1.58 ms/file)** | **25.40s (1.50 ms/file)** | **-1.35s (5% faster)** | Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Mpdreamz added 2 commits January 30, 2026 16:33

remove baseline results file

6c35d8a

Do not use System.Index and fix nullabillity checks for older platforms

cc7e38f

xoofx added the enhancement label Jan 30, 2026

xoofx merged commit d47fbc7 into xoofx:master Jan 30, 2026
3 checks passed

MihaZupan mentioned this pull request Jan 30, 2026

Improve performance for large tables #180

Closed

Mpdreamz mentioned this pull request Feb 11, 2026

Update Markdig 0.44.0 to 0.45.0 elastic/docs-builder#2686

Merged

xoofx mentioned this pull request Feb 28, 2026

Empty pipe sequences incorrectly parsed as valid tables #927

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize PipeTable parsing: O(n²) → O(n) for 3.7x–85x speedup, enables 10K+ row tables#922

Optimize PipeTable parsing: O(n²) → O(n) for 3.7x–85x speedup, enables 10K+ row tables#922
xoofx merged 3 commits intoxoofx:masterfrom
Mpdreamz:fix/large-table-grids

Mpdreamz commented Jan 30, 2026

Uh oh!

xoofx commented Jan 30, 2026

Uh oh!

Mpdreamz commented Jan 30, 2026

Uh oh!

Uh oh!

Mpdreamz commented Feb 10, 2026

Uh oh!

xoofx commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Mpdreamz commented Jan 30, 2026

Summary

The Problem

How the Old Parser Worked

Why This Was Problematic

The Solution

Flat Sibling Structure

Cell Boundary Detection

Handling Nested Pipes

Benchmarks

Baseline Results (Before)

Current Results (After)

Performance Improvement

Memory Improvement

Scaling Verification (Linear)

Breaking Changes

Test Results

Uh oh!

xoofx commented Jan 30, 2026

Uh oh!

Mpdreamz commented Jan 30, 2026

Uh oh!

Uh oh!

Mpdreamz commented Feb 10, 2026

Uh oh!

xoofx commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants