Commit ba31be1
authored
perf(pdf-server): lazy form extraction via range transport + incremental viewer scans (#639)
* perf(pdf-server): lazy form extraction via range transport + incremental viewer scans
Server: display_pdf now opens the document via PDFDataRangeTransport
(disableAutoFetch) and only runs the per-page form/annotation walk when
getFieldObjects() is non-empty, so form-free PDFs are probed with ~10-25%
of bytes instead of a full download. The unused viewFieldInfo Map is
removed.
Viewer: getDocument sets disableAutoFetch/disableStream; baseline
annotation scan and field-name mapping run lazily per rendered page
instead of walking every page after load, so first paint no longer
schedules a full-file pull.
E2E: new range-counting HTTPS fixture (W-9 for forms, generated
text+image PDF for no-forms) with stallAfterBytes control, and four
regression tests asserting form fields are returned, <30% served on
no-forms display_pdf, first page renders while later ranges are stalled,
and overlap stays bounded.
* fix(pdf-server): handle separated field/widget trees in extractFormSchema
pdfjs getFieldObjects() returns the full field-tree array. For PDFs with
a separated structure (pdf-lib, some authoring tools) the typed widget
sits at fields[1+] behind a typeless container at fields[0]; the previous
code only inspected fields[0] and skipped them all. Pick the first entry
with a non-empty type instead.
Makes the e2e forms.pdf fixture fully generated (no checked-in third-party
asset on the hot path); fw9.pdf stays as a unit-test fixture for the
hierarchical/XFA case.
* fix(pdf-server): address range-transport hangs and lazy-scan tombstone loss
PdfCacheRangeTransport:
- abort() is a no-op stub on PDFDataRangeTransport (it's the hook pdfjs
calls *on* the transport, not an upstream error channel). Expose a
.failed promise that rejects on the first fetch error and race every
pdfjs await against it in display_pdf, so transient network errors
surface into the existing catch instead of hanging the tool call.
- pdfjs coalesces adjacent missing chunks into one unbounded
requestDataRange; readPdfRange clamps each call to MAX_CHUNK_BYTES.
Loop and deliver in slices so every requested chunk is marked loaded.
Viewer (mcp-app.ts):
- The lazy per-page baseline scan left pdfBaselineAnnotations incomplete,
so persistAnnotations and getAnnotatedPdfBytes silently dropped
restoredRemovedIds tombstones for unvisited pages. Union those ids into
the computed diff and removedRefs.
Test fixture: release stalled handlers before resetStats/close so a
failing stalled test doesn't hang afterAll; fail fast if started with
NODE_ENV=production. NODE_TLS_REJECT_UNAUTHORIZED scope documented (full
per-process scoping needs a validateUrl localhost allow, tracked
separately).
* test(pdf-server): bridge coverage gaps and switch fixture to gated loopback HTTP
PdfCacheRangeTransport.deliver(): pdf.js's reader is keyed by the
original begin and removed after one delivery, so accumulate slices and
call onDataRange once with the full buffer (the previous multi-call
approach threw inside pdfjs). Covered by a new integration test that
drives getDocument()/getPage(1) on a >1MB PDF through a clamping
in-memory readPdfRange.
validateUrl: allow http://127.0.0.1|localhost|[::1] only when
PDF_SERVER_ALLOW_LOOPBACK_HTTP is set, so a remote deploy can't be made
to probe its own ports. Covered by env-on/off unit tests.
Fixture switched to plain HTTP (no openssl, no
NODE_TLS_REJECT_UNAUTHORIZED). Adds /error.pdf (500s after 50KB) and two
e2e tests: page 2 renders after stall release (>512KB object path), and
display_pdf returns gracefully on mid-load 500. Existing <30% test now
samples stats before the iframe loads.
New unit tests: display_pdf returns (not hangs) on mid-load fetch
failure via in-memory MCP client; computeDiff/serializeDiff contract
tests pinning the restoredRemovedIds tombstone-preservation behaviour.
* test(pdf-server): e2e for restoredRemovedIds tombstone preservation through viewer
Adds /with-native-annot.pdf fixture (2 pages, native /Text annot on
page 2) and a Playwright test that drives delete on page 2 → iframe
reload → interact add_annotations on page 1 (persist with page 2
unscanned) → assert localStorage diff still contains the removed id →
page 2 still shows it gone. Covers the mcp-app.ts glue the unit tests
can't reach.
* refactor(pdf-server): simplify per round-2 review and fix false-pass test
server.ts: extract probeFormFields() so display_pdf calls one helper
instead of inlining 40 lines of transport/orFail plumbing; make
extractFormSchema's fieldObjects param required (the optional branch
only existed for a test helper).
server.test.ts: the >1MB integration test now forces the image XObject
fetch via getOperatorList() and asserts max requestDataRange span >
MAX_CHUNK_BYTES, so it can't go vacuous. Dedupe makeRandomJpeg by
importing from the fixture.
mcp-app.ts: restore per-annotation try/catch isolation in the lazy
baseline scan (a throw was skipping AnnotationLayer.render); only
carry forward restoredRemovedIds while the baseline scan is incomplete
so a stale id can't pin dirty=true once every page is scanned.
tests: bump fixture JPEG to 1.1MB; consolidate 7 e2e tests to 4 (merge
overlap into the byte-budget test, merge stall+page-2, drop the
error-e2e duplicate of the unit integration test); delete
run-fixture.mjs; drop /error.pdf and ERROR_AFTER_BYTES; rename
tombstone test's describe and clarify page-2 covers the viewer
transport.
* fix(pdf-server): preserve partial probeFormFields results when extractFormFieldInfo throws
* test(pdf-server): fix tombstone e2e — deleted native annotations become a cleared card, not removed from DOM
* test(pdf-server): mark tombstone e2e as fixme — needs basic-host iframe-reload replay support
* test(pdf-server): link tombstone fixme to tracking issue #6421 parent 30a78b6 commit ba31be1
8 files changed
Lines changed: 1133 additions & 106 deletions
File tree
- examples/pdf-server
- src
- tests
- e2e
- helpers
- assets
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
7 | 10 | | |
8 | 11 | | |
9 | 12 | | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
10 | 16 | | |
11 | 17 | | |
12 | 18 | | |
| |||
289 | 295 | | |
290 | 296 | | |
291 | 297 | | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
292 | 558 | | |
293 | 559 | | |
294 | 560 | | |
| |||
0 commit comments