Convert PDFs to clean Markdown, chunk into logical sections, and extract embedded tables to CSV.
setup— provision the local extractor venv and verify system tools (pdftotext, ocrmypdf, tesseract).pdf-to-markdown— convert a single PDF to Markdown, picking marker / docling / pymupdf4llm based on layout complexity.ocr-scanned-pdf— runocrmypdfto add a text layer to scanned/image PDFs. Auto-invoked when needed.chunk-markdown— split a long.mdinto logical chapters/sections with a TOON manifest.extract-tables— pull tables from a PDF (camelot/tabula) into CSV files with a TOON index.doc-to-everything— end-to-end orchestrator: PDF → Markdown → chunks → tables in a self-contained workspace.
Running doc-to-everything on book.pdf produces:
book/
source.pdf
full.md
assets/
chunks/
index.toon
00-frontmatter.md
01-introduction.md
...
tables/
index.toon
01-p12-revenue.csv
...
manifest.toon
claude plugins install document-to-markdown@danielrosehillSystem: pdftotext (poppler-utils), ocrmypdf, tesseract-ocr.
Python (managed via uv venv under $CLAUDE_USER_DATA/document-to-markdown/venv/): marker-pdf, docling, pymupdf4llm, camelot-py[cv], tabula-py, pandas. Run the setup skill on first use.