Document-To-Markdown

Convert PDFs to clean Markdown, chunk into logical sections, and extract embedded tables to CSV.

Skills

setup — provision the local extractor venv and verify system tools (pdftotext, ocrmypdf, tesseract).
pdf-to-markdown — convert a single PDF to Markdown, picking marker / docling / pymupdf4llm based on layout complexity.
ocr-scanned-pdf — run ocrmypdf to add a text layer to scanned/image PDFs. Auto-invoked when needed.
chunk-markdown — split a long .md into logical chapters/sections with a TOON manifest.
extract-tables — pull tables from a PDF (camelot/tabula) into CSV files with a TOON index.
doc-to-everything — end-to-end orchestrator: PDF → Markdown → chunks → tables in a self-contained workspace.

Output layout

Running doc-to-everything on book.pdf produces:

book/
  source.pdf
  full.md
  assets/
  chunks/
    index.toon
    00-frontmatter.md
    01-introduction.md
    ...
  tables/
    index.toon
    01-p12-revenue.csv
    ...
  manifest.toon

Installation

claude plugins install document-to-markdown@danielrosehill

Dependencies

System: pdftotext (poppler-utils), ocrmypdf, tesseract-ocr. Python (managed via uv venv under $CLAUDE_USER_DATA/document-to-markdown/venv/): marker-pdf, docling, pymupdf4llm, camelot-py[cv], tabula-py, pandas. Run the setup skill on first use.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.claude-plugin		.claude-plugin
skills		skills
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document-To-Markdown

Skills

Output layout

Installation

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Document-To-Markdown

Skills

Output layout

Installation

Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages