Skip to content

[Upload] Robust, resumable, multi-worker upload_folder#4200

Draft
Wauplin wants to merge 2 commits intomainfrom
upload-folder-v2
Draft

[Upload] Robust, resumable, multi-worker upload_folder#4200
Wauplin wants to merge 2 commits intomainfrom
upload-folder-v2

Conversation

@Wauplin
Copy link
Copy Markdown
Contributor

@Wauplin Wauplin commented May 6, 2026

Try it out

pip install "huggingface_hub @ git+https://github.com/huggingface/huggingface_hub.git@upload-folder-v2"

Then just use upload_folder / hf upload as usual, the new pipeline kicks in automatically.

✗ hf upload test-upload-v2 tmpdata/
Found 11100 files to upload
Reading cached metadata: 100%|█████████████████████████████| 11100/11100 [00:03<00:00, 3583.66it/s]
Pipeline: 0 to hash, 10200 to check mode, 117 to preupload, 658 to commit, 125 already done
  Preparing   ██████████░░░░░░░░░░  5,900 / 11,100
  Uploading   ░░░░░░░░░░░░░░░░░░░░  2 / 603 files  317MB · 3.68MB/s
  Committing  ███░░░░░░░░░░░░░░░░░  1,925 / 11,100  8 commits
Screencast.from.06-05-2026.18.35.15.webm

Summary

  • New _upload_folder_v2 module with a 4-stage pipeline (hash → mode → preupload → commit) that replaces the single-commit upload_folder when hf_xet is available.
  • Resumable: per-file metadata is persisted after each step. On restart, already-completed work is skipped.
  • Multi-commit: large folders are committed in adaptive batches (20→1000 files) instead of one giant commit.
  • Parallel: hashing and xet uploads run in a ThreadPoolExecutor.
  • Live 3-line progress display on TTY

Benefits over the legacy single-commit path:

  • Handles very large folders (10k+ files, multi-GB) that would timeout or OOM with a single commit.
  • Progress is never lost — interrupt and resume at any point.
  • Xet uploads go through a single aggregated progress bar instead of noisy per-file bars.

Drawbacks / rough edges:

  • Requires hf_xet — falls back to legacy path without it.
  • Multi-commit means the repo history shows multiple commits instead of one atomic commit (inherent to the approach).
  • PR creation flow creates the PR upfront then pushes commits to it.
  • This is a first pass — expect rough edges on error recovery, edge cases with .gitignore, and non-TTY output.

When hf_xet is not installed, upload_folder behaves exactly as before (single-commit path).


Note

Cursor Bugbot is generating a summary for commit fe2da27. Configure here.

Wauplin and others added 2 commits May 6, 2026 18:30
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bot-ci-comment
Copy link
Copy Markdown

bot-ci-comment Bot commented May 6, 2026

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant