[Download] Harden resume validation#4143
[Download] Harden resume validation#4143anujbharambe wants to merge 1 commit intohuggingface:mainfrom
Conversation
9ed38a2 to
35a40b2
Compare
This commit hardens the download resumption logic to prevent data corruption and improve reliability:
- Implements symmetric validation for ETag and expected_size in resume metadata.
- Fixes unreachable 416 (Range Not Satisfiable) handler by checking status before raising.
- Ensures etag/Range parameters are correctly forwarded during connection-error retries.
- Fixes Content-Range validation to account for initial Range offsets and enforces strict validation.
- Always persists resume metadata sidecars from the first download attempt.
- Ensures metadata sidecars are cleared when force_download=True, even if incomplete file is missing.
- Removes redundant ETag formatting logic as normalization is handled upstream.
- Removes error-prone sentinel values ('none') in favor of unambiguous empty strings.
- Adds comprehensive tests for Range-based resumes and metadata integrity.
35a40b2 to
299fda4
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 4 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 299fda4. Configure here.
|
Hey @kripper @Wauplin - this PR is ready for review. All Cursor Bugbot comments have been addressed and resolved. The remaining open Bugbot findings are low-severity style observations (dead code branch, duplicate regex), not correctness issues. Summary of what this does: Closes #4060 |

Summary
.incompletefiles to detect mismatches and restart safely.If-RangeandContent-Rangebefore appending data.http_getbehavior.Details
This change hardens resume behavior for large downloads by:
Content-Range.This addresses unreliable resume cases and prevents silent corruption due to appending the wrong byte ranges.
Reviewers
@kripper
Closes #4060
Note
Medium Risk
Changes core download/resume behavior by introducing new validation paths (ETag/Content-Range checks) and restart fallbacks, which could affect large-file downloads and retry semantics if edge cases are mishandled.
Overview
Hardens resumable downloads to prevent silent corruption.
http_getnow supports an optionaletag, sends it viaIf-Rangeon resume, handles416 Range Not Satisfiableby restarting cleanly, and validates206responses by checkingContent-Rangematches the requested resume position (otherwise restart from scratch).Persists and validates resume state for
.incompletefiles._download_to_tmp_and_movenow writes a sidecar.metadatafile (etag/expected size/normalized URL) and deletes/restarts when metadata or size is inconsistent, cleaning up the metadata file on completion or whenforce_download=True. Tests add coverage for metadata mismatch resets and updatehttp_getretry mocks to includestatus_code/Content-Rangebehavior.Reviewed by Cursor Bugbot for commit 299fda4. Bugbot is set up for automated code reviews on this repo. Configure here.