from_pretrained orchestration + distributed save/load by 3outeille · Pull Request #45409 · huggingface/transformers

3outeille · 2026-04-13T14:27:38Z

Summary

Full distributed_config integration in from_pretrained() — mesh creation, apply TP + FSDP, attach model.device_mesh
gather_full_state_dict() for streaming DTensor→full tensor saving (rank 0 only)
convert_strided_to_shard() / restore_strided_from_shard() for DCP compatibility with _StridedShard
save_optimizer() / load_optimizer() in distributed/utils.py
Rename apply_fsdp2 → apply_fully_shard_data_parallel
Trainer integration with distributed_config

Part of the distributed training API chain: #44989

Chain: main ← #44989 ← #44083 ← #44974 ← #45028 ← #45408 ← this PR

Review question

Does from_pretrained wire things up in the right order? Is save/load round-trip correct?

Test plan

End-to-end from_pretrained with distributed_config
gather_full_state_dict() roundtrip verification
save_optimizer() / load_optimizer() roundtrip
Run existing FSDP and TP mixin tests

- Add gather_full_state_dict() for DTensor→full tensor saving - Add convert_strided_to_shard() / restore_strided_from_shard() for DCP - Add _redistribute_dtensor() helper - Full distributed_config integration in from_pretrained/save_pretrained - Rename apply_fsdp2 → apply_fully_shard_data_parallel - save_optimizer() / load_optimizer() in distributed/utils - Trainer integration with distributed_config - Updated FSDP and TP tests for new orchestration API - DTensor shard-on-read test updates

src/transformers/integrations/__init__.py

src/transformers/modeling_utils.py

HuggingFaceDocBuilderDev · 2026-04-14T09:40:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

# Conflicts: # src/transformers/distributed/utils.py

* MoE expert parallelism + sequence parallelism - Add PackedColwiseParallel for fused gate_up_proj weights - Add MoEExpertsParallel with per-expert DTensor sharding - Add PrepareModuleInputOutput for SP allgather/split hooks - Add _AllReduceBackward for MoE routing weight gradients - Extend TPStyle with moe_experts, packed_colwise, activation, module kinds - _StridedShard handling in core_model_loading for interleaved weights - MoE model configs: mixtral, deepseek_v3, qwen3 with SP plans - DTensor rotary_pos_emb guard for mixtral * Fix ruff linting and formatting * Fix ruff formatting in core_model_loading.py * Restore _IdentityOp accidentally removed in 25a1f48 The _IdentityOp class (added by PR #44983) was accidentally deleted during the MoE expert parallelism work. It is needed by finegrained_fp8.py and metal_quantization.py as a pass-through reverse_op for dequantize operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Backport new TP/FSDP API + fix DTensor imports in Copied-from models * from_pretrained orchestration + distributed save/load (#45409) * from_pretrained orchestration + save/load - Add gather_full_state_dict() for DTensor→full tensor saving - Add convert_strided_to_shard() / restore_strided_from_shard() for DCP - Add _redistribute_dtensor() helper - Full distributed_config integration in from_pretrained/save_pretrained - Rename apply_fsdp2 → apply_fully_shard_data_parallel - save_optimizer() / load_optimizer() in distributed/utils - Trainer integration with distributed_config - Updated FSDP and TP tests for new orchestration API - DTensor shard-on-read test updates * revert distributed utils * eaaea * all tests for core modeling are passing * populate import from init for tp * ruff * ruff --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-14T16:18:10Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45409&sha=39bea2

3outeille mentioned this pull request Apr 13, 2026

🚨 Distributed training API #44989

Draft

3outeille commented Apr 14, 2026

View reviewed changes

src/transformers/integrations/__init__.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py Show resolved Hide resolved

src/transformers/modeling_utils.py Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

3outeille force-pushed the moe-sequence-parallel branch from e04c7d9 to 24ca327 Compare April 14, 2026 09:54

3outeille force-pushed the orchestration-save-load branch from 815b5b2 to 7361deb Compare April 14, 2026 09:55

3outeille force-pushed the moe-sequence-parallel branch from 24ca327 to 7f297e0 Compare April 14, 2026 13:44

Merge branch 'moe-sequence-parallel' into orchestration-save-load

1ecc329

# Conflicts: # src/transformers/distributed/utils.py

3outeille force-pushed the orchestration-save-load branch from 7361deb to 1ecc329 Compare April 14, 2026 13:45

3outeille added 10 commits April 14, 2026 14:24

Merge branch 'moe-sequence-parallel' into orchestration-save-load

1e76b23

revert distributed utils

b1e9179

eaaea

11b4d67

all tests for core modeling are passing

5948a1d

populate import from init for tp

01311a6

Merge branch 'moe-sequence-parallel' into orchestration-save-load

b78f28c

Merge branch 'moe-sequence-parallel' into orchestration-save-load

72d46a5

Merge branch 'moe-sequence-parallel' into orchestration-save-load

16916dd

ruff

2e0045c

ruff

39bea22

3outeille merged commit bbf3ab6 into moe-sequence-parallel Apr 14, 2026
19 of 28 checks passed

3outeille deleted the orchestration-save-load branch April 14, 2026 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

from_pretrained orchestration + distributed save/load#45409

from_pretrained orchestration + distributed save/load#45409
3outeille merged 12 commits intomoe-sequence-parallelfrom
orchestration-save-load

3outeille commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 14, 2026

Uh oh!

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

3outeille commented Apr 13, 2026

Summary

Review question

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 14, 2026

Uh oh!

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants