Add Qwen3.5 model support (27B dense and 35B-A3B MoE) by zhuzilin · Pull Request #1641 · THUDM/slime

zhuzilin · 2026-02-28T06:36:22Z

Need to upgrade to transformers 0.5.2 manually for qwen3.5 support.

New model plugin: slime_plugins/models/qwen3_5.py
- Qwen3_5GatedDeltaNet with separate QKV/Z projections, conv1d, and flat QKV split
- get_qwen3_5_spec replacing standard attention with linear attention per layer_types
New weight bridge: slime_plugins/mbridge/qwen3_5.py
- Handles VLM weight prefix (model.language_model.layers)
- Fused expert weight format for MoE (3D tensors -> per-expert slices)
- MTP layer support with individual expert format
New HF converter: slime/backends/megatron_utils/megatron_to_hf/qwen3_5.py
- TEGroupedMLP per-expert weight{i} -> HF fused expert format
- Proper gate/up split for swiglu experts
Fix sglang_rollout.py: skip processor for text-only VLM models
Model configs and run scripts for both 27B and 35B-A3B

Tested: Both models verified end-to-end with training.

27B: TP=1 SGLang (8 engines), TP=2/PP=2/CP=2 Megatron, logprob_diff=0.017
35B-A3B: TP=2 SGLang (4 engines), EP=8 Megatron, logprob_diff=0.012

- New model plugin: slime_plugins/models/qwen3_5.py - Qwen3_5GatedDeltaNet with separate QKV/Z projections, conv1d, and flat QKV split - get_qwen3_5_spec replacing standard attention with linear attention per layer_types - New weight bridge: slime_plugins/mbridge/qwen3_5.py - Handles VLM weight prefix (model.language_model.layers) - Fused expert weight format for MoE (3D tensors -> per-expert slices) - MTP layer support with individual expert format - New HF converter: slime/backends/megatron_utils/megatron_to_hf/qwen3_5.py - TEGroupedMLP per-expert weight{i} -> HF fused expert format - Proper gate/up split for swiglu experts - Fix sglang_rollout.py: skip processor for text-only VLM models - Model configs and run scripts for both 27B and 35B-A3B Tested: Both models verified end-to-end with training. - 27B: TP=1 SGLang (8 engines), TP=2/PP=2/CP=2 Megatron, logprob_diff=0.017 - 35B-A3B: TP=2 SGLang (4 engines), EP=8 Megatron, logprob_diff=0.012 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

zhuzilin merged commit 55828f3 into main Feb 28, 2026
2 checks passed

zhuzilin deleted the feature/qwen3_5 branch February 28, 2026 06:36

rohin-garg pushed a commit to aimosprite/slime that referenced this pull request Feb 28, 2026

Add Qwen3.5 model support (27B dense and 35B-A3B MoE) (THUDM#1641)

79f721e

Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Zhichenzzz mentioned this pull request Mar 16, 2026

[model] Add Qwen3.5 model support (4B, 9B, 27B and 35B-A3B) radixark/miles#740

Merged

dongyuanjushi pushed a commit to dongyuanjushi/slime that referenced this pull request Mar 18, 2026

Add Qwen3.5 model support (27B dense and 35B-A3B MoE) (THUDM#1641)

2a90f91

Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

yzoaim mentioned this pull request Mar 21, 2026

feat: add Qwen3.5-4B model support Gen-Verse/OpenClaw-RL#43

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3.5 model support (27B dense and 35B-A3B MoE)#1641

Add Qwen3.5 model support (27B dense and 35B-A3B MoE)#1641
zhuzilin merged 1 commit intomainfrom
feature/qwen3_5

zhuzilin commented Feb 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhuzilin commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhuzilin commented Feb 28, 2026 •

edited

Loading