Skip to content

Rework Megatron to Support Adding New Models#674

Open
FurtherAI wants to merge 200 commits intomainfrom
austin/megatron_models
Open

Rework Megatron to Support Adding New Models#674
FurtherAI wants to merge 200 commits intomainfrom
austin/megatron_models

Conversation

@FurtherAI
Copy link
Copy Markdown
Collaborator

New/General Model Support

This PR turns the Megatron/vLLM/Qwen model-support work into a production path rather than a pile of model-specific patches.

Motivation / Outcome

  • Decouple ART from vLLM dependencies. ART no longer installs or imports vllm in the main environment. vLLM now lives in a separate art-vllm-runtime package/venv with its own lockfile, so ART can pin its own Torch/Flash/TE stack independently.
  • Keep dedicated and shared-GPU serving working after separation. ART launches the external vLLM runtime, uses stock OpenAI/vLLM endpoints where possible, and keeps only small ART runtime routes for sleep/wake and served-model-name control. Shared-GPU mode still uses vLLM sleep mode.
  • Support efficient trainer-to-vLLM weight sync without importing vLLM. The trainer side now has an ART-owned NCCL weight-transfer subset and merged-weight export path, so Megatron can stream merged weights to the isolated runtime without depending on vLLM internals.
  • Introduce a real Megatron model-support framework. Model-specific behavior moved behind registry/handler boundaries: target modules, dense vs MoE topology, dependency floors, native vLLM LoRA status, provider patching, LoRA wrapping/export, and architecture discovery now live in one extensible system.
  • Add validated Qwen model families. Registers Qwen3 dense/MoE and Qwen3.5/Qwen3.6 dense/MoE support, including Qwen3.5/3.6 text-only Megatron runtime, GatedDeltaNet layers, packed mRoPE position handling, dense-vs-MoE topology selection, and explicit MTP disablement for ART training.
  • Fold in most of PR Transformers 5 and Qwen 3.5/3.6 official suppor #667 in the new design. This includes Qwen3.5/Qwen3.6 support, chat-template compatibility, official vLLM upgrade direction, and native vLLM LoRA serving, but implemented through the handler/registry/runtime-isolation architecture instead of scattered patches. Does not include some features such as chat-template kwargs.
  • Make LoRA disk format and serving paths explicit. Canonical Megatron LoRA checkpoints on disk are vLLM-compatible; Megatron loads through handler codecs, vLLM can serve native LoRA for validated handlers, and merged serving remains available for models that need it.
  • GDN packed training correctness and performance. Adds shared-prefix GDN execution support, uses Megatron/TE modules for linear/norm/LoRA behavior, preserves sequence-parallel semantics, and adds compile workarounds only through model-handler policy.
  • Harden process lifecycle. Adds managed child-process cleanup and backend/service teardown so vLLM and Megatron subprocesses do not survive parent death or failed runs.
  • Add packaging and release support for the split runtime. The package build now creates and bundles the art-vllm-runtime wheel plus its pyproject.toml, uv.lock, and manifest into the root openpipe-art wheel/sdist without adding vllm to root package metadata. Release/package workflows use the new build script and validate that published ART artifacts contain the runtime bundle while keeping vLLM install-time resolution isolated to the managed runtime environment. Packaging/release has been tested locally to the extent we can, but needs to be validated in the real GitHub release process and actual installation.
  • Reorganize Megatron code and tests. Megatron runtime, training, weights, model support, GDN, kernels, and runtime-isolation code are grouped into clearer submodules. New integration tests live under tests/integration/megatron/....

Validation

  • Added a model-support workflow with stages for dependency resolution, architecture discovery, HF parity, LoRA coverage, merged vLLM serving, correctness/sensitivity, chat-template rollout, packed position ids, native vLLM LoRA, and yes/no trainability. These are the validation steps that support that a new model has been implemented correctly.
  • Runtime-isolation tests verify ART import does not require vLLM or trigger vLLM import side effects, and that the runtime project/env boundary works.
  • Representative workflow artifacts cover Qwen3 MoE, Qwen3 dense, Qwen3.5/3.6 MoE, and Qwen3.5/3.6 dense paths, including trainability gates.

Future Validation

  • Main things that aren't included are training benchmarks and measurements of train-inference mismatch, which would ensure that the weights that are trained are all used and do the same math (are sliced into the right places, etc.) in vLLM.

FurtherAI added 30 commits April 8, 2026 02:53
…_and_trainability_main

# Conflicts:
#	src/art/megatron/train.py
@mintlify
Copy link
Copy Markdown

mintlify Bot commented May 8, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
openpipe-art 🟢 Ready View Preview May 8, 2026, 6:43 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

FurtherAI added 9 commits May 8, 2026 19:48
# Conflicts:
#	pyproject.toml
#	src/art/local/backend.py
#	src/art/megatron/compile_workarounds.py
#	src/art/megatron/flex_attention.py
#	src/art/megatron/jobs.py
#	src/art/megatron/lora.py
#	src/art/megatron/offload.py
#	src/art/megatron/provider.py
#	src/art/megatron/runtime/backend.py
#	src/art/megatron/service.py
#	src/art/megatron/setup.sh
#	src/art/megatron/train.py
#	src/art/pipeline_trainer/trainer.py
#	src/art/preprocessing/tokenize.py
#	src/art/tinker/renderers.py
#	src/art/tinker/server.py
#	src/art/unsloth/service.py
#	src/art/unsloth/train.py
#	src/art/vllm/patches.py
#	tests/integration/megatron/model_support/oracle_worker.py
#	tests/unit/test_preprocessing_tokenize.py
#	tests/unit/test_vllm_patches_contract.py
#	uv.lock
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant