Skip to content

Commit 0c51734

Browse files
authored
Merge branch 'main' into add-neuron-backend
2 parents 3367409 + 5d207e7 commit 0c51734

77 files changed

Lines changed: 9114 additions & 806 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.ai/AGENTS.md

Lines changed: 5 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -24,54 +24,10 @@ Strive to write code as simple and explicit as possible.
2424

2525
### Models
2626
- All layer calls should be visible directly in `forward` — avoid helper functions that hide `nn.Module` calls.
27-
- Try to not introduce graph breaks as much as possible for better compatibility with `torch.compile`. For example, DO NOT arbitrarily insert operations from NumPy in the forward implementations.
28-
- Attention must follow the diffusers pattern: both the `Attention` class and its processor are defined in the model file. The processor's `__call__` handles the actual compute and must use `dispatch_attention_fn` rather than calling `F.scaled_dot_product_attention` directly. The attention class inherits `AttentionModuleMixin` and declares `_default_processor_cls` and `_available_processors`.
27+
- Avoid graph breaks for `torch.compile` compatibility — do not insert NumPy operations in forward implementations and any other patterns that can break `torch.compile` compatibility with `fullgraph=True`.
28+
- See the **model-integration** skill for the attention pattern, pipeline rules, test setup instructions, and other important details.
2929

30-
```python
31-
# transformer_mymodel.py
30+
## Skills
3231

33-
class MyModelAttnProcessor:
34-
_attention_backend = None
35-
_parallel_config = None
36-
37-
def __call__(self, attn, hidden_states, attention_mask=None, ...):
38-
query = attn.to_q(hidden_states)
39-
key = attn.to_k(hidden_states)
40-
value = attn.to_v(hidden_states)
41-
# reshape, apply rope, etc.
42-
hidden_states = dispatch_attention_fn(
43-
query, key, value,
44-
attn_mask=attention_mask,
45-
backend=self._attention_backend,
46-
parallel_config=self._parallel_config,
47-
)
48-
hidden_states = hidden_states.flatten(2, 3)
49-
return attn.to_out[0](hidden_states)
50-
51-
52-
class MyModelAttention(nn.Module, AttentionModuleMixin):
53-
_default_processor_cls = MyModelAttnProcessor
54-
_available_processors = [MyModelAttnProcessor]
55-
56-
def __init__(self, query_dim, heads=8, dim_head=64, ...):
57-
super().__init__()
58-
self.to_q = nn.Linear(query_dim, heads * dim_head, bias=False)
59-
self.to_k = nn.Linear(query_dim, heads * dim_head, bias=False)
60-
self.to_v = nn.Linear(query_dim, heads * dim_head, bias=False)
61-
self.to_out = nn.ModuleList([nn.Linear(heads * dim_head, query_dim), nn.Dropout(0.0)])
62-
self.set_processor(MyModelAttnProcessor())
63-
64-
def forward(self, hidden_states, attention_mask=None, **kwargs):
65-
return self.processor(self, hidden_states, attention_mask, **kwargs)
66-
```
67-
68-
Consult the implementations in `src/diffusers/models/transformers/` if you need further references.
69-
70-
### Pipeline
71-
- All pipelines must inherit from `DiffusionPipeline`. Consult implementations in `src/diffusers/pipelines` in case you need references.
72-
- DO NOT use an existing pipeline class (e.g., `FluxPipeline`) to override another pipeline (e.g., `FluxImg2ImgPipeline` which will be a part of the core codebase (`src`).
73-
74-
75-
### Tests
76-
- Slow tests gated with `@slow` and `RUN_SLOW=1`
77-
- All model-level tests must use the `BaseModelTesterConfig`, `ModelTesterMixin`, `MemoryTesterMixin`, `AttentionTesterMixin`, `LoraTesterMixin`, and `TrainingTesterMixin` classes initially to write the tests. Any additional tests should be added after discussions with the maintainers. Use `tests/models/transformers/test_models_transformer_flux.py` as a reference.
32+
Task-specific guides live in `.ai/skills/` and are loaded on demand by AI agents.
33+
Available skills: **model-integration** (adding/converting pipelines), **parity-testing** (debugging numerical parity).
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
---
2+
name: integrating-models
3+
description: >
4+
Use when adding a new model or pipeline to diffusers, setting up file
5+
structure for a new model, converting a pipeline to modular format, or
6+
converting weights for a new version of an already-supported model.
7+
---
8+
9+
## Goal
10+
11+
Integrate a new model into diffusers end-to-end. The overall flow:
12+
13+
1. **Gather info** — ask the user for the reference repo, setup guide, a runnable inference script, and other objectives such as standard vs modular.
14+
2. **Confirm the plan** — once you have everything, tell the user exactly what you'll do: e.g. "I'll integrate model X with pipeline Y into diffusers based on your script. I'll run parity tests (model-level and pipeline-level) using the `parity-testing` skill to verify numerical correctness against the reference."
15+
3. **Implement** — write the diffusers code (model, pipeline, scheduler if needed), convert weights, register in `__init__.py`.
16+
4. **Parity test** — use the `parity-testing` skill to verify component and e2e parity against the reference implementation.
17+
5. **Deliver a unit test** — provide a self-contained test script that runs the diffusers implementation, checks numerical output (np allclose), and saves an image/video for visual verification. This is what the user runs to confirm everything works.
18+
19+
Work one workflow at a time — get it to full parity before moving on.
20+
21+
## Setup — gather before starting
22+
23+
Before writing any code, gather info in this order:
24+
25+
1. **Reference repo** — ask for the github link. If they've already set it up locally, ask for the path. Otherwise, ask what setup steps are needed (install deps, download checkpoints, set env vars, etc.) and run through them before proceeding.
26+
2. **Inference script** — ask for a runnable end-to-end script for a basic workflow first (e.g. T2V). Then ask what other workflows they want to support (I2V, V2V, etc.) and agree on the full implementation order together.
27+
3. **Standard vs modular** — standard pipelines, modular, or both?
28+
29+
Use `AskUserQuestion` with structured choices for step 3 when the options are known.
30+
31+
## Standard Pipeline Integration
32+
33+
### File structure for a new model
34+
35+
```
36+
src/diffusers/
37+
models/transformers/transformer_<model>.py # The core model
38+
schedulers/scheduling_<model>.py # If model needs a custom scheduler
39+
pipelines/<model>/
40+
__init__.py
41+
pipeline_<model>.py # Main pipeline
42+
pipeline_<model>_<variant>.py # Variant pipelines (e.g. pyramid, distilled)
43+
pipeline_output.py # Output dataclass
44+
loaders/lora_pipeline.py # LoRA mixin (add to existing file)
45+
46+
tests/
47+
models/transformers/test_models_transformer_<model>.py
48+
pipelines/<model>/test_<model>.py
49+
lora/test_lora_layers_<model>.py
50+
51+
docs/source/en/api/
52+
pipelines/<model>.md
53+
models/<model>_transformer3d.md # or appropriate name
54+
```
55+
56+
### Integration checklist
57+
58+
- [ ] Implement transformer model with `from_pretrained` support
59+
- [ ] Implement or reuse scheduler
60+
- [ ] Implement pipeline(s) with `__call__` method
61+
- [ ] Add LoRA support if applicable
62+
- [ ] Register all classes in `__init__.py` files (lazy imports)
63+
- [ ] Write unit tests (model, pipeline, LoRA)
64+
- [ ] Write docs
65+
- [ ] Run `make style` and `make quality`
66+
- [ ] Test parity with reference implementation (see `parity-testing` skill)
67+
68+
### Attention pattern
69+
70+
Attention must follow the diffusers pattern: both the `Attention` class and its processor are defined in the model file. The processor's `__call__` handles the actual compute and must use `dispatch_attention_fn` rather than calling `F.scaled_dot_product_attention` directly. The attention class inherits `AttentionModuleMixin` and declares `_default_processor_cls` and `_available_processors`.
71+
72+
```python
73+
# transformer_mymodel.py
74+
75+
class MyModelAttnProcessor:
76+
_attention_backend = None
77+
_parallel_config = None
78+
79+
def __call__(self, attn, hidden_states, attention_mask=None, ...):
80+
query = attn.to_q(hidden_states)
81+
key = attn.to_k(hidden_states)
82+
value = attn.to_v(hidden_states)
83+
# reshape, apply rope, etc.
84+
hidden_states = dispatch_attention_fn(
85+
query, key, value,
86+
attn_mask=attention_mask,
87+
backend=self._attention_backend,
88+
parallel_config=self._parallel_config,
89+
)
90+
hidden_states = hidden_states.flatten(2, 3)
91+
return attn.to_out[0](hidden_states)
92+
93+
94+
class MyModelAttention(nn.Module, AttentionModuleMixin):
95+
_default_processor_cls = MyModelAttnProcessor
96+
_available_processors = [MyModelAttnProcessor]
97+
98+
def __init__(self, query_dim, heads=8, dim_head=64, ...):
99+
super().__init__()
100+
self.to_q = nn.Linear(query_dim, heads * dim_head, bias=False)
101+
self.to_k = nn.Linear(query_dim, heads * dim_head, bias=False)
102+
self.to_v = nn.Linear(query_dim, heads * dim_head, bias=False)
103+
self.to_out = nn.ModuleList([nn.Linear(heads * dim_head, query_dim), nn.Dropout(0.0)])
104+
self.set_processor(MyModelAttnProcessor())
105+
106+
def forward(self, hidden_states, attention_mask=None, **kwargs):
107+
return self.processor(self, hidden_states, attention_mask, **kwargs)
108+
```
109+
110+
Consult the implementations in `src/diffusers/models/transformers/` if you need further references.
111+
112+
### Implementation rules
113+
114+
1. **Don't combine structural changes with behavioral changes.** Restructuring code to fit diffusers APIs (ModelMixin, ConfigMixin, etc.) is unavoidable. But don't also "improve" the algorithm, refactor computation order, or rename internal variables for aesthetics. Keep numerical logic as close to the reference as possible, even if it looks unclean. For standard → modular, this is stricter: copy loop logic verbatim and only restructure into blocks. Clean up in a separate commit after parity is confirmed.
115+
2. **Pipelines must inherit from `DiffusionPipeline`.** Consult implementations in `src/diffusers/pipelines` in case you need references.
116+
3. **Don't subclass an existing pipeline for a variant.** DO NOT use an existing pipeline class (e.g., `FluxPipeline`) to override another pipeline (e.g., `FluxImg2ImgPipeline`) which will be a part of the core codebase (`src`).
117+
118+
### Test setup
119+
120+
- Slow tests gated with `@slow` and `RUN_SLOW=1`
121+
- All model-level tests must use the `BaseModelTesterConfig`, `ModelTesterMixin`, `MemoryTesterMixin`, `AttentionTesterMixin`, `LoraTesterMixin`, and `TrainingTesterMixin` classes initially to write the tests. Any additional tests should be added after discussions with the maintainers. Use `tests/models/transformers/test_models_transformer_flux.py` as a reference.
122+
123+
### Common diffusers conventions
124+
125+
- Pipelines inherit from `DiffusionPipeline`
126+
- Models use `ModelMixin` with `register_to_config` for config serialization
127+
- Schedulers use `SchedulerMixin` with `ConfigMixin`
128+
- Use `@torch.no_grad()` on pipeline `__call__`
129+
- Support `output_type="latent"` for skipping VAE decode
130+
- Support `generator` parameter for reproducibility
131+
- Use `self.progress_bar(timesteps)` for progress tracking
132+
133+
## Gotchas
134+
135+
1. **Forgetting `__init__.py` lazy imports.** Every new class must be registered in the appropriate `__init__.py` with lazy imports. Missing this causes `ImportError` that only shows up when users try `from diffusers import YourNewClass`.
136+
137+
2. **Using `einops` or other non-PyTorch deps.** Reference implementations often use `einops.rearrange`. Always rewrite with native PyTorch (`reshape`, `permute`, `unflatten`). Don't add the dependency. If a dependency is truly unavoidable, guard its import: `if is_my_dependency_available(): import my_dependency`.
138+
139+
3. **Missing `make fix-copies` after `# Copied from`.** If you add `# Copied from` annotations, you must run `make fix-copies` to propagate them. CI will fail otherwise.
140+
141+
4. **Wrong `_supports_cache_class` / `_no_split_modules`.** These class attributes control KV cache and device placement. Copy from a similar model and verify -- wrong values cause silent correctness bugs or OOM errors.
142+
143+
5. **Missing `@torch.no_grad()` on pipeline `__call__`.** Forgetting this causes GPU OOM from gradient accumulation during inference.
144+
145+
6. **Config serialization gaps.** Every `__init__` parameter in a `ModelMixin` subclass must be captured by `register_to_config`. If you add a new param but forget to register it, `from_pretrained` will silently use the default instead of the saved value.
146+
147+
7. **Forgetting to update `_import_structure` and `_lazy_modules`.** The top-level `src/diffusers/__init__.py` has both -- missing either one causes partial import failures.
148+
149+
8. **Hardcoded dtype in model forward.** Don't hardcode `torch.float32` or `torch.bfloat16` in the model's forward pass. Use the dtype of the input tensors or `self.dtype` so the model works with any precision.
150+
151+
---
152+
153+
## Modular Pipeline Conversion
154+
155+
See [modular-conversion.md](modular-conversion.md) for the full guide on converting standard pipelines to modular format, including block types, build order, guider abstraction, and conversion checklist.
156+
157+
---
158+
159+
## Weight Conversion Tips
160+
161+
<!-- TODO: Add concrete examples as we encounter them. Common patterns to watch for:
162+
- Fused QKV weights that need splitting into separate Q, K, V
163+
- Scale/shift ordering differences (reference stores [shift, scale], diffusers expects [scale, shift])
164+
- Weight transpositions (linear stored as transposed conv, or vice versa)
165+
- Interleaved head dimensions that need reshaping
166+
- Bias terms absorbed into different layers
167+
Add each with a before/after code snippet showing the conversion. -->
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Modular Pipeline Conversion Reference
2+
3+
## When to use
4+
5+
Modular pipelines break a monolithic `__call__` into composable blocks. Convert when:
6+
- The model supports multiple workflows (T2V, I2V, V2V, etc.)
7+
- Users need to swap guidance strategies (CFG, CFG-Zero*, PAG)
8+
- You want to share blocks across pipeline variants
9+
10+
## File structure
11+
12+
```
13+
src/diffusers/modular_pipelines/<model>/
14+
__init__.py # Lazy imports
15+
modular_pipeline.py # Pipeline class (tiny, mostly config)
16+
encoders.py # Text encoder + image/video VAE encoder blocks
17+
before_denoise.py # Pre-denoise setup blocks
18+
denoise.py # The denoising loop blocks
19+
decoders.py # VAE decode block
20+
modular_blocks_<model>.py # Block assembly (AutoBlocks)
21+
```
22+
23+
## Block types decision tree
24+
25+
```
26+
Is this a single operation?
27+
YES -> ModularPipelineBlocks (leaf block)
28+
29+
Does it run multiple blocks in sequence?
30+
YES -> SequentialPipelineBlocks
31+
Does it iterate (e.g. chunk loop)?
32+
YES -> LoopSequentialPipelineBlocks
33+
34+
Does it choose ONE block based on which input is present?
35+
Is the selection 1:1 with trigger inputs?
36+
YES -> AutoPipelineBlocks (simple trigger mapping)
37+
NO -> ConditionalPipelineBlocks (custom select_block method)
38+
```
39+
40+
## Build order (easiest first)
41+
42+
1. `decoders.py` -- Takes latents, runs VAE decode, returns images/videos
43+
2. `encoders.py` -- Takes prompt, returns prompt_embeds. Add image/video VAE encoder if needed
44+
3. `before_denoise.py` -- Timesteps, latent prep, noise setup. Each logical operation = one block
45+
4. `denoise.py` -- The hardest. Convert guidance to guider abstraction
46+
47+
## Key pattern: Guider abstraction
48+
49+
Original pipeline has guidance baked in:
50+
```python
51+
for i, t in enumerate(timesteps):
52+
noise_pred = self.transformer(latents, prompt_embeds, ...)
53+
if self.do_classifier_free_guidance:
54+
noise_uncond = self.transformer(latents, negative_prompt_embeds, ...)
55+
noise_pred = noise_uncond + scale * (noise_pred - noise_uncond)
56+
latents = self.scheduler.step(noise_pred, t, latents).prev_sample
57+
```
58+
59+
Modular pipeline separates concerns:
60+
```python
61+
guider_inputs = {
62+
"encoder_hidden_states": (prompt_embeds, negative_prompt_embeds),
63+
}
64+
65+
for i, t in enumerate(timesteps):
66+
components.guider.set_state(step=i, num_inference_steps=num_steps, timestep=t)
67+
guider_state = components.guider.prepare_inputs(guider_inputs)
68+
69+
for batch in guider_state:
70+
components.guider.prepare_models(components.transformer)
71+
cond_kwargs = {k: getattr(batch, k) for k in guider_inputs}
72+
context_name = getattr(batch, components.guider._identifier_key)
73+
with components.transformer.cache_context(context_name):
74+
batch.noise_pred = components.transformer(
75+
hidden_states=latents, timestep=timestep,
76+
return_dict=False, **cond_kwargs, **shared_kwargs,
77+
)[0]
78+
components.guider.cleanup_models(components.transformer)
79+
80+
noise_pred = components.guider(guider_state)[0]
81+
latents = components.scheduler.step(noise_pred, t, latents, generator=generator)[0]
82+
```
83+
84+
## Key pattern: Chunk loops for video models
85+
86+
Use `LoopSequentialPipelineBlocks` for outer loop:
87+
```python
88+
class ChunkDenoiseStep(LoopSequentialPipelineBlocks):
89+
block_classes = [PrepareChunkStep, NoiseGenStep, DenoiseInnerStep, UpdateStep]
90+
```
91+
92+
Note: blocks inside `LoopSequentialPipelineBlocks` receive `(components, block_state, k)` where `k` is the loop iteration index.
93+
94+
## Key pattern: Workflow selection
95+
96+
```python
97+
class AutoDenoise(ConditionalPipelineBlocks):
98+
block_classes = [V2VDenoiseStep, I2VDenoiseStep, T2VDenoiseStep]
99+
block_trigger_inputs = ["video_latents", "image_latents"]
100+
default_block_name = "text2video"
101+
```
102+
103+
## Standard InputParam/OutputParam templates
104+
105+
```python
106+
# Inputs
107+
InputParam.template("prompt") # str, required
108+
InputParam.template("negative_prompt") # str, optional
109+
InputParam.template("image") # PIL.Image, optional
110+
InputParam.template("generator") # torch.Generator, optional
111+
InputParam.template("num_inference_steps") # int, default=50
112+
InputParam.template("latents") # torch.Tensor, optional
113+
114+
# Outputs
115+
OutputParam.template("prompt_embeds")
116+
OutputParam.template("negative_prompt_embeds")
117+
OutputParam.template("image_latents")
118+
OutputParam.template("latents")
119+
OutputParam.template("videos")
120+
OutputParam.template("images")
121+
```
122+
123+
## ComponentSpec patterns
124+
125+
```python
126+
# Heavy models - loaded from pretrained
127+
ComponentSpec("transformer", YourTransformerModel)
128+
ComponentSpec("vae", AutoencoderKL)
129+
130+
# Lightweight objects - created inline from config
131+
ComponentSpec(
132+
"guider",
133+
ClassifierFreeGuidance,
134+
config=FrozenDict({"guidance_scale": 7.5}),
135+
default_creation_method="from_config"
136+
)
137+
```
138+
139+
## Conversion checklist
140+
141+
- [ ] Read original pipeline's `__call__` end-to-end, map stages
142+
- [ ] Write test scripts (reference + target) with identical seeds
143+
- [ ] Create file structure under `modular_pipelines/<model>/`
144+
- [ ] Write decoder block (simplest)
145+
- [ ] Write encoder blocks (text, image, video)
146+
- [ ] Write before_denoise blocks (timesteps, latent prep, noise)
147+
- [ ] Write denoise block with guider abstraction (hardest)
148+
- [ ] Create pipeline class with `default_blocks_name`
149+
- [ ] Assemble blocks in `modular_blocks_<model>.py`
150+
- [ ] Wire up `__init__.py` with lazy imports
151+
- [ ] Run `make style` and `make quality`
152+
- [ ] Test all workflows for parity with reference

0 commit comments

Comments
 (0)