fix(rocm): apply RoPE for embedding models without KV cache by siluzhou · Pull Request #973 · alibaba/rtp-llm

siluzhou · 2026-05-07T11:17:20Z

When kv_cache is None (embedding models), AiterPrefillImplAsm and AiterPrefillImplNonAsm skipped RoPE and went straight to _forward_varlen. This caused severe output divergence for models that require positional encoding (e.g. tbstars with rope_theta=10000).

Also update _forward_varlen to handle the (q, k_padded, v_padded) tuple returned by the RoPE C++ op via unpad_kv_vectorized.

LLLLKKKK · 2026-05-07T11:29:59Z

AI Code Review - PR #973

Status: LGTM

Summary: P0/0 · P1/0 · P2/1 · P3/0

lgtm ready to ci

Non-blocking Suggestions

P2

缺少 no-KV RoPE 路径的回归测试 @ rtp_llm/models_py/modules/factory/attention/rocm_impl/aiter.py:916
- 建议：补一个 ROCm 单测或 smoke，构造 need_rope_kv_cache=True、kv_cache=None 的 AiterPrefillImplAsm/NonAsm 路径，并校验输出与带 RoPE 的参考实现一致。

Checklist Violations (2 fail / 97 total)

General Principles Checklist

[6.1] Tests — 新逻辑有聚焦单测 + 相关集成/smoke 测试 → issue 缺少 no-KV RoPE 路径的回归测试
新增 no-KV RoPE wrapper 路径和 tuple varlen 输入处理，但 diff 未新增覆盖 kv_cache=None + need_rope_kv_cache=True 的测试。

RTP-LLM Checklist

[H] 测试与 CI — 测试覆盖充分：大重构等价覆盖，新功能端到端测试 → issue 缺少 no-KV RoPE 路径的回归测试
新增 no-KV RoPE 修复没有随 diff 增加对应 ROCm 单测或 smoke 覆盖。

Strengths

修复点同时覆盖 ASM 与 NonAsm 的 prefill no-KV 分支，避免两个 ROCm 后端行为分叉。
_forward_varlen 复用现有 unpad_kv_vectorized，没有重新引入逐 batch Python unpad 循环。

When kv_cache is None (embedding models), AiterPrefillImplAsm and AiterPrefillImplNonAsm skipped RoPE and went straight to _forward_varlen. This caused severe output divergence for models that require positional encoding (e.g. tbstars with rope_theta=10000). Also update _forward_varlen to handle the (q, k_padded, v_padded) tuple returned by the RoPE C++ op via unpad_kv_vectorized. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

LLLLKKKK · 2026-05-07T14:07:20Z

AI Code Review - PR #973

Status: LGTM

Summary: P0/0 · P1/0 · P2/0 · P3/0

lgtm ready to ci

Checklist ✅ (81 items passed)

Strengths

ASM 与 NonASM 的 no-KV prefill 分支现在都会先执行 RoPE，再进入 FMHA，修复范围一致。
新增 wrapper 回归测试覆盖了 need_rope_kv_cache=True 且 kv_cache=None 时 RoPE 输出被传给 FMHA 的调用顺序。
_forward_varlen 复用已有 unpad_kv_vectorized，没有重新引入逐 batch Python unpad 循环。

wht21 · 2026-05-07T14:16:18Z

internal source has been updated, please review the changes!

wht21 · 2026-05-07T16:53:02Z

internal source has been updated, please review the changes!

siluzhou requested a review from LLLLKKKK as a code owner May 7, 2026 11:17

siluzhou force-pushed the develop/tbstars_225 branch from 91cd1f9 to 049caba Compare May 7, 2026 13:56

LLLLKKKK approved these changes May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rocm): apply RoPE for embedding models without KV cache#973

fix(rocm): apply RoPE for embedding models without KV cache#973
siluzhou wants to merge 1 commit intomainfrom
develop/tbstars_225

siluzhou commented May 7, 2026

Uh oh!

LLLLKKKK commented May 7, 2026

Uh oh!

LLLLKKKK commented May 7, 2026

Uh oh!

wht21 commented May 7, 2026

Uh oh!

wht21 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

siluzhou commented May 7, 2026

Uh oh!

LLLLKKKK commented May 7, 2026

AI Code Review - PR #973

Non-blocking Suggestions

P2

Checklist Violations (2 fail / 97 total)

Strengths

Uh oh!

LLLLKKKK commented May 7, 2026

AI Code Review - PR #973

Checklist ✅ (81 items passed)

Strengths

Uh oh!

wht21 commented May 7, 2026

Uh oh!

wht21 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants