Skip to content

fix - make prepare_cg_spec_decode_kernel easy use and understand#954

Open
zerozw wants to merge 1 commit intomainfrom
fix/fix_trtllm
Open

fix - make prepare_cg_spec_decode_kernel easy use and understand#954
zerozw wants to merge 1 commit intomainfrom
fix/fix_trtllm

Conversation

@zerozw
Copy link
Copy Markdown
Collaborator

@zerozw zerozw commented Apr 29, 2026

No description provided.

Copilot AI review requested due to automatic review settings April 29, 2026 09:52
@zerozw zerozw requested a review from LLLLKKKK as a code owner April 29, 2026 09:52
@LLLLKKKK
Copy link
Copy Markdown
Collaborator

AI Code Review - PR #954

Status: LGTM

Summary: P0/0 · P1/0 · P2/0 · P3/1

lgtm ready to ci

Non-blocking Suggestions

P3

  • 测试名 padding_rows_stay_zero 与实际语义略有出入 @ rtp_llm/models_py/modules/factory/attention/cuda_impl/test/trtllm_gen_test.py:412
    • 建议:重命名为 test_mode1_padding_rows_become_zero,或在断言前补充注释说明 padding 行被显式写入 0(prefix+q_len=0+0)。
Review Checklist: 9 pass / 0 fail

General Principles Checklist

Passed

  • [PASS] 6.1 Software Engineering: 改动仅围绕单一职责 (spec-decode kernel 的 q_len 加载语义),未夹带无关重构
  • [PASS] 6.1 Architecture: 调用方契约保持兼容,边界处行为变化已显式覆盖
  • [PASS] 6.1 Tests: 新逻辑有针对性单元测试覆盖关键边界 (padding 行)
  • [PASS] 6.1 Quality: 逻辑改动未与无关 formatting 混合,docstring 与代码语义同步更新

RTP-LLM Checklist

Passed

  • [PASS] B Correctness And Logic: kernel 输入加载需正确处理 batch 维度与 padding/掩码
  • [PASS] D Performance: 性能热路径 (CUDA graph prepare) 的改动是否合理
  • [PASS] H Tests And CI: 修改 kernel 行为后,既有测试与新增测试均跟随更新

Python Static-First Checklist

Passed

  • [PASS] P.A Static Correctness: 无未使用 import / 未定义变量 / 类型不一致
  • [PASS] P.E Tests: 测试用例独立、可重复、不依赖外部状态

Strengths

  • 精准定位 CUDA-graph padding 行 q_len 被广播为首请求 q_len 导致 seq_lens_out 错误的根因,并通过逐行 tl.load + mask + other=0 修正,改动最小化。
  • docstring 同步更新,清楚说明 valid spec 请求共享 q_len 但 padding 行需要独立读取,why 写得清晰,避免后续误回滚。
  • 新增 test_mode1_padding_rows_stay_zero 直接覆盖 padding 行场景,且同步把 mode1 既有用例的 q_len_tensor 从 1 元素改成 N 元素,与新 kernel 的逐行加载契约保持一致,避免越界 mask-load 误读。

@LLLLKKKK LLLLKKKK enabled auto-merge (rebase) April 29, 2026 10:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@wht21
Copy link
Copy Markdown
Collaborator

wht21 commented Apr 30, 2026

internal source has been updated, please review the changes!

1 similar comment
@wht21
Copy link
Copy Markdown
Collaborator

wht21 commented Apr 30, 2026

internal source has been updated, please review the changes!

@LLLLKKKK
Copy link
Copy Markdown
Collaborator

AI Code Review - PR #954

Status: LGTM

Summary: P0/0 · P1/0 · P2/0 · P3/0

lgtm ready to ci

Review Checklist: 9 pass / 0 fail

General Principles Checklist

Passed

  • [PASS] 6.1 Software Engineering: 修改集中、不夹杂无关重构,docstring 与行为同步更新
  • [PASS] 6.1 Architecture: 状态/不变量在 padding 路径上保持有效
  • [PASS] 6.1 Tests: 为修复行为新增针对性回归用例并覆盖边界
  • [PASS] 6.1 Quality: 无 hot-path 调试日志、无 mega-PR、提交聚焦

RTP-LLM Checklist

Passed

  • [PASS] B Correctness & Logic (kernel/cache): Triton kernel 加载/存储的 mask 与 ptr 偏移正确
  • [PASS] D Performance & CUDA Graph: CUDA-graph 准备路径行为与捕获/重放一致
  • [PASS] H Tests & CI: 新增/修改测试可在现有 PrepareCudaGraphKernelTest 框架下运行

Python Static-First Checklist

Passed

  • [PASS] P.A Static-first checks: 类型与张量构造正确、无未使用变量
  • [PASS] P.B Correctness in Python tests: 断言覆盖修复目标且不依赖外部状态

Strengths

  • 针对 CUDA-graph spec-decode 准备 kernel 的真实缺陷做了小而准的修复:之前 tl.load(q_len_ptr) 只读 q_len[0] 并广播到所有行,导致 padding 行(prefix=0)被错误地写成 q_len[0]。改为按 offsets_n 带 mask 读取后,padding 行可以保持为 0。
  • docstring 同步说明了为什么需要 per-row 读取(CUDA-graph padding 行单独清零),让后续读者可以理解约束而不是把它当成多余泛化。
  • 新增 test_mode1_padding_rows_stay_zero 用 prefix=[100,200,0,0] / q_len=[5,5,0,0] 直接锁定回归路径;同时把 test_mode1_base / test_mode1_large_batch 的 q_len 由单元素改为 torch.full((N,), ...),与新的 kernel 读取语义保持一致。
  • 调用方 FlashInferTRTLLMSpecDecodeImpl.prepare_cuda_graph 传入的是 attn_inputs.input_lengths_d(按 batch 长度),N 个 mask 内读取均在界内,无新增越界风险。

Copilot AI review requested due to automatic review settings May 6, 2026 02:09
@zerozw zerozw force-pushed the fix/fix_trtllm branch from d3e5d05 to 6830966 Compare May 6, 2026 02:09
@LLLLKKKK
Copy link
Copy Markdown
Collaborator

LLLLKKKK commented May 6, 2026

AI Code Review - PR #954

Status: LGTM

Summary: P0/0 · P1/0 · P2/0 · P3/0

lgtm ready to ci

Checklist ✅ (56 items passed)

Strengths

  • 按行读取 q_len 修复了 CUDA Graph spec-decode 准备阶段 padding 行不正确继承首请求 q_len 的语义 bug,prefix/q_len 同步加 other=0 与 mask 对齐
  • 新增 test_mode1_padding_rows_stay_zero 直接覆盖 padding 行保持 0 的回归路径,断言精确([105,205,0,0])
  • 原有 test_mode1_base / test_mode1_large_batch 同步改为 torch.full((N,), q_len),与新内核契约一致,避免遗留 1 元素 tensor 误用
  • kernel docstring 明确说明 padding 行单独清零的动机,便于后续维护者理解为何不能再读 q_len_ptr[0]

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +164 to +165
q_len = tl.load(q_len_ptr + offsets_n, mask=mask_n, other=0)
prefix = tl.load(prefix_ptr + offsets_n, mask=mask_n, other=0)
mask_n = offsets_n < N
q_len = tl.load(q_len_ptr)
prefix = tl.load(prefix_ptr + offsets_n, mask=mask_n)
q_len = tl.load(q_len_ptr + offsets_n, mask=mask_n, other=0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants