Skip to content

Fix PS 1.4 instruction-size table and bound the bytecode walk#137

Open
BinqAdams wants to merge 1 commit intoNVIDIAGameWorks:mainfrom
BinqAdams:fix/ps14-instruction-size
Open

Fix PS 1.4 instruction-size table and bound the bytecode walk#137
BinqAdams wants to merge 1 commit intoNVIDIAGameWorks:mainfrom
BinqAdams:fix/ps14-instruction-size

Conversation

@BinqAdams
Copy link
Copy Markdown
Contributor

Summary

Fix a heap-corruption bug in CommonShader::getShaderByteSize triggered by any D3D9 application that creates Pixel Shader 1.4 shaders, and add a defense-in-depth bound on the bytecode walk.

Motivation

PS 1.4 reuses opcodes 0x40 and 0x42 with explicit dst+src token pairs (3 dwords each) where PS 1.0–1.3 had implicit src derived from the dst register index (2 dwords each):

Opcode Old table PS 1.0–1.3 PS 1.4
0x40 return 2 D3DSIO_TEXCOORD (2 dw) D3DSIO_TEXCRD (3 dw)
0x41 return 2 D3DSIO_TEXKILL (2 dw) D3DSIO_TEXKILL (2 dw)
0x42 return 2 D3DSIO_TEX (2 dw) D3DSIO_TEXLD (3 dw)

The static SM 1.x table in CommonShader::getShaderInstructionSize (bridge/src/client/d3d9_commonshader.h) was authored for PS 1.0–1.3 / VS 1.x and gives the wrong size for PS 1.4. The parser advances by 2 instead of 3, lands mid-instruction, treats a register-arg token as the next opcode, misaligns indefinitely, and walks past the actual D3DSIO_END token off the end of the bytecode buffer. Without PageHeap the over-read silently corrupts the heap freelist and surfaces as a delayed RtlpAllocateHeap AV; with PageHeap (gflags /full) it AVs immediately on the guard page.

What Changed

Two changes in bridge/src/client/d3d9_commonshader.h:

  1. PS 1.4 special case in getShaderInstructionSize. For m_majorVersion == 1 && m_minorVersion == 4, opcodes 0x40 and 0x42 return 3 instead of 2. Opcode 0x41 (TEXKILL) is unchanged at 2 dwords across PS versions; 0x430x4F are deprecated/reserved in PS 1.4 and shouldn't appear. Other opcodes fall through to the existing PS 1.0–1.3 / VS table.
  2. Defense-in-depth bound in getShaderByteSize. 16K-dword (64 KB) cap on the walk. Real D3D9 shaders are at most a few KB; if the walk exceeds the cap without finding D3DSIO_END the bytecode is malformed or the instruction-size table doesn't recognize a token in this shader variant. Logs via bridge_util::Logger::err and returns the cap-bounded size so the caller's memcpy stays bounded — the server will reject the shader, but the client survives instead of walking off the end of the heap.

Net diff: bridge/src/client/d3d9_commonshader.h only, +59 / -3.

Testing

Reproducer: any D3D9 application that creates a PS 1.4 shader containing D3DSIO_TEXCRD (0x40) or D3DSIO_TEXLD (0x42). With the old table, CreatePixelShader corrupts heap metadata adjacent to the bytecode buffer. Under PageHeap (gflags /full), the over-read AVs immediately on the guard page following the buffer; without PageHeap, the corruption surfaces delayed in RtlpAllocateHeap after thousands of unrelated allocations.

With this fix the PS 1.4 walk terminates correctly at D3DSIO_END and the buffer-bounded memcpy no longer over-reads. The 16K-dword cap activates on malformed bytecode and on shader variants the table doesn't yet handle: in those cases the log line names the failing function and the server-side compilation fails cleanly.

PS 1.4 reuses opcodes 0x40 and 0x42 with explicit dst+src token pairs
(3 dwords each) where PS 1.0-1.3 had implicit src derived from the dst
register index (2 dwords each):

  Opcode  Old table   PS 1.0-1.3              PS 1.4
  ------  ---------   ----------              ------
  0x40    return 2    D3DSIO_TEXCOORD (2 dw)  D3DSIO_TEXCRD (3 dw)
  0x41    return 2    D3DSIO_TEXKILL  (2 dw)  D3DSIO_TEXKILL (2 dw)
  0x42    return 2    D3DSIO_TEX      (2 dw)  D3DSIO_TEXLD  (3 dw)

The static SM 1.x table in CommonShader::getShaderInstructionSize was
authored for PS 1.0-1.3 / VS 1.x and gives the wrong size for PS 1.4.
The parser advances by 2 instead of 3, lands mid-instruction, treats a
register-arg token as the next opcode, misaligns indefinitely, and
walks past the actual D3DSIO_END token off the end of the bytecode
buffer. Without PageHeap the over-read silently corrupts the heap
freelist and surfaces as a delayed RtlpAllocateHeap AV; with PageHeap
it AVs immediately on the guard page.

Two changes in d3d9_commonshader.h:

1. PS 1.4 special case in getShaderInstructionSize. For
   m_majorVersion == 1 && m_minorVersion == 4, opcodes 0x40 and 0x42
   return 3 instead of 2. Opcode 0x41 (TEXKILL) is unchanged at 2
   dwords across PS versions; 0x43-0x4F are deprecated/reserved in
   PS 1.4 and shouldn't appear. Other opcodes fall through to the
   existing PS 1.0-1.3 / VS table.

2. Defense-in-depth bound in getShaderByteSize. 16K-dword (64 KB) cap
   on the walk. Real D3D9 shaders are at most a few KB; if the walk
   exceeds the cap without finding D3DSIO_END the bytecode is
   malformed or the instruction-size table doesn't recognize a token
   in this shader variant. Logs via bridge_util::Logger::err and
   returns the cap-bounded size so the caller's memcpy stays bounded
   - the server will reject the shader, but the client survives
   instead of walking off the end of the heap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant