Uninitialized memory read in `depth_init_MathAPI` from pair_pos buffer off-by-one

# Uninitialized memory read in `depth_init_MathAPI` from pair_pos buffer off-by-one

## Environment

- cuJSON commit: `2ac7d3dcd7ad1ff64ebdb14022bf94c59b3b4953` (branch `master`)
- OS: Ubuntu 22.04.5 LTS
- GPU: NVIDIA A40 (Ampere, `sm_86`)
- NVIDIA driver: 590.48.01
- CUDA toolkit: 13.1 (`nvcc` V13.1.115)

## Component

cuJSON — Standard JSON parser, `parse_standard_json.cu` (`parse_standard_json()` → `Parser()` → `depth_init_MathAPI()`, consumed at `parse_standard_json.cu:1393`)

## Severity

High (CWE-908, Use of Uninitialized Resource) — silent data corruption on every parse, with potential out-of-bounds access.

## Description

In `parse_standard_json`, the result buffer `res_buff` is partitioned into `structural` and `pair_pos` arrays. The `pair_pos` pointer is set to `res_buff + result_size + 1`, but the `cudaMemcpy` that populates pair-position data writes starting at `res_buff + result_size + 2`. This off-by-one gap leaves `pair_pos[0]` (i.e. `res_buff[result_size + 1]`) as uninitialized pinned host memory. The uninitialized value propagates into the `depth_init_MathAPI` kernel (line 1393) and into the downstream CUB radix sort, corrupting the JSON parse tree for every input.

The bug is deterministic and input-independent: any valid JSON that produces at least one structural element triggers it (confirmed with `{"key":123}` and `[1,2,3]`).

## Root cause

The function allocates one pinned buffer and partitions it into two contiguous arrays:

```
res_buff:
  [0 .. result_size-1]              -> structural (result_size int32_t)
  [result_size+1 .. 2*result_size]  -> pair_pos   (result_size int32_t)
```

```cpp
parsed_tree.structural = res_buff;                  // index 0
parsed_tree.pair_pos   = res_buff + result_size + 1; // index result_size+1
```

The second `cudaMemcpy` writes to `res_buff + result_size + 2`, not `+ 1`, leaving a 1-element gap:

```cpp
cudaMallocHost(&res_buff, (2 * result_size + 2) * sizeof(int32_t));

cudaMemcpy(res_buff, dev_structural, result_size * sizeof(int32_t), cudaMemcpyDeviceToHost);

// note +2, not +1
cudaMemcpy(res_buff + result_size + 2, dev_pair_pos, result_size * sizeof(int32_t), cudaMemcpyDeviceToHost);

parsed_tree.pair_pos = res_buff + result_size + 1;  // index result_size+1 is NEVER written
```

So `pair_pos[0]` reads uninitialized pinned host memory from `cudaMallocHost`. `depth_init_MathAPI` then uses the pair positions to compute tree depth, operating on garbage; the corrupted pair data also flows into CUB's radix sort (`block_load.cuh:130`), confirming propagation through multiple kernels.

## Impact

| Category | Assessment |
|---|---|
| Data Corruption | High — uninitialized `pair_pos[0]` feeds `depth_init_MathAPI` and CUB radix sort, silently corrupting parse-tree depth/pair-matching for every document. |
| Information Leak | Medium — uninitialized pinned host memory may hold residual data; if `pair_pos` values are exposed via API/serialized output, stale contents could leak. |
| Denial of Service | Medium — a garbage `pair_pos[0]` used as an index in downstream kernels could cause an out-of-bounds access, GPU fault, or hang. |

## Reproduction

```bash
CUDA_HOME=${CUDA_HOME:-/usr/local/cuda}
nvcc -std=c++17 -arch=native -O2 -g -lineinfo \
  -I<cujson-src> -I${CUDA_HOME}/include \
  llm_harness_parse_standard_json.cu harness_afl_main.cpp \
  --compiler-bindir g++-13 -L${CUDA_HOME}/lib64 -lcudart \
  -o /tmp/cujson_parse_standard_json_initcheck.bin

compute-sanitizer --tool=initcheck --report-api-errors=no --error-exitcode 99 \
  /tmp/cujson_parse_standard_json_initcheck.bin trigger_input.json
```

Input is any valid JSON with structural tokens (e.g. `{"key":123}`).

## Sanitizer evidence

```
========= Uninitialized __global__ memory read of size 4 bytes
=========   at depth_init_MathAPI(unsigned int *, unsigned int *, int, int)+0xa0 in parse_standard_json.cu:1393
=========   by thread (0,0,0) in block (0,0,0)
=========   Host Frame: Parser(...) 
=========   Host Frame: parse_standard_json(cuJSONInput)
=========   Host Frame: main
=========
========= Uninitialized __global__ memory read of size 1 bytes
=========   at cub::...::LoadDirectBlocked<...>+0x380 in block_load.cuh:130
=========   by thread (0,0,0) in block (0,0,0)
=========   Device Frame: cub::...::DeviceRadixSortSingleTileKernel<...>+0xb0 in radix_sort.cuh:401
```

## Suggested fix

The root cause is the off-by-one mismatch between the `pair_pos` pointer (`res_buff + result_size + 1`) and the second `cudaMemcpy` destination (`res_buff + result_size + 2`). Any of the following resolves it.

Option A — align the pointer to the copy destination:

```diff
- parsed_tree.pair_pos = res_buff + result_size + 1;
+ parsed_tree.pair_pos = res_buff + result_size + 2;
```

Option B — align the copy destination to the pointer:

```diff
- cudaMemcpy(res_buff + result_size + 2, dev_pair_pos, result_size * sizeof(int32_t), ...);
+ cudaMemcpy(res_buff + result_size + 1, dev_pair_pos, result_size * sizeof(int32_t), ...);
```

Option C (preferred long term) — remove the unused gap and tighten the allocation:

```diff
- cudaMallocHost(&res_buff, (2 * result_size + 2) * sizeof(int32_t));
+ cudaMallocHost(&res_buff, (2 * result_size) * sizeof(int32_t));
- parsed_tree.pair_pos = res_buff + result_size + 1;
+ parsed_tree.pair_pos = res_buff + result_size;
- cudaMemcpy(res_buff + result_size + 2, dev_pair_pos, ...);
+ cudaMemcpy(res_buff + result_size, dev_pair_pos, result_size * sizeof(int32_t), ...);
```

Defensive hardening: zero-initialize the buffer after allocation (`memset(res_buff, 0, total_size * sizeof(int32_t));`) so any residual gap cannot leak.

## Attachments

The following files from this finding are attached:

- `llm_harness_parse_standard_json.cu`
- `harness_afl_main.cpp`
- `trigger_input.json`
- `initcheck.log`

[Archive.zip](https://github.com/user-attachments/files/28676191/Archive.zip)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uninitialized memory read in `depth_init_MathAPI` from pair_pos buffer off-by-one #3

Uninitialized memory read in `depth_init_MathAPI` from pair_pos buffer off-by-one

Environment

Component

Severity

Description

Root cause

Impact

Reproduction

Sanitizer evidence

Suggested fix

Attachments

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Category	Assessment
Data Corruption	High — uninitialized `pair_pos[0]` feeds `depth_init_MathAPI` and CUB radix sort, silently corrupting parse-tree depth/pair-matching for every document.
Information Leak	Medium — uninitialized pinned host memory may hold residual data; if `pair_pos` values are exposed via API/serialized output, stale contents could leak.
Denial of Service	Medium — a garbage `pair_pos[0]` used as an index in downstream kernels could cause an out-of-bounds access, GPU fault, or hang.

Uninitialized memory read in depth_init_MathAPI from pair_pos buffer off-by-one #3

Description

Uninitialized memory read in depth_init_MathAPI from pair_pos buffer off-by-one

Environment

Component

Severity

Description

Root cause

Impact

Reproduction

Sanitizer evidence

Suggested fix

Attachments

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Uninitialized memory read in `depth_init_MathAPI` from pair_pos buffer off-by-one #3

Uninitialized memory read in `depth_init_MathAPI` from pair_pos buffer off-by-one