Uninitialized memory read in depth_init_MathAPI from pair_pos buffer off-by-one
Environment
- cuJSON commit:
2ac7d3dcd7ad1ff64ebdb14022bf94c59b3b4953 (branch master)
- OS: Ubuntu 22.04.5 LTS
- GPU: NVIDIA A40 (Ampere,
sm_86)
- NVIDIA driver: 590.48.01
- CUDA toolkit: 13.1 (
nvcc V13.1.115)
Component
cuJSON — Standard JSON parser, parse_standard_json.cu (parse_standard_json() → Parser() → depth_init_MathAPI(), consumed at parse_standard_json.cu:1393)
Severity
High (CWE-908, Use of Uninitialized Resource) — silent data corruption on every parse, with potential out-of-bounds access.
Description
In parse_standard_json, the result buffer res_buff is partitioned into structural and pair_pos arrays. The pair_pos pointer is set to res_buff + result_size + 1, but the cudaMemcpy that populates pair-position data writes starting at res_buff + result_size + 2. This off-by-one gap leaves pair_pos[0] (i.e. res_buff[result_size + 1]) as uninitialized pinned host memory. The uninitialized value propagates into the depth_init_MathAPI kernel (line 1393) and into the downstream CUB radix sort, corrupting the JSON parse tree for every input.
The bug is deterministic and input-independent: any valid JSON that produces at least one structural element triggers it (confirmed with {"key":123} and [1,2,3]).
Root cause
The function allocates one pinned buffer and partitions it into two contiguous arrays:
res_buff:
[0 .. result_size-1] -> structural (result_size int32_t)
[result_size+1 .. 2*result_size] -> pair_pos (result_size int32_t)
parsed_tree.structural = res_buff; // index 0
parsed_tree.pair_pos = res_buff + result_size + 1; // index result_size+1
The second cudaMemcpy writes to res_buff + result_size + 2, not + 1, leaving a 1-element gap:
cudaMallocHost(&res_buff, (2 * result_size + 2) * sizeof(int32_t));
cudaMemcpy(res_buff, dev_structural, result_size * sizeof(int32_t), cudaMemcpyDeviceToHost);
// note +2, not +1
cudaMemcpy(res_buff + result_size + 2, dev_pair_pos, result_size * sizeof(int32_t), cudaMemcpyDeviceToHost);
parsed_tree.pair_pos = res_buff + result_size + 1; // index result_size+1 is NEVER written
So pair_pos[0] reads uninitialized pinned host memory from cudaMallocHost. depth_init_MathAPI then uses the pair positions to compute tree depth, operating on garbage; the corrupted pair data also flows into CUB's radix sort (block_load.cuh:130), confirming propagation through multiple kernels.
Impact
| Category |
Assessment |
| Data Corruption |
High — uninitialized pair_pos[0] feeds depth_init_MathAPI and CUB radix sort, silently corrupting parse-tree depth/pair-matching for every document. |
| Information Leak |
Medium — uninitialized pinned host memory may hold residual data; if pair_pos values are exposed via API/serialized output, stale contents could leak. |
| Denial of Service |
Medium — a garbage pair_pos[0] used as an index in downstream kernels could cause an out-of-bounds access, GPU fault, or hang. |
Reproduction
CUDA_HOME=${CUDA_HOME:-/usr/local/cuda}
nvcc -std=c++17 -arch=native -O2 -g -lineinfo \
-I<cujson-src> -I${CUDA_HOME}/include \
llm_harness_parse_standard_json.cu harness_afl_main.cpp \
--compiler-bindir g++-13 -L${CUDA_HOME}/lib64 -lcudart \
-o /tmp/cujson_parse_standard_json_initcheck.bin
compute-sanitizer --tool=initcheck --report-api-errors=no --error-exitcode 99 \
/tmp/cujson_parse_standard_json_initcheck.bin trigger_input.json
Input is any valid JSON with structural tokens (e.g. {"key":123}).
Sanitizer evidence
========= Uninitialized __global__ memory read of size 4 bytes
========= at depth_init_MathAPI(unsigned int *, unsigned int *, int, int)+0xa0 in parse_standard_json.cu:1393
========= by thread (0,0,0) in block (0,0,0)
========= Host Frame: Parser(...)
========= Host Frame: parse_standard_json(cuJSONInput)
========= Host Frame: main
=========
========= Uninitialized __global__ memory read of size 1 bytes
========= at cub::...::LoadDirectBlocked<...>+0x380 in block_load.cuh:130
========= by thread (0,0,0) in block (0,0,0)
========= Device Frame: cub::...::DeviceRadixSortSingleTileKernel<...>+0xb0 in radix_sort.cuh:401
Suggested fix
The root cause is the off-by-one mismatch between the pair_pos pointer (res_buff + result_size + 1) and the second cudaMemcpy destination (res_buff + result_size + 2). Any of the following resolves it.
Option A — align the pointer to the copy destination:
- parsed_tree.pair_pos = res_buff + result_size + 1;
+ parsed_tree.pair_pos = res_buff + result_size + 2;
Option B — align the copy destination to the pointer:
- cudaMemcpy(res_buff + result_size + 2, dev_pair_pos, result_size * sizeof(int32_t), ...);
+ cudaMemcpy(res_buff + result_size + 1, dev_pair_pos, result_size * sizeof(int32_t), ...);
Option C (preferred long term) — remove the unused gap and tighten the allocation:
- cudaMallocHost(&res_buff, (2 * result_size + 2) * sizeof(int32_t));
+ cudaMallocHost(&res_buff, (2 * result_size) * sizeof(int32_t));
- parsed_tree.pair_pos = res_buff + result_size + 1;
+ parsed_tree.pair_pos = res_buff + result_size;
- cudaMemcpy(res_buff + result_size + 2, dev_pair_pos, ...);
+ cudaMemcpy(res_buff + result_size, dev_pair_pos, result_size * sizeof(int32_t), ...);
Defensive hardening: zero-initialize the buffer after allocation (memset(res_buff, 0, total_size * sizeof(int32_t));) so any residual gap cannot leak.
Attachments
The following files from this finding are attached:
llm_harness_parse_standard_json.cu
harness_afl_main.cpp
trigger_input.json
initcheck.log
Archive.zip
Uninitialized memory read in
depth_init_MathAPIfrom pair_pos buffer off-by-oneEnvironment
2ac7d3dcd7ad1ff64ebdb14022bf94c59b3b4953(branchmaster)sm_86)nvccV13.1.115)Component
cuJSON — Standard JSON parser,
parse_standard_json.cu(parse_standard_json()→Parser()→depth_init_MathAPI(), consumed atparse_standard_json.cu:1393)Severity
High (CWE-908, Use of Uninitialized Resource) — silent data corruption on every parse, with potential out-of-bounds access.
Description
In
parse_standard_json, the result bufferres_buffis partitioned intostructuralandpair_posarrays. Thepair_pospointer is set tores_buff + result_size + 1, but thecudaMemcpythat populates pair-position data writes starting atres_buff + result_size + 2. This off-by-one gap leavespair_pos[0](i.e.res_buff[result_size + 1]) as uninitialized pinned host memory. The uninitialized value propagates into thedepth_init_MathAPIkernel (line 1393) and into the downstream CUB radix sort, corrupting the JSON parse tree for every input.The bug is deterministic and input-independent: any valid JSON that produces at least one structural element triggers it (confirmed with
{"key":123}and[1,2,3]).Root cause
The function allocates one pinned buffer and partitions it into two contiguous arrays:
The second
cudaMemcpywrites tores_buff + result_size + 2, not+ 1, leaving a 1-element gap:So
pair_pos[0]reads uninitialized pinned host memory fromcudaMallocHost.depth_init_MathAPIthen uses the pair positions to compute tree depth, operating on garbage; the corrupted pair data also flows into CUB's radix sort (block_load.cuh:130), confirming propagation through multiple kernels.Impact
pair_pos[0]feedsdepth_init_MathAPIand CUB radix sort, silently corrupting parse-tree depth/pair-matching for every document.pair_posvalues are exposed via API/serialized output, stale contents could leak.pair_pos[0]used as an index in downstream kernels could cause an out-of-bounds access, GPU fault, or hang.Reproduction
Input is any valid JSON with structural tokens (e.g.
{"key":123}).Sanitizer evidence
Suggested fix
The root cause is the off-by-one mismatch between the
pair_pospointer (res_buff + result_size + 1) and the secondcudaMemcpydestination (res_buff + result_size + 2). Any of the following resolves it.Option A — align the pointer to the copy destination:
Option B — align the copy destination to the pointer:
Option C (preferred long term) — remove the unused gap and tighten the allocation:
Defensive hardening: zero-initialize the buffer after allocation (
memset(res_buff, 0, total_size * sizeof(int32_t));) so any residual gap cannot leak.Attachments
The following files from this finding are attached:
llm_harness_parse_standard_json.cuharness_afl_main.cpptrigger_input.jsoninitcheck.logArchive.zip