Data race on shared_error in validate_expand_MathAPI_new2 / validate_expand
Environment
- cuJSON commit:
2ac7d3dcd7ad1ff64ebdb14022bf94c59b3b4953 (branch master)
- OS: Ubuntu 22.04.5 LTS
- GPU: NVIDIA A40 (Ampere,
sm_86)
- NVIDIA driver: 590.48.01
- CUDA toolkit: 13.1 (
nvcc V13.1.115)
Component
cuJSON — parse_standard_json.cu, parse_json_lines.cu
Severity
Medium (CWE-362, Concurrent Execution Using Shared Resource with Improper Synchronization) — correctness issue with potential security implications.
Description
The bracket-validation kernels validate_expand_MathAPI_new2 (parse_standard_json.cu) and validate_expand (parse_json_lines.cu) contain a shared-memory data race on the shared_error flag. The code uses a non-atomic shared_error |= 1 instead of atomicOr(&shared_error, 1u), creating a read/write race between threads in the same block.
The bug is deterministic: any JSON/JSONL input with bracket structures triggers it, and deeper nesting increases the number of concurrent threads and the probability of an observable effect.
Root cause
Both kernels use a shared-memory error flag:
__shared__ uint32_t shared_error;
// ...
shared_error |= 1; // non-atomic read-modify-write
Multiple threads in the same warp/block execute shared_error |= 1 concurrently. |= is a non-atomic read-modify-write (read, OR, write back). For two concurrent threads:
- Thread A reads
shared_error = 0
- Thread B reads
shared_error = 0
- Thread A writes
shared_error = 1
- Thread B writes
shared_error = 1 (lost update)
For |= 1 the lost update is benign (same value), but it is still undefined behavior in the CUDA memory model. More importantly, the race also exists between the error-setting writes and subsequent reads of shared_error (read at offset +0x260 vs write at +0x3d0), so a thread can observe stale state and miss an error condition. The earlier kernel variant validate_expand_MathAPI_new correctly used atomicOr(&shared_error, 1).
Affected locations: validate_expand_MathAPI_new2 (parse_standard_json.cu, ~lines 1491–1542) and validate_expand (parse_json_lines.cu, ~line 1009).
Impact
- Correctness: a thread can read
shared_error before another thread's write is visible, potentially missing a bracket-pairing error and allowing malformed JSON to pass validation.
- Security: if
pairError is not set when it should be, the parser continues with incorrectly paired brackets. Downstream iterators that trust pair_pos validity can then produce incorrect results or trigger out-of-bounds accesses.
Reproduction
Build the harness and run racecheck:
CUDA_HOME=${CUDA_HOME:-/usr/local/cuda}
nvcc -std=c++17 -arch=native -O2 -g -lineinfo \
-I<cujson-src> -I${CUDA_HOME}/include \
llm_harness_parse_standard_json.cu harness_afl_main.cpp \
--compiler-bindir g++-13 -L${CUDA_HOME}/lib64 -lcudart \
-o /tmp/cujson_validate_expand_racecheck.bin
compute-sanitizer --tool=racecheck --report-api-errors=no --error-exitcode 99 \
/tmp/cujson_validate_expand_racecheck.bin trigger_input.bin
trigger_input.bin is 201 bytes: 100 nested [ brackets, then 1, then 100 ] brackets ([[[...[1]...]]]).
Expected: no race conditions. Actual: a racecheck hazard on shared_error.
Sanitizer evidence
parse_standard_json (racecheck):
Warning: Race between Read at validate_expand_MathAPI_new2+0x260
and Write at validate_expand_MathAPI_new2+0x3d0 [4 hazards]
and Write at validate_expand_MathAPI_new2+0x820 [4 hazards]
Error: Race between Write at validate_expand_MathAPI_new2+0x3d0
and Read at validate_expand_MathAPI_new2+0x770 [4 hazards]
RACECHECK SUMMARY: 2 hazards (1 error, 1 warning)
parse_json_lines (racecheck):
Error: Race between Read at validate_expand+0x260
and Write at validate_expand+0x3d0 [12 hazards]
RACECHECK SUMMARY: 1 hazard (1 error, 0 warnings)
Cross-tool notes: iGUARD does not flag this (it targets global-memory cross-SM races, not shared-memory intra-block RMW); memcheck is N/A (race, not OOB); synccheck reports 0 errors.
Suggested fix
Replace the non-atomic updates with atomicOr at all sites that set shared_error:
// validate_expand_MathAPI_new2 (parse_standard_json.cu, ~5 locations):
atomicOr(&shared_error, 1u);
// validate_expand (parse_json_lines.cu, ~1 location):
atomicOr(&shared_error, 1u);
Attachments
The following files from this finding are attached:
llm_harness_parse_standard_json.cu
harness_afl_main.cpp
trigger_input.bin
racecheck.log
Archive.zip
Data race on
shared_errorinvalidate_expand_MathAPI_new2/validate_expandEnvironment
2ac7d3dcd7ad1ff64ebdb14022bf94c59b3b4953(branchmaster)sm_86)nvccV13.1.115)Component
cuJSON —
parse_standard_json.cu,parse_json_lines.cuSeverity
Medium (CWE-362, Concurrent Execution Using Shared Resource with Improper Synchronization) — correctness issue with potential security implications.
Description
The bracket-validation kernels
validate_expand_MathAPI_new2(parse_standard_json.cu) andvalidate_expand(parse_json_lines.cu) contain a shared-memory data race on theshared_errorflag. The code uses a non-atomicshared_error |= 1instead ofatomicOr(&shared_error, 1u), creating a read/write race between threads in the same block.The bug is deterministic: any JSON/JSONL input with bracket structures triggers it, and deeper nesting increases the number of concurrent threads and the probability of an observable effect.
Root cause
Both kernels use a shared-memory error flag:
Multiple threads in the same warp/block execute
shared_error |= 1concurrently.|=is a non-atomic read-modify-write (read, OR, write back). For two concurrent threads:shared_error= 0shared_error= 0shared_error= 1shared_error= 1 (lost update)For
|= 1the lost update is benign (same value), but it is still undefined behavior in the CUDA memory model. More importantly, the race also exists between the error-setting writes and subsequent reads ofshared_error(read at offset +0x260 vs write at +0x3d0), so a thread can observe stale state and miss an error condition. The earlier kernel variantvalidate_expand_MathAPI_newcorrectly usedatomicOr(&shared_error, 1).Affected locations:
validate_expand_MathAPI_new2(parse_standard_json.cu, ~lines 1491–1542) andvalidate_expand(parse_json_lines.cu, ~line 1009).Impact
shared_errorbefore another thread's write is visible, potentially missing a bracket-pairing error and allowing malformed JSON to pass validation.pairErroris not set when it should be, the parser continues with incorrectly paired brackets. Downstream iterators that trustpair_posvalidity can then produce incorrect results or trigger out-of-bounds accesses.Reproduction
Build the harness and run racecheck:
trigger_input.binis 201 bytes: 100 nested[brackets, then1, then 100]brackets ([[[...[1]...]]]).Expected: no race conditions. Actual: a racecheck hazard on
shared_error.Sanitizer evidence
parse_standard_json (racecheck):
parse_json_lines (racecheck):
Cross-tool notes: iGUARD does not flag this (it targets global-memory cross-SM races, not shared-memory intra-block RMW); memcheck is N/A (race, not OOB); synccheck reports 0 errors.
Suggested fix
Replace the non-atomic updates with
atomicOrat all sites that setshared_error:Attachments
The following files from this finding are attached:
llm_harness_parse_standard_json.cuharness_afl_main.cpptrigger_input.binracecheck.logArchive.zip