Data race on `shared_error` in `validate_expand_MathAPI_new2` / `validate_expand`

# Data race on `shared_error` in `validate_expand_MathAPI_new2` / `validate_expand`

## Environment

- cuJSON commit: `2ac7d3dcd7ad1ff64ebdb14022bf94c59b3b4953` (branch `master`)
- OS: Ubuntu 22.04.5 LTS
- GPU: NVIDIA A40 (Ampere, `sm_86`)
- NVIDIA driver: 590.48.01
- CUDA toolkit: 13.1 (`nvcc` V13.1.115)

## Component

cuJSON — `parse_standard_json.cu`, `parse_json_lines.cu`

## Severity

Medium (CWE-362, Concurrent Execution Using Shared Resource with Improper Synchronization) — correctness issue with potential security implications.

## Description

The bracket-validation kernels `validate_expand_MathAPI_new2` (parse_standard_json.cu) and `validate_expand` (parse_json_lines.cu) contain a shared-memory data race on the `shared_error` flag. The code uses a non-atomic `shared_error |= 1` instead of `atomicOr(&shared_error, 1u)`, creating a read/write race between threads in the same block.

The bug is deterministic: any JSON/JSONL input with bracket structures triggers it, and deeper nesting increases the number of concurrent threads and the probability of an observable effect.

## Root cause

Both kernels use a shared-memory error flag:

```cpp
__shared__ uint32_t shared_error;
// ...
shared_error |= 1; // non-atomic read-modify-write
```

Multiple threads in the same warp/block execute `shared_error |= 1` concurrently. `|=` is a non-atomic read-modify-write (read, OR, write back). For two concurrent threads:

1. Thread A reads `shared_error` = 0
2. Thread B reads `shared_error` = 0
3. Thread A writes `shared_error` = 1
4. Thread B writes `shared_error` = 1 (lost update)

For `|= 1` the lost update is benign (same value), but it is still undefined behavior in the CUDA memory model. More importantly, the race also exists between the error-setting writes and subsequent reads of `shared_error` (read at offset +0x260 vs write at +0x3d0), so a thread can observe stale state and miss an error condition. The earlier kernel variant `validate_expand_MathAPI_new` correctly used `atomicOr(&shared_error, 1)`.

Affected locations: `validate_expand_MathAPI_new2` (parse_standard_json.cu, ~lines 1491–1542) and `validate_expand` (parse_json_lines.cu, ~line 1009).

## Impact

- Correctness: a thread can read `shared_error` before another thread's write is visible, potentially missing a bracket-pairing error and allowing malformed JSON to pass validation.
- Security: if `pairError` is not set when it should be, the parser continues with incorrectly paired brackets. Downstream iterators that trust `pair_pos` validity can then produce incorrect results or trigger out-of-bounds accesses.

## Reproduction

Build the harness and run racecheck:

```bash
CUDA_HOME=${CUDA_HOME:-/usr/local/cuda}
nvcc -std=c++17 -arch=native -O2 -g -lineinfo \
  -I<cujson-src> -I${CUDA_HOME}/include \
  llm_harness_parse_standard_json.cu harness_afl_main.cpp \
  --compiler-bindir g++-13 -L${CUDA_HOME}/lib64 -lcudart \
  -o /tmp/cujson_validate_expand_racecheck.bin

compute-sanitizer --tool=racecheck --report-api-errors=no --error-exitcode 99 \
  /tmp/cujson_validate_expand_racecheck.bin trigger_input.bin
```

`trigger_input.bin` is 201 bytes: 100 nested `[` brackets, then `1`, then 100 `]` brackets (`[[[...[1]...]]]`).

Expected: no race conditions. Actual: a racecheck hazard on `shared_error`.

## Sanitizer evidence

parse_standard_json (racecheck):

```
Warning: Race between Read at validate_expand_MathAPI_new2+0x260
  and Write at validate_expand_MathAPI_new2+0x3d0 [4 hazards]
  and Write at validate_expand_MathAPI_new2+0x820 [4 hazards]

Error: Race between Write at validate_expand_MathAPI_new2+0x3d0
  and Read at validate_expand_MathAPI_new2+0x770 [4 hazards]

RACECHECK SUMMARY: 2 hazards (1 error, 1 warning)
```

parse_json_lines (racecheck):

```
Error: Race between Read at validate_expand+0x260
  and Write at validate_expand+0x3d0 [12 hazards]

RACECHECK SUMMARY: 1 hazard (1 error, 0 warnings)
```

Cross-tool notes: iGUARD does not flag this (it targets global-memory cross-SM races, not shared-memory intra-block RMW); memcheck is N/A (race, not OOB); synccheck reports 0 errors.

## Suggested fix

Replace the non-atomic updates with `atomicOr` at all sites that set `shared_error`:

```cpp
// validate_expand_MathAPI_new2 (parse_standard_json.cu, ~5 locations):
atomicOr(&shared_error, 1u);

// validate_expand (parse_json_lines.cu, ~1 location):
atomicOr(&shared_error, 1u);
```

## Attachments

The following files from this finding are attached:

- `llm_harness_parse_standard_json.cu`
- `harness_afl_main.cpp`
- `trigger_input.bin`
- `racecheck.log`

[Archive.zip](https://github.com/user-attachments/files/28676147/Archive.zip)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data race on `shared_error` in `validate_expand_MathAPI_new2` / `validate_expand` #1

Data race on `shared_error` in `validate_expand_MathAPI_new2` / `validate_expand`

Environment

Component

Severity

Description

Root cause

Impact

Reproduction

Sanitizer evidence

Suggested fix

Attachments

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Data race on shared_error in validate_expand_MathAPI_new2 / validate_expand #1

Description

Data race on shared_error in validate_expand_MathAPI_new2 / validate_expand

Environment

Component

Severity

Description

Root cause

Impact

Reproduction

Sanitizer evidence

Suggested fix

Attachments

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Data race on `shared_error` in `validate_expand_MathAPI_new2` / `validate_expand` #1

Data race on `shared_error` in `validate_expand_MathAPI_new2` / `validate_expand`