NULL pointer dereference in parse_json_lines (CWE-476)
Environment
- cuJSON commit:
2ac7d3dcd7ad1ff64ebdb14022bf94c59b3b4953 (branch master)
- OS: Ubuntu 22.04.5 LTS
- GPU: NVIDIA A40 (Ampere,
sm_86)
- NVIDIA driver: 590.48.01
- CUDA toolkit: 13.1 (
nvcc V13.1.115)
Component
cuJSON — JSON Lines parser, parse_json_lines(cuJSONInput) (parse_json_lines.h / parse_json_lines.cu)
Severity
Medium (CWE-476, NULL Pointer Dereference) — Denial of Service.
Description
parse_json_lines() dereferences input.data without validating it is non-NULL when input.size > 0. Passing a cuJSONInput with data = nullptr and size = 100 causes an immediate SIGSEGV (read from address 0x000000000000), crashing the host process. The same class of bug likely applies to the input.chunks and input.chunkSizes array pointers when chunk-based input is used.
Root cause
parse_json_lines uses input.size to control loop bounds and dereferences input.data (or derived chunk pointers) without first verifying the data pointer is non-NULL:
cuJSONResult parse_json_lines(cuJSONInput input) {
cuJSONResult result;
memset(&result, 0, sizeof(result));
// Checks input.size > 0 but NOT input.data != nullptr
if (input.size > 0) {
// Chunk processing derives pointers from input.data, e.g.:
// chunks[i] = input.data + offset;
// Later dereferences without a null check:
// volatile uint8_t val = chunks[i][0]; // CRASH if input.data == NULL
}
}
The parser trusts input.size > 0 as an indicator that input.data is valid. The CPU attempts a read at address 0x0 (the zero page, unmapped), producing the SIGSEGV. The register dump confirms rax = 0x0 (NULL data pointer) and r12 = 0x64 (decimal 100, the input.size value).
Impact
| Category |
Assessment |
| Denial of Service |
Confirmed. Any caller passing data=NULL, size>0 triggers an immediate SIGSEGV, crashing the host process. |
| Memory Corruption |
Not applicable — the read is from address 0x0, which terminates rather than silently corrupting. |
| Remote Code Execution |
Not directly exploitable; the zero page is not mappable on modern Linux with vm.mmap_min_addr. |
| Availability |
High — repeated exploitation crashes the service each time. |
Attack scenario: in a GPU-accelerated data-ingestion pipeline, an adversary supplies a malformed/empty dataset that yields cuJSONInput.data == NULL while size stays non-zero (e.g. due to metadata/header parsing). The parse_json_lines call crashes the whole process, including co-located GPU workloads sharing the address space.
Reproduction
The harness constructs the invalid input struct directly:
cuJSONInput input = {};
input.data = nullptr;
input.size = 100;
parse_json_lines(input); // SEGV
Build and run with ASan:
CUDA_HOME=${CUDA_HOME:-/usr/local/cuda}
nvcc -std=c++17 -arch=native -O0 -g \
-Xcompiler -fsanitize=address,-fno-omit-frame-pointer \
-I<cujson-src> -I${CUDA_HOME}/include \
llm_harness_parse_json_lines.cu \
--compiler-bindir g++-13 -L${CUDA_HOME}/lib64 -lcudart \
-Xlinker -fsanitize=address \
-o /tmp/cujson_parse_json_lines_asan.bin
ASAN_OPTIONS=protect_shadow_gap=0:detect_leaks=0:halt_on_error=1 \
/tmp/cujson_parse_json_lines_asan.bin
This finding is struct-field driven, so no external crash input is required; the harness constructs the invalid cuJSONInput internally.
Sanitizer evidence
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1849784==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55a5bcad7825 bp 0x7ffc687a95d0 sp 0x7ffc687a94f0 T0)
==1849784==The signal is caused by a READ memory access.
==1849784==Hint: address points to the zero page.
#0 0x55a5bcad7825 in parse_json_lines(cuJSONInput)
#1 0x55a5bcad7a6a in main
#2 0x7f948e48bd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
==1849784==Register values:
rax = 0x0000000000000000 ... r11 = 0x0000000000000064
r12 = 0x0000000000000064 r13 = 0x0000000000000000 ...
SUMMARY: AddressSanitizer: SEGV in parse_json_lines(cuJSONInput)
==1849784==ABORTING
Suggested fix
Validate input pointers at the entry point of parse_json_lines before any data access:
cuJSONResult parse_json_lines(cuJSONInput input) {
cuJSONResult result;
memset(&result, 0, sizeof(result));
+ // Reject NULL data with non-zero size
+ if (input.size > 0 && input.data == nullptr) {
+ result.error = CUJSON_ERROR_INVALID_INPUT;
+ result.errorMessage = "input.data is NULL with non-zero input.size";
+ return result;
+ }
+
+ // If chunk-based input is used, validate chunk arrays
+ if (input.numChunks > 0) {
+ if (input.chunks == nullptr || input.chunkSizes == nullptr) {
+ result.error = CUJSON_ERROR_INVALID_INPUT;
+ result.errorMessage = "input.chunks or input.chunkSizes is NULL with non-zero numChunks";
+ return result;
+ }
+ for (size_t i = 0; i < input.numChunks; i++) {
+ if (input.chunks[i] == nullptr && input.chunkSizes[i] > 0) {
+ result.error = CUJSON_ERROR_INVALID_INPUT;
+ result.errorMessage = "input.chunks[i] is NULL with non-zero chunkSizes[i]";
+ return result;
+ }
+ }
+ }
+
+ // Zero-size input is a valid no-op
+ if (input.size == 0) {
+ result.error = CUJSON_SUCCESS;
+ return result;
+ }
+
// ... existing processing code ...
}
Additional hardening: document the contract (input.data must be non-NULL when input.size > 0), add a debug assertion, and apply the same NULL-check pattern to all public API entry points accepting cuJSONInput.
Attachments
The following files from this finding are attached:
llm_harness_parse_json_lines.cu
harness_afl_main.cpp
asan.log
Archive.zip
NULL pointer dereference in
parse_json_lines(CWE-476)Environment
2ac7d3dcd7ad1ff64ebdb14022bf94c59b3b4953(branchmaster)sm_86)nvccV13.1.115)Component
cuJSON — JSON Lines parser,
parse_json_lines(cuJSONInput)(parse_json_lines.h/parse_json_lines.cu)Severity
Medium (CWE-476, NULL Pointer Dereference) — Denial of Service.
Description
parse_json_lines()dereferencesinput.datawithout validating it is non-NULL wheninput.size > 0. Passing acuJSONInputwithdata = nullptrandsize = 100causes an immediateSIGSEGV(read from address0x000000000000), crashing the host process. The same class of bug likely applies to theinput.chunksandinput.chunkSizesarray pointers when chunk-based input is used.Root cause
parse_json_linesusesinput.sizeto control loop bounds and dereferencesinput.data(or derived chunk pointers) without first verifying the data pointer is non-NULL:The parser trusts
input.size > 0as an indicator thatinput.datais valid. The CPU attempts a read at address0x0(the zero page, unmapped), producing theSIGSEGV. The register dump confirmsrax = 0x0(NULL data pointer) andr12 = 0x64(decimal 100, theinput.sizevalue).Impact
data=NULL, size>0triggers an immediateSIGSEGV, crashing the host process.0x0, which terminates rather than silently corrupting.vm.mmap_min_addr.Attack scenario: in a GPU-accelerated data-ingestion pipeline, an adversary supplies a malformed/empty dataset that yields
cuJSONInput.data == NULLwhilesizestays non-zero (e.g. due to metadata/header parsing). Theparse_json_linescall crashes the whole process, including co-located GPU workloads sharing the address space.Reproduction
The harness constructs the invalid input struct directly:
Build and run with ASan:
This finding is struct-field driven, so no external crash input is required; the harness constructs the invalid
cuJSONInputinternally.Sanitizer evidence
Suggested fix
Validate input pointers at the entry point of
parse_json_linesbefore any data access:cuJSONResult parse_json_lines(cuJSONInput input) { cuJSONResult result; memset(&result, 0, sizeof(result)); + // Reject NULL data with non-zero size + if (input.size > 0 && input.data == nullptr) { + result.error = CUJSON_ERROR_INVALID_INPUT; + result.errorMessage = "input.data is NULL with non-zero input.size"; + return result; + } + + // If chunk-based input is used, validate chunk arrays + if (input.numChunks > 0) { + if (input.chunks == nullptr || input.chunkSizes == nullptr) { + result.error = CUJSON_ERROR_INVALID_INPUT; + result.errorMessage = "input.chunks or input.chunkSizes is NULL with non-zero numChunks"; + return result; + } + for (size_t i = 0; i < input.numChunks; i++) { + if (input.chunks[i] == nullptr && input.chunkSizes[i] > 0) { + result.error = CUJSON_ERROR_INVALID_INPUT; + result.errorMessage = "input.chunks[i] is NULL with non-zero chunkSizes[i]"; + return result; + } + } + } + + // Zero-size input is a valid no-op + if (input.size == 0) { + result.error = CUJSON_SUCCESS; + return result; + } + // ... existing processing code ... }Additional hardening: document the contract (
input.datamust be non-NULL wheninput.size > 0), add a debug assertion, and apply the same NULL-check pattern to all public API entry points acceptingcuJSONInput.Attachments
The following files from this finding are attached:
llm_harness_parse_json_lines.cuharness_afl_main.cppasan.logArchive.zip