Skip to content

NULL pointer dereference in parse_json_lines (CWE-476) #2

@TarekIbnZiad

Description

@TarekIbnZiad

NULL pointer dereference in parse_json_lines (CWE-476)

Environment

  • cuJSON commit: 2ac7d3dcd7ad1ff64ebdb14022bf94c59b3b4953 (branch master)
  • OS: Ubuntu 22.04.5 LTS
  • GPU: NVIDIA A40 (Ampere, sm_86)
  • NVIDIA driver: 590.48.01
  • CUDA toolkit: 13.1 (nvcc V13.1.115)

Component

cuJSON — JSON Lines parser, parse_json_lines(cuJSONInput) (parse_json_lines.h / parse_json_lines.cu)

Severity

Medium (CWE-476, NULL Pointer Dereference) — Denial of Service.

Description

parse_json_lines() dereferences input.data without validating it is non-NULL when input.size > 0. Passing a cuJSONInput with data = nullptr and size = 100 causes an immediate SIGSEGV (read from address 0x000000000000), crashing the host process. The same class of bug likely applies to the input.chunks and input.chunkSizes array pointers when chunk-based input is used.

Root cause

parse_json_lines uses input.size to control loop bounds and dereferences input.data (or derived chunk pointers) without first verifying the data pointer is non-NULL:

cuJSONResult parse_json_lines(cuJSONInput input) {
  cuJSONResult result;
  memset(&result, 0, sizeof(result));

  // Checks input.size > 0 but NOT input.data != nullptr
  if (input.size > 0) {
    // Chunk processing derives pointers from input.data, e.g.:
    //   chunks[i] = input.data + offset;
    // Later dereferences without a null check:
    //   volatile uint8_t val = chunks[i][0]; // CRASH if input.data == NULL
  }
}

The parser trusts input.size > 0 as an indicator that input.data is valid. The CPU attempts a read at address 0x0 (the zero page, unmapped), producing the SIGSEGV. The register dump confirms rax = 0x0 (NULL data pointer) and r12 = 0x64 (decimal 100, the input.size value).

Impact

Category Assessment
Denial of Service Confirmed. Any caller passing data=NULL, size>0 triggers an immediate SIGSEGV, crashing the host process.
Memory Corruption Not applicable — the read is from address 0x0, which terminates rather than silently corrupting.
Remote Code Execution Not directly exploitable; the zero page is not mappable on modern Linux with vm.mmap_min_addr.
Availability High — repeated exploitation crashes the service each time.

Attack scenario: in a GPU-accelerated data-ingestion pipeline, an adversary supplies a malformed/empty dataset that yields cuJSONInput.data == NULL while size stays non-zero (e.g. due to metadata/header parsing). The parse_json_lines call crashes the whole process, including co-located GPU workloads sharing the address space.

Reproduction

The harness constructs the invalid input struct directly:

cuJSONInput input = {};
input.data = nullptr;
input.size = 100;
parse_json_lines(input); // SEGV

Build and run with ASan:

CUDA_HOME=${CUDA_HOME:-/usr/local/cuda}
nvcc -std=c++17 -arch=native -O0 -g \
  -Xcompiler -fsanitize=address,-fno-omit-frame-pointer \
  -I<cujson-src> -I${CUDA_HOME}/include \
  llm_harness_parse_json_lines.cu \
  --compiler-bindir g++-13 -L${CUDA_HOME}/lib64 -lcudart \
  -Xlinker -fsanitize=address \
  -o /tmp/cujson_parse_json_lines_asan.bin

ASAN_OPTIONS=protect_shadow_gap=0:detect_leaks=0:halt_on_error=1 \
  /tmp/cujson_parse_json_lines_asan.bin

This finding is struct-field driven, so no external crash input is required; the harness constructs the invalid cuJSONInput internally.

Sanitizer evidence

AddressSanitizer:DEADLYSIGNAL
=================================================================
==1849784==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55a5bcad7825 bp 0x7ffc687a95d0 sp 0x7ffc687a94f0 T0)
==1849784==The signal is caused by a READ memory access.
==1849784==Hint: address points to the zero page.
  #0 0x55a5bcad7825 in parse_json_lines(cuJSONInput)
  #1 0x55a5bcad7a6a in main
  #2 0x7f948e48bd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

==1849784==Register values:
rax = 0x0000000000000000  ...  r11 = 0x0000000000000064
r12 = 0x0000000000000064  r13 = 0x0000000000000000  ...
SUMMARY: AddressSanitizer: SEGV in parse_json_lines(cuJSONInput)
==1849784==ABORTING

Suggested fix

Validate input pointers at the entry point of parse_json_lines before any data access:

 cuJSONResult parse_json_lines(cuJSONInput input) {
   cuJSONResult result;
   memset(&result, 0, sizeof(result));

+  // Reject NULL data with non-zero size
+  if (input.size > 0 && input.data == nullptr) {
+    result.error = CUJSON_ERROR_INVALID_INPUT;
+    result.errorMessage = "input.data is NULL with non-zero input.size";
+    return result;
+  }
+
+  // If chunk-based input is used, validate chunk arrays
+  if (input.numChunks > 0) {
+    if (input.chunks == nullptr || input.chunkSizes == nullptr) {
+      result.error = CUJSON_ERROR_INVALID_INPUT;
+      result.errorMessage = "input.chunks or input.chunkSizes is NULL with non-zero numChunks";
+      return result;
+    }
+    for (size_t i = 0; i < input.numChunks; i++) {
+      if (input.chunks[i] == nullptr && input.chunkSizes[i] > 0) {
+        result.error = CUJSON_ERROR_INVALID_INPUT;
+        result.errorMessage = "input.chunks[i] is NULL with non-zero chunkSizes[i]";
+        return result;
+      }
+    }
+  }
+
+  // Zero-size input is a valid no-op
+  if (input.size == 0) {
+    result.error = CUJSON_SUCCESS;
+    return result;
+  }
+
   // ... existing processing code ...
 }

Additional hardening: document the contract (input.data must be non-NULL when input.size > 0), add a debug assertion, and apply the same NULL-check pattern to all public API entry points accepting cuJSONInput.

Attachments

The following files from this finding are attached:

  • llm_harness_parse_json_lines.cu
  • harness_afl_main.cpp
  • asan.log

Archive.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions