Skip to content

feat(db): add NeuG graph database as optional backend with native Cypher support#698

Open
BingqingLyu wants to merge 7 commits into
colbymchenry:mainfrom
BingqingLyu:feat/neug-backend
Open

feat(db): add NeuG graph database as optional backend with native Cypher support#698
BingqingLyu wants to merge 7 commits into
colbymchenry:mainfrom
BingqingLyu:feat/neug-backend

Conversation

@BingqingLyu
Copy link
Copy Markdown

Summary

  • Optional NeuG graph database backend (codegraph init --backend neug) as a drop-in alternative to SQLite
  • All existing CLI/MCP functionality works unchanged on both backends
  • New codegraph cypher <query> CLI command and codegraph_cypher MCP tool expose native Cypher queries (NeuG only)
  • 67 integration tests covering every QueryBuilder method and Cypher query patterns
  • Uses published @graphscope-neug/neug package (platform binaries: macOS ARM64, Linux x64)

Motivation

Addresses #681.

CodeGraph currently stores its code knowledge graph in SQLite. While SQLite is proven and portable, using a relational database to simulate graph operations has inherent limitations:

  1. Multi-hop query performance: Graph traversals (callers, callees, impact) require N rounds of SQL queries + application-level BFS, with B-tree index scans per hop.
  2. No graph query language: Structural questions like "all paths from A to B" or "all classes implementing interface X with their methods" are not straightforward in SQL — they require recursive CTEs or multiple queries with application-level assembly.

NeuG addresses both with 4 key advantages:

  • High-performance graph storage — CSR-optimized adjacency traversal. Built on GraphScope Flex, which set the world record on the LDBC SNB Interactive benchmark (80,000+ QPS with declarative Cypher).
  • Industry-standard Cypher — Full Cypher query language support for declarative graph pattern matching, multi-hop traversals, aggregation, and more. Exposed via codegraph cypher CLI, codegraph_cypher MCP tool, and executeCypher() API.
  • Lightweight & embeddable — No external server process. Supports incremental updates, fitting CodeGraph's local-first architecture.
  • Extensible via native C++ extensions — Graph algorithms (Connected Components, PageRank, Louvain, etc.) are planned for upcoming releases, enabling advanced code analysis like community detection and influence ranking.

Test plan

  • 67 NeuG integration tests pass (all QueryBuilder methods + Cypher query verification)
  • All CLI commands verified: callers, callees, impact, cypher, status, index, sync

We're the NeuG team and happy to own this integration end-to-end — implementation, tests, and ongoing maintenance. Happy to discuss the approach.

BingqingLyu and others added 7 commits June 5, 2026 14:08
NeuG (embedded graph DB with Cypher) can now be used instead of SQLite:
  codegraph init --backend neug

Implements NeuGQueryBuilder with the same public API as QueryBuilder
(duck typing), so GraphTraverser, MCP tools, and CLI work unchanged.

Key details:
- MERGE-based upserts preserve edges (no DETACH DELETE+CREATE)
- Literal interpolation for CONTAINS and IN (NeuG 0.1.2 $param limitation)
- Standalone test suite (npm run test:neug) — vitest excluded due to
  glog double-init abort in worker threads (see neug-segv-repro.js)
- 32 integration tests against real NeuG binary, all passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement 6 missing QueryBuilder methods (getNodeAndEdgeCount,
getDominantFile, getTopRouteFile, getRoutingManifest,
findNodesByExactName, findNodesByNameSubstring) so all existing
CLI/MCP features work on the NeuG backend. Add executeCypher()
for raw Cypher queries and expose it via `codegraph cypher <query>`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cover batch operations (insertNodes, insertEdges, updateNode), node
query methods (getNodesByName, getNodesByQualifiedNameExact,
getNodesByLowerName, getAllNodes, getAllNodeNames), file operations
(getStaleFiles), unresolved reference lifecycle (deleteUnresolvedByNode,
getUnresolvedByName, getUnresolvedReferencesBatch,
getUnresolvedReferencesByFiles, deleteResolvedReferences,
deleteSpecificResolvedReferences), status/routing methods
(getDominantFile, getTopRouteFile, getRoutingManifest), and graph
traversal (getCallees, getImpactRadius). 61 tests total.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update README with architecture diagram, CLI reference, and a new
Graph Database Backend section covering NeuG advantages over SQLite
for graph operations. Remove SQLite hard-coding from MCP server
instructions. Add design doc recording motivation, architecture,
and implementation details.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify that all 3 Cypher examples from the upstream issue draft actually
run on NeuG 0.1.2: variable-length paths, multi-hop pattern matching,
and aggregation queries. Total test count: 61 → 67.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e-neug/neug

Replace `neug: file:neug-nodejs-0.1.2-osx_arm64.tgz` with the published
`@graphscope-neug/neug@^0.1.2` package. Update all import references
from `'neug'` to `'@graphscope-neug/neug'` in source and tests. Also
fix NeuG link in design doc, refine README architecture diagram and
platform support text.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expose NeuG Cypher query capability through the MCP server so agents
can execute arbitrary graph pattern matching. Returns tabular results,
errors gracefully on SQLite backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@treytracedit-lab
Copy link
Copy Markdown

Fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants