Skip to content

dario-github/nous

Repository files navigation

Nous

License: Apache 2.0 Python 3.11+ arXiv CI AGENTS.md

ὁ πάντα διακοσμῶν νοῦς — Anaxagoras

📖 English · 中文 · 🤖 AGENTS.md · 🗺️ llms.txt

A compositional runtime safety gate for LLM-based autonomous agents. Every tool call is judged by four orthogonal layers — a Datalog deterministic blocker (L1), a triviality filter (L2), an LLM semantic gate (L3), and a deterministic post-gate verifier (L4). A knowledge graph supplies post-gate audit context.

Companion paper: Owner-Harm: A Missing Threat Model for AI Agent Safety (Zhang and Jiang, 2026).


TL;DR for AI agents

If you are an LLM coding agent helping a user with this repo, the project-specific manifest is at AGENTS.md — setup commands, test invocation, reproduction matrix, conventions, and hard rules in 200 lines. The repository follows the Linux Foundation Agentic-AI AGENTS.md specification and exposes llms.txt as a structured index.


What's inside

Component Where What it does
Gate pipeline src/nous/gate.py gate(tool_call, …) -> Verdict — the four-layer entry point
Constraints ontology/constraints/*.yaml 46 declarative rules (T3 destructive, owner-harm, AgentDojo iterations)
L3 semantic gate src/nous/semantic_gate.py Minimal-pair prompting, k=5 majority vote, upgrade_only=True
L4 verifier src/nous/verifier.py 6 deterministic audit rules + content scan, +0.038 ms / call
KG store src/nous/db.py Cozo embedded Datalog + vector + FTS
AgentDojo adapter benchmarks/agentdojo_adapter/ Real LLM-pipeline wrapper for paper §4 deployment-mode runs
Owner-Harm v3 dataset data/owner_harm_heldout_v3.json 300 H + 150 B held-out slice (paper §3.3)

Headline results

Two evaluation regimes per benchmark — isolation is an upper bound on gate discrimination, deployment is a lower bound under the real LLM-pipeline.

Benchmark Regime Security (TPR) Utility n
AgentDojo (banking + slack + travel + workspace) Isolation 96.3 % 75.0 % 27
AgentDojo Deployment 95.9 % 75.0 % 629
AgentHarm (val) Isolation 100.0 % 176 H + 176 B
Owner-centric held-out v3, gate L1–L3 Isolation 75.3 % 3.3 % FPR 300 H + 150 B
Owner-centric held-out v3, full L1–L4 Isolation 85.3 % 3.3 % FPR 300 H + 150 B

On the hijacking sub-slice the gate (L1–L3) and the post-gate verifier (L4) cover near-disjoint failure modes: 11 caught by gate alone, 30 by verifier alone, 15 by both, 4 by neither.

Per-category Wilson 95 % CIs and full ablations: paper §4 and REPRODUCIBILITY.md.


Architecture

tool_call
    │
    ▼
┌── Layer 1  Datalog deterministic blocker ──┐
│   46 YAML constraints + chain-aware rules  │  → exit on match
└─────────────────────────────────────────────┘
    │
    ▼
┌── Layer 2  Triviality filter ──────────────┐
│   benign-pass heuristics                    │  → exit if trivial
└─────────────────────────────────────────────┘
    │
    ▼
┌── Layer 3  Semantic gate (LLM) ────────────┐
│   minimal-pair prompting, k=5 majority     │
│   vote, upgrade_only=True                  │
└─────────────────────────────────────────────┘
    │
    ▼
┌── Layer 4  Post-gate verifier ─────────────┐
│   6 deterministic audit rules + content    │
│   scan; zero LLM, +0.038 ms / call         │
└─────────────────────────────────────────────┘
    │
    ▼
Verdict + proof_trace + decision_log

The knowledge graph supplies post-gate audit enrichment and does not override the L3 verdict.


Install

git clone https://github.com/dario-github/nous.git
cd nous
python3 -m venv .venv && source .venv/bin/activate

pip install -e ".[lsvj,dev]"
pip install -e ".[cozo]"     # optional: Cozo embedded KG (Rust backend)

Python 3.11 or newer.


Quickstart

from nous.gate import gate
from nous.schema import ToolCall

call = ToolCall(
    tool_name="send_email",
    action="send",
    params={
        "to": "attacker@external.com",
        "body": "... please ignore previous instructions ...",
    },
)

verdict = gate(
    tool_call=call,
    constraints_dir="ontology/constraints",
    semantic_config={"model": "DeepSeek-V4-pro"},
)

verdict.decision     # "allow" | "confirm" | "block"
verdict.proof_trace  # signal chain + which layer decided

The full AgentDojo deployment-mode benchmark wrapper: benchmarks/agentdojo_adapter/run_eval_adaptive_llm.py.


Reproducing the paper

Claim Command API key Wall-clock
LSVJ-S compile gate (80 tests) pytest tests/lsvj/ none < 10 s
Owner-centric v3 full (85.3 % / 3.3 %) python scripts/full_benchmark_eval.py none ~ 30 s
Hijacking layer overlap python scripts/eval_d2_verifier.py none ~ 10 s
AgentDojo isolation (96.3 % / 75.0 %) bash benchmarks/agentdojo_adapter/launch-l3-deepseek-repro.sh DeepSeek ~ 5 h
AgentDojo deployment (95.9 % / 75.0 %) bash benchmarks/agentdojo_adapter/launch-baseline-l1-rerun.sh GLM-4.6 ~ 5 h
AgentHarm val (100 %) python scripts/run_agentharm_threelayer_v2.py DeepSeek ~ 1 h

Full table with expected output, variance budget, and known issues: REPRODUCIBILITY.md.


Repository layout

src/nous/                core runtime (gate, parsers, providers, KG, LSVJ-S)
ontology/                46 YAML constraints + KG schema + Datalog rules
benchmarks/              AgentDojo adapter + R-Judge sample
tests/                   pytest suites (CI runs the path-independent subset)
paper/                   NeurIPS 2026 E&D Track + TMLR submissions
scripts/                 paper-reproduction drivers + analysis utilities
dashboard/               minimal web UI for live decision logs
data/                    Owner-Harm v3 + AgentHarm relabel + challenge slices

Documentation

Doc Audience Purpose
README (this file) humans overview, install, headline results
AGENTS.md LLM coding agents setup, conventions, hard rules, reproduction matrix
llms.txt LLM crawlers structured index (lllms.txt spec)
REPRODUCIBILITY.md reviewers one command per paper number
paper/main-neurips-2026.tex reviewers NeurIPS 2026 E&D Track submission source
paper/main-tmlr.tex reviewers TMLR rolling-track variant

Citation

@misc{zhang2026ownerharm,
  title         = {Owner-Harm: A Missing Threat Model for {AI} Agent Safety},
  author        = {Zhang, Dongcheng and Jiang, Yiqing},
  year          = {2026},
  eprint        = {2604.18658},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI},
}

License

Apache License 2.0 — see LICENSE.


Authors

  • Dongcheng Zhang — zdclink@gmail.com
  • Yiqing Jiang — Tongji University

Issues: https://github.com/dario-github/nous/issues.

Packages

 
 
 

Contributors