Nous

ὁ πάντα διακοσμῶν νοῦς — Anaxagoras

📖 English · 中文 · 🤖 AGENTS.md · 🗺️ llms.txt

A compositional runtime safety gate for LLM-based autonomous agents. Every tool call is judged by four orthogonal layers — a Datalog deterministic blocker (L1), a triviality filter (L2), an LLM semantic gate (L3), and a deterministic post-gate verifier (L4). A knowledge graph supplies post-gate audit context.

Companion paper: Owner-Harm: A Missing Threat Model for AI Agent Safety (Zhang and Jiang, 2026).

TL;DR for AI agents

If you are an LLM coding agent helping a user with this repo, the project-specific manifest is at AGENTS.md — setup commands, test invocation, reproduction matrix, conventions, and hard rules in 200 lines. The repository follows the Linux Foundation Agentic-AI AGENTS.md specification and exposes llms.txt as a structured index.

What's inside

Component	Where	What it does
Gate pipeline	`src/nous/gate.py`	`gate(tool_call, …) -> Verdict` — the four-layer entry point
Constraints	`ontology/constraints/*.yaml`	46 declarative rules (T3 destructive, owner-harm, AgentDojo iterations)
L3 semantic gate	`src/nous/semantic_gate.py`	Minimal-pair prompting, `k=5` majority vote, `upgrade_only=True`
L4 verifier	`src/nous/verifier.py`	6 deterministic audit rules + content scan, +0.038 ms / call
KG store	`src/nous/db.py`	Cozo embedded Datalog + vector + FTS
AgentDojo adapter	`benchmarks/agentdojo_adapter/`	Real LLM-pipeline wrapper for paper §4 deployment-mode runs
Owner-Harm v3 dataset	`data/owner_harm_heldout_v3.json`	300 H + 150 B held-out slice (paper §3.3)

Headline results

Two evaluation regimes per benchmark — isolation is an upper bound on gate discrimination, deployment is a lower bound under the real LLM-pipeline.

Benchmark	Regime	Security (TPR)	Utility	n
AgentDojo (banking + slack + travel + workspace)	Isolation	96.3 %	75.0 %	27
AgentDojo	Deployment	95.9 %	75.0 %	629
AgentHarm (val)	Isolation	100.0 %	—	176 H + 176 B
Owner-centric held-out v3, gate L1–L3	Isolation	75.3 %	3.3 % FPR	300 H + 150 B
Owner-centric held-out v3, full L1–L4	Isolation	85.3 %	3.3 % FPR	300 H + 150 B

On the hijacking sub-slice the gate (L1–L3) and the post-gate verifier (L4) cover near-disjoint failure modes: 11 caught by gate alone, 30 by verifier alone, 15 by both, 4 by neither.

Per-category Wilson 95 % CIs and full ablations: paper §4 and REPRODUCIBILITY.md.

Architecture

tool_call
    │
    ▼
┌── Layer 1  Datalog deterministic blocker ──┐
│   46 YAML constraints + chain-aware rules  │  → exit on match
└─────────────────────────────────────────────┘
    │
    ▼
┌── Layer 2  Triviality filter ──────────────┐
│   benign-pass heuristics                    │  → exit if trivial
└─────────────────────────────────────────────┘
    │
    ▼
┌── Layer 3  Semantic gate (LLM) ────────────┐
│   minimal-pair prompting, k=5 majority     │
│   vote, upgrade_only=True                  │
└─────────────────────────────────────────────┘
    │
    ▼
┌── Layer 4  Post-gate verifier ─────────────┐
│   6 deterministic audit rules + content    │
│   scan; zero LLM, +0.038 ms / call         │
└─────────────────────────────────────────────┘
    │
    ▼
Verdict + proof_trace + decision_log

The knowledge graph supplies post-gate audit enrichment and does not override the L3 verdict.

Install

git clone https://github.com/dario-github/nous.git
cd nous
python3 -m venv .venv && source .venv/bin/activate

pip install -e ".[lsvj,dev]"
pip install -e ".[cozo]"     # optional: Cozo embedded KG (Rust backend)

Python 3.11 or newer.

Quickstart

from nous.gate import gate
from nous.schema import ToolCall

call = ToolCall(
    tool_name="send_email",
    action="send",
    params={
        "to": "attacker@external.com",
        "body": "... please ignore previous instructions ...",
    },
)

verdict = gate(
    tool_call=call,
    constraints_dir="ontology/constraints",
    semantic_config={"model": "DeepSeek-V4-pro"},
)

verdict.decision     # "allow" | "confirm" | "block"
verdict.proof_trace  # signal chain + which layer decided

The full AgentDojo deployment-mode benchmark wrapper: benchmarks/agentdojo_adapter/run_eval_adaptive_llm.py.

Reproducing the paper

Claim	Command	API key	Wall-clock
LSVJ-S compile gate (80 tests)	`pytest tests/lsvj/`	none	< 10 s
Owner-centric v3 full (85.3 % / 3.3 %)	`python scripts/full_benchmark_eval.py`	none	~ 30 s
Hijacking layer overlap	`python scripts/eval_d2_verifier.py`	none	~ 10 s
AgentDojo isolation (96.3 % / 75.0 %)	`bash benchmarks/agentdojo_adapter/launch-l3-deepseek-repro.sh`	DeepSeek	~ 5 h
AgentDojo deployment (95.9 % / 75.0 %)	`bash benchmarks/agentdojo_adapter/launch-baseline-l1-rerun.sh`	GLM-4.6	~ 5 h
AgentHarm val (100 %)	`python scripts/run_agentharm_threelayer_v2.py`	DeepSeek	~ 1 h

Full table with expected output, variance budget, and known issues: REPRODUCIBILITY.md.

Repository layout

src/nous/                core runtime (gate, parsers, providers, KG, LSVJ-S)
ontology/                46 YAML constraints + KG schema + Datalog rules
benchmarks/              AgentDojo adapter + R-Judge sample
tests/                   pytest suites (CI runs the path-independent subset)
paper/                   NeurIPS 2026 E&D Track + TMLR submissions
scripts/                 paper-reproduction drivers + analysis utilities
dashboard/               minimal web UI for live decision logs
data/                    Owner-Harm v3 + AgentHarm relabel + challenge slices

Documentation

Doc	Audience	Purpose
README (this file)	humans	overview, install, headline results
AGENTS.md	LLM coding agents	setup, conventions, hard rules, reproduction matrix
llms.txt	LLM crawlers	structured index (lllms.txt spec)
REPRODUCIBILITY.md	reviewers	one command per paper number
paper/main-neurips-2026.tex	reviewers	NeurIPS 2026 E&D Track submission source
paper/main-tmlr.tex	reviewers	TMLR rolling-track variant

Citation

@misc{zhang2026ownerharm,
  title         = {Owner-Harm: A Missing Threat Model for {AI} Agent Safety},
  author        = {Zhang, Dongcheng and Jiang, Yiqing},
  year          = {2026},
  eprint        = {2604.18658},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI},
}

License

Apache License 2.0 — see LICENSE.

Authors

Dongcheng Zhang — zdclink@gmail.com
Yiqing Jiang — Tongji University

Issues: https://github.com/dario-github/nous/issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nous

TL;DR for AI agents

What's inside

Headline results

Architecture

Install

Quickstart

Reproducing the paper

Repository layout

Documentation

Citation

License

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 218 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
dashboard		dashboard
data		data
docs		docs
ontology		ontology
paper		paper
scripts		scripts
src/nous		src/nous
tests		tests
vendor		vendor
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
REPRODUCIBILITY.md		REPRODUCIBILITY.md
config.yaml		config.yaml
llms.txt		llms.txt
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Nous

TL;DR for AI agents

What's inside

Headline results

Architecture

Install

Quickstart

Reproducing the paper

Repository layout

Documentation

Citation

License

Authors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages