feat: #201 Config data class — Phases 1-5 (substrate + taxonomy memory list + lazy detail + codex hardening) by hanwencheng · Pull Request #205 · litentry/agentKeys

hanwencheng · 2026-06-05T13:43:06Z

Resolves #201 (the config-driven memory list). Phases 1–5 land here, plus the codex adversarial-review hardening. Phase 0 (the cap layer) landed in #200.

What landed

Phase 1 — infra (operator runs on AWS; idempotent mirrors of the memory scripts)

scripts/provision-config-bucket.sh, scripts/provision-config-role.sh, scripts/apply-config-bucket-policy.sh — own $CONFIG_BUCKET + agentkeys-config-role, config/ prefix, split-statement v3 bucket policy (s3:prefix=bots/${PrincipalTag}/config/*), OIDC agentkeys_actor_omni PrincipalTag (reused).
CONFIG_BUCKET / CONFIG_ROLE_ARN + WORKER_CONFIG_HOST / AGENTKEYS_WORKER_CONFIG_URL in scripts/operator-workstation.env.
Wired into scripts/setup-cloud.sh step 13 (per-data-class provisioning).

Phase 2 — config worker (master-only)

New agentkeys-worker-config crate — mirror of agentkeys-worker-memory: config/ S3 prefix, $CONFIG_BUCKET, AGENTKEYS_CONFIG_KEK_HEX, DataClass::Config, port 9096. /v1/config/{put,get,teardown}.
Full scripts/setup-broker-host.sh wiring: build/install lists, worker-config.env (KEK auto-gen + preserve), systemd unit, nginx vhost, stop/backup lists, firewall + certbot domain list, post-install summary. Registered in the workspace Cargo.toml.

Phase 3 — isolation tests (test-discipline rule, all four layers)

harness/v2-stage3-demo.sh steps 19–21: step 19 config creds write their own config/ prefix (200) but are AccessDenied at the memory + vault buckets (and memory creds → config bucket AccessDenied) — per-data-class layers 3+4; steps 20–21 the cap data-class-mismatch (config cap → memory + cred workers; memory + cred cap → config worker → cap_data_class_mismatch). All master-self → run on the operator (no sandbox defer); skip cleanly until the config infra is provisioned + the broker redeployed. Cleanup shifted 19 → 22; STEP_TOTAL=22.

Source-of-truth updates (landed with the code)

docs/arch.md: §5 canonical names, §17.2 role list, §17.3 (Planned → Landed substrate), §17.5 cap-binding table + four-layer table + Config endpoints, storage diagram.
CLAUDE.md: per-data-class four-layer table (layers 3+4), six cap endpoints, "the third data class — config — has landed" note (generalized for a fourth).
docs/operator-runbook-harness.md + harness/CLAUDE.md (keep-docs-in-sync rule), plan doc phase status.

Verification

agentkeys-worker-config: dev + release build + unit tests green (S3-key prefix isolation tests).
cargo check --workspace clean (all 17 crates, incl. the broker with the feat: #164 sponsored ERC-4337 register + v2-demo harness restructure #200 config routes).
All bash scripts syntax-clean (bash -n); config scripts structurally identical to the memory originals.

Phases 4–5 + codex hardening (landed on this branch)

Phase 4 — daemon (ui_bridge.rs): reads/writes the memory-types taxonomy via the Config data class (--config-url / --config-role-arn); GET /v1/master/memory → categories from the taxonomy (no decrypt, cache fallback); new lazy GET /v1/master/memory/entry?ns=&key=; plant writes per-namespace JSON arrays. CLI hook memory-inject renders the array (single-body still injects). Harness memory-plant-demo.sh + web-parity-demo.sh write/pass the new shape.

Phase 5 — frontend (apps/parent-control): lists categories, decrypts a namespace's entries on demand, plant re-fetches categories.

Codex adversarial-review hardening:

Finding 1 (data loss) — the plant is now a read-modify-write merge under a plant_lock: it reads the durable memory:<ns> blob first, merges (durable entries never dropped), and aborts rather than overwrites on a transient read error — closing the restart-stale-cache / concurrent-plant last-writer-wins window.
Finding 2 (silent failure) — the memory + config workers return 404 (not 502) on NoSuchKey, so the daemon distinguishes "never written" from a real failure; GET /v1/master/memory 502s on a configured-but-broken Config instead of masking it as empty; plant returns an explicit taxonomy_status.
5 new daemon unit tests (merge preserve/dedup/same-key, list-502-on-partial-config, taxonomy_status).

Deploy note: touching the workers (the 404 behavior) requires a bash scripts/setup-broker-host.sh --ref main redeploy.

Remaining operator one-shots (runtime, not code)

Phase 1 on AWS — bash scripts/setup-cloud.sh (prod) / --ci (test): provisions the config bucket + IAM role + bucket policy. Idempotent; one command.
Phase 2 broker redeploy — bash scripts/setup-broker-host.sh --ref main: brings the config worker live + the worker-404 behavior.
Phase 3 green run — stage-3 steps 19–21 skip (config-role-missing / scope-not-set) until 1+2 are deployed in the test env, then run green. CI now tolerates the config skip instead of crashing on the unbound env var.

Refs: #178 (classifier-service), #191 (W3 master-self memory), #200 (Phase 0 cap layer).

🤖 Generated with Claude Code

…r + isolation tests (Phases 1-3) Stand up the DataClass::Config substrate end-to-end (Phases 1-3 of the config-driven memory-list plan; Phase 0 cap layer landed in #200). The visible daemon/frontend behavior (Phases 4-5) is a follow-up, gated on the operator deploying this (per the issue's dependency chain 4 -> 0,1,2). Phase 1 — infra (idempotent mirrors of the memory scripts): - scripts/provision-config-bucket.sh, provision-config-role.sh, apply-config-bucket-policy.sh (config/ prefix, own bucket + role per arch.md §17.2; split-statement v3 bucket policy) - CONFIG_BUCKET / CONFIG_ROLE_ARN + config worker host/URL in operator-workstation.env; wired into setup-cloud.sh step 13 Phase 2 — config worker (master-only): - new agentkeys-worker-config crate (mirror of agentkeys-worker-memory; config/ S3 prefix, $CONFIG_BUCKET, AGENTKEYS_CONFIG_KEK_HEX, DataClass::Config, :9096) - full setup-broker-host.sh wiring (build/install/env/systemd/nginx/firewall/ certbot/post-install summary) Phase 3 — isolation tests (test-discipline rule): - harness/v2-stage3-demo.sh steps 19-21: config layer-3/4 (own-prefix write OK + cross-bucket AccessDenied) + cap data-class-mismatch (config<->memory, config<->cred). All master-self -> run on the operator, no sandbox defer; skip cleanly until the operator provisions config infra + redeploys the broker. Source-of-truth updates: arch.md (§5 canonical names, §17.2/.3/.5, four-layer table, storage diagram), CLAUDE.md (per-data-class table + six cap endpoints + 'third data class landed'), operator-runbook-harness.md, harness/CLAUDE.md, plan doc. Verified: config worker dev+release build + unit tests green; cargo check --workspace clean (all 17 crates); all bash scripts syntax-clean.

…reachable post_cross_class folded curl's stderr into the returned code via 2>&1, so an UNDEPLOYED worker (e.g. config.litentry.org before the broker redeploy) yielded rc="curl: (35) SSL_ERROR_SYSCALL...\n000000" instead of a clean "000". That no longer matched master_cross_class_rejection's 000|502|503|504) case and fell through to die — turning the intended graceful prereq_missing (config-worker-unreachable) at stage-3 step 21 into a hard failure. Send curl's transport error to a side file so rc is just the 3-digit %{http_code} (000 on transport failure), and surface that error as the body for diagnostics. Also hardens steps 14-15 (same helper) — clean rc + diagnostic body. Verified: repro against the unreachable config.litentry.org returns clean 000 -> prereq_missing fires; bash -n clean.

…litentry.org A record) The config worker host was added to operator-workstation.env but NOT to the two DNS provisioning paths nor the worker health-check, so config.litentry.org never got an A record → unreachable (the stage-3 step-21 SSL_ERROR_SYSCALL). Add WORKER_CONFIG_HOST everywhere the four original workers are enumerated: - scripts/setup-cloud.sh do_step_6 — the PRIMARY DNS path (its own change-batch, not a delegate): + config A record + env validation (8 A records / 14 UPSERTs). - scripts/dns-upsert-workers.sh — the standalone re-UPSERT path: + config in the sanity loop, change-batch, plan printout, DoH verify loop, and certbot next-steps. - scripts/verify-workers.sh — + config:/healthz ("ok":true), All 5 workers green. - operator-workstation.env — comment now says five workers incl. config. Verified: bash -n clean on all three; setup-cloud change-batch builds 14 records; dns-upsert change-batch valid JSON.

…workers.sh (single source of truth) Wire the config-worker setup fully into the idempotent orchestrator so nobody runs DNS by hand, and kill the dual-maintenance drift that left config.litentry.org without an A record (two hardcoded worker lists: setup-cloud step 6 + dns-upsert). - dns-upsert-workers.sh: new --no-verify (UPSERT then exit, skipping the INSYNC/DoH wait + operator next-steps printout) for orchestrator use. - setup-cloud.sh step 6: keep DKIM/MX/TXT + broker/signer/mcp inline (9 records); DELEGATE the 5 service-worker A records (audit/email/cred/memory/config) to dns-upsert-workers.sh --eip $EIP --no-verify (honors --dry-run + the same ENV_FILE so the prod/test split carries through). One source of truth → a new worker can never again be added to one list but not the other. - The 3 config provision scripts were already delegated in step 13 (no change). - cloud-bootstrap.md: config.litentry.org added to the certbot recipe (+ explicit one-shot form), the --config-host flag, the DNS A-record list, the worker-subdomain table, the per-worker env-file glob, the build/nginx/test-subdomain references. Verified: bash -n clean on all three; setup-cloud inline batch builds 9 records; dns-upsert --no-verify parses + early-exits; cloud-bootstrap certbot loop includes CONFIG_HOST.

…re-deploys + branch switches) The broker host redeploys often and switches branches via --ref. cargo already caches deps in $REPO_ROOT/target (we never clean on the happy path), but a git checkout -f rewrites changed files' mtimes → cargo re-fingerprints + rebuilds them, and a cold/wiped target/ recompiles the whole aws-sdk/tokio tree. Add sccache — a CONTENT-addressed compiler cache keyed on each crate's actual inputs (not mtime/branch/target state), persisted in $SCCACHE_DIR independent of target/. Identical inputs hit the cache regardless of branch or a cold target/. - setup_build_cache(): installs sccache (prebuilt musl binary, arch-detected → cargo install fallback → skip), exports RUSTC_WRAPPER + SCCACHE_DIR, starts the server. Best-effort + idempotent + NON-FATAL (deploy proceeds with plain cargo if install fails). Opt out: AGENTKEYS_NO_SCCACHE=1; pin: SCCACHE_VERSION=vX.Y.Z. - Prints 'sccache stats' after the worker build — visible proof (re-deploys = mostly cache hits). - cloud-bootstrap.md documents the cache + the opt-out. Verified: bash -n clean. Note: this does NOT change what gets built; my earlier #201 commits were all shell/docs (zero Rust), so a re-run that only pulls them recompiles nothing.

… a VPN'd laptop) + ACME pre-check Operator hit a certbot 'unauthorized … 404' on config.litentry.org because certbot --webroot was run on a local box (behind a VPN): the challenge file landed there, but Let's Encrypt validates against the hostname's PUBLIC IP = the broker, which had no such file. The nginx 1.28.3 (VPN proxy) vs 1.24.0 (broker) version split in the 404 pages was the tell. Fold-back to §5b so the next operator can't repeat it: - Loud '⚠️ run EVERY command ON THE BROKER HOST' callout explaining the --webroot-writes-local vs CA-validates-public-IP mechanism + the WARP/Zscaler interception trap (laptop curl of <host> hits the VPN's nginx, not the broker). - A cheap local ACME pre-check (nginx reload + probe file + curl localhost with Host header) BEFORE the certbot loop — a freshly-added worker (config) needs a reload; 'nginx -T' showing the vhost does NOT mean the running process loaded it. - New troubleshooting entry for the exact 'unauthorized … 404' error covering both causes (wrong host; vhost not reloaded). Docs only; fences balanced.

… (not 'first associated EIP') Root cause of the config (and all-worker) cert failures: dns-upsert-workers.sh derived the EIP via `describe-addresses | first`, which can't distinguish the PROD broker EIP from the TEST broker EIP when both are allocated. It silently grabbed the test EIP (3.214.219.209) and pointed all 5 worker A records at the test broker, while broker/signer stayed on prod (54.164.117.252). Let's Encrypt then validated config.litentry.org against the test box (404). Derive the workers' EIP from BROKER_HOST's OWN Route 53 A record instead — the workers co-locate with the broker, so their records MUST mirror it. This is env-aware (BROKER_HOST is broker.${ZONE} for prod vs test-broker.${ZONE} for test) and authoritative. Add a co-location guard that warns when the chosen/passed EIP disagrees with the broker's A record (catches a prod/test mixup early). cloud-bootstrap.md §5b gains a troubleshooting entry for 'worker cert fails but broker works' with a DoH cross-check loop. Verified live (--dry-run against the real zone): derives 54.164.117.252 and sets all 5 worker records to it; bash -n clean.

…t), matching setup-cloud step 4 Prod and the CI/test broker are SEPARATE machines with SEPARATE EIPs. The previous fix derived from broker.${ZONE}'s A record (works for prod, but chicken-egg on a fresh test box + a different mechanism than the bootstrap). Switch to the SAME tag-based, TEST_MODE-aware derivation setup-cloud.sh step 4 uses — one source of truth: prod → describe-addresses --filters Name=tag:Name,Values=agentkeys-broker-eip test → ...Values=agentkeys-broker-eip-test (--test, or a *test* ENV_FILE) - New --test flag + auto-detect from a *test* ENV_FILE (switches to operator-workstation.test.env), mirroring setup-cloud. - Keep the broker-A-record co-location cross-check as a warn-only guard. Verified live (--dry-run): prod → 54.164.117.252 (tag agentkeys-broker-eip); --test → 3.214.219.209 (tag agentkeys-broker-eip-test). bash -n clean.

…test = separate EIPs) Two broker EC2 instances exist with separate EIPs, distinguished by the EIP Name tag (agentkeys-broker-eip vs agentkeys-broker-eip-test). 'describe-addresses first-match' silently picks the wrong one — it pointed all 5 worker A records at the test broker while broker/signer were on prod (multi-round LE 404s). New AWS- gotchas subsection: never first-match; derive by the env-aware tag (setup-cloud step 4 / dns-upsert-workers.sh), curl ifconfig.me on the host, DoH-cross-check workers == broker for DNS.

…real slow-rebuild cause) setup-broker-host.sh deleted /root/.cargo + /root/.rustup at the END of every run (~1.5GB reclaim). So every re-deploy re-downloaded the WHOLE rustup toolchain + all 372 crate sources — minutes of pure waste (target/ persists in the repo dir, which is why the compile itself was only ~50s, but the toolchain+registry did not). - KEEP the toolchain by default; gate the delete behind a new --reclaim-toolchain flag (pass it on a final/one-shot deploy to free the disk). - Pre-source $HOME/.cargo/env in the build-prereqs step so a kept toolchain is on PATH on a non-login sudo shell — otherwise `have rustup` is false and it reinstalls anyway even with /root/.cargo present. - Header usage + post-run NOTE updated to reflect keep-by-default. Combined with the sccache change (86d18be), re-deploys now skip toolchain DL + crate-registry DL + most recompilation. bash -n clean.

…nv flag Per the deploy-script governance: there are exactly THREE idempotent deployment orchestrators (setup-cloud.sh / setup-broker-host.sh / setup-heima.sh); every other mutation is wired into one of them. Codify it in CLAUDE.md + standardise the environment flag. - Add --ci (canonical CI-env flag; --test retained as alias) to all 3 entry points + dns-upsert-workers.sh. Plain run = local/prod; --ci = CI (selects the agentkeys-broker-eip-test EIP, -test IAM/buckets, *.test.env). - CLAUDE.md: new 'Three idempotent deployment entry points' section (ownership table, flag convention, HARD wire-in rule, exempt list). Verified mcp-host is already wired into setup-broker-host (#152 re-converge); setup-dev-env is a dev-workstation bootstrap (exempt, not a deploy). Verified: bash -n clean; --ci --dry-run derives the test EIP (3.214.219.209).

… ENV_FILE passthrough + cloud-bootstrap --ci Two real test-mode bugs + doc drift, found while fitting the scripts to cloud-bootstrap.md: - operator-workstation.test.env was MISSING the entire config (#201) data class (CONFIG_ROLE_ARN / CONFIG_BUCKET / WORKER_CONFIG_HOST / *_URL) — so setup-cloud.sh --ci / setup-broker-host.sh --ci would die on the WORKER_CONFIG_HOST validation. Added the -test trio (agentkeys-config-role-test, agentkeys-config-test-<acct>, config-test.litentry.org). - setup-cloud.sh step 13 called provision/apply-*.sh WITHOUT ENV_FILE; each re-sources operator-workstation.env (prod) and overwrites inherited CONFIG_BUCKET, so --ci would silently provision PROD buckets. Now passes ENV_FILE through (DRY loop) → -test buckets. - cloud-bootstrap.md: --test → --ci (alias noted) in quick-start; added config bucket to 'what --ci derives'; corrected the stale 'toolchain deleted each run' note to the new keep-by-default + --reclaim-toolchain behavior; called out prod vs CI = separate EIPs. Verified: test env config trio resolves; setup-cloud bash -n clean.

…ata classes This session's two test-mode bugs were systemic, not one-offs — fold them into the #90 isolation section's data-class checklist so the next data-class-adder can't repeat: 1. a new data class MUST be added to BOTH operator-workstation.env AND .test.env (.test.env is not auto-derived; a prod-only key breaks the whole --ci path). 2. setup-cloud.sh delegation MUST pass ENV_FILE to provision/apply helpers (they re-source prod env + overwrite inherited $BUCKET, so --ci would hit prod buckets). Includes the verify step (setup-cloud.sh --ci --dry-run must name -test resources).

…tring, not %s args) The Totals summary printed literal \033[1;32m… because the C_* color vars (literal "\033[…" strings) were passed as printf %s ARGS — printf only interprets \033 in the FORMAT string, not in args. Moved the colors into the format string, matching the ${C_*}-in-format pattern used everywhere else. TTY-gated defs unchanged, so non-TTY/CI runs stay plain. Verified via cat -v (^[ = real ESC); bash -n clean.

…lazy detail + codex hardening Phase 4 (daemon): read/write the memory-types taxonomy via the Config data class (--config-url/--config-role-arn); GET /v1/master/memory returns categories from the taxonomy (no decrypt, cache fallback); new lazy GET /v1/master/memory/entry?ns=&key=; plant writes per-namespace JSON arrays. CLI hook memory-inject renders the array (single-body still injects). harness memory-plant-demo + web-parity write/pass the new shape. Phase 5 (frontend): apps/parent-control lists categories, decrypts a namespace's entries on demand; plant re-fetches categories. Codex adversarial-review hardening: - finding 1 (data loss): plant is now a read-modify-write merge under a plant_lock (durable blob preserved; abort-on-read-error, never overwrite). - finding 2 (silent failure): memory+config workers return 404 on NoSuchKey; list 502s on a configured-but-broken Config; plant returns taxonomy_status. Workers changed → requires a setup-broker-host.sh redeploy for the 404 behavior.

…config-role-missing harness-e2e crashed at stage-3 step 19 with `CONFIG_ROLE_ARN: unbound variable`: the CI env-materializer (harness-ci.yml) never emitted the config data-class keys Phase 3 added to the stage-3 demo, and the demo runs under `set -u`. - harness-ci.yml: materialize CONFIG_BUCKET / CONFIG_ROLE_ARN / AGENTKEYS_WORKER_CONFIG_URL (derived -test values, no new secret); allow the config-role-missing skip (operator one-shot, like scope-not-set) so step 19 skips cleanly until the test config bucket/role are provisioned. Steps 20-21 (config cap-mismatch) still run against the deployed config worker. - v2-stage3-demo.sh: default the config vars to empty after sourcing the env file → degrade via prereq_missing instead of an unbound-variable abort. - CLAUDE.md: fold the materializer into the env-file discipline (3rd place a new data class's keys must land).

…e wiki Add the product/onboarding view of the classifier design (#178) on top of the landed Config substrate (#201): the two config-init entry points (default preset + NL->COMPILE), connect-time classifier auto-distribution of cred + memory scopes (one pattern, two axes), the four security invariants, and the resolved decisions tracked in #207 (telemetry split to #208). - docs/plan/web-flow/onboarding-classifier-distribution.md (new spec) - docs/wiki/policy-scope-namespace.md (new terminology reference, lint-clean) - docs/arch.md section 5 canonical-names row (policy/scope/namespace/category/service) - docs/plan/classifier-service.md cross-links

stage-3 step 21 hits the config worker HTTPS endpoint (config-test.<zone>), whose cert can't issue until the config-test DNS record is provisioned by the operator one-shot (setup-cloud.sh --ci) — the SAME one-shot already tolerated via config-role-missing. Add config-worker-unreachable to the stage-3 allow-skip so CI skips step 21 cleanly until the test config infra exists; step 20 + the agentkeys-worker-config unit tests still cover the config cap-data-class-mismatch. harness/CLAUDE.md already documents steps 19-21 as 'skip until config infra is provisioned/deployed'. Drop the allowance once config-test is provisioned.

… combine, not just resolve #205 (issue #201) landed a THIRD data class (Config): /v1/cap/config-{store,fetch} + an agentkeys-worker-config worker + a hand-rolled daemon config/per-ns-memory chain. #204 (#203) made agentkeys-backend-client the ONE owner of the broker/worker protocol. Rather than let the two coexist as parallel hand-rolled vs crate-owned chains, this merge folds #205's new surface INTO the #203 single-owner model. Conflicts resolved (2 files): - ui_bridge.rs: adopt #205's per-namespace storage model wholesale (memory_put_ns_real / memory_get_ns_real / RMW-under-plant-lock / real_config_ctx) — my per-entry memory_put_real + real_memory_client are SUPERSEDED, dropped. Kept my route consts (MASTER_MEMORY_{,PLANT_}ROUTE) + the plant-contract unit test, and #205's new /v1/master/memory/entry route. Swapped #205's inline 0x-normalize in the shared resolve_session_coords for the crate's normalize_omni_0x. - memory-plant-demo.sh: keep #205's per-ns JSON-array blob + my @backend-fixture annotation. Combine (#203 applied to #205's surface): - crate: CapMintOp gains ConfigStore/ConfigFetch (6 cap endpoints now); add ConfigPutBody/ConfigGetBody + fixtures (regenerated, now 6). - daemon mint_master_cap → BackendClient::cap_mint (the cap-mint body — the #200 drift locus — is now the crate's BrokerCapRequest for memory AND config; one function covers all 4 routes). Worker put/get bodies (memory + config) build from the crate's MemoryPutBody/MemoryGetBody/ConfigPutBody/ConfigGetBody types; the raw POST stays in the daemon to reuse the once-minted STS creds across namespaces. Re-added agentkeys-provisioner to the daemon (still used for that STS mint). - gate: config_put/config_get fixtures are pass-1-annotatable but EXCLUDED from pass-2 auto-detect (key-set-identical to cred bodies → would false-positive); documented in the gate + the fixtures README. #205's bash bodies (4-key ttl-omitted cap + ambiguous cred/config worker bodies) don't trip pass-2. - docs: arch.md tree gains agentkeys-worker-config + updated backend-client note; root CLAUDE.md #203 rule updated for the 6 endpoints + config body types. Verified: cargo build + clippy -D warnings + cargo test --workspace all clean (0 failures; plant-contract + config frozen tests pass); backend + web-api drift gates + fixture --check pass under LC_ALL=C.UTF-8; bash -n clean on all touched scripts.

…mo structure #204 (#203 backend-client refactor) merged after #205 and restructured harness-e2e from three v2-stage{1,2,3}-demo.sh steps into one v2-demo.sh --ci orchestrator — which already carried forward the config-worker-unreachable allow-skip. Re-apply the codex-review guard (PR #210) onto that new structure: a self-dissolving Guard step after the v2-demo run that warns while config-test is unprovisioned (#209) and FAILS once it becomes reachable, so the step-21 isolation skip can't silently persist. Resolves the #210<->main conflict; the --allow-skip invocation is unchanged from main.

…mo structure (#210) #204 (#203 backend-client refactor) merged after #205 and restructured harness-e2e from three v2-stage{1,2,3}-demo.sh steps into one v2-demo.sh --ci orchestrator — which already carried forward the config-worker-unreachable allow-skip. Re-apply the codex-review guard (PR #210) onto that new structure: a self-dissolving Guard step after the v2-demo run that warns while config-test is unprovisioned (#209) and FAILS once it becomes reachable, so the step-21 isolation skip can't silently persist. Resolves the #210<->main conflict; the --allow-skip invocation is unchanged from main.

…6/7/8) (#212) * feat: #207 onboarding + classifier auto-distribution (items 1A/2/3/5/6/7/8) Productionizes #207 onboarding + classifier-driven auto-distribution on the #205 Config substrate. All items except 1B (NL→COMPILE UI). - 1A: ~10 bundled presets + GET/POST /v1/master/config/{presets,init} author config/memory-taxonomy.enc (master-self read-modify-write MERGE — a later plant never clobbers it); onboarding gains a 'set up categories' step. - 2/3/6: agentkeys-worker-classify (COMPUTE gate, COMPILE+TAG, no S3), CapOp::Classify + /v1/cap/classify (data-class-bound), agentkeys-catalog (entity→category + per-category sensitivity floor + signed vendor overlays bounded by the floor). Deploy-wired: setup-broker-host.sh, dns-upsert-workers.sh, env files (prod/test/CI). - 5/7/8: daemon classify bridge (--classify-url → cap-gated worker TAG, local catalog tier-0 fallback), /v1/master/classify/{tag,propose}, /v1/actors/:id/scope/grant; AutoDistributePanel (propose→confirm, sensitivity-tiered: safe auto / sensitive K11). - stage-3 demo step 22: classifier-worker isolation negatives (op + data-class mismatch), skip-until-deployed. Docs synced (arch.md, CLAUDE.md, harness/CLAUDE.md, runbook). Determinism guardrail held — catalog hashmap lookups, no LLM on the gate hot path. Connect-time auto-distribution at agent pairing tracked in #211. Tests: daemon 99, catalog 7, worker-classify 6, broker 36; frontend tsc + next build green. * style: cargo fmt the #207 crates (catalog, worker-classify, ui_bridge) Pure rustfmt — whitespace only, no logic change. Satisfies the CI cargo-fmt gate. * fix: #207 CI step-22 skip + init graceful-degrade + general (non-memory-only) onboarding narrative Three fixes: 1. CI (harness step 22): curl '... || echo 000' DOUBLED %{http_code} to '000000' on the undeployed classify worker (TLS/connect failure), missing the 'case 000|502|503|504)' skip → spurious die → phase-3 FAIL. Mirror the config helper: send curl stderr to a side file, use '|| true' (curl already prints %{http_code}=000). Now skips cleanly via classify-worker-unavailable. 2. init 502 (user-reported): a degraded config worker (S3 GetObject 502) made reconcile_taxonomy's read-before-write abort init with a cryptic 502. Now init DEGRADES — authors into the in-memory mirror with a loud 'cached-degraded: <reason>' status (never silently 'ok') instead of hard-failing; the master-memory list shows the in-memory authored taxonomy on a durable error when local authored data exists (else still 502 — #201 finding 2 preserved for the empty case). 3. Narrative: the init page said 'memory categories', but the taxonomy is general (memory + credentials + future data classes). Reworded the onboarding setup step, the memory-page setup, and the user manual to 'category taxonomy — what an agent can access (memory, credentials, …)'; surfaced the degraded status in the UI. daemon 100 tests · frontend tsc+build · fmt+clippy clean · harness bash -n ok. * fix: #207 revert the init degrade — real durable data or fail loud (no in-memory compromise) The previous 'cached-degraded' fallback was wrong: it masked a broken Config store behind an in-memory stand-in, violating the all-real-data principle (#201 finding-2). Reverted to real-data-only: - init (configured path): a config worker failure (unreachable / S3 error) is now a HARD 502 with an actionable message — NO in-memory fallback. We author real durable data or fail loud so the operator fixes the Config data class. The ONLY in-memory path left is an explicit dev daemon started WITHOUT --config-url ('cached', labelled 'dev only — not durable'). - list (resolve_categories): a configured-but-broken Config 502s again (finding-2), never masked behind in-memory data. - config worker: s3_get/s3_put now surface the REAL S3 error (AccessDenied / NoSuchBucket / region) via ProvideErrorMetadata instead of a generic 'service error', so the broken store is diagnosable. - frontend + user manual: dropped the 'degraded/saved-locally' wording; a config-worker failure shows the real error, and the dev-only in-memory path is clearly labelled. daemon 100 + worker-config tests · frontend tsc+build · fmt+clippy clean. * feat: #207 onboarding init flow — progress ceremony, jump-to-app + sticky toast, idempotent re-onboard Three onboarding flow fixes (init is multi-second: cap-mint → STS → config worker → S3): 1. PROGRESS BAR — the setup step now runs a CeremonyRunner (Read profile → Compile taxonomy → Encrypt+store to Config → Index+audit), the real init fires as the slow step's awaited action, so the bar reflects the true duration (no more frozen 'authoring…' button). 2. JUMP TO APP — on success it goes straight to the main page (no dead 'Enter agentKeys' button). A STICKY toast (no auto-dismiss, with an × dismiss) carries the next step: 'N categories authored · Next: connect an agent (Pairing tab)'. 3. IDEMPOTENT RE-ONBOARD — on entering setup, probe listMemoryCategories: if a taxonomy already exists, skip straight in (never re-author / re-prompt). The daemon init is already data-idempotent (reconcile_taxonomy MERGES, never clobbers) and writes ONLY config/memory-taxonomy.enc — never the memory:<ns> plant blobs — so planted data is never deleted. New test init_preserves_a_pre_existing_planted_namespace proves it. daemon 101 tests · frontend tsc+build · fmt+clippy clean. * fix(harness): #207 no leaked demo memory + shrink phase-6 parity to a thin wiring smoke 1. (memory hygiene) Onboarding never plants memory (init authors the TAXONOMY only; dev_seed has no caller; --ui-bridge-seed-* seeds the session, not memory). The 'already planted' was the harness leaking durable S3 memory: - memory-plant-demo.sh: plant into DEDICATED demo-* namespaces (never the real travel/personal/family) + an EXIT-trap cleanup that deletes exactly those blobs on success OR failure (KEEP_DEMO_MEMORY=1 to keep). The real prepared archive is user-only (the web button) — never auto-planted. - web-parity-demo.sh: EXIT-trap now also deletes the dedicated webparity probe (was only cleaned on the success path via step 4). CI already materializes MEMORY_BUCKET + has a belt-and-braces prefix wipe. 2. (shrink #3) web-parity phase 6: 4 steps → 3. Dropped the redundant parity-artifact step (canonical-key HEAD + manual delete) — the body shape is compile/fixture-gated (check-web-api-drift.sh), the S3 key is deterministic + worker-unit-tested, and cleanup is the EXIT trap. The runtime check is now just the plant→200 wiring smoke (harness/CLAUDE.md 'parity checks evolve down a ladder'). Docs synced: harness/CLAUDE.md inventory + operator-runbook-harness.md. bash -n clean. * feat: #207 credentials as a first-class data class in the app (same abstraction as memory) Credentials now mirror memory end-to-end: list-then-categorize over the master's own real vault, plus a vault-a-credential write. - cred worker: new POST /v1/cred/list (master-only — operator==actor, so a single-service cap can't enumerate the vault) lists the actor's stored service ids from S3. + service_from_key parsing test. - daemon: GET /v1/master/credentials (cred worker list → categorize each via the catalog, the parallel to GET /v1/master/memory's category list) + POST /v1/master/credentials/store (mint master-self cred-store cap → STS → cred worker, the parallel to the memory plant). real_cred_ctx reads AGENTKEYS_WORKER_CRED_URL + VAULT_ROLE_ARN. Unconfigured → empty (honest dev); configured-but-broken → 502 (real data or fail loud, no in-memory stand-in). Wired cred-store/cred-fetch into mint_master_cap's CapMintOp match (#204 owner). - frontend: a Credentials page (credentials.tsx) grouped by category with sensitivity chips + a 'vault a credential' form, a nav item, and client methods. Tests: daemon 102 + worker service_from_key + list-unconfigured-empty; frontend tsc + next build; clippy -D warnings + fmt clean. user-manual documents it.

hanwencheng added 16 commits June 5, 2026 21:42

hanwencheng changed the title ~~feat(worker-config): #201 Config data-class substrate — infra + worker + isolation tests (Phases 1-3)~~ feat: #201 Config data class — Phases 1-5 (substrate + taxonomy memory list + lazy detail + codex hardening) Jun 6, 2026

hanwencheng added 2 commits June 6, 2026 11:55

hanwencheng merged commit 765af13 into main Jun 6, 2026
9 checks passed

hanwencheng mentioned this pull request Jun 6, 2026

Provision config-test (DNS + cert) so stage-3 step 21 is a live gate again (drop config-worker-unreachable allow-skip) #209

Open

3 tasks

hanwencheng mentioned this pull request Jun 6, 2026

ci(harness): guard the stage-3 step-21 config-worker skip against silent permanence (#209) #210

Merged

hanwencheng mentioned this pull request Jun 6, 2026

feat: #207 onboarding + classifier auto-distribution (items 1A/2/3/5/6/7/8) #212

Merged

This was referenced Jun 6, 2026

feat(audit): real CBOR + EVM-calldata decode for the web audit view (#153) #194

Merged

Config data class + lazy, config-driven memory list (Phases 1–5) #201

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: #201 Config data class — Phases 1-5 (substrate + taxonomy memory list + lazy detail + codex hardening)#205

feat: #201 Config data class — Phases 1-5 (substrate + taxonomy memory list + lazy detail + codex hardening)#205
hanwencheng merged 18 commits into
mainfrom
claude/cranky-dijkstra-e21e21

hanwencheng commented Jun 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hanwencheng commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What landed

Phases 4–5 + codex hardening (landed on this branch)

Remaining operator one-shots (runtime, not code)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hanwencheng commented Jun 5, 2026 •

edited

Loading