feat: #201 Config data class — Phases 1-5 (substrate + taxonomy memory list + lazy detail + codex hardening)#205
Merged
Merged
Conversation
…r + isolation tests (Phases 1-3) Stand up the DataClass::Config substrate end-to-end (Phases 1-3 of the config-driven memory-list plan; Phase 0 cap layer landed in #200). The visible daemon/frontend behavior (Phases 4-5) is a follow-up, gated on the operator deploying this (per the issue's dependency chain 4 -> 0,1,2). Phase 1 — infra (idempotent mirrors of the memory scripts): - scripts/provision-config-bucket.sh, provision-config-role.sh, apply-config-bucket-policy.sh (config/ prefix, own bucket + role per arch.md §17.2; split-statement v3 bucket policy) - CONFIG_BUCKET / CONFIG_ROLE_ARN + config worker host/URL in operator-workstation.env; wired into setup-cloud.sh step 13 Phase 2 — config worker (master-only): - new agentkeys-worker-config crate (mirror of agentkeys-worker-memory; config/ S3 prefix, $CONFIG_BUCKET, AGENTKEYS_CONFIG_KEK_HEX, DataClass::Config, :9096) - full setup-broker-host.sh wiring (build/install/env/systemd/nginx/firewall/ certbot/post-install summary) Phase 3 — isolation tests (test-discipline rule): - harness/v2-stage3-demo.sh steps 19-21: config layer-3/4 (own-prefix write OK + cross-bucket AccessDenied) + cap data-class-mismatch (config<->memory, config<->cred). All master-self -> run on the operator, no sandbox defer; skip cleanly until the operator provisions config infra + redeploys the broker. Source-of-truth updates: arch.md (§5 canonical names, §17.2/.3/.5, four-layer table, storage diagram), CLAUDE.md (per-data-class table + six cap endpoints + 'third data class landed'), operator-runbook-harness.md, harness/CLAUDE.md, plan doc. Verified: config worker dev+release build + unit tests green; cargo check --workspace clean (all 17 crates); all bash scripts syntax-clean.
…reachable
post_cross_class folded curl's stderr into the returned code via 2>&1, so an
UNDEPLOYED worker (e.g. config.litentry.org before the broker redeploy) yielded
rc="curl: (35) SSL_ERROR_SYSCALL...\n000000" instead of a clean "000". That no
longer matched master_cross_class_rejection's 000|502|503|504) case and fell
through to die — turning the intended graceful prereq_missing
(config-worker-unreachable) at stage-3 step 21 into a hard failure.
Send curl's transport error to a side file so rc is just the 3-digit %{http_code}
(000 on transport failure), and surface that error as the body for diagnostics.
Also hardens steps 14-15 (same helper) — clean rc + diagnostic body.
Verified: repro against the unreachable config.litentry.org returns clean 000 ->
prereq_missing fires; bash -n clean.
…litentry.org A record)
The config worker host was added to operator-workstation.env but NOT to the two
DNS provisioning paths nor the worker health-check, so config.litentry.org never
got an A record → unreachable (the stage-3 step-21 SSL_ERROR_SYSCALL).
Add WORKER_CONFIG_HOST everywhere the four original workers are enumerated:
- scripts/setup-cloud.sh do_step_6 — the PRIMARY DNS path (its own change-batch,
not a delegate): + config A record + env validation (8 A records / 14 UPSERTs).
- scripts/dns-upsert-workers.sh — the standalone re-UPSERT path: + config in the
sanity loop, change-batch, plan printout, DoH verify loop, and certbot next-steps.
- scripts/verify-workers.sh — + config:/healthz ("ok":true), All 5 workers green.
- operator-workstation.env — comment now says five workers incl. config.
Verified: bash -n clean on all three; setup-cloud change-batch builds 14 records;
dns-upsert change-batch valid JSON.
…workers.sh (single source of truth) Wire the config-worker setup fully into the idempotent orchestrator so nobody runs DNS by hand, and kill the dual-maintenance drift that left config.litentry.org without an A record (two hardcoded worker lists: setup-cloud step 6 + dns-upsert). - dns-upsert-workers.sh: new --no-verify (UPSERT then exit, skipping the INSYNC/DoH wait + operator next-steps printout) for orchestrator use. - setup-cloud.sh step 6: keep DKIM/MX/TXT + broker/signer/mcp inline (9 records); DELEGATE the 5 service-worker A records (audit/email/cred/memory/config) to dns-upsert-workers.sh --eip $EIP --no-verify (honors --dry-run + the same ENV_FILE so the prod/test split carries through). One source of truth → a new worker can never again be added to one list but not the other. - The 3 config provision scripts were already delegated in step 13 (no change). - cloud-bootstrap.md: config.litentry.org added to the certbot recipe (+ explicit one-shot form), the --config-host flag, the DNS A-record list, the worker-subdomain table, the per-worker env-file glob, the build/nginx/test-subdomain references. Verified: bash -n clean on all three; setup-cloud inline batch builds 9 records; dns-upsert --no-verify parses + early-exits; cloud-bootstrap certbot loop includes CONFIG_HOST.
…re-deploys + branch switches) The broker host redeploys often and switches branches via --ref. cargo already caches deps in $REPO_ROOT/target (we never clean on the happy path), but a git checkout -f rewrites changed files' mtimes → cargo re-fingerprints + rebuilds them, and a cold/wiped target/ recompiles the whole aws-sdk/tokio tree. Add sccache — a CONTENT-addressed compiler cache keyed on each crate's actual inputs (not mtime/branch/target state), persisted in $SCCACHE_DIR independent of target/. Identical inputs hit the cache regardless of branch or a cold target/. - setup_build_cache(): installs sccache (prebuilt musl binary, arch-detected → cargo install fallback → skip), exports RUSTC_WRAPPER + SCCACHE_DIR, starts the server. Best-effort + idempotent + NON-FATAL (deploy proceeds with plain cargo if install fails). Opt out: AGENTKEYS_NO_SCCACHE=1; pin: SCCACHE_VERSION=vX.Y.Z. - Prints 'sccache stats' after the worker build — visible proof (re-deploys = mostly cache hits). - cloud-bootstrap.md documents the cache + the opt-out. Verified: bash -n clean. Note: this does NOT change what gets built; my earlier #201 commits were all shell/docs (zero Rust), so a re-run that only pulls them recompiles nothing.
… a VPN'd laptop) + ACME pre-check Operator hit a certbot 'unauthorized … 404' on config.litentry.org because certbot --webroot was run on a local box (behind a VPN): the challenge file landed there, but Let's Encrypt validates against the hostname's PUBLIC IP = the broker, which had no such file. The nginx 1.28.3 (VPN proxy) vs 1.24.0 (broker) version split in the 404 pages was the tell. Fold-back to §5b so the next operator can't repeat it: - Loud '⚠️ run EVERY command ON THE BROKER HOST' callout explaining the --webroot-writes-local vs CA-validates-public-IP mechanism + the WARP/Zscaler interception trap (laptop curl of <host> hits the VPN's nginx, not the broker). - A cheap local ACME pre-check (nginx reload + probe file + curl localhost with Host header) BEFORE the certbot loop — a freshly-added worker (config) needs a reload; 'nginx -T' showing the vhost does NOT mean the running process loaded it. - New troubleshooting entry for the exact 'unauthorized … 404' error covering both causes (wrong host; vhost not reloaded). Docs only; fences balanced.
… (not 'first associated EIP')
Root cause of the config (and all-worker) cert failures: dns-upsert-workers.sh
derived the EIP via `describe-addresses | first`, which can't distinguish the
PROD broker EIP from the TEST broker EIP when both are allocated. It silently
grabbed the test EIP (3.214.219.209) and pointed all 5 worker A records at the
test broker, while broker/signer stayed on prod (54.164.117.252). Let's Encrypt
then validated config.litentry.org against the test box (404).
Derive the workers' EIP from BROKER_HOST's OWN Route 53 A record instead — the
workers co-locate with the broker, so their records MUST mirror it. This is
env-aware (BROKER_HOST is broker.${ZONE} for prod vs test-broker.${ZONE} for test)
and authoritative. Add a co-location guard that warns when the chosen/passed EIP
disagrees with the broker's A record (catches a prod/test mixup early).
cloud-bootstrap.md §5b gains a troubleshooting entry for 'worker cert fails but
broker works' with a DoH cross-check loop.
Verified live (--dry-run against the real zone): derives 54.164.117.252 and sets
all 5 worker records to it; bash -n clean.
…t), matching setup-cloud step 4
Prod and the CI/test broker are SEPARATE machines with SEPARATE EIPs. The previous
fix derived from broker.${ZONE}'s A record (works for prod, but chicken-egg on a
fresh test box + a different mechanism than the bootstrap). Switch to the SAME
tag-based, TEST_MODE-aware derivation setup-cloud.sh step 4 uses — one source of
truth:
prod → describe-addresses --filters Name=tag:Name,Values=agentkeys-broker-eip
test → ...Values=agentkeys-broker-eip-test (--test, or a *test* ENV_FILE)
- New --test flag + auto-detect from a *test* ENV_FILE (switches to
operator-workstation.test.env), mirroring setup-cloud.
- Keep the broker-A-record co-location cross-check as a warn-only guard.
Verified live (--dry-run): prod → 54.164.117.252 (tag agentkeys-broker-eip);
--test → 3.214.219.209 (tag agentkeys-broker-eip-test). bash -n clean.
…test = separate EIPs) Two broker EC2 instances exist with separate EIPs, distinguished by the EIP Name tag (agentkeys-broker-eip vs agentkeys-broker-eip-test). 'describe-addresses first-match' silently picks the wrong one — it pointed all 5 worker A records at the test broker while broker/signer were on prod (multi-round LE 404s). New AWS- gotchas subsection: never first-match; derive by the env-aware tag (setup-cloud step 4 / dns-upsert-workers.sh), curl ifconfig.me on the host, DoH-cross-check workers == broker for DNS.
…real slow-rebuild cause) setup-broker-host.sh deleted /root/.cargo + /root/.rustup at the END of every run (~1.5GB reclaim). So every re-deploy re-downloaded the WHOLE rustup toolchain + all 372 crate sources — minutes of pure waste (target/ persists in the repo dir, which is why the compile itself was only ~50s, but the toolchain+registry did not). - KEEP the toolchain by default; gate the delete behind a new --reclaim-toolchain flag (pass it on a final/one-shot deploy to free the disk). - Pre-source $HOME/.cargo/env in the build-prereqs step so a kept toolchain is on PATH on a non-login sudo shell — otherwise `have rustup` is false and it reinstalls anyway even with /root/.cargo present. - Header usage + post-run NOTE updated to reflect keep-by-default. Combined with the sccache change (86d18be), re-deploys now skip toolchain DL + crate-registry DL + most recompilation. bash -n clean.
…nv flag Per the deploy-script governance: there are exactly THREE idempotent deployment orchestrators (setup-cloud.sh / setup-broker-host.sh / setup-heima.sh); every other mutation is wired into one of them. Codify it in CLAUDE.md + standardise the environment flag. - Add --ci (canonical CI-env flag; --test retained as alias) to all 3 entry points + dns-upsert-workers.sh. Plain run = local/prod; --ci = CI (selects the agentkeys-broker-eip-test EIP, -test IAM/buckets, *.test.env). - CLAUDE.md: new 'Three idempotent deployment entry points' section (ownership table, flag convention, HARD wire-in rule, exempt list). Verified mcp-host is already wired into setup-broker-host (#152 re-converge); setup-dev-env is a dev-workstation bootstrap (exempt, not a deploy). Verified: bash -n clean; --ci --dry-run derives the test EIP (3.214.219.209).
… ENV_FILE passthrough + cloud-bootstrap --ci Two real test-mode bugs + doc drift, found while fitting the scripts to cloud-bootstrap.md: - operator-workstation.test.env was MISSING the entire config (#201) data class (CONFIG_ROLE_ARN / CONFIG_BUCKET / WORKER_CONFIG_HOST / *_URL) — so setup-cloud.sh --ci / setup-broker-host.sh --ci would die on the WORKER_CONFIG_HOST validation. Added the -test trio (agentkeys-config-role-test, agentkeys-config-test-<acct>, config-test.litentry.org). - setup-cloud.sh step 13 called provision/apply-*.sh WITHOUT ENV_FILE; each re-sources operator-workstation.env (prod) and overwrites inherited CONFIG_BUCKET, so --ci would silently provision PROD buckets. Now passes ENV_FILE through (DRY loop) → -test buckets. - cloud-bootstrap.md: --test → --ci (alias noted) in quick-start; added config bucket to 'what --ci derives'; corrected the stale 'toolchain deleted each run' note to the new keep-by-default + --reclaim-toolchain behavior; called out prod vs CI = separate EIPs. Verified: test env config trio resolves; setup-cloud bash -n clean.
…ata classes This session's two test-mode bugs were systemic, not one-offs — fold them into the #90 isolation section's data-class checklist so the next data-class-adder can't repeat: 1. a new data class MUST be added to BOTH operator-workstation.env AND .test.env (.test.env is not auto-derived; a prod-only key breaks the whole --ci path). 2. setup-cloud.sh delegation MUST pass ENV_FILE to provision/apply helpers (they re-source prod env + overwrite inherited $BUCKET, so --ci would hit prod buckets). Includes the verify step (setup-cloud.sh --ci --dry-run must name -test resources).
…tring, not %s args)
The Totals summary printed literal \033[1;32m… because the C_* color vars (literal
"\033[…" strings) were passed as printf %s ARGS — printf only interprets \033 in
the FORMAT string, not in args. Moved the colors into the format string, matching
the ${C_*}-in-format pattern used everywhere else. TTY-gated defs unchanged, so
non-TTY/CI runs stay plain. Verified via cat -v (^[ = real ESC); bash -n clean.
…lazy detail + codex hardening Phase 4 (daemon): read/write the memory-types taxonomy via the Config data class (--config-url/--config-role-arn); GET /v1/master/memory returns categories from the taxonomy (no decrypt, cache fallback); new lazy GET /v1/master/memory/entry?ns=&key=; plant writes per-namespace JSON arrays. CLI hook memory-inject renders the array (single-body still injects). harness memory-plant-demo + web-parity write/pass the new shape. Phase 5 (frontend): apps/parent-control lists categories, decrypts a namespace's entries on demand; plant re-fetches categories. Codex adversarial-review hardening: - finding 1 (data loss): plant is now a read-modify-write merge under a plant_lock (durable blob preserved; abort-on-read-error, never overwrite). - finding 2 (silent failure): memory+config workers return 404 on NoSuchKey; list 502s on a configured-but-broken Config; plant returns taxonomy_status. Workers changed → requires a setup-broker-host.sh redeploy for the 404 behavior.
…config-role-missing harness-e2e crashed at stage-3 step 19 with `CONFIG_ROLE_ARN: unbound variable`: the CI env-materializer (harness-ci.yml) never emitted the config data-class keys Phase 3 added to the stage-3 demo, and the demo runs under `set -u`. - harness-ci.yml: materialize CONFIG_BUCKET / CONFIG_ROLE_ARN / AGENTKEYS_WORKER_CONFIG_URL (derived -test values, no new secret); allow the config-role-missing skip (operator one-shot, like scope-not-set) so step 19 skips cleanly until the test config bucket/role are provisioned. Steps 20-21 (config cap-mismatch) still run against the deployed config worker. - v2-stage3-demo.sh: default the config vars to empty after sourcing the env file → degrade via prereq_missing instead of an unbound-variable abort. - CLAUDE.md: fold the materializer into the env-file discipline (3rd place a new data class's keys must land).
…e wiki Add the product/onboarding view of the classifier design (#178) on top of the landed Config substrate (#201): the two config-init entry points (default preset + NL->COMPILE), connect-time classifier auto-distribution of cred + memory scopes (one pattern, two axes), the four security invariants, and the resolved decisions tracked in #207 (telemetry split to #208). - docs/plan/web-flow/onboarding-classifier-distribution.md (new spec) - docs/wiki/policy-scope-namespace.md (new terminology reference, lint-clean) - docs/arch.md section 5 canonical-names row (policy/scope/namespace/category/service) - docs/plan/classifier-service.md cross-links
stage-3 step 21 hits the config worker HTTPS endpoint (config-test.<zone>), whose cert can't issue until the config-test DNS record is provisioned by the operator one-shot (setup-cloud.sh --ci) — the SAME one-shot already tolerated via config-role-missing. Add config-worker-unreachable to the stage-3 allow-skip so CI skips step 21 cleanly until the test config infra exists; step 20 + the agentkeys-worker-config unit tests still cover the config cap-data-class-mismatch. harness/CLAUDE.md already documents steps 19-21 as 'skip until config infra is provisioned/deployed'. Drop the allowance once config-test is provisioned.
3 tasks
hanwencheng
added a commit
that referenced
this pull request
Jun 6, 2026
… combine, not just resolve #205 (issue #201) landed a THIRD data class (Config): /v1/cap/config-{store,fetch} + an agentkeys-worker-config worker + a hand-rolled daemon config/per-ns-memory chain. #204 (#203) made agentkeys-backend-client the ONE owner of the broker/worker protocol. Rather than let the two coexist as parallel hand-rolled vs crate-owned chains, this merge folds #205's new surface INTO the #203 single-owner model. Conflicts resolved (2 files): - ui_bridge.rs: adopt #205's per-namespace storage model wholesale (memory_put_ns_real / memory_get_ns_real / RMW-under-plant-lock / real_config_ctx) — my per-entry memory_put_real + real_memory_client are SUPERSEDED, dropped. Kept my route consts (MASTER_MEMORY_{,PLANT_}ROUTE) + the plant-contract unit test, and #205's new /v1/master/memory/entry route. Swapped #205's inline 0x-normalize in the shared resolve_session_coords for the crate's normalize_omni_0x. - memory-plant-demo.sh: keep #205's per-ns JSON-array blob + my @backend-fixture annotation. Combine (#203 applied to #205's surface): - crate: CapMintOp gains ConfigStore/ConfigFetch (6 cap endpoints now); add ConfigPutBody/ConfigGetBody + fixtures (regenerated, now 6). - daemon mint_master_cap → BackendClient::cap_mint (the cap-mint body — the #200 drift locus — is now the crate's BrokerCapRequest for memory AND config; one function covers all 4 routes). Worker put/get bodies (memory + config) build from the crate's MemoryPutBody/MemoryGetBody/ConfigPutBody/ConfigGetBody types; the raw POST stays in the daemon to reuse the once-minted STS creds across namespaces. Re-added agentkeys-provisioner to the daemon (still used for that STS mint). - gate: config_put/config_get fixtures are pass-1-annotatable but EXCLUDED from pass-2 auto-detect (key-set-identical to cred bodies → would false-positive); documented in the gate + the fixtures README. #205's bash bodies (4-key ttl-omitted cap + ambiguous cred/config worker bodies) don't trip pass-2. - docs: arch.md tree gains agentkeys-worker-config + updated backend-client note; root CLAUDE.md #203 rule updated for the 6 endpoints + config body types. Verified: cargo build + clippy -D warnings + cargo test --workspace all clean (0 failures; plant-contract + config frozen tests pass); backend + web-api drift gates + fixture --check pass under LC_ALL=C.UTF-8; bash -n clean on all touched scripts.
hanwencheng
added a commit
that referenced
this pull request
Jun 6, 2026
…mo structure #204 (#203 backend-client refactor) merged after #205 and restructured harness-e2e from three v2-stage{1,2,3}-demo.sh steps into one v2-demo.sh --ci orchestrator — which already carried forward the config-worker-unreachable allow-skip. Re-apply the codex-review guard (PR #210) onto that new structure: a self-dissolving Guard step after the v2-demo run that warns while config-test is unprovisioned (#209) and FAILS once it becomes reachable, so the step-21 isolation skip can't silently persist. Resolves the #210<->main conflict; the --allow-skip invocation is unchanged from main.
hanwencheng
added a commit
that referenced
this pull request
Jun 6, 2026
…mo structure (#210) #204 (#203 backend-client refactor) merged after #205 and restructured harness-e2e from three v2-stage{1,2,3}-demo.sh steps into one v2-demo.sh --ci orchestrator — which already carried forward the config-worker-unreachable allow-skip. Re-apply the codex-review guard (PR #210) onto that new structure: a self-dissolving Guard step after the v2-demo run that warns while config-test is unprovisioned (#209) and FAILS once it becomes reachable, so the step-21 isolation skip can't silently persist. Resolves the #210<->main conflict; the --allow-skip invocation is unchanged from main.
hanwencheng
added a commit
that referenced
this pull request
Jun 6, 2026
…6/7/8) (#212) * feat: #207 onboarding + classifier auto-distribution (items 1A/2/3/5/6/7/8) Productionizes #207 onboarding + classifier-driven auto-distribution on the #205 Config substrate. All items except 1B (NL→COMPILE UI). - 1A: ~10 bundled presets + GET/POST /v1/master/config/{presets,init} author config/memory-taxonomy.enc (master-self read-modify-write MERGE — a later plant never clobbers it); onboarding gains a 'set up categories' step. - 2/3/6: agentkeys-worker-classify (COMPUTE gate, COMPILE+TAG, no S3), CapOp::Classify + /v1/cap/classify (data-class-bound), agentkeys-catalog (entity→category + per-category sensitivity floor + signed vendor overlays bounded by the floor). Deploy-wired: setup-broker-host.sh, dns-upsert-workers.sh, env files (prod/test/CI). - 5/7/8: daemon classify bridge (--classify-url → cap-gated worker TAG, local catalog tier-0 fallback), /v1/master/classify/{tag,propose}, /v1/actors/:id/scope/grant; AutoDistributePanel (propose→confirm, sensitivity-tiered: safe auto / sensitive K11). - stage-3 demo step 22: classifier-worker isolation negatives (op + data-class mismatch), skip-until-deployed. Docs synced (arch.md, CLAUDE.md, harness/CLAUDE.md, runbook). Determinism guardrail held — catalog hashmap lookups, no LLM on the gate hot path. Connect-time auto-distribution at agent pairing tracked in #211. Tests: daemon 99, catalog 7, worker-classify 6, broker 36; frontend tsc + next build green. * style: cargo fmt the #207 crates (catalog, worker-classify, ui_bridge) Pure rustfmt — whitespace only, no logic change. Satisfies the CI cargo-fmt gate. * fix: #207 CI step-22 skip + init graceful-degrade + general (non-memory-only) onboarding narrative Three fixes: 1. CI (harness step 22): curl '... || echo 000' DOUBLED %{http_code} to '000000' on the undeployed classify worker (TLS/connect failure), missing the 'case 000|502|503|504)' skip → spurious die → phase-3 FAIL. Mirror the config helper: send curl stderr to a side file, use '|| true' (curl already prints %{http_code}=000). Now skips cleanly via classify-worker-unavailable. 2. init 502 (user-reported): a degraded config worker (S3 GetObject 502) made reconcile_taxonomy's read-before-write abort init with a cryptic 502. Now init DEGRADES — authors into the in-memory mirror with a loud 'cached-degraded: <reason>' status (never silently 'ok') instead of hard-failing; the master-memory list shows the in-memory authored taxonomy on a durable error when local authored data exists (else still 502 — #201 finding 2 preserved for the empty case). 3. Narrative: the init page said 'memory categories', but the taxonomy is general (memory + credentials + future data classes). Reworded the onboarding setup step, the memory-page setup, and the user manual to 'category taxonomy — what an agent can access (memory, credentials, …)'; surfaced the degraded status in the UI. daemon 100 tests · frontend tsc+build · fmt+clippy clean · harness bash -n ok. * fix: #207 revert the init degrade — real durable data or fail loud (no in-memory compromise) The previous 'cached-degraded' fallback was wrong: it masked a broken Config store behind an in-memory stand-in, violating the all-real-data principle (#201 finding-2). Reverted to real-data-only: - init (configured path): a config worker failure (unreachable / S3 error) is now a HARD 502 with an actionable message — NO in-memory fallback. We author real durable data or fail loud so the operator fixes the Config data class. The ONLY in-memory path left is an explicit dev daemon started WITHOUT --config-url ('cached', labelled 'dev only — not durable'). - list (resolve_categories): a configured-but-broken Config 502s again (finding-2), never masked behind in-memory data. - config worker: s3_get/s3_put now surface the REAL S3 error (AccessDenied / NoSuchBucket / region) via ProvideErrorMetadata instead of a generic 'service error', so the broken store is diagnosable. - frontend + user manual: dropped the 'degraded/saved-locally' wording; a config-worker failure shows the real error, and the dev-only in-memory path is clearly labelled. daemon 100 + worker-config tests · frontend tsc+build · fmt+clippy clean. * feat: #207 onboarding init flow — progress ceremony, jump-to-app + sticky toast, idempotent re-onboard Three onboarding flow fixes (init is multi-second: cap-mint → STS → config worker → S3): 1. PROGRESS BAR — the setup step now runs a CeremonyRunner (Read profile → Compile taxonomy → Encrypt+store to Config → Index+audit), the real init fires as the slow step's awaited action, so the bar reflects the true duration (no more frozen 'authoring…' button). 2. JUMP TO APP — on success it goes straight to the main page (no dead 'Enter agentKeys' button). A STICKY toast (no auto-dismiss, with an × dismiss) carries the next step: 'N categories authored · Next: connect an agent (Pairing tab)'. 3. IDEMPOTENT RE-ONBOARD — on entering setup, probe listMemoryCategories: if a taxonomy already exists, skip straight in (never re-author / re-prompt). The daemon init is already data-idempotent (reconcile_taxonomy MERGES, never clobbers) and writes ONLY config/memory-taxonomy.enc — never the memory:<ns> plant blobs — so planted data is never deleted. New test init_preserves_a_pre_existing_planted_namespace proves it. daemon 101 tests · frontend tsc+build · fmt+clippy clean. * fix(harness): #207 no leaked demo memory + shrink phase-6 parity to a thin wiring smoke 1. (memory hygiene) Onboarding never plants memory (init authors the TAXONOMY only; dev_seed has no caller; --ui-bridge-seed-* seeds the session, not memory). The 'already planted' was the harness leaking durable S3 memory: - memory-plant-demo.sh: plant into DEDICATED demo-* namespaces (never the real travel/personal/family) + an EXIT-trap cleanup that deletes exactly those blobs on success OR failure (KEEP_DEMO_MEMORY=1 to keep). The real prepared archive is user-only (the web button) — never auto-planted. - web-parity-demo.sh: EXIT-trap now also deletes the dedicated webparity probe (was only cleaned on the success path via step 4). CI already materializes MEMORY_BUCKET + has a belt-and-braces prefix wipe. 2. (shrink #3) web-parity phase 6: 4 steps → 3. Dropped the redundant parity-artifact step (canonical-key HEAD + manual delete) — the body shape is compile/fixture-gated (check-web-api-drift.sh), the S3 key is deterministic + worker-unit-tested, and cleanup is the EXIT trap. The runtime check is now just the plant→200 wiring smoke (harness/CLAUDE.md 'parity checks evolve down a ladder'). Docs synced: harness/CLAUDE.md inventory + operator-runbook-harness.md. bash -n clean. * feat: #207 credentials as a first-class data class in the app (same abstraction as memory) Credentials now mirror memory end-to-end: list-then-categorize over the master's own real vault, plus a vault-a-credential write. - cred worker: new POST /v1/cred/list (master-only — operator==actor, so a single-service cap can't enumerate the vault) lists the actor's stored service ids from S3. + service_from_key parsing test. - daemon: GET /v1/master/credentials (cred worker list → categorize each via the catalog, the parallel to GET /v1/master/memory's category list) + POST /v1/master/credentials/store (mint master-self cred-store cap → STS → cred worker, the parallel to the memory plant). real_cred_ctx reads AGENTKEYS_WORKER_CRED_URL + VAULT_ROLE_ARN. Unconfigured → empty (honest dev); configured-but-broken → 502 (real data or fail loud, no in-memory stand-in). Wired cred-store/cred-fetch into mint_master_cap's CapMintOp match (#204 owner). - frontend: a Credentials page (credentials.tsx) grouped by category with sensitivity chips + a 'vault a credential' form, a nav item, and client methods. Tests: daemon 102 + worker service_from_key + list-unconfigured-empty; frontend tsc + next build; clippy -D warnings + fmt clean. user-manual documents it.
This was referenced Jun 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #201 (the config-driven memory list). Phases 1–5 land here, plus the codex adversarial-review hardening. Phase 0 (the cap layer) landed in #200.
What landed
Phase 1 — infra (operator runs on AWS; idempotent mirrors of the memory scripts)
scripts/provision-config-bucket.sh,scripts/provision-config-role.sh,scripts/apply-config-bucket-policy.sh— own$CONFIG_BUCKET+agentkeys-config-role,config/prefix, split-statement v3 bucket policy (s3:prefix=bots/${PrincipalTag}/config/*), OIDCagentkeys_actor_omniPrincipalTag (reused).CONFIG_BUCKET/CONFIG_ROLE_ARN+WORKER_CONFIG_HOST/AGENTKEYS_WORKER_CONFIG_URLinscripts/operator-workstation.env.scripts/setup-cloud.shstep 13 (per-data-class provisioning).Phase 2 — config worker (master-only)
agentkeys-worker-configcrate — mirror ofagentkeys-worker-memory:config/S3 prefix,$CONFIG_BUCKET,AGENTKEYS_CONFIG_KEK_HEX,DataClass::Config, port 9096./v1/config/{put,get,teardown}.scripts/setup-broker-host.shwiring: build/install lists,worker-config.env(KEK auto-gen + preserve), systemd unit, nginx vhost, stop/backup lists, firewall + certbot domain list, post-install summary. Registered in the workspaceCargo.toml.Phase 3 — isolation tests (test-discipline rule, all four layers)
harness/v2-stage3-demo.shsteps 19–21: step 19 config creds write their ownconfig/prefix (200) but are AccessDenied at the memory + vault buckets (and memory creds → config bucket AccessDenied) — per-data-class layers 3+4; steps 20–21 the cap data-class-mismatch (config cap → memory + cred workers; memory + cred cap → config worker →cap_data_class_mismatch). All master-self → run on the operator (no sandbox defer);skipcleanly until the config infra is provisioned + the broker redeployed. Cleanup shifted 19 → 22;STEP_TOTAL=22.Source-of-truth updates (landed with the code)
docs/arch.md: §5 canonical names, §17.2 role list, §17.3 (Planned → Landed substrate), §17.5 cap-binding table + four-layer table + Config endpoints, storage diagram.CLAUDE.md: per-data-class four-layer table (layers 3+4), six cap endpoints, "the third data class — config — has landed" note (generalized for a fourth).docs/operator-runbook-harness.md+harness/CLAUDE.md(keep-docs-in-sync rule), plan doc phase status.Verification
agentkeys-worker-config: dev + release build + unit tests green (S3-key prefix isolation tests).cargo check --workspaceclean (all 17 crates, incl. the broker with the feat: #164 sponsored ERC-4337 register + v2-demo harness restructure #200 config routes).bash -n); config scripts structurally identical to the memory originals.Phases 4–5 + codex hardening (landed on this branch)
Phase 4 — daemon (
ui_bridge.rs): reads/writes the memory-types taxonomy via the Config data class (--config-url/--config-role-arn);GET /v1/master/memory→ categories from the taxonomy (no decrypt, cache fallback); new lazyGET /v1/master/memory/entry?ns=&key=; plant writes per-namespace JSON arrays. CLIhook memory-injectrenders the array (single-body still injects). Harnessmemory-plant-demo.sh+web-parity-demo.shwrite/pass the new shape.Phase 5 — frontend (
apps/parent-control): lists categories, decrypts a namespace's entries on demand, plant re-fetches categories.Codex adversarial-review hardening:
plant_lock: it reads the durablememory:<ns>blob first, merges (durable entries never dropped), and aborts rather than overwrites on a transient read error — closing the restart-stale-cache / concurrent-plant last-writer-wins window.NoSuchKey, so the daemon distinguishes "never written" from a real failure;GET /v1/master/memory502s on a configured-but-broken Config instead of masking it as empty;plantreturns an explicittaxonomy_status.Deploy note: touching the workers (the 404 behavior) requires a
bash scripts/setup-broker-host.sh --ref mainredeploy.Remaining operator one-shots (runtime, not code)
bash scripts/setup-cloud.sh(prod) /--ci(test): provisions the config bucket + IAM role + bucket policy. Idempotent; one command.bash scripts/setup-broker-host.sh --ref main: brings the config worker live + the worker-404 behavior.skip(config-role-missing/scope-not-set) until 1+2 are deployed in the test env, then run green. CI now tolerates the config skip instead of crashing on the unbound env var.Refs: #178 (classifier-service), #191 (W3 master-self memory), #200 (Phase 0 cap layer).
🤖 Generated with Claude Code