Skip to content

ci(harness): guard the stage-3 step-21 config-worker skip against silent permanence (#209)#210

Merged
hanwencheng merged 1 commit into
mainfrom
claude/stage3-step21-skip-review
Jun 6, 2026
Merged

ci(harness): guard the stage-3 step-21 config-worker skip against silent permanence (#209)#210
hanwencheng merged 1 commit into
mainfrom
claude/stage3-step21-skip-review

Conversation

@hanwencheng
Copy link
Copy Markdown
Member

Question

Should harness-CI stage-3 step 21 (the config worker's live cap_data_class_mismatch rejection) be allow-skipped via config-worker-unreachable (added in #205)? Asked for a /codex:adversarial-review second opinion, challenging both the skip and the un-skip.

Codex verdict — needs-attention, cuts both ways

  • [high] The bare un-skip is wrong. Removing config-worker-unreachable doesn't restore the proof — it just hard-fails main CI on a known-unprovisioned config-test DNS/TLS dependency (Provision config-test (DNS + cert) so stage-3 step 21 is a live gate again (drop config-worker-unreachable allow-skip) #209), reddening every run without fixing anything.
  • [medium] But the skip is a real (narrow) blind spot. Step 21 is the only live POST to ${AGENTKEYS_WORKER_CONFIG_URL}/v1/config/put with a cross-class cap. Step 20 (memory/cred workers reject a config cap) + agentkeys-worker-config unit tests verify the handler logic, but cannot catch deployed route/nginx/TLS/BROKER_CAP_PUBKEY/middleware drift — which only step 21 catches. So the skip must not become silently permanent.

Codex recommendation: "keep config-worker-unreachable allowlisted temporarily with an explicit #209 TODO/expiry and a separate guard that fails once the hostname is reachable… emit a distinct CI annotation, link #209, and remove the allowance immediately after config-test is provisioned and step 21 passes live."

Resolution (this PR)

Keep the quarantined skip (CI stays green — the host genuinely isn't up), but make the quarantine honest, not silent via a new Guard step that probes config-test.<zone>/healthz:

config-test state Guard outcome
unreachable (today, #209 open) ::warning:: every run (visible, links #209) → exit 0, CI green
reachable (after #209) ::error::exit 1, forcing removal of the now-unnecessary allowance so step 21 runs as a live gate
allowance already removed ::notice:: → exit 0 (guard inert — self-dissolving)

The --allow-skip line itself is unchanged vs main; the diff is the Guard step + an expanded comment. The guard's three branches are unit-simulated (see the commit). This PR's own harness-CI is green (config-test unreachable → warning), but now the gap is loud + the tolerance can't outlive #209.

Closes the loop

🤖 Generated with Claude Code

…mo structure

#204 (#203 backend-client refactor) merged after #205 and restructured harness-e2e from three v2-stage{1,2,3}-demo.sh steps into one v2-demo.sh --ci orchestrator — which already carried forward the config-worker-unreachable allow-skip. Re-apply the codex-review guard (PR #210) onto that new structure: a self-dissolving Guard step after the v2-demo run that warns while config-test is unprovisioned (#209) and FAILS once it becomes reachable, so the step-21 isolation skip can't silently persist. Resolves the #210<->main conflict; the --allow-skip invocation is unchanged from main.
@hanwencheng hanwencheng force-pushed the claude/stage3-step21-skip-review branch from 29b9837 to a931f47 Compare June 6, 2026 06:09
@hanwencheng
Copy link
Copy Markdown
Member Author

Rebased onto current main (9ff3f8a) to resolve the conflict with #204 (which merged after #205 and restructured harness-e2e from three v2-stage{1,2,3}-demo.sh steps into one v2-demo.sh --ci orchestrator). #204 already carried the config-worker-unreachable allow-skip forward, so this PR's net change is just the self-dissolving Guard step (re-anchored after the v2-demo step). The --allow-skip invocation is unchanged from main; diff is +41/-1, harness-ci.yml only.

@hanwencheng hanwencheng merged commit 4c71db0 into main Jun 6, 2026
5 checks passed
hanwencheng added a commit that referenced this pull request Jun 6, 2026
Conflicts resolved:
- ui_bridge.rs route block: kept #204's MASTER_MEMORY_PLANT_ROUTE constant +
  my new /v1/master/config/{presets,init} + /v1/master/classify/{tag,propose} routes.
  My inline classify/config cap-mint paths coexist with #204's agentkeys-backend-client
  (the #201 config/memory plant paths #204 deliberately kept inline — same pattern).
- harness-ci.yml: kept #210's v2-demo.sh --ci structure + the #209 step-21 quarantine
  guard; added classify-not-configured,classify-worker-unavailable to --allow-skip.
- Cargo.lock: took main's + cargo regenerated to include agentkeys-catalog +
  agentkeys-worker-classify alongside agentkeys-backend-client.

Post-merge green: workspace build + clippy clean; daemon 100 + broker 36 + catalog/
worker tests; frontend tsc + next build; both #203/#204 drift checks pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant