[aw-failures] Token-budget exhaustion (25M effective-tokens cap) recurring across 6+ scheduled workflows — 2026-05-29 02:00–07:32 UTC

### Problem statement

Between 2026-05-29 02:00 UTC and 07:32 UTC, **at least 7 scheduled agentic workflow runs across 6 distinct workflows** failed with the same Copilot CLI error:

```
CAPIError: 429 Maximum effective tokens exceeded (25011516.00 / 25000000)
```

This is a material expansion of the P1 "token-budget loop" cluster first surfaced in the parent report [#35484](https://github.com/github/gh-aw/issues/35484) (which observed 2 affected workflows). The pattern has tripled in 24 hours and is now the dominant failure mode in the 6h window.

### Affected workflows / runs (6h window)

| Workflow | Run ID | Time (UTC) | Symptom |
|---|---|---|---|
| PR Sous Chef | [§26623257736](https://github.com/github/gh-aw/actions/runs/26623257736) | 07:02 | 4 retries of CAPI 429 → action timed out @ 25m |
| PR Sous Chef | [§26620005382](https://github.com/github/gh-aw/actions/runs/26620005382) | 05:31 | `effective_tokens_rate_limit_error` set |
| Safe Output Health Monitor | [§26620239212](https://github.com/github/gh-aw/actions/runs/26620239212) | 05:38 | `effective_tokens_rate_limit_error` set |
| Step Name Alignment | [§26619561645](https://github.com/github/gh-aw/actions/runs/26619561645) | 05:18 | `effective_tokens_rate_limit_error` set (also tracked in [#35644](https://github.com/github/gh-aw/issues/35644)) |
| Copilot CLI Deep Research Agent | [§26619051030](https://github.com/github/gh-aw/actions/runs/26619051030) | 05:01 | `effective_tokens_rate_limit_error` set + 10 KB body limit hit on `create_discussion` |
| Go Logger Enhancement | [§26618323959](https://github.com/github/gh-aw/actions/runs/26618323959) | 04:39 | `effective_tokens_rate_limit_error` set |
| Daily Firewall Logs Collector and Reporter | [§26615980789](https://github.com/github/gh-aw/actions/runs/26615980789) | 03:22 | `effective_tokens_rate_limit_error` set |

### Probable root cause

The Copilot CLI harness retries the agent up to 4 times with `--continue` after partial failures. When the prior turn already accumulated ≥20 M effective tokens (large MCP tool descriptions + workflow body + tool output history), each retry re-sends the full conversation and crosses the 25 M cap on the next request. The job then either:

1. Loops through 4 retries each consuming ~1–2 minutes of 429 backoff (94 s total wait), then times out at the 25-minute step limit, or
2. Exits non-zero on attempt 1 and the conclusion step marks the run as failure.

Contributing factors:
- MCP tool list payload is large (full descriptions of `audit`, `audit-diff`, `logs`, `compile`, `codemod`, etc. each ~1 KB).
- Some workflow prompts (e.g. PR Sous Chef triaging 7 PRs) accumulate `gh pr view` JSON outputs across iterations.
- Cache hits are high in absolute tokens (3.4 M cached on the PR Sous Chef failure) but cached input still counts against the effective-tokens cap.

### Proposed remediation

1. **Cap per-workflow turn count** — set explicit `max-turns` on the affected scheduled workflows (suggested 30 for triage workflows, 60 for investigative). Today none of these workflows declare `max-turns`.
2. **Reduce MCP tool surface area per workflow** — most failing workflows have access to the full `agenticworkflows` tool catalog when they only need `logs` + `audit`. Use `allow-tool` lists to keep MCP description payload small.
3. **Trim conversation between retries** — when the harness detects 429 effective-tokens on attempt N, it should pass `--no-resume` (or compact) instead of `--continue` for attempt N+1, so the retry starts from a smaller context window.
4. **Pre-emptive guard** — emit a workflow warning when cumulative tokens cross 20 M (80 % of cap) so the agent can self-truncate with `noop` before failing on the next request.

### Success criteria / verification

- Over a 24h window after rollout, agentic workflow runs failing with `effective_tokens_rate_limit_error` drop below **5 %** of completed runs (current rate: ~7 of ~25 scheduled completions in the 6h sample = ~28 %).
- No single workflow contributes >1 token-cap failure in any 6h window.
- PR Sous Chef, Safe Output Health Monitor, and Copilot CLI Deep Research run to completion in ≥ 80 % of scheduled invocations.

### Related issues

- Parent: [#35484](https://github.com/github/gh-aw/issues/35484)
- Related single-workflow tracker: [#35644](https://github.com/github/gh-aw/issues/35644) (Step Name Alignment 80 % failure rate)
- Related but distinct cause (not token-budget): [#35441](https://github.com/github/gh-aw/issues/35441) (Daily Hippo Learn cache-memory git pack corruption — recurred at 07:37 today, [§26624675283](https://github.com/github/gh-aw/actions/runs/26624675283), confirming that tracker is still live).

_Filed by [aw] Failure Investigator (6h) [§26625870457](https://github.com/github/gh-aw/actions/runs/26625870457)._







> Generated by [🔍 [aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/26625870457) · opus47 11.4M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)
> - [x] expires  on Jun 5, 2026, 8:18 AM UTC






---

### Recurrence confirmed — 2026-05-30 17:43 UTC (still active)

The 25M effective-tokens cap fired again ~33h after this issue was opened, confirming the token-budget exhaustion pattern is **still active and not yet remediated**.

| Workflow | Run | Time (UTC) | Symptom |
|---|---|---|---|
| Linter Miner | [§26690626184](https://github.com/github/gh-aw/actions/runs/26690626184) | 2026-05-30 17:43 | `agent` → `Execute GitHub Copilot CLI` failed; `effective_tokens_rate_limit_error` set |

<details>
<summary>Exact 429 signature</summary>

```
429 Maximum effective tokens exceeded (25132364.10 / 25000000).
```

Run profile: 21.5m, 59 turns, 25.13M effective tokens — same single-run-crosses-the-cap shape described above (no `--continue` retry needed; one long run exceeded 25M on its own).

</details>

Keeping this issue **open**. (Investigated by the [aw] Failure Investigator 6h window ending 2026-05-30 ~19:10 UTC.)

> Generated by [🔍 [aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/26692427111) · opus48 4.1M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)



---

### Recurrence — 2026-06-01 13:25 UTC (PR Sous Chef)

This failure mode recurred 3 days after the original report. Exactly **one** new failed run appears in the 2026-06-01 08:45–14:45 UTC window, and it matches this cluster.

| Workflow | Run | Time (UTC) | Effective tokens | Proximate failure |
|---|---|---|---|---|
| PR Sous Chef | [§26757738602](https://github.com/github/gh-aw/actions/runs/26757738602) | 13:25 | 23,683,854 (94.7% of 25M cap) | `Execute GitHub Copilot CLI` timed out @ 25m |

#### Key deltas vs. original report

1. **No CAPI 429 this time.** Effective tokens peaked at 23.68M — just under the 25M cap — so the proximate failure was the **25-minute step timeout**, not the `effective_tokens` rate-limit error. The agent was still serially iterating the PR queue (PR 36222 → 36225 → 36230) when wall-clock expired. Token pressure and the 25m timeout are two faces of the same root cause: serial whole-queue processing.
2. **High run-to-run variance.** A sibling PR Sous Chef run [§26757759212](https://github.com/github/gh-aw/actions/runs/26757759212), triggered 23s later (13:25:30), **succeeded** with only 10,567,434 effective tokens — ~2.24× fewer than the failed run over the same PR queue. This confirms the failure is load/variance-dependent, not a hard regression.
3. **Two near-simultaneous PR Sous Chef runs** started within 23 seconds (13:25:07 and 13:25:30). Possible duplicate scheduling/trigger; low confidence, flagged for follow-up — concurrent runs over the same PR queue compound token/time pressure.

<details><summary>Evidence — agent step termination (run 26757738602)</summary>

```
2026-06-01T13:54:05.5700647Z ##[error]The action 'Execute GitHub Copilot CLI' has timed out after 25 minutes.
```

agent_usage.json:
```json
{"input_tokens":3594933,"output_tokens":88094,"effective_tokens":23683854,"primary_model":"gpt-5.4-mini-2026-03-17"}
```

Job outcomes: `agent` → **failure** (27.4m); `detection` and `safe_outputs` → skipped; `agent_output.json` was empty (0 safe items emitted before the timeout).

</details>

#### Confidence & unknowns

- **High confidence**: this run belongs to the #35661 cluster (PR Sous Chef, ~25M effective-token ceiling, 25m step timeout on serial PR-queue processing).
- **Unknown / low confidence**: whether the duplicate near-simultaneous scheduling (two runs 23s apart) is systemic or a one-off.
- **No other failures** in the 2026-06-01 08:45–14:45 UTC window: 28 other runs were 19 success / 8 in-progress / 1 queued. The workflows tracked by #35780 (squid/claude startup) and #35984 (Contribution Check safe_outputs) **did not execute** in this window, so neither could be confirmed fixed or stale from fresh evidence — both remain open.

**Recommendation (unchanged, reinforced):** bound PR Sous Chef's per-run work — process the PR queue in capped batches and/or lower the per-run effective-token budget so a single scheduled run cannot approach the 25M cap or the 25m step timeout.

**References:** [§26757738602](https://github.com/github/gh-aw/actions/runs/26757738602), [§26757759212](https://github.com/github/gh-aw/actions/runs/26757759212)

> Generated by [🔍 [aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/26762018833) · opus48 1.5M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)



---

### 2026-06-02 re-investigation (6h window 08:10–14:10 UTC) — STILL RECURRING, now cross-engine

**Keep this open and prioritize the effective-token rail fix — the 25M cap is still the dominant failure mode 4 days after this issue was filed, and it now breaks the `claude` engine too, not just `copilot`.** Of 6 agentic failures in the last 6h, 4 are effective-token over-consumption.

#### Fresh affected runs

| Workflow | Run | Engine | Turns | Effective tokens | Symptom |
|---|---|---|---|---|---|
| daily-experiment-report | [§26810371152](https://github.com/github/gh-aw/actions/runs/26810371152) | copilot | 42 | **25,191,391** | `CAPIError: 429 Maximum effective tokens exceeded (25191390.60 / 25000000)` → hard-rail, not retried → exit 1 |
| Package Specification Extractor | [§26815252670](https://github.com/github/gh-aw/actions/runs/26815252670) | copilot | 38 | **25,059,124** | same 429 hard-rail; also `hasNumerousPermissionDenied=true` (3 denied bash cmds) |
| Typist - Go Type Analysis | [§26819864149](https://github.com/github/gh-aw/actions/runs/26819864149) | claude | (0*) | **46,885,956** | claude-opus ran ~22m then `agent` job failed; effective tokens ~1.9× the cap |
| Daily AW Cross-Repo Compile Check | [§26812510648](https://github.com/github/gh-aw/actions/runs/26812510648) | claude | 72 | 22,454,280 | 66m run, 34 rate-limit + 33 timeout markers, killed near cap |

\* Typist records `Turns=0` because the claude-engine stdout parser miscounts turns when the run is killed mid-stream — the run actually produced opus-4-8 assistant turns. Tracking note: the turn-counter under-reports for killed claude runs.

#### Exact copilot hard-rail signature (both copilot runs)

<details>
<summary>copilot-harness effective-token rail</summary>

```
Last error: CAPIError: 429 Maximum effective tokens exceeded (25191390.60 / 25000000).
[copilot-harness] attempt 1 failed: ... isMaxEffectiveTokensExceededError=true ...
[copilot-harness] attempt 1: AWF effective-token hard rail hit — not retrying or continuing
```
</details>

#### Regression delta (audit-diff: failed §26815252670 vs healthy copilot §26815254572 Functional Pragmatist)

The failure is pure over-consumption, **not** infrastructure — `has_anomalies: false`, no firewall/MCP status changes.

<details>
<summary>audit-diff metrics</summary>

| Metric | Healthy | Failed | Δ |
|---|---|---|---|
| Effective tokens | 3,655,877 | 25,059,124 | +585% |
| Turns / requests | 7 | 38 | +31 |
| Tokens per turn | 522K | 659K | +26% |
| copilot API call volume | 15 | 84 | +460% |
| cache efficiency | 0 | 0 | — |

Cache efficiency is **0** on both runs despite 2.29M cache-read tokens on the failed run — the effective-token formula is charging full weight for re-read context. High tokens-per-turn (~660K) + 38 turns indicates the agent re-reads large context every turn rather than narrowing.
</details>

#### Recommended fixes (carry forward)

1. **Enforce a per-run turn/effective-token soft budget that triggers graceful summarize-and-exit before the 25M hard rail** — today the rail aborts mid-task with no safe output.
2. **Audit context growth per turn** for the repeat offenders (daily-experiment-report, Package Specification Extractor, Cross-Repo Compile Check) — 660K tokens/turn with 0 cache efficiency points at full-context re-reads.
3. **Fix the claude-engine turn counter** so killed runs (Typist) don't report `Turns=0` and slip past turn-based detectors.

#### Correlation

Same root cause as the original report. Cross-engine spread (claude now affected) materially broadens scope. Distinct from #35780 (squid startup) and #36325 (zero-token early CLI exit — spends no tokens). Re-investigation parent run: [§26825214886](https://github.com/github/gh-aw/actions/runs/26825214886).

> Generated by [🔍 [aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/26825214886) · opus48 2M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aw-failures] Token-budget exhaustion (25M effective-tokens cap) recurring across 6+ scheduled workflows — 2026-05-29 02:00–07:32 UTC #35661

Problem statement

Affected workflows / runs (6h window)

Probable root cause

Proposed remediation

Success criteria / verification

Related issues

Recurrence confirmed — 2026-05-30 17:43 UTC (still active)

Recurrence — 2026-06-01 13:25 UTC (PR Sous Chef)

Key deltas vs. original report

Confidence & unknowns

2026-06-02 re-investigation (6h window 08:10–14:10 UTC) — STILL RECURRING, now cross-engine

Fresh affected runs

Exact copilot hard-rail signature (both copilot runs)

Regression delta (audit-diff: failed §26815252670 vs healthy copilot §26815254572 Functional Pragmatist)

Recommended fixes (carry forward)

Correlation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Workflow	Run ID	Time (UTC)	Symptom
PR Sous Chef	§26623257736	07:02	4 retries of CAPI 429 → action timed out @ 25m
PR Sous Chef	§26620005382	05:31	`effective_tokens_rate_limit_error` set
Safe Output Health Monitor	§26620239212	05:38	`effective_tokens_rate_limit_error` set
Step Name Alignment	§26619561645	05:18	`effective_tokens_rate_limit_error` set (also tracked in #35644)
Copilot CLI Deep Research Agent	§26619051030	05:01	`effective_tokens_rate_limit_error` set + 10 KB body limit hit on `create_discussion`
Go Logger Enhancement	§26618323959	04:39	`effective_tokens_rate_limit_error` set
Daily Firewall Logs Collector and Reporter	§26615980789	03:22	`effective_tokens_rate_limit_error` set

Workflow	Run	Engine	Turns	Effective tokens	Symptom
daily-experiment-report	§26810371152	copilot	42	25,191,391	`CAPIError: 429 Maximum effective tokens exceeded (25191390.60 / 25000000)` → hard-rail, not retried → exit 1
Package Specification Extractor	§26815252670	copilot	38	25,059,124	same 429 hard-rail; also `hasNumerousPermissionDenied=true` (3 denied bash cmds)
Typist - Go Type Analysis	§26819864149	claude	(0*)	46,885,956	claude-opus ran ~22m then `agent` job failed; effective tokens ~1.9× the cap
Daily AW Cross-Repo Compile Check	§26812510648	claude	72	22,454,280	66m run, 34 rate-limit + 33 timeout markers, killed near cap

Metric	Healthy	Failed	Δ
Effective tokens	3,655,877	25,059,124	+585%
Turns / requests	7	38	+31
Tokens per turn	522K	659K	+26%
copilot API call volume	15	84	+460%
cache efficiency	0	0	—

[aw-failures] Token-budget exhaustion (25M effective-tokens cap) recurring across 6+ scheduled workflows — 2026-05-29 02:00–07:32 UTC #35661

Description

Problem statement

Affected workflows / runs (6h window)

Probable root cause

Proposed remediation

Success criteria / verification

Related issues

Recurrence confirmed — 2026-05-30 17:43 UTC (still active)

Recurrence — 2026-06-01 13:25 UTC (PR Sous Chef)

Key deltas vs. original report

Confidence & unknowns

2026-06-02 re-investigation (6h window 08:10–14:10 UTC) — STILL RECURRING, now cross-engine

Fresh affected runs

Exact copilot hard-rail signature (both copilot runs)

Regression delta (audit-diff: failed §26815252670 vs healthy copilot §26815254572 Functional Pragmatist)

Recommended fixes (carry forward)

Correlation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions