[aw-failures] [aw-failure-investigator] Failure Report 2026-05-06 ~19:12 UTC

### Overview

Failure investigation run covering the last 6 hours (roughly 13:00–19:12 UTC on 2026-05-06). Of **50 runs** analyzed (15 successful, 30 skipped, 3 failed, 2 in-progress), **3 distinct failures** were found across 2 root-cause clusters. Both clusters trace to infrastructure-level problems rather than workflow logic bugs.

> ⚠️ Note: `gh` CLI was unavailable for querying existing issues (403 on internal GitHub). Existing-issue correlation could not be performed; sub-issues may overlap with prior tracking.

---

### Failure Clusters

| # | Workflow | Run | Time (UTC) | Root Cause | Priority |
|---|----------|-----|-----------|------------|----------|
| 1 | Documentation Unbloat | [§25449195821](https://github.com/github/gh-aw/actions/runs/25449195821) | 16:56 | API rate limit at guard-policy check | P1 |
| 2 | Contribution Check | [§25450254117](https://github.com/github/gh-aw/actions/runs/25450254117) | 17:18 | API rate limit in safe_outputs job | P1 |
| 3 | Daily Safe Output Integrator | [§25455592425](https://github.com/github/gh-aw/actions/runs/25455592425) | 19:10 | Detection model emitted shell command instead of THREAT_DETECTION_RESULT JSON | P1 |

---

### Evidence

#### Cluster A: GitHub Installation API Rate Limit (~17:00 UTC)

<details>
<summary>Documentation Unbloat — guard policy failure</summary>

Error from agent job:
```
##[error]Failed to determine automatic guard policy: API rate limit exceeded for installation.
Request ID: 3C00:3C2A48:290908C:9DF389F:69...
```
The workflow failed before the agent could activate — the guard policy step hit the GitHub API installation rate limit.

</details>

<details>
<summary>Contribution Check — safe_outputs rate limit burst</summary>

The agent ran 18 turns successfully and produced `add_comment` safe outputs. However, in the `safe_outputs` job, 3 consecutive `add_comment` calls and 1 `create_issue` call all failed with:
```
##[error]Failed to add comment: API rate limit exceeded for installation.
Request ID: 0804:2874A1:2A87C35:A2F6043:69FB78FF
##[error]Failed to add comment: API rate limit exceeded for installation.
Request ID: 0800:27151A:11BC8E0:43D4955:69FB7900
```
After retries (~2 min), the safe_outputs recovered:
```
✓ Comment created successfully with ID: 4390448744
✓ Issue #30477 closed successfully
```
However the safe_outputs job exit code was non-zero due to the initial failures, causing the run to be marked `failure`. The rate limiting was transient and self-resolved.

</details>

#### Cluster B: Detection Model Misbehavior (~19:10 UTC)

<details>
<summary>Daily Safe Output Integrator — detection stage output</summary>

The detection log (`detection.log`, 29 bytes, 2 lines) contained only:
```
sudo: awf: command not found
```
The expected output format is:
```
THREAT_DETECTION_RESULT:{"prompt_injection":bool,"secret_leak":bool,"malicious_patch":bool,"reasons":[...]}
```
The detection parser correctly flagged this as `parse_error` and set `GH_AW_DETECTION_REASON: parse_error`. The agent itself ran successfully (10 turns, called `safeoutputs.noop`) and the noop was processed, but the `detection` job failure caused the overall run conclusion to be `failure`.

The detection model appears to have attempted to execute a shell command (`sudo awf`) rather than analyze the output and emit the expected JSON result format. This may indicate:
- The detection model was confused by workflow output content
- A prompt injection artifact in the agent output
- A detection model prompt or configuration regression

</details>

---

### Existing Issue Correlation

Unable to query open `agentic-workflows` issues — `gh` CLI returned HTTP 403 during this run. Sub-issues below may overlap with existing tracking.

---

### Proposed Fix Roadmap

| Priority | Issue | Action |
|----------|-------|--------|
| P1 | API rate limit causes transient workflow failures | Add retry-with-backoff to guard policy check; safe_outputs already retries but partial failures propagate. Consider graceful degradation. |
| P1 | Detection model emits shell commands instead of JSON | Investigate detection model prompt; add output validation before log parse; treat `parse_error` as warning (not failure) if agent succeeded. |

---

### Sub-Issues Created

- #30674 — Detection model produces shell commands instead of THREAT_DETECTION_RESULT JSON

**References:**
- [§25449195821](https://github.com/github/gh-aw/actions/runs/25449195821) — Documentation Unbloat (rate limit)
- [§25450254117](https://github.com/github/gh-aw/actions/runs/25450254117) — Contribution Check (rate limit)
- [§25455592425](https://github.com/github/gh-aw/actions/runs/25455592425) — Daily Safe Output Integrator (detection parse error)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aw-failures] [aw-failure-investigator] Failure Report 2026-05-06 ~19:12 UTC #30673

Overview

Failure Clusters

Evidence

Cluster A: GitHub Installation API Rate Limit (~17:00 UTC)

Cluster B: Detection Model Misbehavior (~19:10 UTC)

Existing Issue Correlation

Proposed Fix Roadmap

Sub-Issues Created

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

#	Workflow	Run	Time (UTC)	Root Cause	Priority
1	Documentation Unbloat	§25449195821	16:56	API rate limit at guard-policy check	P1
2	Contribution Check	§25450254117	17:18	API rate limit in safe_outputs job	P1
3	Daily Safe Output Integrator	§25455592425	19:10	Detection model emitted shell command instead of THREAT_DETECTION_RESULT JSON	P1

Priority	Issue	Action
P1	API rate limit causes transient workflow failures	Add retry-with-backoff to guard policy check; safe_outputs already retries but partial failures propagate. Consider graceful degradation.
P1	Detection model emits shell commands instead of JSON	Investigate detection model prompt; add output validation before log parse; treat `parse_error` as warning (not failure) if agent succeeded.

[aw-failures] [aw-failure-investigator] Failure Report 2026-05-06 ~19:12 UTC #30673

Description

Overview

Failure Clusters

Evidence

Cluster A: GitHub Installation API Rate Limit (~17:00 UTC)

Cluster B: Detection Model Misbehavior (~19:10 UTC)

Existing Issue Correlation

Proposed Fix Roadmap

Sub-Issues Created

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions