Overview
Failure investigation run covering the last 6 hours (roughly 13:00–19:12 UTC on 2026-05-06). Of 50 runs analyzed (15 successful, 30 skipped, 3 failed, 2 in-progress), 3 distinct failures were found across 2 root-cause clusters. Both clusters trace to infrastructure-level problems rather than workflow logic bugs.
⚠️ Note: gh CLI was unavailable for querying existing issues (403 on internal GitHub). Existing-issue correlation could not be performed; sub-issues may overlap with prior tracking.
Failure Clusters
| # |
Workflow |
Run |
Time (UTC) |
Root Cause |
Priority |
| 1 |
Documentation Unbloat |
§25449195821 |
16:56 |
API rate limit at guard-policy check |
P1 |
| 2 |
Contribution Check |
§25450254117 |
17:18 |
API rate limit in safe_outputs job |
P1 |
| 3 |
Daily Safe Output Integrator |
§25455592425 |
19:10 |
Detection model emitted shell command instead of THREAT_DETECTION_RESULT JSON |
P1 |
Evidence
Cluster A: GitHub Installation API Rate Limit (~17:00 UTC)
Documentation Unbloat — guard policy failure
Error from agent job:
##[error]Failed to determine automatic guard policy: API rate limit exceeded for installation.
Request ID: 3C00:3C2A48:290908C:9DF389F:69...
The workflow failed before the agent could activate — the guard policy step hit the GitHub API installation rate limit.
Contribution Check — safe_outputs rate limit burst
The agent ran 18 turns successfully and produced add_comment safe outputs. However, in the safe_outputs job, 3 consecutive add_comment calls and 1 create_issue call all failed with:
##[error]Failed to add comment: API rate limit exceeded for installation.
Request ID: 0804:2874A1:2A87C35:A2F6043:69FB78FF
##[error]Failed to add comment: API rate limit exceeded for installation.
Request ID: 0800:27151A:11BC8E0:43D4955:69FB7900
After retries (~2 min), the safe_outputs recovered:
✓ Comment created successfully with ID: 4390448744
✓ Issue #30477 closed successfully
However the safe_outputs job exit code was non-zero due to the initial failures, causing the run to be marked failure. The rate limiting was transient and self-resolved.
Cluster B: Detection Model Misbehavior (~19:10 UTC)
Daily Safe Output Integrator — detection stage output
The detection log (detection.log, 29 bytes, 2 lines) contained only:
sudo: awf: command not found
The expected output format is:
THREAT_DETECTION_RESULT:{"prompt_injection":bool,"secret_leak":bool,"malicious_patch":bool,"reasons":[...]}
The detection parser correctly flagged this as parse_error and set GH_AW_DETECTION_REASON: parse_error. The agent itself ran successfully (10 turns, called safeoutputs.noop) and the noop was processed, but the detection job failure caused the overall run conclusion to be failure.
The detection model appears to have attempted to execute a shell command (sudo awf) rather than analyze the output and emit the expected JSON result format. This may indicate:
- The detection model was confused by workflow output content
- A prompt injection artifact in the agent output
- A detection model prompt or configuration regression
Existing Issue Correlation
Unable to query open agentic-workflows issues — gh CLI returned HTTP 403 during this run. Sub-issues below may overlap with existing tracking.
Proposed Fix Roadmap
| Priority |
Issue |
Action |
| P1 |
API rate limit causes transient workflow failures |
Add retry-with-backoff to guard policy check; safe_outputs already retries but partial failures propagate. Consider graceful degradation. |
| P1 |
Detection model emits shell commands instead of JSON |
Investigate detection model prompt; add output validation before log parse; treat parse_error as warning (not failure) if agent succeeded. |
Sub-Issues Created
References:
Overview
Failure investigation run covering the last 6 hours (roughly 13:00–19:12 UTC on 2026-05-06). Of 50 runs analyzed (15 successful, 30 skipped, 3 failed, 2 in-progress), 3 distinct failures were found across 2 root-cause clusters. Both clusters trace to infrastructure-level problems rather than workflow logic bugs.
Failure Clusters
Evidence
Cluster A: GitHub Installation API Rate Limit (~17:00 UTC)
Documentation Unbloat — guard policy failure
Error from agent job:
The workflow failed before the agent could activate — the guard policy step hit the GitHub API installation rate limit.
Contribution Check — safe_outputs rate limit burst
The agent ran 18 turns successfully and produced
add_commentsafe outputs. However, in thesafe_outputsjob, 3 consecutiveadd_commentcalls and 1create_issuecall all failed with:After retries (~2 min), the safe_outputs recovered:
However the safe_outputs job exit code was non-zero due to the initial failures, causing the run to be marked
failure. The rate limiting was transient and self-resolved.Cluster B: Detection Model Misbehavior (~19:10 UTC)
Daily Safe Output Integrator — detection stage output
The detection log (
detection.log, 29 bytes, 2 lines) contained only:The expected output format is:
The detection parser correctly flagged this as
parse_errorand setGH_AW_DETECTION_REASON: parse_error. The agent itself ran successfully (10 turns, calledsafeoutputs.noop) and the noop was processed, but thedetectionjob failure caused the overall run conclusion to befailure.The detection model appears to have attempted to execute a shell command (
sudo awf) rather than analyze the output and emit the expected JSON result format. This may indicate:Existing Issue Correlation
Unable to query open
agentic-workflowsissues —ghCLI returned HTTP 403 during this run. Sub-issues below may overlap with existing tracking.Proposed Fix Roadmap
parse_erroras warning (not failure) if agent succeeded.Sub-Issues Created
References: