Skip to content

[aw-failures] [aw-failure-investigator] Failure Report 2026-05-06 ~19:12 UTC #30673

@github-actions

Description

@github-actions

Overview

Failure investigation run covering the last 6 hours (roughly 13:00–19:12 UTC on 2026-05-06). Of 50 runs analyzed (15 successful, 30 skipped, 3 failed, 2 in-progress), 3 distinct failures were found across 2 root-cause clusters. Both clusters trace to infrastructure-level problems rather than workflow logic bugs.

⚠️ Note: gh CLI was unavailable for querying existing issues (403 on internal GitHub). Existing-issue correlation could not be performed; sub-issues may overlap with prior tracking.


Failure Clusters

# Workflow Run Time (UTC) Root Cause Priority
1 Documentation Unbloat §25449195821 16:56 API rate limit at guard-policy check P1
2 Contribution Check §25450254117 17:18 API rate limit in safe_outputs job P1
3 Daily Safe Output Integrator §25455592425 19:10 Detection model emitted shell command instead of THREAT_DETECTION_RESULT JSON P1

Evidence

Cluster A: GitHub Installation API Rate Limit (~17:00 UTC)

Documentation Unbloat — guard policy failure

Error from agent job:

##[error]Failed to determine automatic guard policy: API rate limit exceeded for installation.
Request ID: 3C00:3C2A48:290908C:9DF389F:69...

The workflow failed before the agent could activate — the guard policy step hit the GitHub API installation rate limit.

Contribution Check — safe_outputs rate limit burst

The agent ran 18 turns successfully and produced add_comment safe outputs. However, in the safe_outputs job, 3 consecutive add_comment calls and 1 create_issue call all failed with:

##[error]Failed to add comment: API rate limit exceeded for installation.
Request ID: 0804:2874A1:2A87C35:A2F6043:69FB78FF
##[error]Failed to add comment: API rate limit exceeded for installation.
Request ID: 0800:27151A:11BC8E0:43D4955:69FB7900

After retries (~2 min), the safe_outputs recovered:

✓ Comment created successfully with ID: 4390448744
✓ Issue #30477 closed successfully

However the safe_outputs job exit code was non-zero due to the initial failures, causing the run to be marked failure. The rate limiting was transient and self-resolved.

Cluster B: Detection Model Misbehavior (~19:10 UTC)

Daily Safe Output Integrator — detection stage output

The detection log (detection.log, 29 bytes, 2 lines) contained only:

sudo: awf: command not found

The expected output format is:

THREAT_DETECTION_RESULT:{"prompt_injection":bool,"secret_leak":bool,"malicious_patch":bool,"reasons":[...]}

The detection parser correctly flagged this as parse_error and set GH_AW_DETECTION_REASON: parse_error. The agent itself ran successfully (10 turns, called safeoutputs.noop) and the noop was processed, but the detection job failure caused the overall run conclusion to be failure.

The detection model appears to have attempted to execute a shell command (sudo awf) rather than analyze the output and emit the expected JSON result format. This may indicate:

  • The detection model was confused by workflow output content
  • A prompt injection artifact in the agent output
  • A detection model prompt or configuration regression

Existing Issue Correlation

Unable to query open agentic-workflows issues — gh CLI returned HTTP 403 during this run. Sub-issues below may overlap with existing tracking.


Proposed Fix Roadmap

Priority Issue Action
P1 API rate limit causes transient workflow failures Add retry-with-backoff to guard policy check; safe_outputs already retries but partial failures propagate. Consider graceful degradation.
P1 Detection model emits shell commands instead of JSON Investigate detection model prompt; add output validation before log parse; treat parse_error as warning (not failure) if agent succeeded.

Sub-Issues Created

References:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions