Skip to content

flake: TestRetryWithInterval #1553

@flake-investigator

Description

@flake-investigator

CI Run Link: https://github.com/coder/coder/actions/runs/26586696037
Failing Job: https://github.com/coder/coder/actions/runs/26586696037/job/78334548165 (test-go-pg-17, run_attempt 1)
Failure time: 2026-05-28T16:15:42Z

Commit Info:

  • SHA: 094fe971ad9226582efbef39fdd4ea5636af3ceb
  • Author: DevCats
  • Commit title: "chore(aibridge): add AWS PRM user-agent attribution for Bedrock calls (#25221)"
  • Commit: coder/coder@094fe97

Failure evidence:

=== FAIL: cli TestRetryWithInterval/Stops_NonRetryableError (unknown)
=== FAIL: cli TestRetryWithInterval/Stops_ContextCanceled (unknown)
=== FAIL: cli TestRetryWithInterval/Succeeds_FirstTry (unknown)
=== FAIL: cli TestRetryWithInterval/Succeeds_AfterTransientFailures (unknown)
=== FAIL: cli TestRetryWithInterval/Stops_MaxAttemptsExhausted (unknown)
=== FAIL: cli TestRetryWithInterval (unknown)

2026-05-28 16:12:26.227 [warn] transient error, retrying  error="lookup example.com: no such host"  attempt=1
sloghuman: failed to write entry: io: read/write on closed pipe

Error analysis:

  • gotestsum reported all TestRetryWithInterval subtests as "unknown" while the test logs show retryWithInterval emitting transient DNS errors and slogtest reporting write failures to a closed pipe.
  • This suggests a flaky interaction with the slogtest logger or stdio capture during these retry tests (no panic or crash observed).

Root cause classification:

  • A. Flaky Test (logging/stdio capture during retryWithInterval subtests)

Data race / panic / OOM checks:

  • No WARNING: DATA RACE, panic:, or OOM indicators found in the retrieved logs.

Precise assignment analysis:

  • Intended blame command: git blame -L 520,595 cli/ssh_internal_test.go (TestRetryWithInterval).
  • Recent file history: git log --oneline -10 --follow cli/ssh_internal_test.go -> 1d0653cd (Ehab Younes) "fix(cli): retry dial timeouts in SSH connection setup".
  • Assigning to the most recent substantive modifier of the retryWithInterval test area.

Related issues search (coder/internal):

  • "TestRetryWithInterval"
  • "RetryWithInterval"
  • "lookup example.com"
  • "transient error, retrying"
    No matches found.

Reproduction (best effort):

go test ./cli -run TestRetryWithInterval -count=50

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions