feat(acp): emit agent_error session/update when LLM call fails after retries#26352
Open
truenorth-lj wants to merge 3 commits into
Open
feat(acp): emit agent_error session/update when LLM call fails after retries#26352truenorth-lj wants to merge 3 commits into
truenorth-lj wants to merge 3 commits into
Conversation
…or propagation Phase 1B of the LLM error propagation refactor. This is the TS mirror of the Python contract merged in tn-mono PR anomalyco#721 (commit 884bcf71). Why a new file instead of an ambient .d.ts augmentation: the SDK's SessionUpdate is a closed `type` alias (discriminated union), not an `interface`. TypeScript declaration merging only works on interfaces, so we cannot extend SessionUpdate via a `.d.ts` patch. The clean TS-native approach is a local extended type that consumers opt into. Adds: - LLMErrorType: closed string union of 6 categories (budget, rate_limit, provider_unavailable, context_overflow, auth, unknown) - LLMErrorPayload: snake_case wire shape mirroring Python LLMErrorPayload; retryable is on-the-wire explicit even though derivable from type - AgentErrorUpdate: { sessionUpdate: "agent_error"; error; stopReason? } - SessionUpdateWithAgentError: SessionUpdate | AgentErrorUpdate (Phase 4 emit site at session/processor.ts halt() will use this) - isRetriable(type): TS mirror of the Python is_retriable() classifier - isAgentErrorUpdate(value): type guard for narrowing unknown frames 12 unit tests cover: classifier per type, vocabulary stability, type-guard positive/negative paths, JSON round-trip, compile-time discriminated union narrowing. Spec: specs/20260508-llm-error-propagation/spec.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…retries When the retry policy in session/retry.ts is exhausted, halt() updates internal state (ctx.assistantMessage.error, Bus.Session.Event.Error, EventV2.SessionEvent.Step.Failed.Sync) but never emits any ACP frame to the connected client. From an ACP client's perspective the turn is silently stuck — no stopReason, no error notification, no final session/update — and the only signal is whatever timeout the client imposes locally. Add a session.error case in acp/agent.ts handleEvent() that translates the SDK error variant into an LLMErrorPayload and emits the agent_error session/update kind with stopReason: "error". The payload prefers headers set by an upstream classifying proxy (x-llm-error-type / x-llm-error-retryable / x-llm-error-reset-at / retry-after) over status-code heuristics. ContextOverflowError is intentionally NOT emitted — halt() routes that variant into in-process compaction; the turn continues on a smaller context window rather than ending in an error state. Tests cover the SDK→LLMErrorPayload mapping for every error variant (APIError with classification headers, APIError status-code fallback, ContextOverflowError, ProviderAuthError, explicit retryable header override) and the integration path that pushes a session.error event through the agent's event subscription and asserts the resulting agent_error session/update. Builds on anomalyco#26306 (which adds the agent_error SessionUpdate kind + LLMErrorPayload type definitions). Closes anomalyco#26350. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #24494
Closes #28453
Stacks on #26306, which adds the
agent_errorSessionUpdatekind and theLLMErrorPayloadshape this PR uses. Diffs from that PR are included here until it merges.Type of change
What does this PR do?
When the retry policy in
session/retry.tsis exhausted,session/processor.ts halt()records the failure internally but does not emit an ACP frame to the connected client. From an ACP client perspective, the turn can look silently stuck: no final error notification and nostopReason.This PR adds a
case "session.error":branch inpackages/opencode/src/acp/agent.tsthat translates the SDK error variant into anLLMErrorPayloadand emits the newagent_errorsession/updatekind withstopReason: "error".The payload extraction prefers classification headers when present:
x-llm-error-type:budget|rate_limit|provider_unavailable|context_overflow|auth|unknownx-llm-error-retryable:"true"/"false"(overrides type-derivedretryable)x-llm-error-reset-at: epoch ms - populatesreset_at_epoch_msretry-after: seconds - populatesretry_after_secondsIt falls back to status-code heuristics when headers are absent (
401->auth,5xx->provider_unavailable, else ->unknown).ProviderAuthErrorandContextOverflowErrorSDK variants are mapped by name.Two
session.errorvariants are intentionally not emitted asagent_error:ContextOverflowError: compaction handles this path and the turn can continue on a smaller context window.MessageAbortedError: this is the normal user-stop path triggered bysession/cancel; the prompt RPC reports cancellation viastopReason: "cancelled", so emittingagent_errorwould make one user action look like both an error and a cancellation.The existing event subscription already receives
session.errorevents through the SDK event stream. No bus, SDK, or event-stream wiring changes are required.How did you verify your code works?
packages/opencode/test/acp/halt-emits-agent-error.test.tscovers:APIErrorclassification headers -> typed payload fields preserved503and401ContextOverflowErrorandProviderAuthErrormappingsession.erroremits exactly oneagent_errorupdate for a classified API errorContextOverflowErroremits noagent_errorMessageAbortedErroremits noagent_errorScreenshots / recordings
N/A - backend / ACP wire change. Client rendering of the
agent_errorframe is downstream consumer work.Checklist
bun run typecheckclean