Address review feedback on #2896 by trungutt · Pull Request #2921 · docker/docker-agent

trungutt · 2026-05-29T07:40:37Z

Why

#2896 introduced the in-process unmanaged OAuth drive-flow and the out-of-band callback route. The PR landed before review feedback (from the bot and from @dgageot) could be folded in. This PR addresses each point.

Changes

1. Docs — `cagent/` → `docker-agent/` (per @dgageot)

docs/features/remote-mcp/index.md still referenced the legacy meta-key names (cagent/type, cagent/server_url, cagent/authorize_url, cagent/state) while the runtime emits docker-agent/*. Embedders following the docs would have looked up the wrong keys. Renamed in-place. Also renamed "legacy" → "client-driven" in the same file for the mode that does not set --mcp-oauth-redirect-uri, matching the PR-body terminology.

2. Drop the tautological state check on the deeplink-callback path (bot, MEDIUM)

On the direct-callback path, handleUnmanagedOAuthFlow was synthesising the payload consumeUnmanagedElicitationReply expects:

content = map[string]any{"code": cb.Code, "state": expectedState}

Then consumeUnmanagedElicitationReply did:

if subtle.ConstantTimeCompare([]byte(state), []byte(expectedState)) != 1 { ... }

Always true by construction — the guard was dead code on this path and gave a false sense of security: any reviewer would assume the state arriving over HTTP was re-validated here, when it was not.

Fix: extract a exchangeAuthorizationCode helper that performs only the /token exchange. The deeplink path calls it directly with cb.Code (state was already validated by the pending-OAuth registry lookup — only states the runtime generated and is currently awaiting are present, unknown ones return 404). The elicitation-reply path keeps its state check, because there state actually arrives from the client and is independent of expectedState.

3. Bound the OAuth flow's wait with a deadline (bot, MEDIUM)

handleUnmanagedOAuthFlow selects across:

Case	When it fires (without this PR)
`ctx.Done()`	Never — `ctx` is the `WithoutCancel`-detached ctx from `clientConnector.Connect`
`userCancelCh`	Only on explicit parent-ctx cancellation by the host
`callbackCh`	Only when a deeplink hits `/api/mcp-oauth/callback`
`elicCh`	Only when `requestElicitation` returns

If the MCP client disconnects silently (TCP RST, idle timeout, process kill) and requestElicitation does not return promptly, the OAuth flow blocks forever. The per-session streaming lock at the SessionManager level is held until the next process restart; every subsequent message from that session returns 409 Conflict / ErrSessionBusy.

Fix: scope the elicitation goroutine to an elicCtx with a 10-minute deadline (unmanagedOAuthWaitTimeout). When the deadline fires, requestElicitation returns a wrapped context.DeadlineExceeded, the elicCh case picks it up, the OAuth flow returns, and the lock releases. An explicit <-elicCtx.Done() case in the outer select serves as a defensive fallback in case requestElicitation ever ignores its ctx. New test TestUnmanagedOAuthFlow_DriveFlow_TimesOutWhenNoReplyArrives exercises this with a 200ms override.

10 minutes is generous enough for a user to click through an IdP consent screen and any IdP-side prompts (MFA, recovery codes, …); small enough that a dead session can't strand the lock. Open to making it configurable via RuntimeConfig if needed.

4. Document the threat model on `POST /api/mcp-oauth/callback` (bot, MEDIUM)

The route inherits the same auth posture as the rest of the API: when --auth-token is unset, it's open. The registry's lookup-by-state IS meaningful authorization for the legitimate request (only states the runtime generated and is awaiting are accepted), but state values do appear in transit:

in the elicitation Meta on the session SSE stream (visible to the connected client)
in the authorize URL the user opens (visible to the user's browser and the authorization server)
in debug logs (when --debug is on)

An attacker who observes a live state via any of these channels could POST here with an attacker-controlled code; the resulting token would be bound to the attacker's account. Setting --auth-token blocks this regardless of state leakage.

The handler's doc-comment now spells out the threat model and recommends --auth-token for deployments that listen on a network-reachable interface. No code change was made: this route is no more exposed than any other API endpoint under the same group, and operators already make the --auth-token choice once for the whole API.

- Fix stale docs that still reference cagent/* meta keys; rename to docker-agent/* to match what the runtime emits. Also rename 'legacy' to 'client-driven' in docs for the mode that does not set --mcp-oauth-redirect-uri. - Drop the tautological state check on the deeplink-callback path in handleUnmanagedOAuthFlow. Synthesizing {code, state=expectedState} for consumeUnmanagedElicitationReply meant the ConstantTimeCompare guard was always true and could mislead future readers. The pending-OAuth registry lookup IS the state validation for that path; state validation now lives only in the elicitation-reply branch where state actually arrives from the client. Extract a small exchangeAuthorizationCode helper that both paths share for the actual /token exchange. - Bound the OAuth flow's wait with a 10 minute deadline. Without it, a silent MCP-client disconnect (TCP RST, idle timeout, process kill) leaves requestElicitation blocked forever and the per-session streaming lock held indefinitely, returning 409 / ErrSessionBusy on every subsequent message until a process restart. The deadline applies via context.WithTimeout on elicCtx; an explicit case in the outer select also fires as a defensive fallback if requestElicitation ever ignores its ctx. New test TestUnmanagedOAuthFlow_DriveFlow_TimesOutWhenNoReplyArrives exercises the path with a 200ms override. - Document the threat model on /api/mcp-oauth/callback in the handler doc-comment: the registry is the per-request authorization, but if a state value leaks (debug logs, compromised host) an attacker could submit a code; --auth-token blocks this regardless of state leakage and SHOULD be set when the agent listens on a network-reachable interface. Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>

docker-agent

Assessment: 🟢 APPROVE

Reviewed the OAuth refactoring changes in this PR. The diff is clean — no bugs found in the added code.

What was checked:

exchangeAuthorizationCode extraction: Correctly splits the token-exchange step from state validation. The elicitation path retains its subtle.ConstantTimeCompare state check; the deeplink-callback path legitimately skips it (state was already validated by the pending-OAuth registry lookup). No tautology remains, and no validation gap is introduced.
Timeout / deadline logic: context.WithTimeout(ctx, unmanagedOAuthWaitTimeout) correctly bounds the wait. The defer elicCancel() ensures the goroutine's context is cancelled on all return paths. The elicitation goroutine writes to a buffered channel (capacity 1) so it cannot block after handleUnmanagedOAuthFlow returns — no goroutine leak.
Defensive <-elicCtx.Done() select case: Provides a correct safety net for implementations of requestElicitation that do not honour context cancellation. In practice the elicCh case fires first with a wrapped DeadlineExceeded; the defensive case only activates if the implementation ignores its ctx.
nil token after select: Both callbackCh and elicCh cases only leave token == nil when consumeErr != nil, so the early return on consumeErr != nil is always taken before token would be dereferenced.
resourceMetadata removal: The param was previously unused (_ = resourceMetadata). The resourceIndicator string (the actual token-exchange parameter derived from it) is still threaded through correctly to exchangeAuthorizationCode.
Test: TestUnmanagedOAuthFlow_DriveFlow_TimesOutWhenNoReplyArrives correctly exercises the deadline path. The global unmanagedOAuthWaitTimeout override is safe because the test does not call t.Parallel().
Docs: cagent/* → docker-agent/* renames are consistent throughout the updated table and prose.

docker-agent

Assessment: 🟢 APPROVE

The refactor is structurally sound. The exchangeAuthorizationCode extraction is correct — the elicitation path retains its ConstantTimeCompare state validation, while the callback path legitimately skips it because the pending-OAuth registry lookup (unknown states return 404) is the equivalent check. The buffered elicCh prevents goroutine leaks in all select arms. The elicCtx.Done() timeout guard correctly races with elicCh and both outcomes produce well-handled errors. No high or medium severity issues found — two minor defensive notes left inline.

docker-agent · 2026-05-29T08:14:10Z

-	)
-	if err != nil {
-		return err
+	if consumeErr != nil {


[LOW] Defensive nil-token guard missing after exchangeAuthorizationCode

After both select arms (callbackCh and elicCh) complete, the post-select block only checks consumeErr != nil before dereferencing token (e.g. token.ClientID = clientID a few lines below). If exchangeAuthorizationCode were ever to return (nil, nil) — a theoretical edge case if exchangeCodeForToken's contract ever changes — execution would reach token.ClientID = ... with a nil pointer and panic.

A one-line defensive check after the error guard would eliminate this latent risk:

if consumeErr != nil { return consumeErr } if token == nil { return errors.New("internal: exchangeAuthorizationCode returned nil token without error") }

This is low risk given exchangeCodeForToken's current implementation always returns an error on nil token, but the guard is cheap and makes the invariant explicit.

docker-agent · 2026-05-29T08:14:10Z

+// would return 409 / ErrSessionBusy until a process restart.
+func TestUnmanagedOAuthFlow_DriveFlow_TimesOutWhenNoReplyArrives(t *testing.T) {
+	original := unmanagedOAuthWaitTimeout
+	unmanagedOAuthWaitTimeout = 200 * time.Millisecond


[LOW] Package-level test variable mutation is fragile if tests are ever parallelized

unmanagedOAuthWaitTimeout is a package-level var overridden directly in the test body:

unmanagedOAuthWaitTimeout = 200 * time.Millisecond t.Cleanup(func() { unmanagedOAuthWaitTimeout = original })

This is safe today because the test does not call t.Parallel(). However, if a future contributor adds t.Parallel() to this test or any sibling test that reads unmanagedOAuthWaitTimeout, a data race is introduced and the cleanup's restore ordering becomes non-deterministic.

Consider a concurrency-safe alternative — for example, an oauthWaitTimeout field on oauthTransport (defaulting to the package constant) so each test instance sets its own value without touching shared state.

… (refs #2921)

…ect-uri flag (refs docker#2896, docker#2921)

…tion (refs docker#2921)

… (refs #2921)

docker-agent reviewed May 29, 2026

View reviewed changes

aheritier added area/api For features/issues/fixes related to the usage of the cagent API area/mcp MCP protocol, MCP tool servers, integration area/security Authentication, authorization, secrets, vulnerabilities kind/fix PR fixes a bug (maps to fix: commit prefix) labels May 29, 2026

trungutt marked this pull request as ready for review May 29, 2026 08:09

trungutt requested a review from a team as a code owner May 29, 2026 08:09

docker-agent reviewed May 29, 2026

View reviewed changes

dgageot approved these changes May 29, 2026

View reviewed changes

dgageot merged commit 7797298 into docker:main May 29, 2026
11 checks passed

aheritier mentioned this pull request May 29, 2026

docs: sync /docs with changes merged 2026-05-28 – 2026-05-29 #2929

Merged

aheritier added a commit that referenced this pull request May 29, 2026

docs: document --auth-token flag and add OAuth callback security note…

f380ed0

… (refs #2921)

aheritier mentioned this pull request May 29, 2026

docs: document --auth-token flag, OAuth callback security note, and TUI notification UX #2932

Merged

aheritier added a commit that referenced this pull request May 29, 2026

docs: document --auth-token flag and add OAuth callback security note…

4f7bc5b

… (refs #2921)

aheritier added a commit that referenced this pull request May 29, 2026

docs: document --auth-token flag and add OAuth callback security note…

24f07b3

… (refs #2921)

aheritier added a commit that referenced this pull request May 29, 2026

docs: document --auth-token flag and add OAuth callback security note…

b2c1310

… (refs #2921)

pull Bot pushed a commit to TheTechOddBug/cagent that referenced this pull request May 29, 2026

docs: document /api/mcp-oauth/callback endpoint and --mcp-oauth-redir…

a2132cb

…ect-uri flag (refs docker#2896, docker#2921)

pull Bot pushed a commit to TheTechOddBug/cagent that referenced this pull request May 29, 2026

docs: clarify docker-agent/authorize_url prose in unmanaged OAuth sec…

42265c7

…tion (refs docker#2921)

aheritier added a commit that referenced this pull request May 29, 2026

docs: document --auth-token flag and add OAuth callback security note…

4fb7d79

… (refs #2921)

aheritier mentioned this pull request May 30, 2026

docs: document --theme flag for docker agent run #2936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address review feedback on #2896#2921

Address review feedback on #2896#2921
dgageot merged 1 commit into
docker:mainfrom
trungutt:address-oauth-review-feedback

trungutt commented May 29, 2026 •

edited

Loading

Uh oh!

docker-agent left a comment

Uh oh!

docker-agent left a comment

Uh oh!

docker-agent May 29, 2026

Uh oh!

docker-agent May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

trungutt commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Changes

1. Docs — cagent/* → docker-agent/* (per @dgageot)

2. Drop the tautological state check on the deeplink-callback path (bot, MEDIUM)

3. Bound the OAuth flow's wait with a deadline (bot, MEDIUM)

4. Document the threat model on POST /api/mcp-oauth/callback (bot, MEDIUM)

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟢 APPROVE

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟢 APPROVE

Uh oh!

docker-agent May 29, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

trungutt commented May 29, 2026 •

edited

Loading

1. Docs — `cagent/` → `docker-agent/` (per @dgageot)

4. Document the threat model on `POST /api/mcp-oauth/callback` (bot, MEDIUM)