feat: optional session store (resumabillity support)#775
Merged
alexhancock merged 9 commits intoApr 21, 2026
Conversation
rockdaboot
reviewed
Mar 30, 2026
e1f4999 to
192faa2
Compare
192faa2 to
2361eca
Compare
Contributor
Author
|
@alexhancock any feedback on this PR? Would really like to progress this forward. |
alexhancock
reviewed
Apr 21, 2026
Contributor
There was a problem hiding this comment.
In theory this will only be needed until June/July when modelcontextprotocol/modelcontextprotocol#2575 is in the official spec, but the implementation looks good and will likely need to live longer than SEP-2575 is included official and used!
alexhancock
approved these changes
Apr 21, 2026
Merged
dax
added a commit
to universal-inbox/universal-inbox
that referenced
this pull request
May 13, 2026
## Summary Production runs two replicas of the API behind a load balancer. With rmcp's per-process `LocalSessionManager`, a request landing on a different pod than the one that handled `initialize` was rejected with 401, which Claude Code interpreted as token expiry — causing a refresh loop that eventually surfaced as *"the universal-inbox-alan MCP server token has expired and requires re-authorization"*. This PR bumps `rmcp` to 1.6 and wires its new `SessionStore` trait through a vendored patch of `rmcp-actix-web` 0.12.3 (upstream re-implements its own handlers instead of delegating to rmcp's tower service, so the trait alone is not enough). A new `RedisSessionStore` persists each session's `initialize` parameters; when a follow-up request hits a pod that doesn't know the session, the patched transport loads from Redis and replays the handshake transparently. The missing-session response also flips from 401 to 404 per the MCP spec, so clients re-initialize cleanly instead of looping on token refresh. ## Changes - **`rmcp` 1.2 → 1.6** in `api/Cargo.toml` to pick up the `SessionStore` trait (`PR modelcontextprotocol/rust-sdk#775`). - **Vendored patch of `rmcp-actix-web` 0.12.3** under `api/vendor/rmcp-actix-web/`, redirected via `[patch.crates-io]` in the workspace `Cargo.toml`. Patch wires `SessionStore` into `handle_get`/`handle_post`/`handle_delete`: replays the `initialize` handshake on any pod that doesn't know the session, persists state on a fresh `initialize`, deletes on session close, and returns 404 (not 401) for unknown/expired sessions. - **`RedisSessionStore`** (`api/src/mcp/session_store.rs`) — `SessionStore` impl using the existing `Cache::connection_manager`, JSON-serialized `SessionState`, namespace `universal-inbox:mcp:session:`, configurable TTL via `set_ex`. - **Wiring** in `api/src/mcp/mod.rs` (`build_http_service` now takes `Arc<dyn SessionStore>`) and `api/src/lib.rs` (constructs `RedisSessionStore` from the existing cache + config). - **Config** `McpSessionStoreSettings { ttl_seconds }` (always enabled; default 86400s). - **Unit tests** in `api/tests/api/test_mcp_session_store.rs` covering round-trip, missing-key returns `None`, delete, and TTL. ## Test plan - [x] `just check` — clippy + compile clean against the patched crate. - [x] `just test session_store` — 4/4 new unit tests pass. - [x] `just test test_mcp` — 18/18 existing MCP/OAuth2 tests pass (no regressions on the patched paths). - [x] Manual two-process smoke test (Caddy round-robining `/api/*` between two API instances sharing the same Postgres + Redis): `initialize` on pod A, `tools/list` with the same `Mcp-Session-Id` lands on pod B, returns 200 — the patched transport restores from Redis. - [ ] Prod canary on Alan: set `UNIVERSAL_INBOX__APPLICATION__MCP_SESSION_STORE__TTL_SECONDS` (defaulted to 86400) and confirm `qovery log` shows `MCP auth accepted` for every request, `session restored from external store` exactly once per pod-bounce of the same session id, and zero `Unauthorized: Session not found` lines. ## Notes - The vendored fork is a near-verbatim copy of upstream rmcp 1.6's `tower::try_restore_from_store` adapted for actix-web. The actix version uses the existing `on_request` hook to populate the synthesized `initialize`/`initialized` extensions, since actix middleware already populates the request's extensions (this is how our auth claim reaches the MCP service today). - `SessionRestoreMarker` is `#[non_exhaustive]` with no public constructor, so it can't be inserted from outside the rmcp crate. Our `UniversalInboxMcpServer` doesn't observe the marker, so omitting it is harmless. - Out of scope: cross-instance SSE event replay (upstream rust-sdk#330). Universal Inbox tools are pure RPC — request/response only — so we don't need an event store. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/universal-inbox/universal-inbox/pull/163" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Added Redis-backed MCP session persistence with configurable TTL, enabling sessions to persist across restarts and server instances. * Implemented cross-instance session restoration—sessions can now be seamlessly recovered and restored on different pods with automatic handshake replay. * Updated session management infrastructure with improved HTTP routing for API and OAuth discovery endpoints. * **Dependencies** * Updated RMCP library to version 1.6 for enhanced session store integration. [](https://app.coderabbit.ai/change-stack/universal-inbox/universal-inbox/pull/163) <!-- end of auto-generated comment: release notes by coderabbit.ai -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
In horizontally-scaled deployments or after restarting a server, MCP clients use
Mcp-Session-Idto resume sessions. When routed to a different server instance or after a restart, the session is unknown and clients get a 404, forcing re-initialization.This PR adds an optional SessionStore trait that persists InitializeRequestParams after a successful handshake. When a request arrives at an instance without a matching in-memory session, the store is consulted and the session is transparently restored by replaying initialize.
The
SessionManagertrait gains arestore_sessionmethod with a default no-op implementation. Custom session manager implementations can override it to integrate with their own logic if needed. The built-inLocalSessionManagerhas an implementation that re-creates the in-memory session worker.Additionally to provide indication to
ServerHandlerimplementation that a call to initialize and on_initialized is as a result of a restore, a markerSessionRestoreMarkeris added to the context.extensions so implementors can act appropriately when a session is restored.The feature is opt-in. Configurations without a session_store configured are unaffected.
Note: for full resumability and multi-server (behind a load balancer) support there is need to also implement an Event store so events aren't lost. This is being discussed at: #330
How Has This Been Tested?
Added a new integration test suite at:
test_streamable_http_session_store.rswith three test scenarios.Breaking Changes
None. Existing logic remains if session_store is not configured (defaults to None).
Types of changes
Checklist
Additional context
Related to #330