Skip to content

fix: chunk large Docker container lists to prevent WebSocket message loss#304

Open
jaydeep-pipaliya wants to merge 2 commits intofosrl:mainfrom
jaydeep-pipaliya:fix/chunk-docker-container-messages
Open

fix: chunk large Docker container lists to prevent WebSocket message loss#304
jaydeep-pipaliya wants to merge 2 commits intofosrl:mainfrom
jaydeep-pipaliya:fix/chunk-docker-container-messages

Conversation

@jaydeep-pipaliya
Copy link
Copy Markdown

@jaydeep-pipaliya jaydeep-pipaliya commented Apr 9, 2026

What does this PR do?

Companion PR for fosrl/pangolin#2117 — Docker Container View not displaying when >20 containers are running.

Problem

gorilla/websocket.WriteJSON serializes the full container list into a single WebSocket text frame. With 55+ containers (each 1-5KB of JSON metadata), the resulting frame can be 55-275KB. Intermediary proxies (Traefik, nginx, Cloudflare tunnels) can silently drop or truncate frames this large, causing the pangolin server to never receive the container data.

Solution

Added sendContainerList() that automatically chunks large container lists:

// ≤15 containers: single message (backward compatible, zero behavior change)
{"containers": [...]}

// >15 containers: chunked with batch metadata
{"containers": [...], "chunkIndex": 0, "totalChunks": 4, "batchId": "a1b2c3d4"}
{"containers": [...], "chunkIndex": 1, "totalChunks": 4, "batchId": "a1b2c3d4"}
{"containers": [...], "chunkIndex": 2, "totalChunks": 4, "batchId": "a1b2c3d4"}
{"containers": [...], "chunkIndex": 3, "totalChunks": 4, "batchId": "a1b2c3d4"}

Why batchId?

Two concurrent container sends can happen when a manual fetch request and a Docker event fire at the same time. Without batchId, their chunks would interleave and corrupt the accumulated data on the server. Each batch gets a unique ID (reusing the existing generateChainId helper) so the server can track and supersede batches correctly.

Why chunk size of 15?

Each container with full metadata (labels, ports, networks) serializes to 1-5KB. 15 containers ≈ 15-75KB per WebSocket frame — safely under common proxy defaults (Traefik default buffer: 64KB, nginx: 64KB). The threshold was chosen to balance message count vs. frame size.

Changes

main.go:

  • Added sendContainerList() — chunks large lists, passes small lists through unchanged
  • Updated both call sites: initial fetch handler + Docker event monitor callback
  • Reuses existing generateChainId() for batch IDs — no new dependencies

Companion PR

Pangolin server: fosrl/pangolin#2817

  • Chunk reassembly with batchId tracking, input validation, typed accumulator, 120s TTL on partial chunks

Testing

  • go build passes
  • ≤15 containers: identical to current behavior (single message, no metadata)
  • >15 containers: splits into chunks of 15 with chunkIndex/totalChunks/batchId
  • Concurrent sends: each gets unique batchId, server handles superseding

…loss

When more than ~20 Docker containers are running, the WebSocket message
containing the full container list can be too large and get dropped by
intermediary proxies (e.g., Traefik), causing the Docker Container View
in Pangolin to show nothing.

Split container lists larger than 15 items into multiple chunked
messages with chunkIndex/totalChunks metadata. The Pangolin server
reassembles chunks before processing.

Small lists (<=15) are sent in a single message for backward
compatibility with older Pangolin versions.

Fixes fosrl/pangolin#2117
Each chunked batch now includes a unique batchId (reusing generateChainId)
so the server can distinguish interleaved sends — e.g., a manual fetch
and a Docker event firing at the same time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant