Skip to content

capture: add multi-display observability and adaptive OCR budgets #34

@haasonsaas

Description

@haasonsaas

Summary

Improve capture quality and cost control with stable display identifiers, per-display telemetry, and adaptive OCR budgets driven by backlog and duplicate/drop behavior.

Why

agentd currently captures the first display and uses one global cadence. That is fine for v0, but dogfood will quickly hit multi-monitor setups, retina/non-retina differences, display changes, high OCR cost, and noisy windows. We need observability before blindly increasing capture volume.

SOTA notes

  • ScreenCaptureKit samples emphasize explicit content filters, configurable stream output, and bounded queue depth.
  • Apple's sample notes queue depth affects memory and should stay bounded; agentd already keeps queueDepth = 5, which is good.
  • Screenpipe's active backlog includes per-monitor observability and stable display IDs, which maps directly to this product surface.

Acceptance

  • Record stable display metadata in frame/batch metadata where possible: display id, size, scale, active/main display state.
  • Heartbeat or diagnostics report includes per-display capture counts, drops, OCR latency estimate, and backpressure drops.
  • Handle display attach/detach without requiring app restart.
  • Add config/policy for selected displays or capture-all-displays, with conservative defaults.
  • Add adaptive OCR/capture budget behavior when backlog, OCR latency, or droppedBackpressure exceed thresholds.
  • Tests cover adaptive budget state transitions and metadata encoding; manual smoke covers multi-display behavior.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions