Skip to content

feat(otel): advanced observability phase 2 - data layer and privacy hardening#3797

Open
jaegeral wants to merge 16 commits intogoogle:masterfrom
jaegeral:otel-advanced-observability
Open

feat(otel): advanced observability phase 2 - data layer and privacy hardening#3797
jaegeral wants to merge 16 commits intogoogle:masterfrom
jaegeral:otel-advanced-observability

Conversation

@jaegeral
Copy link
Copy Markdown
Collaborator

@jaegeral jaegeral commented May 5, 2026

This Pull Request implements Phase 2 of the OpenTelemetry instrumentation for Timesketch. The focus of this phase is extending distributed tracing into the data layer (OpenSearch and PostgreSQL)

Data Layer Visibility

  • OpenSearch: Implemented manual instrumentation for search() and count() methods. This captures targeted indices and internal processing latency (took_ms) without exposing the query content.

  • SQLAlchemy (PostgreSQL): Integrated automatic tracing for all database operations, providing visibility into connection health and statement execution time.

  • ACL Layer: Wrapped has_permission() checks in logical spans. This groups the many individual database metadata lookups (often 40+ per request) into a single, understandable context in the trace waterfall.

  • Content Exclusion: Raw search query strings and DSL structures are now completely excluded from span attributes. This ensures that no sensitive case data (PII, hostnames, identifiers) ever leaves the application via telemetry.

  • Sensitive Data Scrubber: A global SpanProcessor automatically redacts credentials (passwords, tokens, sessions) from any span attribute across the entire system.

  • Analyst Attribution: Authenticated user.id and user.name are captured on all API spans. This allows correlating performance bottlenecks or errors back to a specific investigator without seeing their search terms.-

Developer Experience & Resilience

  • Safe Telemetry Calls: Implemented a @safe_telemetry_call decorator that ensures telemetry logic (like JSON serialization) is "best-effort" and never crashes the primary business logic.

Infrastructure & Documentation

  • Configuration: Relocated otel-collector-config.yaml to the central data/ directory for better deployment readiness.
  • Guides: Updated docs/OpenTelemetry.md with current architecture and configuration details.

@jaegeral
Copy link
Copy Markdown
Collaborator Author

jaegeral commented May 5, 2026

/gemini review

@jaegeral jaegeral requested a review from rgayon May 5, 2026 14:07
@jaegeral jaegeral self-assigned this May 5, 2026
@jaegeral jaegeral marked this pull request as ready for review May 5, 2026 14:07
- CHOKIDAR_USEPOLLING=true
- PROMETHEUS_MULTIPROC_DIR=/tmp/
- ENABLE_STRUCTURED_LOGGING=true
# - ENABLE_STRUCTURED_LOGGING=true
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from typing import Generator, List, Dict, Optional, Any, Union

from dateutil import parser, relativedelta
from flask import abort
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just changing the order of imports to make lint happy

@jaegeral jaegeral changed the title feat: Telemetry Privacy & Redaction and add a span for acl checks feat(otel): advanced observability phase 2 - data layer and privacy hardening May 6, 2026
@jaegeral
Copy link
Copy Markdown
Collaborator Author

jaegeral commented May 6, 2026

Had timesketch/models/acl.py:324:4: W9016: "permission, user" missing in parameter type documentation (missing-type-doc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant