Skip to content

Day-2 operations framework: TTL reaper, monitoring skill, and runbook generation #23

@arnaudlh

Description

@arnaudlh

Description

The manifesto describes AI in operations: "incident detection, root cause analysis, and remediation suggestions." Git-ape currently stops at post-deployment health checks. This issue establishes Day-2 operational foundations.

Related: #18 (drift detection — another Day-2 capability)

Scope

  1. TTL Reaper workflowgit-ape-ttl-reaper.yml (or gh-aw agentic workflow) that checks deployment TTL (set in metadata.json) and auto-destroys expired resources after notification.
  2. Monitoring setup — During deployment, auto-configure Azure Monitor alerts for key metrics (availability, errors, latency).
  3. Post-deploy monitoring skill/azure-monitor-checker that queries Azure Monitor for resource health status. Enables @git-ape status <deployment-id>.
  4. Runbook generation — Auto-generate operational runbooks from deployment architecture (what to check, how to restart, escalation paths).
  5. Azure SRE Agent compatibility — Ensure deployment artifacts (architecture diagrams, runbooks) are consumable by Azure SRE Agent.

Acceptance Criteria

  • TTL reaper workflow auto-destroys expired deployments.
  • Deployments include Azure Monitor alert configurations.
  • @git-ape status <deployment-id> shows resource health.
  • Operational runbooks generated after deployment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions