Skip to content

owine/compose-workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

335 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Compose Workflow

Reusable GitHub Actions workflows for Docker Compose deployments across multiple repositories. Provides centralized lint and deploy automation with self-hosted-runner-based deployment to the host on which compose stacks live.

Architecture

As of 2026-05-02, deployment uses self-hosted GitHub Actions runners that live on the deployment hosts themselves β€” there is no SSH-from-CI path. The runner runs as a deploy user, in the docker group and the admin's group, and pulls jobs from GitHub. Eliminates the SSH-key/Tailscale/sudo-rule class of issues that plagued the prior design.

Three caller repos use this workflow:

  • docker-piwine β€” runner label piwine
  • docker-piwine-office β€” runner label piwine-office
  • docker-zendc β€” runner label zendc

Key Features

  • πŸ”’ Security first β€” input validation, GitGuardian secret scanning, 1Password integration
  • 🏠 Self-hosted deploy β€” no inbound network access required; runner pulls jobs from GitHub
  • πŸ”„ Automatic rollback β€” git reset --hard <previous_sha> + redeploy on deploy or health failure
  • πŸ” Failure diagnostics β€” on stack failure, dumps docker compose ps -a, healthcheck history (.State.Health.Log), and scoped service logs
  • πŸ“Š Discord notifications β€” pipeline-status icon line with deploy/health/rollback states, commit link, user mention on failure
  • 🚦 Critical stack detection β€” auto-detects from com.compose.tier: infrastructure labels
  • πŸ” Multi-registry auth β€” single 1P round trip + docker/login-action per registry (ghcr.io, docker.io, registry.gitlab.com, custom GitLab)

Available Workflows

compose-lint.yml β€” validation

Parallel GitGuardian + yamllint + docker compose config validation. Runs on GitHub-hosted runners (no host access needed).

jobs:
  lint:
    uses: owine/compose-workflow/.github/workflows/compose-lint.yml@main
    secrets: inherit
    with:
      stacks: '["stack1", "stack2", "stack3"]'
      webhook-url: "op://Docker/discord-github-notifications/<env>_webhook_url"
      repo-name: "my-docker-repo"
      target-repository: ${{ github.repository }}
      target-ref: ${{ github.sha }}
      discord-user-id: "op://Docker/discord-github-notifications/user_id"
      # plus event-context inputs (see compose-lint.yml for the full list)

deploy.yml β€” self-hosted deploy

5-job pipeline: prepare β†’ deploy β†’ health-check β†’ rollback β†’ notify. Runs on [self-hosted, <runner-label>].

on:
  workflow_run:
    workflows: ["Lint Docker Compose"]
    types: [completed]
    branches: [main]
  workflow_dispatch:
    inputs:
      force-deploy:
        type: boolean
        default: false

concurrency:
  group: deploy-<repo-label>
  cancel-in-progress: false

jobs:
  deploy:
    if: ${{ github.event.workflow_run.conclusion == 'success' || github.event_name == 'workflow_dispatch' }}
    uses: owine/compose-workflow/.github/workflows/deploy.yml@<sha>
    secrets: inherit
    with:
      runner-label: piwine                  # or piwine-office, zendc
      live-repo-path: /opt/compose
      repo-name: "docker-piwine"
      webhook-url: "op://Docker/discord-github-notifications/piwine_webhook_url"
      discord-user-id: "op://Docker/discord-github-notifications/user_id"
      target-ref: ${{ github.event.workflow_run.head_sha || github.sha }}
      has-dockge: true                      # false for zendc
      force-deploy: ${{ inputs.force-deploy || false }}

Optional inputs: live-dockge-path (when has-dockge: true), auto-detect-critical (default true), critical-services (manual override), image-pull-timeout, service-startup-timeout, failed-container-log-lines.

Required Configuration

Repository structure (caller repo)

β”œβ”€β”€ .yamllint                     # yamllint configuration
β”œβ”€β”€ compose.env                   # env file with op:// references
β”œβ”€β”€ .github/
β”‚   β”œβ”€β”€ actionlint.yaml           # declares the runner-label
β”‚   └── workflows/
β”‚       β”œβ”€β”€ lint.yml              # calls compose-lint.yml
β”‚       └── deploy.yml            # calls compose-workflow's deploy.yml
β”œβ”€β”€ stack1/compose.yaml
β”œβ”€β”€ stack2/compose.yaml
└── ...

actionlint.yaml must declare the runner label or PRs fail with "label X is unknown":

self-hosted-runner:
  labels:
    - piwine     # whichever label this repo's deploy.yml passes as runner-label

Required secrets

Calling repos need exactly one secret:

  • OP_SERVICE_ACCOUNT_TOKEN β€” 1Password service account token. Used by both lint (GitGuardian API key) and deploy (env-file resolution + multi-registry credentials + Discord webhook).

The previously-required SSH_USER / SSH_HOST secrets were deleted from all three caller repos on 2026-05-03; only OP_SERVICE_ACCOUNT_TOKEN remains.

1Password references

op://Docker/discord-github-notifications/<env>_webhook_url
op://Docker/discord-github-notifications/user_id
op://Docker/ghcr-pat/{username,pat}
op://Docker/docker-hub/{username,token}
op://Docker/gitlab-registry/{username,token}
op://Docker/gitlab-container-zenterprise/{username,token}
op://Docker/gitguardian/api_key

Self-hosted runner host requirements

The runner host needs docker, jq, timeout (coreutils), gh, and op (1Password CLI) on the deploy user's PATH. Plus a registered runner systemd service running as deploy. Full host-prep playbook in docs/superpowers/runbooks/self-hosted-runner-migration.md β€” covers path-ownership pattern (admin owns tree, deploy in admin's group via safe.directory), setgid + group-write, and the mandatory umask 002 for both admin and runner users.

Healthcheck requirements for --wait

deploy.yml invokes docker compose up --wait, which only verifies services that have healthchecks defined. Services without healthchecks start but don't gate the deploy. See CLAUDE.md for healthcheck patterns.

One-shot containers (e.g. migration sidecars gated via service_completed_successfully) end up exited with code 0 β€” the health-check job recognizes this as success.

Disabling stacks and services

Two mechanisms cover different scopes. Both rely on docker compose up --wait --remove-orphans (which deploy.yml already uses on every stack invocation) to tear down whatever is no longer present.

Disabling a whole stack: .disabled marker file

Touch <stack>/.disabled alongside the stack's compose.yaml to disable it:

touch silo/.disabled
git add silo/.disabled
git commit -m "disable silo"
git push

On the next deploy:

  • The Discover stacks step excludes the directory from active stacks and emits it on the disabled_stacks output.
  • The change-detection script reclassifies the stack as removed (effectively-present-in-CURRENT, not effectively-present-in-TARGET).
  • The Teardown removed stacks step runs docker compose down against the still-present compose.yaml in the live tree.
  • Discord embed and PR comment surface a πŸ›‘ **Disabled stacks:** line alongside the existing πŸ—‘οΈ **Removed stacks:** line.

The stack directory and its compose.yaml stay in the repo β€” only the running containers go away. Re-enable with git rm <stack>/.disabled; the next deploy classifies it as a new stack and runs up --wait.

Lint coverage is unaffected: caller-repo lint workflows do not filter .disabled, so YAML and compose config validation continue to run on disabled stacks. This keeps the file from rotting silently between disable and re-enable.

Detection rules in detail. A stack is "effectively present" at a given SHA iff compose.yaml exists and .disabled does not. The change-detection script applies an effectively-present-in-CURRENT guard on removed-stack detectors and an effectively-present-in-TARGET guard on new-stack detectors. This correctly handles edge cases:

Transition Action
Enabled β†’ disabled docker compose down (teardown path)
Disabled β†’ enabled docker compose up --wait (new-stack path)
Stay disabled no-op
Born disabled (added with .disabled already present) no-op
Delete while disabled no-op (already torn down at disable time)

Design spec: docs/superpowers/specs/2026-05-26-disabled-stacks-design.md.

Disabling individual services: comment them out

Compose itself has no first-class "disable one service" mechanism, but up --wait --remove-orphans treats any service no longer present in the compose model as an orphan and removes it on the next deploy. The simplest way to remove a service: comment out its block in compose.yaml.

services:
  primary:
    image: ghcr.io/example/app:1.2.3
    # ...

  # postgres:
  #   image: postgres:16
  #   restart: unless-stopped
  #   # ... rest of service definition

On the next deploy:

  • up --wait --remove-orphans brings down any containers belonging to the commented-out service.
  • The stack itself stays "existing" (its directory and compose.yaml are still present), so it isn't reclassified.

Re-enable by uncommenting and pushing; up --wait recreates the service.

Why not Compose profiles? profiles: keeps the service definition active in YAML, so Renovate continues opening image-bump PRs for a service nobody is running. Commenting out freezes the service at its current image and stops the bump churn. Use profiles only when you expect a short-term disable and want the image to stay current.

Caveats of commenting:

  • Larger diffs (the entire service block goes from active to commented). You've already accepted this as the tradeoff for simplicity.
  • The commented YAML is no longer parsed, so if the surrounding compose structure drifts (new networks, renamed volumes), re-enabling may require a touch-up.
  • Service-specific named volumes/networks declared at the top level are not removed by --remove-orphans (it only removes containers). Manual cleanup with docker volume rm / docker network rm if you want the storage gone.

Testing and Development

# Lint workflow files
actionlint .github/workflows/compose-lint.yml \
           .github/workflows/deploy.yml \
           .github/workflows/workflow-lint.yml
yamllint --strict .github/workflows/*.yml

# Lint deployment scripts
shellcheck scripts/deployment/*.sh scripts/linting/*.sh

# Local testing utilities
./scripts/testing/test-workflow.sh
./scripts/testing/validate-compose.sh

Security

Input validation

  • Stack names validated against ^[a-zA-Z0-9._-]+$ before any docker compose invocation
  • Target refs validated as 40-char hex SHAs
  • Webhook URLs validated as 1Password references

Secret management

  • All secrets stored in 1Password (no plaintext in repos or workflow files)
  • op run --env-file=… resolves references at deploy time
  • 1password/load-secrets-action for individual values (registry creds, Discord webhook)
  • Multi-registry creds cached in the runner host's ~/.docker/config.json via docker/login-action with logout: false

Network model

  • No inbound network access to deployment hosts is required for CI β€” runners on the host pull jobs from GitHub
  • Outbound: runner β†’ GitHub Actions API, image registries, 1Password, Discord webhook
  • No Tailscale dependency (the prior SSH-based design needed it; the self-hosted approach makes it unnecessary)

Troubleshooting

Symptom First thing to check
Runner shows offline in GitHub sudo systemctl status actions.runner.<...>.service on the host
git reset --hard permission denied umask 002 missing in admin user's .zshrc/.bashrc β€” see runbook
Stack fails with no log lines Check the failure diagnostic dump in the run β€” compose ps -a + inspect Health.Log + scoped compose logs should be there
Discord embed wrong color Verify the case statement in notify job's status step still maps healthy β†’ success and failed β†’ failure
GitGuardian failure Verify OP_SERVICE_ACCOUNT_TOKEN and that the 1P service account has access to the GitGuardian API key

For self-hosted runner setup or migration of a new host, see the runbook.

Version management

  • Latest: @main for newest features
  • Pinned: full 40-char SHA on the uses: line β€” Renovate auto-bumps this on the caller side

Contributing

  1. Run actionlint, yamllint --strict, and shellcheck on changed files
  2. Update CLAUDE.md and README.md when behavior changes
  3. For breaking changes (new required input on a reusable workflow), bump every caller's SHA pin in the same push session β€” Renovate eventually does this but with a window of broken deploys in between

License

Private repository, internal use only.

About

Reusable GitHub Actions workflows for Docker Compose deployments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages