Reusable GitHub Actions workflows for Docker Compose deployments across multiple repositories. Provides centralized lint and deploy automation with self-hosted-runner-based deployment to the host on which compose stacks live.
As of 2026-05-02, deployment uses self-hosted GitHub Actions runners that live on the deployment hosts themselves β there is no SSH-from-CI path. The runner runs as a deploy user, in the docker group and the admin's group, and pulls jobs from GitHub. Eliminates the SSH-key/Tailscale/sudo-rule class of issues that plagued the prior design.
Three caller repos use this workflow:
docker-piwineβ runner labelpiwinedocker-piwine-officeβ runner labelpiwine-officedocker-zendcβ runner labelzendc
- π Security first β input validation, GitGuardian secret scanning, 1Password integration
- π Self-hosted deploy β no inbound network access required; runner pulls jobs from GitHub
- π Automatic rollback β
git reset --hard <previous_sha>+ redeploy on deploy or health failure - π Failure diagnostics β on stack failure, dumps
docker compose ps -a, healthcheck history (.State.Health.Log), and scoped service logs - π Discord notifications β pipeline-status icon line with deploy/health/rollback states, commit link, user mention on failure
- π¦ Critical stack detection β auto-detects from
com.compose.tier: infrastructurelabels - π Multi-registry auth β single 1P round trip +
docker/login-actionper registry (ghcr.io, docker.io, registry.gitlab.com, custom GitLab)
Parallel GitGuardian + yamllint + docker compose config validation. Runs on GitHub-hosted runners (no host access needed).
jobs:
lint:
uses: owine/compose-workflow/.github/workflows/compose-lint.yml@main
secrets: inherit
with:
stacks: '["stack1", "stack2", "stack3"]'
webhook-url: "op://Docker/discord-github-notifications/<env>_webhook_url"
repo-name: "my-docker-repo"
target-repository: ${{ github.repository }}
target-ref: ${{ github.sha }}
discord-user-id: "op://Docker/discord-github-notifications/user_id"
# plus event-context inputs (see compose-lint.yml for the full list)5-job pipeline: prepare β deploy β health-check β rollback β notify. Runs on [self-hosted, <runner-label>].
on:
workflow_run:
workflows: ["Lint Docker Compose"]
types: [completed]
branches: [main]
workflow_dispatch:
inputs:
force-deploy:
type: boolean
default: false
concurrency:
group: deploy-<repo-label>
cancel-in-progress: false
jobs:
deploy:
if: ${{ github.event.workflow_run.conclusion == 'success' || github.event_name == 'workflow_dispatch' }}
uses: owine/compose-workflow/.github/workflows/deploy.yml@<sha>
secrets: inherit
with:
runner-label: piwine # or piwine-office, zendc
live-repo-path: /opt/compose
repo-name: "docker-piwine"
webhook-url: "op://Docker/discord-github-notifications/piwine_webhook_url"
discord-user-id: "op://Docker/discord-github-notifications/user_id"
target-ref: ${{ github.event.workflow_run.head_sha || github.sha }}
has-dockge: true # false for zendc
force-deploy: ${{ inputs.force-deploy || false }}Optional inputs: live-dockge-path (when has-dockge: true), auto-detect-critical (default true), critical-services (manual override), image-pull-timeout, service-startup-timeout, failed-container-log-lines.
βββ .yamllint # yamllint configuration
βββ compose.env # env file with op:// references
βββ .github/
β βββ actionlint.yaml # declares the runner-label
β βββ workflows/
β βββ lint.yml # calls compose-lint.yml
β βββ deploy.yml # calls compose-workflow's deploy.yml
βββ stack1/compose.yaml
βββ stack2/compose.yaml
βββ ...
actionlint.yaml must declare the runner label or PRs fail with "label X is unknown":
self-hosted-runner:
labels:
- piwine # whichever label this repo's deploy.yml passes as runner-labelCalling repos need exactly one secret:
OP_SERVICE_ACCOUNT_TOKENβ 1Password service account token. Used by both lint (GitGuardian API key) and deploy (env-file resolution + multi-registry credentials + Discord webhook).
The previously-required SSH_USER / SSH_HOST secrets were deleted from all three caller repos on 2026-05-03; only OP_SERVICE_ACCOUNT_TOKEN remains.
op://Docker/discord-github-notifications/<env>_webhook_url
op://Docker/discord-github-notifications/user_id
op://Docker/ghcr-pat/{username,pat}
op://Docker/docker-hub/{username,token}
op://Docker/gitlab-registry/{username,token}
op://Docker/gitlab-container-zenterprise/{username,token}
op://Docker/gitguardian/api_key
The runner host needs docker, jq, timeout (coreutils), gh, and op (1Password CLI) on the deploy user's PATH. Plus a registered runner systemd service running as deploy. Full host-prep playbook in docs/superpowers/runbooks/self-hosted-runner-migration.md β covers path-ownership pattern (admin owns tree, deploy in admin's group via safe.directory), setgid + group-write, and the mandatory umask 002 for both admin and runner users.
deploy.yml invokes docker compose up --wait, which only verifies services that have healthchecks defined. Services without healthchecks start but don't gate the deploy. See CLAUDE.md for healthcheck patterns.
One-shot containers (e.g. migration sidecars gated via service_completed_successfully) end up exited with code 0 β the health-check job recognizes this as success.
Two mechanisms cover different scopes. Both rely on docker compose up --wait --remove-orphans (which deploy.yml already uses on every stack invocation) to tear down whatever is no longer present.
Touch <stack>/.disabled alongside the stack's compose.yaml to disable it:
touch silo/.disabled
git add silo/.disabled
git commit -m "disable silo"
git pushOn the next deploy:
- The
Discover stacksstep excludes the directory from active stacks and emits it on thedisabled_stacksoutput. - The change-detection script reclassifies the stack as removed (effectively-present-in-CURRENT, not effectively-present-in-TARGET).
- The
Teardown removed stacksstep runsdocker compose downagainst the still-presentcompose.yamlin the live tree. - Discord embed and PR comment surface a
π **Disabled stacks:**line alongside the existingποΈ **Removed stacks:**line.
The stack directory and its compose.yaml stay in the repo β only the running containers go away. Re-enable with git rm <stack>/.disabled; the next deploy classifies it as a new stack and runs up --wait.
Lint coverage is unaffected: caller-repo lint workflows do not filter .disabled, so YAML and compose config validation continue to run on disabled stacks. This keeps the file from rotting silently between disable and re-enable.
Detection rules in detail. A stack is "effectively present" at a given SHA iff compose.yaml exists and .disabled does not. The change-detection script applies an effectively-present-in-CURRENT guard on removed-stack detectors and an effectively-present-in-TARGET guard on new-stack detectors. This correctly handles edge cases:
| Transition | Action |
|---|---|
| Enabled β disabled | docker compose down (teardown path) |
| Disabled β enabled | docker compose up --wait (new-stack path) |
| Stay disabled | no-op |
Born disabled (added with .disabled already present) |
no-op |
| Delete while disabled | no-op (already torn down at disable time) |
Design spec: docs/superpowers/specs/2026-05-26-disabled-stacks-design.md.
Compose itself has no first-class "disable one service" mechanism, but up --wait --remove-orphans treats any service no longer present in the compose model as an orphan and removes it on the next deploy. The simplest way to remove a service: comment out its block in compose.yaml.
services:
primary:
image: ghcr.io/example/app:1.2.3
# ...
# postgres:
# image: postgres:16
# restart: unless-stopped
# # ... rest of service definitionOn the next deploy:
up --wait --remove-orphansbrings down any containers belonging to the commented-out service.- The stack itself stays "existing" (its directory and
compose.yamlare still present), so it isn't reclassified.
Re-enable by uncommenting and pushing; up --wait recreates the service.
Why not Compose profiles? profiles: keeps the service definition active in YAML, so Renovate continues opening image-bump PRs for a service nobody is running. Commenting out freezes the service at its current image and stops the bump churn. Use profiles only when you expect a short-term disable and want the image to stay current.
Caveats of commenting:
- Larger diffs (the entire service block goes from active to commented). You've already accepted this as the tradeoff for simplicity.
- The commented YAML is no longer parsed, so if the surrounding compose structure drifts (new networks, renamed volumes), re-enabling may require a touch-up.
- Service-specific named volumes/networks declared at the top level are not removed by
--remove-orphans(it only removes containers). Manual cleanup withdocker volume rm/docker network rmif you want the storage gone.
# Lint workflow files
actionlint .github/workflows/compose-lint.yml \
.github/workflows/deploy.yml \
.github/workflows/workflow-lint.yml
yamllint --strict .github/workflows/*.yml
# Lint deployment scripts
shellcheck scripts/deployment/*.sh scripts/linting/*.sh
# Local testing utilities
./scripts/testing/test-workflow.sh
./scripts/testing/validate-compose.sh- Stack names validated against
^[a-zA-Z0-9._-]+$before anydocker composeinvocation - Target refs validated as 40-char hex SHAs
- Webhook URLs validated as 1Password references
- All secrets stored in 1Password (no plaintext in repos or workflow files)
op run --env-file=β¦resolves references at deploy time1password/load-secrets-actionfor individual values (registry creds, Discord webhook)- Multi-registry creds cached in the runner host's
~/.docker/config.jsonviadocker/login-actionwithlogout: false
- No inbound network access to deployment hosts is required for CI β runners on the host pull jobs from GitHub
- Outbound: runner β GitHub Actions API, image registries, 1Password, Discord webhook
- No Tailscale dependency (the prior SSH-based design needed it; the self-hosted approach makes it unnecessary)
| Symptom | First thing to check |
|---|---|
Runner shows offline in GitHub |
sudo systemctl status actions.runner.<...>.service on the host |
git reset --hard permission denied |
umask 002 missing in admin user's .zshrc/.bashrc β see runbook |
| Stack fails with no log lines | Check the failure diagnostic dump in the run β compose ps -a + inspect Health.Log + scoped compose logs should be there |
| Discord embed wrong color | Verify the case statement in notify job's status step still maps healthy β success and failed β failure |
| GitGuardian failure | Verify OP_SERVICE_ACCOUNT_TOKEN and that the 1P service account has access to the GitGuardian API key |
For self-hosted runner setup or migration of a new host, see the runbook.
- Latest:
@mainfor newest features - Pinned: full 40-char SHA on the
uses:line β Renovate auto-bumps this on the caller side
- Run
actionlint,yamllint --strict, andshellcheckon changed files - Update CLAUDE.md and README.md when behavior changes
- For breaking changes (new required input on a reusable workflow), bump every caller's SHA pin in the same push session β Renovate eventually does this but with a window of broken deploys in between
Private repository, internal use only.