Skip to content

Latest commit

 

History

History
165 lines (120 loc) · 6.25 KB

File metadata and controls

165 lines (120 loc) · 6.25 KB

CI/CD Pipeline Guide

Overview

GSD 2 uses a three-stage promotion pipeline that automatically moves merged PRs through Dev → Test → Prod environments using npm dist-tags.

PR merged to main
        │
        ▼
   ┌─────────┐    ci.yml passes (build, test, typecheck)
   │   DEV   │    → publishes gsd-pi@<version>-dev.<sha> with @dev tag
   └────┬────┘
        ▼ (automatic if green)
   ┌─────────┐    CLI smoke tests + LLM fixture replay
   │  TEST   │    → promotes to @next tag
   └────┬────┘    → pushes Docker image as :next
        ▼ (manual approval required)
   ┌─────────┐    optional real-LLM integration tests
   │  PROD   │    → promotes to @latest tag
   └─────────┘    → creates GitHub Release

For Contributors: Testing Your PR Before It Ships

Install the Dev Build

Every merged PR is immediately installable:

# Latest dev build (bleeding edge, every merged PR)
npx gsd-pi@dev

# Test candidate (passed smoke + fixture tests)
npx gsd-pi@next

# Stable production release
npx gsd-pi@latest    # or just: npx gsd-pi

Using Docker

# Test candidate
docker run --rm -v $(pwd):/workspace ghcr.io/gsd-build/gsd-pi:next --version

# Stable
docker run --rm -v $(pwd):/workspace ghcr.io/gsd-build/gsd-pi:latest --version

Checking if a Fix Landed

  1. Find the PR's merge commit SHA (first 7 chars)
  2. Check if it's in @dev: npm view gsd-pi@dev version
    • If the version ends in -dev.<your-sha>, your PR is in dev
  3. Check if it promoted to @next: npm view gsd-pi@next version
  4. Check if it's in production: npm view gsd-pi@latest version

For Maintainers

Pipeline Workflows

Workflow File Trigger Purpose
CI ci.yml PR + push to main Build, test, typecheck — gate for all promotions
Release Pipeline pipeline.yml After CI succeeds on main Three-stage promotion
Native Binaries build-native.yml v* tags Cross-compile platform binaries
Dev Cleanup cleanup-dev-versions.yml Weekly (Monday 06:00 UTC) Unpublish -dev. versions older than 30 days

Gating Tests

The pipeline only triggers after ci.yml passes. Key gating tests include:

  • Unit tests (npm run test:unit) — includes auto-session-encapsulation.test.ts which enforces that all auto-mode state is encapsulated in AutoSession, plus dispatch loop regression tests that exercise the full deriveState → resolveDispatch → idempotency chain without an LLM. Any PR adding module-level mutable state to auto.ts will fail CI and block the pipeline.
  • Integration tests (npm run test:integration)
  • Extension typecheck (npm run typecheck:extensions)
  • Package validation (npm run validate-pack)
  • Smoke tests (npm run test:smoke) — run post-build in the pipeline against the local binary and again against the globally-installed @dev package
  • Fixture tests (npm run test:fixtures) — replay recorded LLM conversations without hitting real APIs
  • Live regression tests (npm run test:live-regression) — run against the installed binary in the Test stage to catch runtime regressions before promotion to @next

Approving a Prod Release

  1. A version reaches the Test stage automatically
  2. In GitHub Actions, the prod-release job will show "Waiting for review"
  3. Click Review deployments → select prodApprove
  4. The version is promoted to @latest and a GitHub Release is created

To enable live LLM tests during Prod promotion:

  • Set the RUN_LIVE_TESTS environment variable to true on the prod environment

Rolling Back a Release

If a broken version reaches production:

# Roll back npm
npm dist-tag add gsd-pi@<previous-good-version> latest

# Roll back Docker
docker pull ghcr.io/gsd-build/gsd-pi:<previous-good-version>
docker tag ghcr.io/gsd-build/gsd-pi:<previous-good-version> ghcr.io/gsd-build/gsd-pi:latest
docker push ghcr.io/gsd-build/gsd-pi:latest

For @dev or @next rollbacks, the next successful merge will overwrite the tag automatically.

GitHub Configuration Required

Setting Value
Environment: dev No protection rules
Environment: test No protection rules
Environment: prod Required reviewers: maintainers
Secret: NPM_TOKEN All environments
Secret: ANTHROPIC_API_KEY Prod environment only
Secret: OPENAI_API_KEY Prod environment only
Variable: RUN_LIVE_TESTS false (set to true to enable live LLM tests)
GHCR Enabled for the gsd-build org

Docker Images

Image Base Purpose Tags
ghcr.io/gsd-build/gsd-ci-builder node:24-bookworm CI build environment with Rust toolchain :latest, :<date>
ghcr.io/gsd-build/gsd-pi node:24-slim User-facing runtime :latest, :next, :v<version>

The CI builder image is rebuilt automatically when the Dockerfile changes. It eliminates ~3-5 min of toolchain setup per CI run.

LLM Fixture Tests

The fixture system records and replays LLM conversations without hitting real APIs (zero cost).

Running Fixture Tests

npm run test:fixtures

Recording New Fixtures

# Set your API key, then record
GSD_FIXTURE_MODE=record GSD_FIXTURE_DIR=./tests/fixtures/recordings \
  node --experimental-strip-types tests/fixtures/record.ts

Fixtures are JSON files in tests/fixtures/recordings/. Each one captures a conversation's request/response pairs and replays them by turn index.

When to Re-Record

Re-record fixtures when:

  • Provider wire format changes (e.g., new field in Anthropic response)
  • Tool definitions change (affects request shape)
  • System prompt changes (may cause turn count mismatch)

Version Strategy

Tag Published Format Who uses it
@dev Every merged PR 2.27.0-dev.a3f2c1b Developers verifying fixes
@next Auto-promoted from dev Same version Early adopters, beta testers
@latest Manually approved Same version Production users

Old -dev. versions are cleaned up weekly (30-day retention).