Skip to content

chore(ci): shard release build into per-target matrix#5206

Closed
avallete wants to merge 14 commits intodevelopfrom
chore/shard-ci-release-build-process
Closed

chore(ci): shard release build into per-target matrix#5206
avallete wants to merge 14 commits intodevelopfrom
chore/shard-ci-release-build-process

Conversation

@avallete
Copy link
Copy Markdown
Member

@avallete avallete commented May 7, 2026

Summary

Splits the single serial build job in release-shared.yml into an 8-way matrix on large-linux-x86 (one shard per published platform package), and broadens the path filter on smoke-test-pr.yml so PRs targeting develop run the full build + smoke validation whenever they touch anything that affects the build phase or runtime behaviour.

Why

  • The release build job runs all 8 bun --compile targets, the 6 Go cross-compiles (legacy), and the 6 nfpm packages serially on a single runner. Wall-clock time is bounded by the sum of every target.
  • The PR-time path filter on smoke-test-pr.yml was scoped to apps/cli/** + release-shared.yml. A regression in apps/cli-go/**, packages/cli-*/**, root workspace files, or the shared setup action would not trigger a PR build, so build breakages could land on develop undetected.

What changes

Sharded build matrix in release-shared.yml

  • 8 parallel shards on large-linux-x86, one per platform package: cli-darwin-arm64, cli-darwin-x64, cli-linux-{arm64,x64}, cli-linux-{arm64,x64}-musl, cli-windows-{arm64,x64}.
  • Each shard uploads its own slice as cli-build-shard-${shell}-${version}-${target}.
  • No merge job, no re-upload of bulk bytes. smoke-test / publish / publish-homebrew / publish-scoop switch from name: to pattern: cli-build-shard-${shell}-${version}-* + merge-multiple: true. Artifact bytes hit storage exactly once.
  • Setup Go and Pre-download Go modules are gated on inputs.shell == 'legacy'; Install nfpm is gated on startsWith(matrix.target, 'cli-linux-'), so darwin/windows shards skip irrelevant setup.
  • New build_only input (default false, so existing callers are unaffected). When true, smoke-test is skipped via if: and publish / publish-homebrew / publish-scoop auto-skip via their needs: chain. Useful for ad-hoc workflow_dispatch runs that just want to validate the build matrix.

apps/cli/scripts/build.ts refactor

  • New --target <pkg-name> flag builds only one shard's outputs (bun binary + Go binary for legacy + archive for standard targets + deb/rpm for glibc Linux + apk for musl).
  • Without --target, the script's end-to-end behaviour is preserved for local dev.
  • Each musl shard now cross-compiles its own Go binary directly (CGO_ENABLED=0 produces output identical to the glibc build), so the cross-shard copyFile from glibc is gone.

apps/cli/scripts/checksums.ts

New self-contained script that walks dist/supabase_${version}_*.{tar.gz,zip,deb,rpm,apk} and writes dist/checksums.txt. Called by build.ts in all-targets mode and by the three consumer CI jobs (publish, publish-homebrew, publish-scoop) right after their pattern-download.

Broader path filter on smoke-test-pr.yml

Replaces the old apps/cli/** + workflow paths with an explicit list of every path that can affect the build phase or how the built artifacts behave at runtime:

  • apps/cli/scripts/{build,checksums,sync-versions}.ts
  • apps/cli/src/**, apps/cli/package.json
  • apps/cli-go/**
  • packages/cli-*/**
  • package.json, pnpm-lock.yaml, pnpm-workspace.yaml
  • .github/actions/setup/**
  • .github/workflows/{release-shared,smoke-test-pr}.yml

Now any PR to develop that could break the release pipeline runs the full build + smoke matrix before merge.

Compatibility

  • release-shared.yml's public input contract stays backward-compatible: the new build_only input has default: false.
  • Local pnpm exec bun apps/cli/scripts/build.ts --version <v> still produces every binary, archive, package, and checksums.txt in dist/.
  • The published GitHub Release contents are byte-identical (same file list, same checksum format).

Test plan

  • On this PR, confirm smoke-test-pr.yml runs against the new sharded build matrix and all 8 shards are green.
  • On the next release-from-develop, confirm the matrix-fanout build, the pattern-download in smoke-test / publish / brew / scoop, and the inline Generate checksums step all behave as expected.
  • Verify the published dist/ artifact set on GitHub Releases matches the current release's file list (12 archives + packages + checksums.txt).

Replace the serial build job in release-shared.yml with an 8-way matrix
on large-linux-x86, one shard per published platform package. Each shard
uploads only its own slice; smoke-test, publish, and brew/scoop consumers
pattern-download every shard via `merge-multiple: true` and regenerate
dist/checksums.txt locally. No merge job, no re-upload of bulk bytes.

Refactor apps/cli/scripts/build.ts to accept --target <pkg-name>: when
set, only that shard's bun binary, Go binary (legacy), archive, and
Linux package(s) are built (deb/rpm for glibc, apk for musl). Each musl
shard now cross-compiles its own Go binary directly because
CGO_ENABLED=0 makes the output identical to glibc, removing the
cross-shard copyFile. Without --target, end-to-end behaviour is preserved
for local dev.

Add apps/cli/scripts/checksums.ts that hashes every
supabase_${version}_*.{tar.gz,zip,deb,rpm,apk} found in dist/ into
dist/checksums.txt. Called by build.ts in all-targets mode and by the
three CI jobs that need the file (publish, publish-homebrew,
publish-scoop) after their pattern-download.

Add a `build_only` input on release-shared.yml that gates smoke-test
(publish/brew/scoop auto-skip via their needs: chain). Wire up a new
build-pr.yml that runs the build matrix on PRs touching build-relevant
files, across both legacy and next shells, so build breakages surface
before merge without waiting for the full smoke-test suite.
@avallete avallete requested a review from a team as a code owner May 7, 2026 12:46
avallete added 2 commits May 7, 2026 15:02
Drop .github/workflows/build-pr.yml and instead expand smoke-test-pr.yml's
path filter to cover everything that affects the build phase or runtime
behaviour: apps/cli/scripts/{build,checksums,sync-versions}.ts,
apps/cli/src/**, apps/cli/package.json, apps/cli-go/**, packages/cli-*/**,
root workspace files (package.json, pnpm-lock.yaml, pnpm-workspace.yaml),
and .github/actions/setup/**. Previously the filter only watched
apps/cli/** and release-shared.yml, so changes to those other paths could
land on develop without any PR-time build + smoke validation.

Keep the build_only input on release-shared.yml as a generic capability
for any future caller (or ad-hoc workflow_dispatch) that wants to skip
the smoke-test matrix and only exercise the build shards.
Introduce a new step in the release-shared.yml workflow to generate checksums for the built artifacts using a dedicated script. This ensures that the integrity of the published packages can be verified post-build.
avallete and others added 11 commits May 7, 2026 15:49
Add retry mechanism for transient Docker failures during smoke tests. Introduce a set of transient exit codes and refactor the Docker test execution to handle retries, improving reliability on shared CI runners. The changes ensure that flaky image pulls do not cause test failures, providing clearer output on success or failure.
Change the runner for the build job in release-shared.yml from large-linux-x86 to blacksmith-8vcpu-ubuntu-2404, optimizing the CI environment for better performance and resource allocation.
Update the runner for the build job in release-shared.yml from blacksmith-8vcpu-ubuntu-2404 to larger-runner-4cpu, aiming to optimize resource allocation and improve CI performance.
Change the runner for the build job in release-shared.yml from larger-runner-4cpu to ubuntu-latest, ensuring compatibility with the latest CI environment and potentially improving build performance.
Refactor the build job in release-shared.yml to utilize a matrix strategy for runners, allowing parallel execution across multiple target platforms. Each target is now assigned to a specific runner architecture, improving build efficiency and reducing queue times.
…ct handling

Update the release-shared.yml workflow to consolidate the build job into a single high-vCPU runner, improving build efficiency by reducing setup overhead. Modify artifact upload and download steps to handle all targets collectively, streamlining the process. Adjust descriptions for clarity and ensure consistency across the workflow.
Change the runner for the build job in release-shared.yml from blacksmith-32vcpu-ubuntu-2404 to blacksmith-32vcpu-ubuntu-2404-arm, ensuring compatibility with ARM architecture and potentially enhancing build performance for target platforms.
…ld jobs

Update the release-shared.yml workflow to introduce separate build jobs for x86 and ARM architectures, utilizing a matrix strategy for parallel execution. Modify artifact handling to ensure correct uploads and downloads per target, and enhance clarity in job descriptions. Add checksum generation steps to verify artifact integrity post-build.
…ase workflow

Update the release-shared.yml workflow to allow building for multiple targets in both x86 and ARM architectures. Modify the build script to accept multiple target arguments, enhancing flexibility and clarity. Adjust artifact handling to ensure correct uploads for each target, improving the overall build process.
… release workflow

Refactor the release-shared.yml workflow to merge x86 and ARM build jobs into a single job that builds for all targets. Update artifact upload and download steps to handle all targets collectively, improving clarity and efficiency in the build process. Adjust verification steps to include all target architectures.
@avallete
Copy link
Copy Markdown
Member Author

avallete commented May 7, 2026

Dropping the PR as none of it really improve the overall CI time.

@avallete avallete closed this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants