Skip to content

chore(ci): unify install + GitHub counts into one refresh-counts workflow#604

Merged
amavashev merged 2 commits into
mainfrom
chore/unify-counts-refresh-workflow
May 9, 2026
Merged

chore(ci): unify install + GitHub counts into one refresh-counts workflow#604
amavashev merged 2 commits into
mainfrom
chore/unify-counts-refresh-workflow

Conversation

@amavashev
Copy link
Copy Markdown
Contributor

Summary

Combines the two scheduled count-refresh workflows into one. The two were racing on installs-cache.json and producing duplicate or mutually-conflicting PRs almost daily.

The conflict

update-installs.yml and update-github-counts.yml ran 30 min apart on overlapping fields:

Field update-installs.yml update-github-counts.yml
npm, pypi, crates ✓ writes
clones*, releases*, ghPackages ✓ writes
total ✓ writes ✓ writes
fetchedAt ✓ writes ✓ writes

Both computed total from different inputs (each saw its own fresh data + the cached version of the other half's fields), so the second-of-the-day PR always conflicted with the first on total whenever the first hadn't merged yet — which was almost always, since PR merge cycles take 1–2 hours.

Evidence in PR history

2026-05-08:
  PR 582 (registry)               merged 10:49
  PR 588 (GH-side, MANUAL backfill) merged 12:49  ← maintainer fixed by hand
  PR 590 (GH-side, automated)       CLOSED 12:57  ← superseded by PR 588

2026-05-09:
  PR 601 (registry)         opened 07:56, merged 10:09
  PR 602 (GH-side)          opened 08:14, merged 10:14  ← took 2h
  PR 603 (GH-side, again)   opened 10:23, merged 10:24  ← same-day duplicate

Two days running, the second PR ended either closed-and-replaced or as a duplicate re-run.

What changed

New: .github/workflows/refresh-counts.yml (single workflow, single cron at 06:00 UTC, single PR per day).

Removed: update-installs.yml and update-github-counts.yml.

Beyond merging the YAML — also fixes the dual-writer problem at the data-design level:

  • Step 1 (bash, registry counters) writes ONLY the per-source HWMs (npm, pypi, crates). Intentionally does NOT touch total or fetchedAt. The schema-regression guard from update-installs.yml is preserved.
  • Step 2 (Node, GH-side counters) reads the cache that step 1 just wrote — sees fresh registry HWMs alongside its own fresh clones/releases/ghPackages — and writes the canonical total + fetchedAt using the same formula as installs.data.ts. Single canonical writer for the displayed total.
  • Step 3 opens one PR with both halves of the update.

The Node script (scripts/update-github-counts.mjs) is unchanged — it already reads the cache before computing total, so it picks up step 1's writes naturally.

Failure modes

Scenario Old (split) New (unified)
GH API down, npm/PyPI/crates up Registry PR merges; GH-side PR errors. Stale clones, fresh registry. Whole refresh aborts at step 2. Stale everything for the day.
Conflicting day-2 PRs Daily, requires manual intervention Cannot happen — single PR
One source rate-limited Two PRs, inconsistent state Whole refresh fails atomically

The new design trades partial-update resilience for atomic-update consistency. Given that the GH API failure mode is rare and recoverable on the next day, while the conflict mode was happening daily and required manual intervention, this is the right trade.

Test plan

  • YAML parsed cleanly via yaml package — 5 steps, single cron 0 6 * * *, job name refresh
  • Grep confirmed no other repo references to the two removed workflow filenames
  • Bash logic in step 1 is the existing update-installs.yml script with only the total/fetchedAt writes removed (verified line-by-line)
  • Step 2 is unchanged — same node scripts/update-github-counts.mjs invocation as the old workflow
  • First scheduled run after merge produces a single PR with both halves of the update (will verify on next 06:00 UTC tick)
  • Manual workflow_dispatch smoke test post-merge (recommended before relying on the cron)

Tokens

Same as before:

  • ORG_TRAFFIC_TOKEN (PAT) for gh pr create and /traffic/clones access; falls back to GITHUB_TOKEN for the GH API if absent.
  • git push falls back to the persisted GITHUB_TOKEN (contents: write from the job's permissions block) — same pattern as both removed workflows.

Out of scope

The adoption-snapshot.yml and installs-cache-schema.yml workflows touch related concerns but are not in scope here. They run on different cadences and don't conflict with the two consolidated workflows.

amavashev added 2 commits May 9, 2026 06:34
…counts.yml

The two scheduled workflows that refresh installs-cache.json — registry
counts (npm/PyPI/crates) and GitHub-side counts (clones/releases/ghPackages)
— were running on staggered crons (06:00 and 06:30 UTC) but BOTH wrote
the same file and BOTH wrote the same two fields (`total` and
`fetchedAt`) from different inputs. Whenever the first workflow's PR
hadn't merged before the second ran (typical: PR merge cycle is 1-2h),
the second workflow opened a PR that conflicted on those shared fields.

Evidence in PR history:
  2026-05-08:
    PR 582 (registry)         merged 10:49
    PR 588 (GH-side, MANUAL backfill) merged 12:49
    PR 590 (GH-side, automated)       CLOSED 12:57 — superseded by PR 588
  2026-05-09:
    PR 601 (registry)         opened 07:56, merged 10:09
    PR 602 (GH-side)          opened 08:14, merged 10:14 (took 2h)
    PR 603 (GH-side, again)   opened 10:23, merged 10:24 — same-day duplicate

Two days running, the second-of-two PR ended either closed-and-replaced
or as a duplicate re-run.

Fix: combine both into a single workflow (.github/workflows/refresh-counts.yml)
that runs once daily, shares one checkout, and produces one PR.

Beyond merging the YAML, this also fixes the dual-writer problem at
the data-design level. Step 1 (bash, registry counters) writes ONLY
the per-source HWMs (npm/pypi/crates) and intentionally does NOT touch
`total` or `fetchedAt`. Step 2 (Node, GH-side counters) reads the cache
that step 1 just wrote — seeing fresh registry HWMs alongside its own
fresh GH-side data — and writes the canonical `total` + `fetchedAt`
based on the installs.data.ts formula. Single canonical writer for the
displayed total.

Removed:
  .github/workflows/update-installs.yml
  .github/workflows/update-github-counts.yml

Added:
  .github/workflows/refresh-counts.yml

The schema-regression guard from update-installs.yml is preserved in
step 1. The Node script (scripts/update-github-counts.mjs) is unchanged
— it already reads the cache before computing total, so it picks up
step 1's writes naturally.

YAML parsed cleanly via `yaml` package; no other repo references to
the two removed workflow filenames.
The original update-installs.yml ran without `set -e` on purpose. Single-
package curl failures fall through (curl exits non-zero, jq emits
nothing, the `$(())` capture treats empty as 0, NPM_TOTAL just doesn't
grow that iteration) and the per-source HWM logic ensures we never
regress the cached value. With `pipefail` enabled, one transient
registry blip would abort the whole daily refresh.

Step 3 (PR-open) keeps `set -euo pipefail` — that's a sequence of
discrete commands where any failure should bubble up.
@amavashev amavashev enabled auto-merge May 9, 2026 10:42
@amavashev amavashev merged commit c7b1dc5 into main May 9, 2026
5 checks passed
@amavashev amavashev deleted the chore/unify-counts-refresh-workflow branch May 9, 2026 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant