chore(installs): per-package HWMs for npm / PyPI / crates counters#607
Merged
chore(installs): per-package HWMs for npm / PyPI / crates counters#607
Conversation
Aggregate-only HWMs masked legitimate growth in one package whenever
another package's API call failed on the same run. Verified live on
2026-05-09 — two consecutive runs of the unified refresh-counts.yml
both saw pypistats.org return data for one package but error on the
other (1620 then 447 instead of the expected 2067 = 1620 + 447). The
HWM logic correctly held the cached aggregate at 2067 in both runs,
but had `runcycles` legitimately grown to 1700 alongside the
`runcycles-openai-agents` failure, the +80 growth would have been
silently masked.
Per-package HWMs preserve each package's high-water mark independently.
A transient failure on one package no longer affects another.
Cache schema additions (additive, no migration):
npmByPackage: Record<string, number>
pypiByPackage: Record<string, number>
cratesByPackage: Record<string, number>
Aggregate fields (`npm`, `pypi`, `crates`) are now derived from the
per-package map: `max(sum-of-per-package, cached-aggregate)`. The
`max(..., cached-aggregate)` is a defensive backstop covering the
cold-start migration window — until the per-package maps are fully
populated, it preserves the legacy aggregate value.
Two writers updated to mirror each other:
scripts/update-registry-counts.mjs (new) — replaces the inline
bash + curl + jq in refresh-counts.yml step 1. Daily workflow
refresh; idempotent; null-vs-0 distinction so a successful
`0 downloads` (brand-new package) is not conflated with an API
failure.
.vitepress/theme/installs.data.ts — build-time loader. Fetch
functions return Record<pkg, number | null> instead of an
aggregate; load() applies hwmPerPackage; per-package maps are
written into the cache.
.vitepress/theme/__tests__/installs.test.ts — 7 new tests:
cold start, all-succeed, one-fails (the exact 2026-05-09
scenario), all-fail, lower-than-cached, null-vs-0, package-
removed-from-list.
.github/workflows/refresh-counts.yml — step 1 collapses from
~80 lines of inline bash to one line: `node scripts/...`.
Schema-regression guard moves into the script.
scripts/update-github-counts.mjs (step 2 of the workflow) is unchanged.
It reads cache.npm/pypi/crates (aggregates) for the total formula —
those still exist and have the same semantics. The new per-package
maps are passively preserved through its JSON round-trip.
Verified:
- vitest 90/90 (was 83; added 7 per-package HWM tests)
- vitepress build clean; loader log shows aggregate HWM safety net
working correctly during cold-start migration:
[installs] pypi=0(hwm:2067) ← per-package agg=0 (both APIs failed
locally), aggregate HWM preserved cached 2067 ✓
- scripts/update-registry-counts.mjs ran locally against the live
cache and populated per-package maps with sums matching cached
aggregates (npm 4639, pypi 2067, crates 67 — exact match).
- YAML re-parses; refresh-counts.yml structure unchanged (5 steps).
Out of scope: an edge case where the build computes a fresher
per-package map but no aggregate grew, in which case the aggregate-
based "should we write" check skips the write. The daily workflow
catches up on the next run, so the data isn't lost — just delayed
by up to a day.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces aggregate-only HWMs on the registry counters with per-package HWMs, addressing the failure mode caught yesterday (2026-05-09) where one package's API call failing on the same run as another package's legit growth would silently mask that growth.
The failure mode (verified live)
Two consecutive runs of
refresh-counts.ymlon 2026-05-09 saw pypistats.org return data for one of the two PyPI packages but error on the other:runcyclessucceededruncycles-openai-agentssucceededAggregate HWM logic correctly held cached
2067in both runs (so the displayed count never regressed). But ifruncycleshad legitimately grown from 1620 → 1700 alongside theruncycles-openai-agentsAPI failure, the new total of 1700 + 447 = 2147 would have been masked by the aggregate HWM (2147 < cached 2067 → kept 2067, +80 growth lost until both packages succeed on the same run).Per-package HWMs preserve each package's HWM independently. A failure on one package can no longer mask growth in another.
What changed
Cache schema (additive, no migration tooling needed)
Aggregate fields (
npm,pypi,crates) are now derived from the per-package map:The
max(..., cached-aggregate)is a defensive backstop for cold-start — until the per-package maps are populated, it preserves the legacy aggregate value. Once all packages have been seen successfully at least once, the per-package map sum becomes the source of truth.Two writers, mirrored
scripts/update-registry-counts.mjs(new). Replaces the ~80 lines of inline bash + curl + jq inrefresh-counts.ymlstep 1. Daily workflow refresh. Returnsnullfrom a fetcher on API failure (distinct from a successful0for a brand-new package). Schema-regression guard moves into the script..vitepress/theme/installs.data.ts(build-time loader). Fetch functions now returnRecord<pkg, number | null>instead of an aggregate.load()applieshwmPerPackage(); per-package maps are written into the cache..vitepress/theme/__tests__/installs.test.ts— 7 new tests covering: cold start, all-succeed, one-fails (the exact 2026-05-09 scenario), all-fail, lower-than-cached, null-vs-0 distinction, package-removed-from-list..github/workflows/refresh-counts.ymlstep 1 collapses to one line:node scripts/update-registry-counts.mjs. The 80 lines of inline bash are gone.scripts/update-github-counts.mjs(step 2) is unchanged. It uses the aggregatecache.npm/pypi/cratesfor the displayed-total formula — same semantics as before. Per-package maps are passively preserved through its JSON round-trip.Test plan
scripts/update-registry-counts.mjsran locally against the live cache, populated per-package maps with sums matching cached aggregates (npm 4639, pypi 2067, crates 67 — exact match).workflow_dispatch, confirm:Edge case (out of scope)
The build's
installs.data.tsonly writes the cache when an aggregate field grew. There's a corner case where the build computes a fresher per-package map but no aggregate changed (e.g., one package grew by N, another dropped by exactly N) — the per-package map would not be persisted by that build run. The daily workflow catches up on the next run, so the data isn't lost; it can be delayed by up to a day. Could be tightened by extending the write-condition check to per-package fields, but the workflow's daily refresh makes this low-priority.Why per-package HWMs in two places (and not extracted into a module)
installs.data.tsis a VitePress data loader (.ts, transpiled at build time); the workflow script is a standalone Node ESM (.mjs). Sharing a module across these would require either bundling overhead or duplicating type definitions; following the codebase's existing convention (the workflow's bash already mirroredinstalls.data.ts's logic), I kept them as two parallel implementations with thehwmPerPackagefunction defined identically in both. The test file copies the same function for unit testing — same pattern as the existinghighWaterMarkandaccumulateClonestest helpers.