feat(compaction): layered pressure architecture + reserve-aware budget alignment#558
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the compaction system to (1) decouple the “sweep stop point” from the existing compaction trigger threshold, and (2) align all threshold percentages to an effective prompt budget by subtracting any runtime-provided reserve token buffer.
Changes:
- Add
sweepTargetThresholdconfig (default0.50) and plumb it through the engine into sweep compaction so sweeps can create multi-turn headroom. - Make
resolveTokenBudgetreserve-aware by subtractingruntimeContext.reserveTokens(or legacyreserveTokensFloor) from the resolved budget. - Add/adjust tests, plugin schema/manifest text, README documentation, and a changeset entry.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/compaction.ts |
Adds optional targetRatio to full sweeps and uses it for sweep loop stopping conditions. |
src/engine.ts |
Applies reserve token subtraction during token budget resolution; passes sweepTargetThreshold to sweep calls. |
src/db/config.ts |
Introduces sweepTargetThreshold on LcmConfig with env override and clamping. |
openclaw.plugin.json |
Adds schema + UI hints for sweepTargetThreshold and clarifies contextThreshold help text. |
test/sweep-target-and-reserve-aware-budget.test.ts |
New tests covering sweepTargetThreshold resolution and reserve-aware budget behavior. |
test/config.test.ts |
Asserts the new default config value. |
README.md |
Documents compaction pressure “bands”, sweep target decoupling, and reserve-aware budgeting. |
.changeset/sweep-target-and-reserve-aware-budget.md |
Patch changeset describing the new behaviors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c34206a to
a57779a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a57779a to
5c6c30c
Compare
|
Reworked this PR significantly after design feedback and running test scenarios on 5.5 and 5.4 agents. The new version (5c6c30c) ships the full layered four-band architecture that balances quality, latency, and preventing overflows in high tool and normal use claw setups:
The cache-invalidation insight: each dispatch's prefix-cache cost is FIXED regardless of pass count (all passes invalidate the same prefix from oldest-modification-point forward). So 3 passes/dispatch costs the same cache-wise as 1 but reduces 3× as many tokens. That's why higher pressure → more passes per dispatch is the right shape. Companion to PR #557 — that PR's PR description rewritten to reflect the new architecture. README's compaction-pressure-architecture section rewritten with the layered diagram + cache efficiency table + scenario walkthrough. Tests: 846 passing (26 new covering the full pressure-tier matrix incl. boundary cases). Force-pushed; will reply individually to any new review comments. This PR also counts as overall system wide "overflow" adoption of new overflow logic from OpenClaw which sets default overflow of 20k token reserve for output tokens. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fa3f425 to
e7edf81
Compare
|
Spawned 4 hardcore adversarial bug-hunters in parallel to find any remaining bugs. They returned 30+ findings; addressed everything that was real (filtered ~12 stale + uncertain). New commit P0:
P1:
P2:
Process:
Force-pushed ( |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
037579a to
4a5955f
Compare
|
Round 3 adversarial sweep complete — applied 4 of 6 findings (commit Verified ✓ (all 12 round-2 fixes applied correctly): maintain reserve-bypass, assemble reserve-bypass, optional fields, safe defaults, applyReserveTokens warn, schema bounds, contextThreshold clamp, changeset minor, defensive sort, log.warn assertion, Math.floor test, README example. All confirmed by adversarial agent. Round 3 fixes applied:
Plus inline bot review fixes from previous push:
Skipped (deliberate):
850 tests passing (15+ new across rounds 2 and 3). CI green. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/engine.ts:6211
- In the
afterTurnfallback path,fallbackBudgetmay be reserve-adjusted (e.g., 128k default minusruntimeContext.reserveTokens), but the warning log still says it is using the raw default128000. This makes logs inaccurate when reserve alignment is active. Consider logging the effective budget actually used (or include both raw and adjusted values) so operators can reconcile thresholds/pressure with observed behavior.
// When neither params.tokenBudget nor runtimeContext.tokenBudget is
// supplied, resolveTokenBudget returns undefined and we fall back to the
// default. Apply reserve adjustment to the fallback too so percentages
// compute against the EFFECTIVE budget — matches the maintain() pattern
// and prevents reserve-aware alignment from being silently bypassed when
// a host calls afterTurn with reserveTokens but no tokenBudget.
const fallbackBudget = this.applyReserveTokens(
DEFAULT_AFTER_TURN_TOKEN_BUDGET,
asRecord(params.runtimeContext) ?? {},
);
const tokenBudget = this.applyAssemblyBudgetCap(resolvedTokenBudget ?? fallbackBudget);
if (resolvedTokenBudget === undefined) {
this.deps.log.warn(
`[lcm] afterTurn: tokenBudget not provided; using default ${DEFAULT_AFTER_TURN_TOKEN_BUDGET}`,
);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…t alignment Layered four-band compaction pressure architecture: effective prompt budget = tokenBudget − reserveTokens ┌────────────────────────────────────────────────────────────┐ 0% 60%/trigger 70%/tier-1 80%/tier-2 91%/sweep 100%/overflow ├────┼───────────┼───────────┼───────────┼───────────────────┤ │ low│ normal │ tier-1 │ tier-2 │ SWEEP │ │ │ 1 pass/ │ 2 passes/ │ 3 passes/ │ unlimited passes │ │ │ dispatch │ dispatch │ dispatch │ target 50% │ │ │ exit @60% │ exit @60% │ exit @60% │ exit @ 50% budget │ └────┴───────────┴───────────┴───────────┴───────────────────┘ Three new capabilities: 1. `sweepTriggerThreshold` (default 0.91) — separate from contextThreshold, controls when dispatched compaction switches into deep SWEEP mode. Below this, dispatched compaction targets contextThreshold (gentle, doesn't overshoot). At/above, runs unlimited passes targeting sweepTargetThreshold. 2. `pressureTiers` (default [{ratio:0.70,maxPasses:2},{ratio:0.80,maxPasses:3}]) — pressure-tiered pass cap ladder for dispatched compaction below sweep mode. Multi-pass amortizes prefix-cache invalidation: every pass in a single dispatch invalidates the SAME cache prefix, so 3 passes/dispatch cost the same cache-wise as 1 but reduce 3× as many tokens. That's why higher pressure → more passes per dispatch is the right shape rather than "fire more often" (which would multiply cache invalidations). 3. `sweepTargetThreshold` (default 0.50) — fraction of token budget that SWEEP targets when it fires. Decouples sweep stopping point from contextThreshold. With default sweep target 0.50 and trigger 0.91, when sweep fires it creates ~40% headroom (~5+ turns of runway). Reserve-aware budget alignment: LCM now reads runtimeContext.reserveTokens (or the legacy reserveTokensFloor key) and subtracts it from the resolved tokenBudget before computing percentages. Every threshold computes against the EFFECTIVE prompt budget — the same number the runtime actually overflows at — instead of the raw context window. Plugins/runtimes that don't pass a reserve get legacy behavior unchanged. Behavior change: `contextThreshold` default lowered from 0.75 → 0.60. The lower trigger gives the cache-aware deferral system more room to operate (defer when cache hot, fire when cold) and feeds the new pressure-tier ladder cleanly. Operators wanting the legacy 0.75 trigger can set it explicitly. New env overrides: LCM_SWEEP_TARGET_THRESHOLD, LCM_SWEEP_TRIGGER_THRESHOLD, LCM_PRESSURE_TIERS (JSON array). New manifest entries + uiHints + configSchema for all three new fields. Tests: 846 passing (820 baseline + 26 new in test/sweep-target-and-reserve-aware-budget.test.ts covering config resolution, budget alignment, and the resolvePressureDispatchPolicy tier ladder). README: new "Compaction pressure architecture" section with layered ASCII diagram, pressure-band table, cache-invalidation efficiency math, and scenario walkthrough showing 6 emergency truncations → 0. Recommended companion: PR Martian-Engineering#557. PR Martian-Engineering#557's criticalBudgetPressureRatio default 0.70 lines up with this PR's tier-1 ratio so dispatched work fires reliably the moment the system enters tier-1 instead of being cache-throttled.
4a5955f to
e86c96b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/engine.ts:6211
- In
afterTurn, whentokenBudgetis missing you log that you’re using the default128000, butfallbackBudgetmay have been reserve-adjusted viaapplyReserveTokens(...). This can make logs misleading for operators (they’ll see 128k even if an effective smaller budget was actually used). Log the effective fallback/token budget being applied (and optionally the reserve) instead of the raw default constant.
const tokenBudget = this.applyAssemblyBudgetCap(resolvedTokenBudget ?? fallbackBudget);
if (resolvedTokenBudget === undefined) {
this.deps.log.warn(
`[lcm] afterTurn: tokenBudget not provided; using default ${DEFAULT_AFTER_TURN_TOKEN_BUDGET}`,
);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // documented "empty/malformed → defaults" semantics in the resolver. | ||
| const tiers = | ||
| Array.isArray(this.config.pressureTiers) && this.config.pressureTiers.length > 0 | ||
| ? [...this.config.pressureTiers].sort((a, b) => a.ratio - b.ratio) |
| * the sweep loop continues until `currentTokens <= targetRatio * tokenBudget`, | ||
| * decoupling the sweep stopping point from `contextThreshold`. When omitted, | ||
| * the legacy behavior applies (sweep exits at `contextThreshold`). | ||
| * | ||
| * The trigger condition is still evaluated against `contextThreshold` — | ||
| * `targetRatio` only affects when the loop STOPS, not whether it starts. |
| const targetThreshold = | ||
| typeof input.targetRatio === "number" | ||
| && Number.isFinite(input.targetRatio) | ||
| && input.targetRatio >= 0 | ||
| && input.targetRatio <= 1 | ||
| ? Math.min(triggerThreshold, Math.floor(input.targetRatio * tokenBudget)) | ||
| : triggerThreshold; |
Summary
This PR ships a layered four-band compaction pressure architecture, plus reserve-aware budget alignment so percentages mean what they say.
What changed
Three new capabilities + one new behavior change:
sweepTriggerThreshold(default0.91) — separate fromcontextThreshold, controls when dispatched compaction switches into deep SWEEP mode. Below this, dispatched work targetscontextThreshold(gentle). At/above, runs unlimited passes targetingsweepTargetThreshold.pressureTiers(default[{ratio:0.70,maxPasses:2},{ratio:0.80,maxPasses:3}]) — pressure-tiered pass cap ladder. Multi-pass amortizes prefix-cache invalidation: every pass in a single dispatch invalidates the SAME cache prefix, so 3 passes/dispatch cost the same cache-wise as 1 but reduce 3× as many tokens.sweepTargetThreshold(default0.50) — fraction of token budget that SWEEP targets when it fires. Decouples sweep stopping point fromcontextThreshold. With sweep target 0.50 and trigger 0.91, when sweep fires it creates ~40% headroom (~5+ turns of runway).Reserve-aware budget alignment — LCM reads
runtimeContext.reserveTokens(or legacyreserveTokensFloor) and subtracts it from the resolvedtokenBudgetbefore computing percentages.Behavior change:
contextThresholddefault lowered from0.75→0.60. The lower trigger gives the cache-aware deferral system more room to operate and feeds the new pressure-tier ladder cleanly.Why this layering — the cache invalidation insight
When LCM compacts the oldest chunk, the prefix cache breaks at the modification point and everything from there to the end of the prompt must re-tokenize on the next turn. Doing 1 pass vs 3 passes vs 6 passes invalidates the SAME prefix — more passes just produce more reduction off that one cache break.
Multi-pass per dispatch is the right shape at higher pressure, not "fire more often" (which would multiply cache invalidations).
Code surface
src/db/config.tssweepTriggerThreshold,pressureTiers(with sorted-ascending validator). LoweredcontextThresholddefault 0.75 → 0.60.src/compaction.tscompactFullSweepaccepts newmaxPassescap (shared across leaf+condensed phases).src/engine.tsresolvePressureDispatchPolicyhelper pickstargetRatio+maxPassesfrom current pressure. Wired intoexecuteCompactionCoresweep path. Reserve-awareapplyReserveTokenshelper subtracts reserve from resolved budget.openclaw.plugin.jsonsweepTriggerThresholdandpressureTiers.README.mdtest/sweep-target-and-reserve-aware-budget.test.tsresolvePressureDispatchPolicytier ladder (including boundary cases at exactly tier-1 / tier-2 / sweep ratios).Backward compatibility
All additions default to gracefully extended behavior:
pressureTiersis unset/empty/malformed: defaults to the canonical ladder (no behavior break)sweepTriggerThresholdis unset: defaults to 0.91 (so the deep sweep target only fires at high pressure, not on every threshold-mode dispatch)runtimeContext.reserveTokensis not present: LCM uses raw budget unchanged (legacy plugins/runtimes)contextThresholddefault change is the one operator-visible behavior shift — operators wanting legacy 0.75 set it explicitlyThe
resolvePressureDispatchPolicyfalls back to sweep semantics (target = 0.50, no pass cap) when current token count is unavailable, preserving the prior version of this PR's behavior for any caller that doesn't supply a pressure signal.Recommended companion: PR #557
PR #557's
criticalBudgetPressureRatiodefault0.70lines up exactly with this PR's tier-1 ratio. Without #557, tier-1 dispatches at 70% would still be cache-throttled up to 5 minutes per dispatch, defeating the whole point of having tiers. With #557, dispatched work fires reliably the moment the system enters tier-1.Test plan
pnpm exec vitest run— 846 tests passing (820 baseline + 26 new)pnpm build— cleantest/config.test.tsdefaults assertions to match the new architectureScenario walkthrough — real session data
Real Eva session on gpt-5.5 (258K context, 20K reserve = 238K effective budget) before any patches: 6 emergency truncations / 18hrs.
After both this PR + PR #557 with default config:
Result: 0 emergency truncations instead of 6 in the same window.