Skip to content

feat(builtins): add truncate builtin for sandbox-mediated file truncation#219

Draft
julesmcrt wants to merge 5 commits intojules.macret/host-remediation/file-redirectionfrom
jules.macret/host-remediation/truncate
Draft

feat(builtins): add truncate builtin for sandbox-mediated file truncation#219
julesmcrt wants to merge 5 commits intojules.macret/host-remediation/file-redirectionfrom
jules.macret/host-remediation/truncate

Conversation

@julesmcrt
Copy link
Copy Markdown
Collaborator

Summary

Adds truncate as the first writing builtin in rshell. Routes all file mutations through a new Sandbox.Truncate method that mirrors the established write policy from PR #206 / #209: resolution stays inside a single os.Root, the cross-root symlink fallback is disabled (TOCTOU-unsafe for writes), and non-regular targets (FIFO/socket/device) are rejected before the open syscall to avoid blocking on FIFO opens.

This unblocks step 7 of the host disk-space remediation demo: truncate -s 0 <file> is the "nuclear option" to reclaim space immediately when logrotate cannot run, while preserving the inode so any process holding the file open keeps writing harmlessly.

Stack: Based on jules.macret/host-remediation/file-redirection (#209). This PR is demo-only and is not intended to merge to main.

Implemented flags

  • -s SIZE, --size=SIZE — set or adjust the file size, with the full GNU suffix grammar:
    • bare single-letter binary: K/k/KiB = 1024, M/m/MiB, G/g/GiB, T/t/TiB (case-insensitive on the bare letter, matching GNU)
    • decimal: KB/MB/GB/TB = 1000-based (case-sensitive on the leading letter)
  • -c, --no-create — silently skip missing files instead of creating them.
  • -h, --help — usage to stdout, exit 0.
  • Multiple file operands are processed in order; one failure does not abort, but exit 1 is returned at the end if any operand failed.

Out of scope (deferred)

  • -r REF / --reference=FILE — set size from a reference file.
  • -o / --io-blocks — treat SIZE as a block count.
  • Relative-size modifiers in -s (+N, -N, <N, >N, /N, %N). All rejected with a clear hint that the modifiers are intentionally unsupported, rather than silently accepted in a way that would surprise users.

These are rejected as unknown flags so truncate --reference=foo bar produces exit 1, not silent success.

Sandbox plumbing

  • New `Sandbox.Truncate(path, cwd, size, create)` in `allowedpaths/sandbox.go`. Resolves through `s.resolve` → `os.Root.OpenFile` (no cross-root symlink fallback). Rejects non-regular targets via `Stat` before opening, mirroring `interp.Runner.rejectNonRegularRedirectTarget`.
  • New `CallContext.Truncate` capability in `builtins/builtins.go`, wired into both `CallContext` constructions in `interp/runner_exec.go`.
  • Returns the raw error (no `PortablePathError` wrap) so the truncate builtin can use `errors.Is(err, os.ErrNotExist)` to silently skip missing files under `-c`.

Tests

  • Scenario tests (`tests/scenarios/cmd/truncate/`): basic, multi-file, no-create, errors, hardening, help — 19 YAMLs.
  • Unit tests (`builtins/truncate/truncate_test.go`): `parseSize` table tests covering every suffix, every relative-size modifier, every overflow boundary including the exact T-suffix ceiling.
  • Integration tests (`builtins/tests/truncate/truncate_test.go` + `hardening_test.go`): end-to-end through the runner, partial-failure exit propagation, sandbox boundary, special-character filenames, large sparse sizes, capability-not-available.
  • Sandbox unit tests (`allowedpaths/sandbox_test.go` + `sandbox_unix_test.go`): every code path of `Sandbox.Truncate`, including the FIFO non-blocking regression test.
  • Pentest tests (`interp/builtin_truncate_pentest_test.go`): integer edges (MaxInt64 boundary, overflow neighbours), path traversal, symlink escape, flag injection, 200-file fd-leak guard, massive sparse sizes, empty/whitespace flag values.
  • Fuzz tests (`builtins/tests/truncate/truncate_fuzz_test.go`): `FuzzTruncateSize`, `FuzzTruncateFlags`, `FuzzTruncateFilenames`, all wired into `.github/workflows/fuzz.yml`.

Verification

  • `make fmt` clean.
  • `go test ./... -timeout 180s` passes (~30 packages).
  • Manual smoke: `go run ./cmd/rshell --allow-all-commands --allowed-paths /tmp -c 'echo hi > /tmp/t && truncate -s 0 /tmp/t && wc -c /tmp/t'` prints `0 /tmp/t`.
  • `help` lists `truncate shrink or extend file size`.

Test plan

  • CI green (lint, scenario tests, fuzz seed, GNU bash-comparison)
  • Manual verification of the demo: log file truncated to 0 bytes while a tail process holds it open, and the tail keeps reading new bytes
  • Confirm sandbox blocks `truncate -s 0 /etc/hosts` outside the allowlist

@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 59927ba265

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread allowedpaths/sandbox.go Outdated
julesmcrt added a commit that referenced this pull request May 4, 2026
Codex review on PR #219 flagged a real race: between the pre-open Stat
guard and the OpenFile call, a regular file can be swapped for a FIFO,
and the subsequent O_WRONLY open will block waiting for a reader. The
in-builtin ctx.Err() loop cannot interrupt that wait.

Replace the racy pre-Stat with an atomic open-and-fstat sequence:

  - O_NONBLOCK on the open makes O_WRONLY on a reader-less FIFO return
    ENXIO immediately. It is benign on regular files and a no-op on
    Windows where syscall.O_NONBLOCK == 0.
  - fstat on the resulting fd verifies the file is regular before any
    ftruncate runs. Even a regular file swapped for a FIFO with a reader
    is rejected at the type check; the size change never reaches the
    kernel.

Add a regression test that opens the read end of a FIFO before calling
Truncate, exercising the post-fd fstat path that the no-reader test
cannot reach.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3fdb6db3e9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/scenarios/cmd/truncate/hardening/unknown_flag.yaml
Comment thread builtins/truncate/truncate.go
julesmcrt added a commit that referenced this pull request May 4, 2026
…nown_flag

Codex review on PR #219 found two issues:

1. tests/scenarios/cmd/truncate/hardening/unknown_flag.yaml lacked
   skip_assert_against_bash. GNU truncate accepts --reference; rshell
   intentionally rejects it as a deferred flag, so the bash-comparison
   suite would diverge.

2. parseSize rejected the lowercase-leading multi-letter suffixes
   (1kB, 1kiB, 1mB, 1miB, ...) that GNU truncate accepts. The actual
   GNU rule is "leading letter case-insensitive, trailing B / iB
   case-sensitive", not "bare single-letter case-insensitive,
   multi-letter case-sensitive on the leading letter".

Updated the suffix map, doc comments, unit tests, integration tests,
fuzz seeds, and SHELL_FEATURES.md. Verified byte-for-byte agreement
with `gtruncate` across all 27 leading-letter / trailing-form
combinations (six accepted plus the rejected variants 1KIB, 1Kib,
1kb).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7ac8d01aed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/truncate/truncate.go
Comment thread tests/scenarios/cmd/truncate/basic/extend.yaml Outdated
julesmcrt added a commit that referenced this pull request May 5, 2026
…to |+

Codex review on PR #219 found two issues:

1. Suffix table stopped at T/t. GNU truncate accepts P (PiB) and E (EiB)
   plus their PB/EB and PiB/EiB forms. GNU also lists Z/Y/R/Q in --help,
   but on 64-bit-uintmax_t systems (the standard platform) GNU rejects
   them with "Value too large to be stored in data type" — which matches
   our int64 ceiling, so rejecting them is not a real divergence.

   Empirically (verified locally with gtruncate from coreutils):
   - K/M/G/T: leading letter case-insensitive (1k = 1K = 1024).
   - P/E:     leading letter UPPERCASE-ONLY (1p, 1pB, 1piB rejected).
   - The trailing "B" / "iB" is always case-sensitive in every form.
   - Z/Y/R/Q: rejected.

   Encode the asymmetry exactly. All 18 spot-checked combinations
   (P/PB/PiB/p/pB/piB/E/EB/EiB/e/eB/eiB/Z/Y/R/Q/7E/8E) now agree with
   gtruncate byte-for-byte.

2. New scenario YAMLs used quoted-empty scalars (`stderr: ""`) instead
   of the |+ block scalars AGENTS.md mandates. Convert all of them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 70019d398d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread allowedpaths/sandbox.go Outdated
julesmcrt added a commit that referenced this pull request May 5, 2026
Codex review on PR #219 noted that hard-coding 0644 makes Sandbox.Truncate
more restrictive than GNU truncate / bash redirects under permissive
umasks: with `umask 000`, GNU `truncate -s 0 f` produces 0666, but
rshell produced 0644.

POSIX `open(2)` applies `mode & ~umask`, so passing 0666 lets the
process umask alone decide the final mode. Verified locally that this
now matches `gtruncate` byte-for-byte across umask 022/000/077/027:

  umask 022 -> 0644  (was 0644 — unchanged)
  umask 000 -> 0666  (was 0644)
  umask 077 -> 0600  (was 0600 — unchanged)
  umask 027 -> 0640  (was 0640 — unchanged)

Add TestSandboxTruncateMethodCreatesHonourUmask to lock this behaviour
in. Update the existing TestHardenCreatePreservesMode to lock umask to
022 (the typical operator environment) so its mode assertion stays
deterministic, and split the umask helpers into Unix/Windows test files
to keep syscall.Umask off the Windows build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

julesmcrt and others added 5 commits May 7, 2026 14:17
…tion

Implements `truncate -s SIZE [-c] FILE...` as the first writing builtin.
Targets the host disk-space remediation flow where `truncate -s 0 <file>` is
the demo's "nuclear option" for reclaiming space when logrotate cannot run.

All file mutations route through a new `Sandbox.Truncate` method that mirrors
the existing `Sandbox.Open` write policy: paths are resolved within a single
`os.Root`, the cross-root symlink fallback is disabled (TOCTOU-unsafe for
writes), and non-regular targets (FIFO/socket/device) are rejected before
the open syscall to avoid blocking on FIFO opens.

Supports `-s` with the full GNU suffix grammar (binary K/k/KiB, decimal KB,
and analogous M/G/T forms) plus `-c`/`--no-create` and `--help`. Deferred:
`-r`/`--reference`, `-o`/`--io-blocks`, and the GNU relative-size operators
(`+N`/`-N`/`<N`/`>N`/`/N`/`%N`), all rejected with clear errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review on PR #219 flagged a real race: between the pre-open Stat
guard and the OpenFile call, a regular file can be swapped for a FIFO,
and the subsequent O_WRONLY open will block waiting for a reader. The
in-builtin ctx.Err() loop cannot interrupt that wait.

Replace the racy pre-Stat with an atomic open-and-fstat sequence:

  - O_NONBLOCK on the open makes O_WRONLY on a reader-less FIFO return
    ENXIO immediately. It is benign on regular files and a no-op on
    Windows where syscall.O_NONBLOCK == 0.
  - fstat on the resulting fd verifies the file is regular before any
    ftruncate runs. Even a regular file swapped for a FIFO with a reader
    is rejected at the type check; the size change never reaches the
    kernel.

Add a regression test that opens the read end of a FIFO before calling
Truncate, exercising the post-fd fstat path that the no-reader test
cannot reach.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nown_flag

Codex review on PR #219 found two issues:

1. tests/scenarios/cmd/truncate/hardening/unknown_flag.yaml lacked
   skip_assert_against_bash. GNU truncate accepts --reference; rshell
   intentionally rejects it as a deferred flag, so the bash-comparison
   suite would diverge.

2. parseSize rejected the lowercase-leading multi-letter suffixes
   (1kB, 1kiB, 1mB, 1miB, ...) that GNU truncate accepts. The actual
   GNU rule is "leading letter case-insensitive, trailing B / iB
   case-sensitive", not "bare single-letter case-insensitive,
   multi-letter case-sensitive on the leading letter".

Updated the suffix map, doc comments, unit tests, integration tests,
fuzz seeds, and SHELL_FEATURES.md. Verified byte-for-byte agreement
with `gtruncate` across all 27 leading-letter / trailing-form
combinations (six accepted plus the rejected variants 1KIB, 1Kib,
1kb).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…to |+

Codex review on PR #219 found two issues:

1. Suffix table stopped at T/t. GNU truncate accepts P (PiB) and E (EiB)
   plus their PB/EB and PiB/EiB forms. GNU also lists Z/Y/R/Q in --help,
   but on 64-bit-uintmax_t systems (the standard platform) GNU rejects
   them with "Value too large to be stored in data type" — which matches
   our int64 ceiling, so rejecting them is not a real divergence.

   Empirically (verified locally with gtruncate from coreutils):
   - K/M/G/T: leading letter case-insensitive (1k = 1K = 1024).
   - P/E:     leading letter UPPERCASE-ONLY (1p, 1pB, 1piB rejected).
   - The trailing "B" / "iB" is always case-sensitive in every form.
   - Z/Y/R/Q: rejected.

   Encode the asymmetry exactly. All 18 spot-checked combinations
   (P/PB/PiB/p/pB/piB/E/EB/EiB/e/eB/eiB/Z/Y/R/Q/7E/8E) now agree with
   gtruncate byte-for-byte.

2. New scenario YAMLs used quoted-empty scalars (`stderr: ""`) instead
   of the |+ block scalars AGENTS.md mandates. Convert all of them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review on PR #219 noted that hard-coding 0644 makes Sandbox.Truncate
more restrictive than GNU truncate / bash redirects under permissive
umasks: with `umask 000`, GNU `truncate -s 0 f` produces 0666, but
rshell produced 0644.

POSIX `open(2)` applies `mode & ~umask`, so passing 0666 lets the
process umask alone decide the final mode. Verified locally that this
now matches `gtruncate` byte-for-byte across umask 022/000/077/027:

  umask 022 -> 0644  (was 0644 — unchanged)
  umask 000 -> 0666  (was 0644)
  umask 077 -> 0600  (was 0600 — unchanged)
  umask 027 -> 0640  (was 0640 — unchanged)

Add TestSandboxTruncateMethodCreatesHonourUmask to lock this behaviour
in. Update the existing TestHardenCreatePreservesMode to lock umask to
022 (the typical operator environment) so its mode assertion stays
deterministic, and split the umask helpers into Unix/Windows test files
to keep syscall.Umask off the Windows build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt julesmcrt force-pushed the jules.macret/host-remediation/truncate branch from 2acb2e8 to 8cbe726 Compare May 7, 2026 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant