feat: TUI feature parity — blueprint create, scenario create, benchmark run ops by jason-rl · Pull Request #233 · runloopai/rl-cli

jason-rl · 2026-05-02T01:04:45Z

Description

Adds TUI and CLI features to bring rl-cli closer to parity with the runloop-fe frontend, plus unifies CLI list subcommand defaults.

Type of Change

New feature (non-breaking change which adds functionality)

Changes Made

TUI Axon Events Screen: View axon events with pagination, relative timestamps (via formatTimeAgo, consistent with other TUI list tables), and origin enum display.
TUI Axon SQL Workbench: Interactive SQL query interface for axons with result display.
Blueprint create/duplicate TUI: Full form with name, source type (dockerfile/base), setup commands, architecture, resources, and metadata. Duplicating from detail screen (u) auto-populates all fields from the base blueprint with name suffixed "-copy".
Blueprint public/custom tab toggle: Tab key toggles between public and custom blueprints with visual tab bar, matching the benchmark/agent list pattern. Defaults to Custom tab.
Scenario create TUI: Multi-step form supporting all 6 scorer types (command, bash script, python script, test-based, AST grep, custom). Includes inline scorer sub-editor with edit/delete, weight validation, environment source selection, metadata, required env vars/secrets, and validation type.
Benchmark run cancel/complete: TUI operations (x to cancel, f to complete) on running benchmark runs with confirmation prompts. CLI commands rli benchmark-run cancel <id> and rli benchmark-run complete <id>.
View benchmark runs from benchmark detail: New operation (r) navigates to filtered run list for the selected benchmark.
CLI scenario create: rli scenario create with --scoring-command (simple) or --scoring-file (full JSON) plus all environment, metadata, and validation options.
Unified CLI list --limit defaults: All 12 CLI list subcommands now default to --limit 20 with consistent -l shorthand and "Max results" description. Previously axon/agent/scenario defaulted to 0 (unlimited) and benchmark-job had no --limit option.

Keyboard Shortcuts

Screen	Key	Action
Axon Detail	`e`	View Events
Axon Detail	`s`	SQL Workbench
Blueprint List	`c`	Create Blueprint
Blueprint List	`Tab`	Toggle Public/Custom
Blueprint Detail	`u`	Duplicate Blueprint
Benchmark Detail	`r`	View Benchmark Runs
Benchmark Run Detail	`x`	Cancel Run (when running)
Benchmark Run Detail	`f`	Complete Run (when running)

Skipped

Repo connect create — the @runloop/api-client SDK v1.16.0 has no repo connect API (the frontend uses a separate TRPC layer).

Testing

I have tested locally
I have added/updated tests
All existing tests pass

Checklist

My code follows the code style of this project
I have performed a self-review of my own code
My changes generate no new warnings
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published

Test plan

🤖 Generated with Claude Code

The TUI screens for viewing axon events and SQL workbench are being moved to PR #233 (TUI feature parity) to better separate concerns. This PR retains the CLI `axon events` command and service layer. Also fixes the origin field displaying numeric values (e.g., 2 instead of AGENT_EVENT) by adding a mapping from SQLite enum indices to string enum names in axonService.ts. Files moved to #233: - AxonEventsScreen.tsx - AxonSqlScreen.tsx - Router/navigation integrations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

## Summary - **Object create form**: New TUI screen accessible via `c` shortcut from object list. Supports optional file path upload or displays pre-signed S3 URL with clipboard copy. For tar/tgz content types, a multi-item list manager allows adding multiple file paths; for other types, a single text input is shown. Switching content type syncs between the two inputs bidirectionally. - **Benchmark tab toggle**: Changed public/custom toggle from `t` key to `Tab` key with visual tab bar matching the agent list pattern. - **Blueprint tab toggle**: Added public/custom toggle with `Tab` key and visual tab bar matching the benchmark/agent list pattern. - **Benchmark pagination fix**: All benchmark list screens (Benchmark Definitions, Benchmark Runs, Scenario Runs, Benchmark Jobs) now correctly display total count from the API instead of current page item count. - **Full Details viewport fix**: Fixed off-by-one in `DetailedInfoView` overhead calculation that caused the top border to be clipped off screen when content filled the viewport exactly. - **CLI: `blueprint create --base`**: Duplicate a blueprint by passing `--base <id|name>` to `blueprint create`. The source blueprint's parameters are used as defaults; any other flag overrides. `--name` defaults to `{base}-copy` when `--base` is used. - **CLI: `axon events`**: New subcommand to list events for an axon. **Fixed origin field to display enum names (SYSTEM_EVENT, AGENT_EVENT, etc.) instead of numeric values.** - **CLI: `scenario list`**: New subcommand to list scenario runs with pagination and optional benchmark run ID filter. - **Metadata support**: `--metadata key=value` added to `object upload` and `benchmark-job run` CLI commands. TUI create screens for objects and benchmark jobs include a metadata key-value editor matching the devbox create pattern. - **Tar archive: duplicate path rejection**: `createTarBuffer` now rejects duplicate input paths (after `path.resolve`) instead of producing an invalid tar archive. - **Tar archive: symlink support**: Replaced `nanotar` with `tar-stream`. Symlinks are now stored as proper tar symlink entries (type `2`) with their target in `linkname`, instead of being rejected. This applies to both CLI and TUI upload flows. - **TUI tar single-file fix**: When content type is tar/tgz and a single file (not directory) is specified, the TUI now uploads it as-is instead of re-archiving it. Matches the CLI behavior. - **TUI `[d]` keybind consistency**: Changed all `[d] Remove` hints in devbox create (gateway configs, MCP configs, agent mounts, object mounts) to `[d] Delete` to match the rest of the codebase. - **Benchmark job metadata navigation**: Fixed up/down arrow keys moving the main form cursor while editing metadata key-value pairs. Added `isActive: !inMetadataSection` guard to the main form input handler. - **Benchmark list `[c]` keybind fix**: Changed "Create Job" shortcut from `c` to `s` to avoid conflicting with the `[c]` = "Copy ID" convention used in detail screens. The popup already used `s` for this action. - **CLI output consistency**: Standardized all CLI list/get/info commands to default to JSON output and removed chalk-colored text tables from `axon events`, `axon list`, `scenario list`, `scenario info`, `benchmark-job list`, `benchmark-job summary`, and `agent show`. Text mode now uses the shared `output()` utility's uncolored key-value format. **Note**: TUI axon events and SQL workbench screens have been moved to PR #233. ## Test plan ### Automated tests (155+ new tests) | Test file | Tests | Coverage | |-----------|-------|---------| | `services/benchmarkJobService.test.ts` | 27 | `buildCloneParams` (all source/secret variants), `createBenchmarkJob` (validation, config mapping), `listBenchmarkJobs`, `getBenchmarkJob`, `getBenchmarkRun`, `listBenchmarkRunScenarioRuns` | | `services/axonService.test.ts` | 15 | `listActiveAxons` (smart search, pagination), `getAxon`, `listAxonEvents` (hasMore detection, row mapping, origin enum mapping), `executeAxonSql` | | `services/benchmarkService.test.ts` | 16 | `listScenarioRuns` (both code paths), `listPublicBenchmarks`, `createBenchmarkRun`, `listBenchmarkRuns`, `getBenchmarkRun`, `getScenarioRun`, `listBenchmarks`, `getBenchmark` | | `services/objectServiceApi.test.ts` | 6 | `createObject`, `completeObject`, `uploadToPresignedUrl` | | `services/objectService.test.ts` | +5 | `buildObjectDetailFields` edge cases (hours format, missing size, public field) | | `commands/object/upload.test.ts` | +3 | Symlink entries stored with correct type/linkname, symlinks inside directory trees, duplicate path rejection | | `commands/axon/events.test.ts` | 5 | Output format, limit defaults, error handling | | `commands/blueprint/create.test.ts` | 26 | Normal create (name+options, missing name error), --base (ID/name lookup, exact match, fallback, not found, default name, custom name, all source params copied, override per flag, preserves non-overridden params), output format, error handling | | `commands/scenario/list.test.ts` | 8 | Pagination, sorting, output formats, filter, error handling | | `components/allocateSectionLines.test.ts` | 10 | Zero/partial/full budget, single-field priority, null filtering, multi-section distribution | ### Manual tests - [x] TUI: Objects list → press `c` → verify create form renders with Name, Content Type, File Path fields - [x] TUI: Create object with non-tar content type → verify single text input for file path - [x] TUI: Create object → switch content type to tar/tgz → verify file path field becomes multi-item list manager - [x] TUI: Type a path in single input, switch to tar → verify path appears as 1st entry in list - [x] TUI: Add multiple paths in tar mode, switch to text → verify 1st entry populates single input; switch back to tar → verify entries >=2 are preserved - [x] TUI: Create object without file path → verify pre-signed URL shown, `c` copies to clipboard - [x] TUI: Create object with file path → verify upload + completion flow - [x] TUI: Create object with multiple paths and tar/tgz content type → verify archive upload - [x] TUI: Object create form → add metadata key-value pairs → verify passed to API - [x] TUI: Benchmark job create → add metadata key-value pairs → verify passed to API - [x] TUI: Benchmark Definitions → verify `Tab` key toggles between Public/Custom with visual tab bar - [x] TUI: Benchmark Definitions → verify old `t` key no longer toggles - [x] TUI: Benchmark Definitions list → verify total count matches API total, not page item count - [x] TUI: Benchmark Runs list → verify total count displays correctly - [x] TUI: Scenario Runs list → verify total count displays correctly - [x] TUI: Benchmark Jobs list → verify total count displays correctly - [x] TUI: Any detail screen → press `i` for Full Details → verify top border is visible (not clipped) - [x] TUI: Blueprint detail → verify "Duplicate Blueprint" operation is NOT shown - [x] TUI: Blueprint list → verify `Tab` key toggles between Public/Custom with visual tab bar - [x] TUI: Object create → select tar/tgz, specify a single .tar file → uploads as-is (not re-archived) - [x] TUI: Object create → select tar/tgz, specify a directory → creates an archive - [x] TUI: Devbox create → verify gateway/MCP/mount sections show `[d] Delete` consistently - [x] TUI: Benchmark job create → enter metadata section → verify up/down arrows only navigate metadata items - [x] TUI: Benchmark list → verify `s` creates a job, `c` no longer triggers create - [x] CLI: `rli blueprint create --base <id>` → verify creates copy with {base}-copy name - [x] CLI: `rli blueprint create --base <id> --name custom` → verify custom name - [x] CLI: `rli blueprint create --base <id> --resources LARGE` → verify override applied - [x] CLI: `rli blueprint create --base <id> --metadata env=prod` → verify metadata override - [x] CLI: `rli blueprint create --help` → verify --base flag listed, --name not marked required - [ ] CLI: `rli object upload <file> --metadata key=value` → verify metadata on created object - [ ] CLI: `rli benchmark-job run --agent ... --benchmark ... --metadata key=value` → verify metadata - [ ] CLI: `rli axon events <id>` → verify JSON output by default, origin shows enum names - [ ] CLI: `rli axon events <id> -o text` → verify key-value output - [ ] CLI: `rli scenario list` → verify JSON output by default - [ ] CLI: `rli scenario list -o text` → verify key-value output - [ ] CLI: `rli benchmark-job list` → verify JSON output by default - [ ] CLI: `rli benchmark-job summary <id>` → verify JSON output by default - [ ] CLI: `rli scenario info <id>` → verify JSON output by default - [ ] CLI: `rli agent show <id>` → verify JSON output by default - [ ] CLI: Upload with duplicate paths → verify error message instead of invalid tar - [ ] CLI: Upload with symlink path → verify symlink stored correctly in tar archive - [x] Type check: `npx tsc --noEmit` passes cleanly 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…rk run cancel/complete Add createBlueprint(), createScenario(), cancelBenchmarkRun(), completeBenchmarkRun() service functions, and benchmarkId filter support for listBenchmarkRuns(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Wire up new screen names and route params for blueprint create/duplicate and scenario create flows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…omplete - scenario create: supports --scoring-command (simple) and --scoring-file (JSON) with all environment, metadata, and validation options - benchmark-run cancel <id>: cancels a running benchmark run - benchmark-run complete <id>: completes/finalizes a running benchmark run Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- BlueprintCreatePage form with name, source type (dockerfile/base), setup commands, architecture, resources, and metadata fields - When duplicating (baseBlueprintId provided), all fields auto-populate from the base blueprint with name suffixed "-copy" - Blueprint detail screen gets "Duplicate Blueprint" operation (shortcut u) - Blueprint list screen gets "Create" shortcut (c) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When a benchmark run is in "running" state, shows cancel (x) and complete (f) operations with confirmation prompts before executing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Benchmark detail screen gets "View Benchmark Runs" operation (shortcut r) that navigates to the run list filtered by benchmark ID - BenchmarkRunListScreen accepts benchmarkId prop to filter results and updates breadcrumb to show benchmark context Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Multi-step form supporting all scorer types: command, bash script, python script, test-based, AST grep, and custom. Includes inline scorer sub-editor, weight validation, environment source selection, metadata, required env vars/secrets, and validation type options. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add error state input handling (retry/cancel) in BlueprintCreatePage and ScenarioCreateScreen (was unreachable due to early return) - Implement scorer edit (e=N) and delete (d=N) handlers in scenario create form to match displayed hints - Deduplicate scorer save logic into shared helper - Show operationError in BenchmarkRunDetailScreen when cancel/complete fails - Validate scorerTimeout is numeric in both TUI and CLI - Add JSON parse error handling for custom_scorer params - Remove dead code branch in scenario create CLI command Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds two new TUI screens accessible from the axon detail screen: - Axon Events: Paginated table of events with navigation and refresh - SQL Workbench: Interactive SQL query interface with dynamic result tables The origin field now displays enum names (SYSTEM_EVENT, AGENT_EVENT, etc.) instead of numeric values thanks to the mapping added in axonService.ts. Keyboard shortcuts from axon detail: - [e] View Events - [s] SQL Workbench Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jason-rl force-pushed the jason/tui-feature-parity branch from 7c3e460 to 34faee9 Compare May 5, 2026 19:52

jason-rl force-pushed the jason/tui-object-create-blueprint-dup-axon-events branch from 2860697 to 9ff1d63 Compare May 5, 2026 19:52

jason-rl force-pushed the jason/tui-feature-parity branch 2 times, most recently from 2ff0050 to 98a60c8 Compare May 5, 2026 22:01

jason-rl force-pushed the jason/tui-object-create-blueprint-dup-axon-events branch from d480a63 to 357b52a Compare May 5, 2026 22:14

jason-rl force-pushed the jason/tui-feature-parity branch from 98a60c8 to 2678e87 Compare May 5, 2026 22:14

jason-rl force-pushed the jason/tui-feature-parity branch from 2678e87 to 86c227d Compare May 6, 2026 18:48

jason-rl mentioned this pull request May 6, 2026

feat: add TUI features and fix benchmark pagination total count #230

Merged

42 tasks

jason-rl force-pushed the jason/tui-object-create-blueprint-dup-axon-events branch 2 times, most recently from a9e6a98 to fbc586a Compare May 11, 2026 19:51

Base automatically changed from jason/tui-object-create-blueprint-dup-axon-events to main May 11, 2026 19:51

jason-rl and others added 9 commits May 11, 2026 13:15

feat: add blueprint-create and scenario-create screen routes

0270327

Wire up new screen names and route params for blueprint create/duplicate and scenario create flows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add cancel and complete operations to benchmark run detail screen

2cae870

When a benchmark run is in "running" state, shows cancel (x) and complete (f) operations with confirmation prompts before executing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: update README command structure for new CLI commands

e1fb57b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jason-rl force-pushed the jason/tui-feature-parity branch from 86c227d to bf4d9e8 Compare May 11, 2026 20:48

jason-rl force-pushed the jason/tui-feature-parity branch from bf4d9e8 to f79b0f5 Compare May 11, 2026 23:07

jason-rl added 2 commits May 11, 2026 16:24

feat: add public/custom tab toggle for blueprints list

a9275a9

fix: unify CLI list subcommand default --limit to 20

aca548c

jason-rl force-pushed the jason/tui-feature-parity branch from f79b0f5 to aca548c Compare May 11, 2026 23:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: TUI feature parity — blueprint create, scenario create, benchmark run ops#233

feat: TUI feature parity — blueprint create, scenario create, benchmark run ops#233
jason-rl wants to merge 12 commits into
mainfrom
jason/tui-feature-parity

jason-rl commented May 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jason-rl commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Changes Made

Keyboard Shortcuts

Skipped

Testing

Checklist

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jason-rl commented May 2, 2026 •

edited

Loading