Skip to content

feat: TUI feature parity — blueprint create, scenario create, benchmark run ops#233

Draft
jason-rl wants to merge 12 commits into
mainfrom
jason/tui-feature-parity
Draft

feat: TUI feature parity — blueprint create, scenario create, benchmark run ops#233
jason-rl wants to merge 12 commits into
mainfrom
jason/tui-feature-parity

Conversation

@jason-rl
Copy link
Copy Markdown
Contributor

@jason-rl jason-rl commented May 2, 2026

Description

Adds TUI and CLI features to bring rl-cli closer to parity with the runloop-fe frontend, plus unifies CLI list subcommand defaults.

Type of Change

  • New feature (non-breaking change which adds functionality)

Changes Made

  • TUI Axon Events Screen: View axon events with pagination, relative timestamps (via formatTimeAgo, consistent with other TUI list tables), and origin enum display.
  • TUI Axon SQL Workbench: Interactive SQL query interface for axons with result display.
  • Blueprint create/duplicate TUI: Full form with name, source type (dockerfile/base), setup commands, architecture, resources, and metadata. Duplicating from detail screen (u) auto-populates all fields from the base blueprint with name suffixed "-copy".
  • Blueprint public/custom tab toggle: Tab key toggles between public and custom blueprints with visual tab bar, matching the benchmark/agent list pattern. Defaults to Custom tab.
  • Scenario create TUI: Multi-step form supporting all 6 scorer types (command, bash script, python script, test-based, AST grep, custom). Includes inline scorer sub-editor with edit/delete, weight validation, environment source selection, metadata, required env vars/secrets, and validation type.
  • Benchmark run cancel/complete: TUI operations (x to cancel, f to complete) on running benchmark runs with confirmation prompts. CLI commands rli benchmark-run cancel <id> and rli benchmark-run complete <id>.
  • View benchmark runs from benchmark detail: New operation (r) navigates to filtered run list for the selected benchmark.
  • CLI scenario create: rli scenario create with --scoring-command (simple) or --scoring-file (full JSON) plus all environment, metadata, and validation options.
  • Unified CLI list --limit defaults: All 12 CLI list subcommands now default to --limit 20 with consistent -l shorthand and "Max results" description. Previously axon/agent/scenario defaulted to 0 (unlimited) and benchmark-job had no --limit option.

Keyboard Shortcuts

Screen Key Action
Axon Detail e View Events
Axon Detail s SQL Workbench
Blueprint List c Create Blueprint
Blueprint List Tab Toggle Public/Custom
Blueprint Detail u Duplicate Blueprint
Benchmark Detail r View Benchmark Runs
Benchmark Run Detail x Cancel Run (when running)
Benchmark Run Detail f Complete Run (when running)

Skipped

  • Repo connect create — the @runloop/api-client SDK v1.16.0 has no repo connect API (the frontend uses a separate TRPC layer).

Testing

  • I have tested locally
  • I have added/updated tests
  • All existing tests pass

Checklist

  • My code follows the code style of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published

Test plan

  • Axon detail → e → events screen renders with proper origin enum names (SYSTEM_EVENT, AGENT_EVENT, etc.)
  • Axon detail → e → events table TIME column shows relative timestamps (e.g. "2 hours ago"), not absolute dates
  • Axon detail → s → SQL workbench renders, can execute queries
  • Blueprint list → opens on Custom tab by default (not Public)
  • Blueprint list → c → create form renders → fill fields → create succeeds → navigates to detail
  • Blueprint list → Tab → toggles between Custom and Public with visual tab bar
  • Blueprint detail → u → form pre-populated with base blueprint data, name has "-copy" suffix
  • Benchmark run detail (running state) → x → confirmation → cancel succeeds
  • Benchmark run detail (running state) → f → confirmation → complete succeeds
  • Benchmark run detail (non-running) → cancel/complete operations not shown
  • Benchmark detail → r → filtered run list for that benchmark
  • Scenario create → fill all fields including scorer → create succeeds
  • rli scenario create --name test --problem-statement "Fix bug" --scoring-command "pytest"
  • rli benchmark-run cancel <id> / rli benchmark-run complete <id>
  • rli agent list → returns 20 results by default (was unlimited)
  • rli axon list → returns 20 results by default (was unlimited)
  • rli scenario list → returns 20 results by default (was unlimited)
  • rli benchmark-job list → returns 20 results by default (new --limit option)
  • rli devbox list -l 5 → returns 5 results (shorthand works)
  • npx tsc --noEmit passes

🤖 Generated with Claude Code

@jason-rl jason-rl force-pushed the jason/tui-feature-parity branch from 7c3e460 to 34faee9 Compare May 5, 2026 19:52
@jason-rl jason-rl force-pushed the jason/tui-object-create-blueprint-dup-axon-events branch from 2860697 to 9ff1d63 Compare May 5, 2026 19:52
@jason-rl jason-rl force-pushed the jason/tui-feature-parity branch 2 times, most recently from 2ff0050 to 98a60c8 Compare May 5, 2026 22:01
@jason-rl jason-rl force-pushed the jason/tui-object-create-blueprint-dup-axon-events branch from d480a63 to 357b52a Compare May 5, 2026 22:14
@jason-rl jason-rl force-pushed the jason/tui-feature-parity branch from 98a60c8 to 2678e87 Compare May 5, 2026 22:14
jason-rl added a commit that referenced this pull request May 6, 2026
The TUI screens for viewing axon events and SQL workbench are being
moved to PR #233 (TUI feature parity) to better separate concerns.
This PR retains the CLI `axon events` command and service layer.

Also fixes the origin field displaying numeric values (e.g., 2 instead
of AGENT_EVENT) by adding a mapping from SQLite enum indices to string
enum names in axonService.ts.

Files moved to #233:
- AxonEventsScreen.tsx
- AxonSqlScreen.tsx
- Router/navigation integrations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jason-rl jason-rl force-pushed the jason/tui-feature-parity branch from 2678e87 to 86c227d Compare May 6, 2026 18:48
@jason-rl jason-rl force-pushed the jason/tui-object-create-blueprint-dup-axon-events branch 2 times, most recently from a9e6a98 to fbc586a Compare May 11, 2026 19:51
jason-rl added a commit that referenced this pull request May 11, 2026
## Summary
- **Object create form**: New TUI screen accessible via `c` shortcut
from object list. Supports optional file path upload or displays
pre-signed S3 URL with clipboard copy. For tar/tgz content types, a
multi-item list manager allows adding multiple file paths; for other
types, a single text input is shown. Switching content type syncs
between the two inputs bidirectionally.
- **Benchmark tab toggle**: Changed public/custom toggle from `t` key to
`Tab` key with visual tab bar matching the agent list pattern.
- **Blueprint tab toggle**: Added public/custom toggle with `Tab` key
and visual tab bar matching the benchmark/agent list pattern.
- **Benchmark pagination fix**: All benchmark list screens (Benchmark
Definitions, Benchmark Runs, Scenario Runs, Benchmark Jobs) now
correctly display total count from the API instead of current page item
count.
- **Full Details viewport fix**: Fixed off-by-one in `DetailedInfoView`
overhead calculation that caused the top border to be clipped off screen
when content filled the viewport exactly.
- **CLI: `blueprint create --base`**: Duplicate a blueprint by passing
`--base <id|name>` to `blueprint create`. The source blueprint's
parameters are used as defaults; any other flag overrides. `--name`
defaults to `{base}-copy` when `--base` is used.
- **CLI: `axon events`**: New subcommand to list events for an axon.
**Fixed origin field to display enum names (SYSTEM_EVENT, AGENT_EVENT,
etc.) instead of numeric values.**
- **CLI: `scenario list`**: New subcommand to list scenario runs with
pagination and optional benchmark run ID filter.
- **Metadata support**: `--metadata key=value` added to `object upload`
and `benchmark-job run` CLI commands. TUI create screens for objects and
benchmark jobs include a metadata key-value editor matching the devbox
create pattern.
- **Tar archive: duplicate path rejection**: `createTarBuffer` now
rejects duplicate input paths (after `path.resolve`) instead of
producing an invalid tar archive.
- **Tar archive: symlink support**: Replaced `nanotar` with
`tar-stream`. Symlinks are now stored as proper tar symlink entries
(type `2`) with their target in `linkname`, instead of being rejected.
This applies to both CLI and TUI upload flows.
- **TUI tar single-file fix**: When content type is tar/tgz and a single
file (not directory) is specified, the TUI now uploads it as-is instead
of re-archiving it. Matches the CLI behavior.
- **TUI `[d]` keybind consistency**: Changed all `[d] Remove` hints in
devbox create (gateway configs, MCP configs, agent mounts, object
mounts) to `[d] Delete` to match the rest of the codebase.
- **Benchmark job metadata navigation**: Fixed up/down arrow keys moving
the main form cursor while editing metadata key-value pairs. Added
`isActive: !inMetadataSection` guard to the main form input handler.
- **Benchmark list `[c]` keybind fix**: Changed "Create Job" shortcut
from `c` to `s` to avoid conflicting with the `[c]` = "Copy ID"
convention used in detail screens. The popup already used `s` for this
action.
- **CLI output consistency**: Standardized all CLI list/get/info
commands to default to JSON output and removed chalk-colored text tables
from `axon events`, `axon list`, `scenario list`, `scenario info`,
`benchmark-job list`, `benchmark-job summary`, and `agent show`. Text
mode now uses the shared `output()` utility's uncolored key-value
format.

**Note**: TUI axon events and SQL workbench screens have been moved to
PR #233.

## Test plan

### Automated tests (155+ new tests)

| Test file | Tests | Coverage |
|-----------|-------|---------|
| `services/benchmarkJobService.test.ts` | 27 | `buildCloneParams` (all
source/secret variants), `createBenchmarkJob` (validation, config
mapping), `listBenchmarkJobs`, `getBenchmarkJob`, `getBenchmarkRun`,
`listBenchmarkRunScenarioRuns` |
| `services/axonService.test.ts` | 15 | `listActiveAxons` (smart search,
pagination), `getAxon`, `listAxonEvents` (hasMore detection, row
mapping, origin enum mapping), `executeAxonSql` |
| `services/benchmarkService.test.ts` | 16 | `listScenarioRuns` (both
code paths), `listPublicBenchmarks`, `createBenchmarkRun`,
`listBenchmarkRuns`, `getBenchmarkRun`, `getScenarioRun`,
`listBenchmarks`, `getBenchmark` |
| `services/objectServiceApi.test.ts` | 6 | `createObject`,
`completeObject`, `uploadToPresignedUrl` |
| `services/objectService.test.ts` | +5 | `buildObjectDetailFields` edge
cases (hours format, missing size, public field) |
| `commands/object/upload.test.ts` | +3 | Symlink entries stored with
correct type/linkname, symlinks inside directory trees, duplicate path
rejection |
| `commands/axon/events.test.ts` | 5 | Output format, limit defaults,
error handling |
| `commands/blueprint/create.test.ts` | 26 | Normal create
(name+options, missing name error), --base (ID/name lookup, exact match,
fallback, not found, default name, custom name, all source params
copied, override per flag, preserves non-overridden params), output
format, error handling |
| `commands/scenario/list.test.ts` | 8 | Pagination, sorting, output
formats, filter, error handling |
| `components/allocateSectionLines.test.ts` | 10 | Zero/partial/full
budget, single-field priority, null filtering, multi-section
distribution |

### Manual tests
- [x] TUI: Objects list → press `c` → verify create form renders with
Name, Content Type, File Path fields
- [x] TUI: Create object with non-tar content type → verify single text
input for file path
- [x] TUI: Create object → switch content type to tar/tgz → verify file
path field becomes multi-item list manager
- [x] TUI: Type a path in single input, switch to tar → verify path
appears as 1st entry in list
- [x] TUI: Add multiple paths in tar mode, switch to text → verify 1st
entry populates single input; switch back to tar → verify entries >=2
are preserved
- [x] TUI: Create object without file path → verify pre-signed URL
shown, `c` copies to clipboard
- [x] TUI: Create object with file path → verify upload + completion
flow
- [x] TUI: Create object with multiple paths and tar/tgz content type →
verify archive upload
- [x] TUI: Object create form → add metadata key-value pairs → verify
passed to API
- [x] TUI: Benchmark job create → add metadata key-value pairs → verify
passed to API
- [x] TUI: Benchmark Definitions → verify `Tab` key toggles between
Public/Custom with visual tab bar
- [x] TUI: Benchmark Definitions → verify old `t` key no longer toggles
- [x] TUI: Benchmark Definitions list → verify total count matches API
total, not page item count
- [x] TUI: Benchmark Runs list → verify total count displays correctly
- [x] TUI: Scenario Runs list → verify total count displays correctly
- [x] TUI: Benchmark Jobs list → verify total count displays correctly
- [x] TUI: Any detail screen → press `i` for Full Details → verify top
border is visible (not clipped)
- [x] TUI: Blueprint detail → verify "Duplicate Blueprint" operation is
NOT shown
- [x] TUI: Blueprint list → verify `Tab` key toggles between
Public/Custom with visual tab bar
- [x] TUI: Object create → select tar/tgz, specify a single .tar file →
uploads as-is (not re-archived)
- [x] TUI: Object create → select tar/tgz, specify a directory → creates
an archive
- [x] TUI: Devbox create → verify gateway/MCP/mount sections show `[d]
Delete` consistently
- [x] TUI: Benchmark job create → enter metadata section → verify
up/down arrows only navigate metadata items
- [x] TUI: Benchmark list → verify `s` creates a job, `c` no longer
triggers create
- [x] CLI: `rli blueprint create --base <id>` → verify creates copy with
{base}-copy name
- [x] CLI: `rli blueprint create --base <id> --name custom` → verify
custom name
- [x] CLI: `rli blueprint create --base <id> --resources LARGE` → verify
override applied
- [x] CLI: `rli blueprint create --base <id> --metadata env=prod` →
verify metadata override
- [x] CLI: `rli blueprint create --help` → verify --base flag listed,
--name not marked required
- [ ] CLI: `rli object upload <file> --metadata key=value` → verify
metadata on created object
- [ ] CLI: `rli benchmark-job run --agent ... --benchmark ... --metadata
key=value` → verify metadata
- [ ] CLI: `rli axon events <id>` → verify JSON output by default,
origin shows enum names
- [ ] CLI: `rli axon events <id> -o text` → verify key-value output
- [ ] CLI: `rli scenario list` → verify JSON output by default
- [ ] CLI: `rli scenario list -o text` → verify key-value output
- [ ] CLI: `rli benchmark-job list` → verify JSON output by default
- [ ] CLI: `rli benchmark-job summary <id>` → verify JSON output by
default
- [ ] CLI: `rli scenario info <id>` → verify JSON output by default
- [ ] CLI: `rli agent show <id>` → verify JSON output by default
- [ ] CLI: Upload with duplicate paths → verify error message instead of
invalid tar
- [ ] CLI: Upload with symlink path → verify symlink stored correctly in
tar archive
- [x] Type check: `npx tsc --noEmit` passes cleanly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Base automatically changed from jason/tui-object-create-blueprint-dup-axon-events to main May 11, 2026 19:51
jason-rl and others added 9 commits May 11, 2026 13:15
…rk run cancel/complete

Add createBlueprint(), createScenario(), cancelBenchmarkRun(), completeBenchmarkRun()
service functions, and benchmarkId filter support for listBenchmarkRuns().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire up new screen names and route params for blueprint create/duplicate
and scenario create flows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…omplete

- scenario create: supports --scoring-command (simple) and --scoring-file (JSON)
  with all environment, metadata, and validation options
- benchmark-run cancel <id>: cancels a running benchmark run
- benchmark-run complete <id>: completes/finalizes a running benchmark run

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- BlueprintCreatePage form with name, source type (dockerfile/base), setup
  commands, architecture, resources, and metadata fields
- When duplicating (baseBlueprintId provided), all fields auto-populate from
  the base blueprint with name suffixed "-copy"
- Blueprint detail screen gets "Duplicate Blueprint" operation (shortcut u)
- Blueprint list screen gets "Create" shortcut (c)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a benchmark run is in "running" state, shows cancel (x) and
complete (f) operations with confirmation prompts before executing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Benchmark detail screen gets "View Benchmark Runs" operation (shortcut r)
  that navigates to the run list filtered by benchmark ID
- BenchmarkRunListScreen accepts benchmarkId prop to filter results and
  updates breadcrumb to show benchmark context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Multi-step form supporting all scorer types: command, bash script,
python script, test-based, AST grep, and custom. Includes inline scorer
sub-editor, weight validation, environment source selection, metadata,
required env vars/secrets, and validation type options.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add error state input handling (retry/cancel) in BlueprintCreatePage
  and ScenarioCreateScreen (was unreachable due to early return)
- Implement scorer edit (e=N) and delete (d=N) handlers in scenario
  create form to match displayed hints
- Deduplicate scorer save logic into shared helper
- Show operationError in BenchmarkRunDetailScreen when cancel/complete fails
- Validate scorerTimeout is numeric in both TUI and CLI
- Add JSON parse error handling for custom_scorer params
- Remove dead code branch in scenario create CLI command

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jason-rl jason-rl force-pushed the jason/tui-feature-parity branch from 86c227d to bf4d9e8 Compare May 11, 2026 20:48
Adds two new TUI screens accessible from the axon detail screen:

- Axon Events: Paginated table of events with navigation and refresh
- SQL Workbench: Interactive SQL query interface with dynamic result tables

The origin field now displays enum names (SYSTEM_EVENT, AGENT_EVENT, etc.)
instead of numeric values thanks to the mapping added in axonService.ts.

Keyboard shortcuts from axon detail:
- [e] View Events
- [s] SQL Workbench

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jason-rl jason-rl force-pushed the jason/tui-feature-parity branch from bf4d9e8 to f79b0f5 Compare May 11, 2026 23:07
@jason-rl jason-rl force-pushed the jason/tui-feature-parity branch from f79b0f5 to aca548c Compare May 11, 2026 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant