feat(rollout): decouple policy deployment from data recording with new lerobot-rollout CLI#3413
feat(rollout): decouple policy deployment from data recording with new lerobot-rollout CLI#3413imstevenpmwork merged 24 commits intomainfrom
lerobot-rollout CLI#3413Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@claude Review this PR |
|
Claude finished @imstevenpmwork's task in 4m 1s —— View job PR Review:
|
…move rename_map config
|
@claude Review this PR |
|
Claude finished @imstevenpmwork's task in 3m 57s —— View job PR Review:
|
| Priority | Issue | Location |
|---|---|---|
| Must-fix | DAgger correction frames dropped by record_stride |
dagger.py:449, 624 |
| Must-fix | Highlight duplicate frame in ring buffer after save | highlight.py:170-189 |
| Should-fix | Doc/validation mismatch on rollout_ prefix |
inference.mdx:62 / context.py:349 |
| Should-discuss | Global default constant changes (50/100 MB) | datasets/utils.py:74-75 |
The remaining items (dead code, logging inconsistency, missing tests) are non-blocking but worth tracking.
|
Claude's important review comments were addressed in commit: 4499519 |
…olicy frequency instead of interpolator rate Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
c59c737 to
96471b5
Compare
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
also reducing default length to 5 minutes
|
Tested this PR with: Both trained on this dataset: omx_multicubes_merged Checked:
|
|
@claude Review this PR |
|
Claude finished @imstevenpmwork's task in 3m 53s —— View job PR Review:
|
| Issue | Status |
|---|---|
DatasetRecordConfig.__post_init__ timestamp breaks resume |
Fixed (stamp_repo_id() method) |
| DAgger dataset validation missing | Fixed (added to needs_dataset) |
| Partial action dict silently drops keys | Fixed (now raises ValueError at core.py:299-300) |
| Highlight duplicate frame in ring buffer | Fixed (continue at highlight.py:175) |
Doc/validation mismatch on rollout_ prefix |
Fixed (docs now use rollout_ prefixed names) |
logging.warning vs logger.warning inconsistency |
Fixed (all strategy files now use logger) |
DAgger correction frames dropped by record_stride |
Acknowledged — author decided to keep current behavior |
| Global default constants changed (50/100 MB) | Still present — see below |
Dead code _teleop_smooth_move_to |
Still present — minor, tracked by TODO |
| PEFT test removed with no rollout equivalent | Still missing |
Bugs & Correctness Issues
1. estimate_max_episode_seconds includes channel dimension in pixel count
src/lerobot/rollout/strategies/core.py:240-244
# (H, W, C)
# We want to extract the spatial dimensions.
if len(shape) == 3:
pixels = shape[0] * shape[1] * shape[2] # includes CThe comment says "extract the spatial dimensions" but the code includes the channel dimension. Bits-per-pixel (conservative_bpp = 0.1) is a per-spatial-pixel metric. Including channels (C=3) inflates the bitrate estimate by 3x, which makes the computed episode duration 3x shorter than intended. Episodes will rotate ~3x more frequently than the target_video_file_size_mb warrants.
Fix: use pixels = shape[0] * shape[1] for spatial-only dimensions.
2. Record script eval_ prefix check doesn't handle namespaced repo IDs
src/lerobot/scripts/lerobot_record.py:392
if cfg.dataset.repo_id.startswith("eval_"):If repo_id = "user/eval_data", this check passes because the string starts with "user/", not "eval_". The equivalent check in rollout at context.py:353-354 correctly extracts the name part: repo_name = cfg.dataset.repo_id.split("/", 1)[-1]. The record check should be consistent:
repo_name = cfg.dataset.repo_id.split("/", 1)[-1]
if repo_name.startswith("eval_"):Design Concerns
3. Global default constants halved — affects all existing users
src/lerobot/datasets/utils.py:74-75
DEFAULT_DATA_FILE_SIZE_IN_MB = 50 # was 100
DEFAULT_VIDEO_FILE_SIZE_IN_MB = 100 # was 200These constants are used throughout the dataset system (not just rollout). Halving them changes episode rotation and file chunking behavior for existing pipelines, training scripts, and any code that depends on these defaults. Combined with issue 1 (3x shorter episodes from channel inclusion), sentry/dagger continuous recording may produce very small episodes. This change should be discussed separately or at minimum called out in the PR description.
4. v is float identity check in feature filtering
src/lerobot/rollout/context.py:286
if isinstance(v, tuple) or (v is float and k.endswith(".pos"))v is float checks identity — whether v is literally the float type object. This works because the feature system uses float as a sentinel, but it's fragile and surprising. v == float is semantically clearer and less likely to confuse future developers.
5. DAgger record_stride on correction frames (known)
src/lerobot/rollout/strategies/dagger.py:462, 641 — When interpolation_multiplier > 1, human correction frames are subsampled. The author acknowledged this and chose to keep it. Just noting it's still present for future reference — a comment in the code explaining this intentional behavior would help.
Minor Issues
6. Dead code: _teleop_smooth_move_to
src/lerobot/rollout/strategies/dagger.py:182-203 — Defined but never called; all call sites are commented out with TODOs. Low priority, tracked by the TODO.
7. estimate_max_episode_seconds test uses (C, H, W) shape
tests/test_rollout.py:263
features = {"cam": {"dtype": "video", "shape": (3, 480, 640)}}The function's comment says shapes are (H, W, C), but the test uses (3, 480, 640) which is (C, H, W). Since the function currently multiplies all three dimensions regardless of order, this doesn't affect the result — but it indicates either the comment or the test has the wrong shape convention.
What Looks Good
- Clean architectural split —
lerobot-recordis now teleop-only,lerobot-rollouthandles all policy deployment. The boundary is well-defined. - Thread-safe robot wrapper (
robot_wrapper.py) — Lock-protected I/O with read-only property proxying. Correct for RTC concurrent access. - DAgger state machine (
dagger.py:88-164) — Transition validation table with thread-safe consume/request is robust and well-tested. - Inference engine ABC (
base.py) — Minimal contract with sensible no-op defaults fornotify_observation,pause,resume. Makes new backends easy to add. stamp_repo_id()fix — Clean separation: explicit method call at create time, skipped on resume. Good fix from prior review.DatasetRecordConfigextraction — Shared between record and rollout, eliminates config duplication.- Action interpolator extraction — Clean decoupling from RTC module to
utils/with backward-compat re-export. - RTC error handling (
rtc.py) — Consecutive error counting, graceful global shutdown propagation, compile warmup management. Production-quality. - Background push pattern — Consistent across sentry, highlight, dagger: single-worker
ThreadPoolExecutorwith_episode_lockserialization. - Observation processing throttling (
core.py:72-93) —_process_observation_and_notifyonly runs the expensive pipeline on policy ticks, caching results for interpolated ticks. - Config validation (
configs.py:224-319) — Comprehensive fail-fast checks including streaming enforcement per strategy, DAggernum_episodesresolution, device propagation. - Examples —
so100_to_so100_EE/rollout.pyandphone_to_so100/rollout.pyare clean demonstrations of the programmatic API with custom kinematics processors. - Docs —
inference.mdxis thorough with clear examples for every strategy and backend combination.
Summary
This is a well-designed, significant architectural improvement. The separation of concerns is clean, the code is consistently documented, and hardware testing confirms no regressions.
| Priority | Issue | Location |
|---|---|---|
| Should-fix | Episode duration 3x shorter due to channel dim in pixel count | core.py:243 |
| Should-fix | eval_ prefix check bypassed for namespaced repo IDs |
lerobot_record.py:392 |
| Worth discussing | Global default constants halved (50/100 MB) | datasets/utils.py:74-75 |
| Minor | v is float identity check fragile |
context.py:286 |
| Minor | Test shape convention mismatch | test_rollout.py:263 |
The remaining items are non-blocking. Overall this is in good shape for merge after addressing issues 1-2.
Resolve conflicts: - groot_n1.py: take upstream's plain class annotations (the @DataClass decorator was removed upstream, so field() defaults no longer apply). - lerobot_record.py: keep HEAD's full file. Upstream's rollout-decoupling refactor (huggingface#3413) removes policy support from lerobot_record entirely, but HEAD has multiple features built on policy-in-record (action interpolation, torque visualizer with policy mode, friction comp, record_effort). The new lerobot-rollout system from upstream is still added as a parallel path; existing lerobot_record features are preserved. - uv.lock: regenerated.
Summary / Motivation
lerobot-record) and policy deployment (lerobot-rollout). Previously,lerobot-recordhandled teleoperation, policy inference, RTC async inference, and DAgger-style human-in-the-loop corrections in a single script with growing conditional complexity. This PR extracts all policy-related execution into a newlerobot.rolloutmodule with a dedicated CLI, leavinglerobot-recordas a focused teleoperation-only tool.RolloutContextbundle. This makes each deployment mode independently testable and extensible without touching the others.Related issues
What changed
src/lerobot/rollout/module — Complete policy deployment engine with:BaseStrategy(autonomous, no recording),SentryStrategy(continuous recording + auto-upload),HighlightStrategy(ring buffer + keystroke save),DAggerStrategy(human-in-the-loop RaC with keyboard/pedal input)SyncInferenceEngine(inline) andRTCInferenceEngine(async background thread with action queue)RolloutContextdependency injection bundle with sub-contexts (runtime, hardware, policy, processor, dataset)ThreadSafeRobotwrapper for lock-protected concurrent hardware accessRolloutRingBuffermemory-bounded circular buffer for highlight captureChoiceRegistry(--strategy.type=...,--inference.type=...)lerobot-rolloutCLI (src/lerobot/scripts/lerobot_rollout.py) registered inpyproject.tomllerobot-record— Stripped all policy/inference code; teleop is now required.DatasetRecordConfig→src/lerobot/configs/dataset.py(shared between record and rollout)ActionInterpolator→src/lerobot/utils/action_interpolator.py(decoupled from RTC policy module)start_pedal_listener→src/lerobot/utils/pedal.py(evdev foot pedal support with graceful degradation)policies/rtc/action_queue.py— added missing lock guards onqsize(),empty(),get_action_index()examples/hil/hil_data_collection.py(1184 lines),examples/hil/hil_utils.py,examples/rtc/eval_with_real_robot.py(673 lines) — replaced bylerobot-rolloutrollout.pyand updatedevaluate.pyfor lekiwi, phone-to-so100, and so100-EE setupsdocs/source/inference.mdxcovering all strategies and examples; updatedhil_data_collection.mdx,il_robots.mdx,rtc.mdxHow was this tested (or how to run locally)
tests/test_rollout.py— covers config validation, ring buffer, thread-safe robot, strategy/inference factory dispatch, DAgger state machine transitions, and context dataclass fields. Run withpytest -q tests/test_rollout.py -svvlerobot-rollout --strategy.type=baselerobot-rollout --strategy.type=sentrylerobot-rollout --strategy.type=daggerwith keyboardlerobot-rollout --strategy.type=highlightwith keyboardlerobot-recordpure teleoperation — no regressionsmainand in this branch. Results were successful as the behavior was the same between these two.lerobot-rollout. The robot folded the t-shirt successfully, proving the compatibility with old working models:Checklist (required before merge)
pre-commit run -a)pytest)Reviewer notes
policies/rtcas least as possible. We can go back to improving it in the near-future.tests/strategy repository-wide in the future.