Feat: onnx export cli#3521
Open
tsuu-abj wants to merge 4 commits intohuggingface:mainfrom
Open
Conversation
Introduces lerobot/export/ as a new top-level module and the lerobot-export CLI entry point for exporting trained LeRobot policies to ONNX and TensorRT for edge/embedded deployment. Core components: - export/core.py: ExportConfig dataclass, main export() orchestrator - export/onnx_export.py: torch.export (dynamo) + torch.onnx fallback paths - export/tensorrt_export.py: TensorRT engine compilation - export/normalization.py: norm/denorm ONNX nodes via public processor API - export/sample_inputs.py: sample-input construction from processor metadata - export/validation.py: round-trip numerical correctness checks Adds onnxruntime, onnx, and tensorrt as optional extras in pyproject.toml.
Moves ONNX wrapper logic out of a monolithic export/wrappers.py into per-policy export modules (policies/act/export_act.py, policies/diffusion/export_diffusion.py). The core dispatch layer auto-discovers make_<type>_export_wrapper by name, so new policy support does not require editing the central export module.
Extracts two reusable adapter primitives into export/adapters/: - DictBatchAdapter: converts dict-of-tensors policy I/O to positional args - IterativeDenoisingAdapter: wraps diffusion-style policies for single-pass export Adds VQBET export support and extends validation.py with --num-validation-trials for statistical correctness checks against real Hub checkpoints.
Replaces ExportConfig.diffusion_mode with free-form policy_options: dict[str, str] so future policy types pass export parameters without modifying the CLI dataclass. Adds make_<type>_export_artifacts auto-discovery for per-policy auxiliary ONNX files. Fixes onnx.checker.check_model to accept a path string for models larger than 2 GiB.
|
hey, just flagging that VLA ONNX/TRT export for pi0, pi0.5, smolvla, and gr00t is already working in reflex-vla (github.com/FastCrest/reflex-vla) if it's useful reference for the PR. the VLA export is the hard part. smolvla in particular took a bunch of patches to get right (broken kv cache wiring, wrong sinusoidal embedding, 5D vs 4D image dims from the AutoProcessor). we also bake the denoising loop unrolled into the monolithic ONNX so the TRT engine is a single graph. might save some time vs figuring it out from scratch. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title
feat(export): add
lerobot-exportCLI for ONNX/TensorRT edge deploymentSummary / Motivation
Adds a
lerobot-exportCLI that exports trained LeRobot policies to ONNX (andoptionally TensorRT) for edge deployment on devices such as Jetson Orin Nano/NX.
Users running trained policies on Jetson-class devices need an optimized
inference format; ONNX + TensorRT FP16 provides significant speedup over
PyTorch on SM_87 GPUs without requiring the Python training stack at inference
time.
Related issues
What changed
lerobot-export(entry pointlerobot.scripts.lerobot_export)supporting
--format=onnx(default) and--format=tensorrt.lerobot.exportpackage with:ExportSpecdataclass +register_export_wrapperplugin registry withauto-discovery by naming convention (
policies/<type>/export_<type>.py).DictBatchAdapter(Pattern A) andIterativeDenoisingAdapterABC (Pattern B) inexport/adapters/.dynamo,legacy, andauto(tries dynamo,falls back to legacy).
validate_onnx()— post-export parity check (max_abs_error, cos_sim,torch.allclose). Supports--validation-trials=Nfor random-input trials.export_to_tensorrt()viatrtexecsubprocess with engine cache andhardware guards (FP8/INT8 on SM_87 raises; FP16 on SM<80 warns).
save_normalization_stats()+NormalizedWrapper(opt-in--fold-normalizationbakes stats as ONNX constants).policies/act/export_act.py— ACT viaDictBatchAdapterpolicies/diffusion/export_diffusion.py— UNet-only and full DDIM loop(
--diffusion-mode=ddim-N)policies/vqbet/export_vqbet.py— VQ-BeT viaDictBatchAdapterNotImplementedErrorwith a concrete extension guide instead of silently producing wrong output.
export(onnx,onnxruntime,onnxscript) inpyproject.toml;_onnx_available/_onnxruntime_availableadded to
import_utils.py.No breaking changes to existing APIs.
How was this tested (or how to run locally)
Unit tests (31 passed, 1 skipped on macOS/MPS):
End-to-end validation against real Hub checkpoints (macOS/CPU, opset 18,
--validation-trials=5):TensorRT engine builds require CUDA +
trtexec; the hardware-guard error pathsare covered by
tests/test_export.py::test_export_to_tensorrt_raises_without_cuda.Checklist (required before merge)
pre-commit run -a)pytest)Reviewer notes
--exporter=dynamoproduces a fixed
batch_size=1ONNX becauseACT.forwardallocatestorch.zeros([batch_size, latent_dim])inside the model, causingtorch.exportto specialize on the concrete value. Fixing this requiresmodifying
ACT.forwarditself and is out of scope for this PR.export/adapters/contains the two reusable primitives intended to keepfuture per-policy adapters thin. Feedback on the API surface is welcome.
development host); a GPU CI job is a follow-up.
lerobot/act_aloha_sim_insertion_human) do notship
policy_preprocessor.json, sonormalization_stats.jsonis written as{}with a warning — this is expected and logged clearly.