Incorporating SIGReg by dennisant · Pull Request #3486 · huggingface/lerobot

dennisant · 2026-04-29T18:00:30Z

Added SIGReg (Sketch Isotropic Gaussian Regularizer) to the ACT+AWM policy's world model latent space
Encourages isotropic Gaussian structure via variance and covariance regularization, improving world-model prediction quality for planning
Controlled via use_sigreg and sigreg_weight config flags (disabled by default)

…re state alignment

Simplified ACT

…the only mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

act_simple: remove sine pos embed option, make learned 2D embeddings …

Add AGENTS.md for AI coding agent guidance

…huggingface#3102) Add a `cudnn_deterministic` flag to `TrainPipelineConfig` (default: False) that sets `torch.backends.cudnn.deterministic = True` and disables benchmark mode, eliminating CUDA floating-point non-determinism at the cost of ~10-20% training speed. When False (default) the existing benchmark=True behaviour is preserved.

* add basic awm * bugfix AR policy * add testing for tokenizer * add wandb log analysis help for claudde * add label smoothing * add cosine learning schedule * add basic world model * add WM image decoder for debugging Adds a lightweight (~160K param) convolutional image decoder driven by the detached world model latent z_pred. Reconstructed images are logged alongside ground-truth next-state frames locally (wm_viz/step_*.png) and to wandb at eval_freq frequency. - WMImageDecoder: Linear → reshape → 5× stride-2 ConvTranspose2d → Tanh - decoder_loss (MSE, detached from main model) added to training objective - AWMPolicy.visualize() for on-demand GT/decoded pair generation - WandBLogger.log_images() for wandb image panel logging - _log_wm_visualizations() helper in training loop, fires at eval_freq Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * add determenism settings for cuda * working latent state decoder * world model bugfixes * remove option to predict encoder output as world model prediction * world model take vision features as input * EMA, cosine learning rate, and normalized MSE + variance reg loss. * Logging + extra training schedule parameters * Flow action world models (policy is flow) * fawm added to policies in lerobot * Add SigLIP decoder policy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add ACT simple with AWM head policy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Register act_simple_with_awm_head and fawm policies in factory Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add goal state generation script and fix vis_dir mkdir Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add print_task_details utility script Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Signed-off-by: Varun Giridhar <32874672+varungiridhar@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: varungiridhar <varungiridhar21@gmail.com> Co-authored-by: Varun Giridhar <32874672+varungiridhar@users.noreply.github.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… gcp - Add GCPPlanner: Monte Carlo score-function gradient estimator that iteratively refines action trajectories via gradient descent on the latent cosine-similarity cost - Antithetic perturbation pairs for variance reduction - Early stopping when max abs action change < convergence_tol - New PlanningConfig fields: lr, lr_decay, convergence_tol, antithetic - Register "gcp" in make_planner(); "mppi" unchanged - Rename MCGradPlanner → GCPPlanner, "mcgrad" → "gcp" throughout Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Test-time latent planning (MPPI + GCP) for ACT+AWM policy

…coder GCPPlanner now differentiates the cosine-similarity cost directly w.r.t. the action sequence via torch.autograd.grad through the WM decoder, rather than estimating gradients via the score-function trick. Key changes: - GCPPlanner.optimize: use torch.enable_grad() + autograd.grad instead of antithetic perturbation sampling; raise RuntimeError with clear diagnostics if cost has no grad_fn or grad is None (indicates wiring bug rather than silently returning unoptimized actions) - MPPIPlanner.optimize: owns its own torch.no_grad() context (moved from @torch.no_grad on _plan_action_chunk) for cleaner context ownership per planner - _plan_action_chunk: remove @torch.no_grad() decorator; each planner now manages its own gradient context - lerobot_eval.py: switch select_action call from torch.inference_mode() to torch.no_grad() so GCPPlanner's torch.enable_grad() block can override it (inference_mode tensors are permanently non-differentiable and cannot be overridden by enable_grad) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Algorithm identifier changes: "gcp" → "gbp", GCPPlanner → GBPPlanner, and all associated docstring/comment references. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(planning): latent-space test-time planning for ACT+AWM (MPPI & GBP)

…ogging Major refactor of self-improvement utilities: - Pretrain replay mixing: both finetune() and finetune_wm() mix online data with pretrain replay from lerobot/pusht at configurable ratios. finetune: 50% pretrain / 50% success, finetune_wm: 60% pretrain / 40% online. - finetune_wm(): new WM-only finetuning function that freezes all params except WM decoder internals. Crucially freezes wm_cross_attn_proj so the WM decoder sees the same key-value space as during pretraining. - finetune(): end-to-end training on success data + pretrain replay. Returns (checkpoint_path, new_global_step) for continuous step tracking. - Comprehensive wandb logging: core loss curves, validation metrics on pretrain held-out set, representation health (norms, stds, effective rank). - TrajectoryBuffer: added "failure_only" mode for WM training on suboptimal trajectories. - Playground: scaffolding for eval → finetune → finetune_wm pipeline with wandb integration and configurable hyperparameters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Use video_backend='pyav' for pretrain replay dataset (torchcodec FFmpeg libs not available on all nodes) - Only include observation features that exist in the dataset (lerobot/pusht has no observation.environment_state) - Set num_workers=0 for pretrain DataLoaders - Use compute_inference.sh for eval jobs in smoke test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- lerobot_train.py: cudnn_deterministic=True now sets all 4 flags: CUBLAS_WORKSPACE_CONFIG, cudnn.benchmark=False, allow_tf32=False, use_deterministic_algorithms(True) - lerobot_eval.py: eval always runs in deterministic mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Clean up main branch by deleting experimental policy directories that are no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add mixing parameter ("naive" vs "ratio") to finetune/finetune_wm: naive concatenates online+pretrain data and samples uniformly, ratio keeps the fixed per-batch split. - Add load_optimizer toggle to resume Adam momentum/variance from checkpoint, with name-based param matching across different param group layouts (pretrain 2-group vs finetune 1-group). - Replace evaluate_final SLURM submission with inline GPU eval. - Remove unused collect_eval_results and subprocess import. - Fix checkpoint save layout: pretrained_model/ + training_state/ as siblings, matching the pretrain checkpoint structure. - Add output_dir parameter to prevent nested save paths across iterations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add EVAL_N_EPISODES, EVAL_USE_PLANNING, EVAL_PLANNING_ALGORITHM, EVAL_PLANNING_OVERRIDES config knobs to playground for full eval flexibility (BC-only, GBP, MPPI with custom params). - Add planning_overrides parameter to eval_and_collect and evaluate_final, applied via setattr on PlanningConfig. - Guard finetune on FINETUNE_STEPS > 0 and finetune_wm on FINETUNE_WM_STEPS > 0 so setting either to 0 cleanly skips it. - Default FINETUNE_WM_STEPS to 0 (skip WM finetune by default). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

These modules were deleted in 8b0be16 but the imports were left behind, causing ModuleNotFoundError on any policy load. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Completes cleanup started in 8b0be16 — all imports, config branches, policy class lookups, and processor branches for the deleted modules are now removed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…mpat fields - Remove all act_awm, awm, fawm imports and branches from factory.py - Restore multi_stage, gripper_closed_threshold, gripper_closed_steps fields in PlanningConfig for checkpoint compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add image_resize, wm_visual_pool, wm_pool_size, log_wm_action_sensitivity to ACTSimpleWithAWMHeadConfig so draccus can deserialize older checkpoints. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tune Adds bc_mask_mode param to finetune() with three modes: - "none" (default): full BC+WM loss on all samples - "failure": zero BC loss on failure episodes, full on successes - "all": zero BC loss everywhere (WM-only through e2e pipeline) Also adds FINETUNE_ONLINE_MODE to control which buffer episodes enter e2e finetuning (success_only/all/failure_only). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Delete self_improvement_playground.py (old orchestrator with inline training loop) — replaced by self_improvement.py which delegates to lerobot-train --resume. - Delete self_improvement_utils.py (1,348-line training/data library) — replaced by self_improvement_data.py + _FinetuneDataset in lerobot_train.py. - Untrack prompt_run_and_eval.md (kept on disk, added to .gitignore). - Also gitignore results_self_improvement_deterministic.tsv. The 3-line bc_loss_mask check in modeling_act_simple_with_awm_head.py is intentionally kept — the new pipeline reuses it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rain/eval Refactors the self-improvement pipeline to reuse existing training and evaluation infrastructure instead of maintaining a separate training loop. New files: - self_improvement.py: orchestrator that collects on-policy data via eval_policy, packages it as a LeRobotDataset, merges with pretrain data, and calls lerobot-train --resume for continued training. - self_improvement_data.py: data packaging utilities — converts eval rollouts to LeRobotDataset, creates merged (pretrain + online) datasets on disk with bc_loss_mask support. Changes to existing files: - configs/train.py: add override_lr field for post-resume LR override. - lerobot_train.py: apply override_lr after loading optimizer state. - wandb_utils.py: gracefully handle missing wandb run on resume with new output_dir (self-improvement finetuning creates a new run). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Don't crash on import when sys.argv[1] is missing (allows importing eval_and_collect/run_finetune from other scripts). - Add test_self_improvement_v2.py: full pipeline test (collect → package → merge → finetune via --resume). All tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace 7-min on-disk merge with instant _FinetuneDataset wrapper in lerobot_train.py. Uses --online_dataset_root to concatenate pretrain + online datasets at load time (zero data copying). - _FinetuneDataset handles mismatched keys (next.reward etc.) and injects bc_loss_mask=1.0 for pretrain samples when needed. - Add online_dataset_root field to TrainPipelineConfig. - Remove dead merge code from self_improvement_data.py. - Add EVAL_AVG_MAX_REWARD / EVAL_EP_S output lines for TSV logging. - Update orchestrator docstring and run_finetune signature. - Update test to use instant concat path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…er fix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…eckpoint) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Without this, LambdaLR.step() immediately overwrites the manual LR override on the very first training step, making override_lr a no-op. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… runs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…d train integration Adds a complete self-improvement pipeline that collects on-policy trajectories, packages them into LeRobotDatasets, and finetunes via in-process train() calls. Key changes: - self_improvement.py: CLI-configured orchestrator (collect → package → train → eval) with multi-iteration support, data accumulation, dual BC/planning eval, and deterministic execution - self_improvement_data.py: converts eval rollouts into LeRobotDatasets with optional bc_loss_mask for failure masking - lerobot_train.py: _FinetuneDataset for instant pretrain+online concatenation, dataset= parameter for caller-provided datasets, LR override post-resume, parameter freezing via trainable_param_keywords, FinetuneDataset-aware sampler - configs/train.py: override_lr, trainable_param_keywords, online_dataset_root fields; validate() supports in-process callers pre-setting checkpoint_path - pyproject.toml: lerobot-self-improve entry point Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…r WM representations SIGReg (Sketch Isotropic Gaussian Regularizer) encourages latent embeddings to be approximately Gaussian-distributed via random projections and the Epps-Pulley statistic, replacing the hard L2 norm constraint with a soft regularization loss. Based on LeWorldModel (arXiv:2603.19312). Made-with: Cursor

… pipeline

Integrates FastWAM (Wan2.2-TI2V-5B based VLA) into LeRobot as a plug-and-play policy for LIBERO evaluation. Key design decisions: - __init__ always loads VAE + T5 text encoder from Wan2.2 pretrained; fine-tuned DiT weights come from model.safetensors via from_pretrained - CausalConv3d.forward uses F.conv3d directly to avoid dispatch to slow_conv3d_forward (CPU-only kernel) on modern GPUs (H100/A100) - FastWAMPolicy.to() syncs model.device (plain Python attr, not updated by nn.Module.to()) to keep VAE/text-encoder on the right device - MIN_MAX normalization for state+action; IDENTITY for images (VAE normalises to [-1,1] internally) - LiberoProcessorStep flips images 180° (LIBERO raw frames are upside-down) - FastWAMGripperRemapStep: 1-2x then sign() maps checkpoint gripper convention to LIBERO convention - libero_10 max_steps bumped from 520 → 700 to match original eval config Evaluated at 80% success (1 episode/task) on libero_10 with num_inference_steps=10 on H100. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Move FastWAM checkpoints from fastwam/ to awm/ shared directory: hf_checkpoint_minmax → awm/fastwam_checkpoint wan22_weights → awm/fastwam_wan22_weights Hardcode defaults so users need zero env setup: - modeling_fastwam.py: os.environ.setdefault("DIFFSYNTH_MODEL_BASE_PATH", ...) so Wan2.2 weights are found automatically on import - configuration_fastwam.py: FASTWAM_CHECKPOINT_PATH and FASTWAM_WAN22_WEIGHTS_PATH constants pointing to awm/ locations Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add processor_fastwam.py and fix forward/backward training path. processor_fastwam.py: - make_fastwam_pre_post_processors: standard pre/post pipeline with MIN_MAX normalization for state+action, IDENTITY for images (VAE handles [-1,1] internally), device placement modeling_fastwam.py: - forward: unpack (loss, loss_dict) tuple from training_loss correctly - forward: unsqueeze proprio (B,D) → (B,1,D) as build_inputs expects 3D - _prepare_video_for_training: handle 4D images (n_obs_steps=1) by tiling single frame to T=5 (minimum valid T%4==1, T>1) configuration_fastwam.py: - observation_delta_indices: return list(range(n_obs_steps)) when n_obs_steps > 1 so datasets return multi-frame video tensors - __post_init__: validate T%4==1 and chunk_size%(T-1)==0 for n_obs_steps>1 Verified: forward+backward with B=2, 4D images, loss=0.78, 1651 params with nonzero gradients (VAE/text-encoder frozen). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Integrates FastWAM policy into the lerobot pipeline alongside existing ACT+AWM policy. Resolves trivial .gitignore conflict by keeping entries from both branches. Made-with: Cursor

…olicy Adds variance and covariance regularization to the AWM latent space to encourage isotropic Gaussian structure, improving world-model prediction quality for planning. Made-with: Cursor

varungiridhar and others added 30 commits January 22, 2026 13:47

act_awm implementation; using raw siglip2 pretrained encoder for futu…

738397b

…re state alignment

Merge branch 'huggingface:main' into main

54e4e01

Update gitignore

c206811

Bug fix when restarting logging under DDP training

df162d0

Patch for training 1B model

8fa47f0

Update gitignore

ec49e45

add simple act

775d9b9

Merge pull request #1 from varungiridhar/simple-act

f20496a

Simplified ACT

act_simple: remove sine pos embed option, make learned 2D embeddings …

e728bf0

…the only mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

add basic agents.md

b43cef8

Merge pull request #2 from varungiridhar/simple-act

e1051eb

act_simple: remove sine pos embed option, make learned 2D embeddings …

Merge pull request #4 from varungiridhar/add-agents-md

51ccc09

Add AGENTS.md for AI coding agent guidance

Update .gitignore

b2fd62e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add AGENTS.md research context file

1d221f4

add mppi planning

7f93b54

Merge pull request #6 from varungiridhar/planning

265d465

Test-time latent planning (MPPI + GCP) for ACT+AWM policy

refactor(planning): rename GCP → GBP (gradient-based planning)

5ad09cd

Algorithm identifier changes: "gcp" → "gbp", GCPPlanner → GBPPlanner, and all associated docstring/comment references. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge pull request #7 from varungiridhar/planning

d33745a

feat(planning): latent-space test-time planning for ACT+AWM (MPPI & GBP)

Remove unused policy modules: act_awm, fawm, awm

8b0be16

Clean up main branch by deleting experimental policy directories that are no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: remove stale act_awm, awm imports from policies __init__

7ee70be

These modules were deleted in 8b0be16 but the imports were left behind, causing ModuleNotFoundError on any policy load. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

varungiridhar and others added 19 commits April 1, 2026 18:09

fix: add compat fields for older checkpoint configs

06266df

Add image_resize, wm_visual_pool, wm_pool_size, log_wm_action_sensitivity to ACTSimpleWithAWMHeadConfig so draccus can deserialize older checkpoints. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

exp: E1 bc-eval-baseline (truly_deterministic checkpoint, goal_provid…

56e7eff

…er fix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

exp: E2 gbp-eval-baseline (lr=0.3, n_iters=20, truly_deterministic ch…

4019b62

…eckpoint) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: update scheduler base_lrs when override_lr is set

16307b0

Without this, LambdaLR.step() immediately overwrites the manual LR override on the very first training step, making override_lr a no-op. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use wandb resume="allow" instead of "must" to avoid crash on new…

7d61ed1

… runs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge self-improvement-v2 for deterministic eval and self-improvement…

d533887

… pipeline

Merge feat/fastwam-integration into main

bb48ac9

Integrates FastWAM policy into the lerobot pipeline alongside existing ACT+AWM policy. Resolves trivial .gitignore conflict by keeping entries from both branches. Made-with: Cursor

feat: add SIGReg (Sketch Isotropic Gaussian Regularizer) to ACT+AWM p…

6595c1c

…olicy Adds variance and covariance regularization to the AWM latent space to encourage isotropic Gaussian structure, improving world-model prediction quality for planning. Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorporating SIGReg#3486

Incorporating SIGReg#3486
dennisant wants to merge 49 commits intohuggingface:mainfrom
varungiridhar:sigreg

dennisant commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dennisant commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants