Skip to content

Incorporating SIGReg#3486

Open
dennisant wants to merge 49 commits intohuggingface:mainfrom
varungiridhar:sigreg
Open

Incorporating SIGReg#3486
dennisant wants to merge 49 commits intohuggingface:mainfrom
varungiridhar:sigreg

Conversation

@dennisant
Copy link
Copy Markdown

  • Added SIGReg (Sketch Isotropic Gaussian Regularizer) to the ACT+AWM policy's world model latent space
  • Encourages isotropic Gaussian structure via variance and covariance regularization, improving world-model prediction quality for planning
  • Controlled via use_sigreg and sigreg_weight config flags (disabled by default)

varungiridhar and others added 30 commits January 22, 2026 13:47
…the only mode

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
act_simple: remove sine pos embed option, make learned 2D embeddings …
Add AGENTS.md for AI coding agent guidance
…huggingface#3102)

Add a `cudnn_deterministic` flag to `TrainPipelineConfig` (default: False)
that sets `torch.backends.cudnn.deterministic = True` and disables benchmark
mode, eliminating CUDA floating-point non-determinism at the cost of ~10-20%
training speed. When False (default) the existing benchmark=True behaviour
is preserved.
* add basic awm

* bugfix AR policy

* add testing for tokenizer

* add wandb log analysis help for claudde

* add label smoothing

* add cosine learning schedule

* add basic world model

* add WM image decoder for debugging

Adds a lightweight (~160K param) convolutional image decoder driven by the
detached world model latent z_pred. Reconstructed images are logged alongside
ground-truth next-state frames locally (wm_viz/step_*.png) and to wandb at
eval_freq frequency.

- WMImageDecoder: Linear → reshape → 5× stride-2 ConvTranspose2d → Tanh
- decoder_loss (MSE, detached from main model) added to training objective
- AWMPolicy.visualize() for on-demand GT/decoded pair generation
- WandBLogger.log_images() for wandb image panel logging
- _log_wm_visualizations() helper in training loop, fires at eval_freq

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* add determenism settings for cuda

* working latent state decoder

* world model bugfixes

* remove option to predict encoder output as world model prediction

* world model take vision features as input

* EMA, cosine learning rate, and normalized MSE + variance reg loss.

* Logging + extra training schedule parameters

* Flow action world models (policy is flow)

* fawm added to policies in lerobot

* Add SigLIP decoder policy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add ACT simple with AWM head policy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Register act_simple_with_awm_head and fawm policies in factory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add goal state generation script and fix vis_dir mkdir

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add print_task_details utility script

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: Varun Giridhar <32874672+varungiridhar@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: varungiridhar <varungiridhar21@gmail.com>
Co-authored-by: Varun Giridhar <32874672+varungiridhar@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… gcp

- Add GCPPlanner: Monte Carlo score-function gradient estimator that
  iteratively refines action trajectories via gradient descent on the
  latent cosine-similarity cost
- Antithetic perturbation pairs for variance reduction
- Early stopping when max abs action change < convergence_tol
- New PlanningConfig fields: lr, lr_decay, convergence_tol, antithetic
- Register "gcp" in make_planner(); "mppi" unchanged
- Rename MCGradPlanner → GCPPlanner, "mcgrad" → "gcp" throughout

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Test-time latent planning (MPPI + GCP) for ACT+AWM policy
…coder

GCPPlanner now differentiates the cosine-similarity cost directly
w.r.t. the action sequence via torch.autograd.grad through the WM
decoder, rather than estimating gradients via the score-function trick.

Key changes:
- GCPPlanner.optimize: use torch.enable_grad() + autograd.grad instead
  of antithetic perturbation sampling; raise RuntimeError with clear
  diagnostics if cost has no grad_fn or grad is None (indicates wiring
  bug rather than silently returning unoptimized actions)
- MPPIPlanner.optimize: owns its own torch.no_grad() context (moved
  from @torch.no_grad on _plan_action_chunk) for cleaner context
  ownership per planner
- _plan_action_chunk: remove @torch.no_grad() decorator; each planner
  now manages its own gradient context
- lerobot_eval.py: switch select_action call from torch.inference_mode()
  to torch.no_grad() so GCPPlanner's torch.enable_grad() block can
  override it (inference_mode tensors are permanently non-differentiable
  and cannot be overridden by enable_grad)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Algorithm identifier changes: "gcp" → "gbp", GCPPlanner → GBPPlanner,
and all associated docstring/comment references.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat(planning): latent-space test-time planning for ACT+AWM (MPPI & GBP)
…ogging

Major refactor of self-improvement utilities:

- Pretrain replay mixing: both finetune() and finetune_wm() mix online data
  with pretrain replay from lerobot/pusht at configurable ratios.
  finetune: 50% pretrain / 50% success, finetune_wm: 60% pretrain / 40% online.

- finetune_wm(): new WM-only finetuning function that freezes all params
  except WM decoder internals. Crucially freezes wm_cross_attn_proj so the
  WM decoder sees the same key-value space as during pretraining.

- finetune(): end-to-end training on success data + pretrain replay.
  Returns (checkpoint_path, new_global_step) for continuous step tracking.

- Comprehensive wandb logging: core loss curves, validation metrics on
  pretrain held-out set, representation health (norms, stds, effective rank).

- TrajectoryBuffer: added "failure_only" mode for WM training on
  suboptimal trajectories.

- Playground: scaffolding for eval → finetune → finetune_wm pipeline
  with wandb integration and configurable hyperparameters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use video_backend='pyav' for pretrain replay dataset (torchcodec
  FFmpeg libs not available on all nodes)
- Only include observation features that exist in the dataset
  (lerobot/pusht has no observation.environment_state)
- Set num_workers=0 for pretrain DataLoaders
- Use compute_inference.sh for eval jobs in smoke test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- lerobot_train.py: cudnn_deterministic=True now sets all 4 flags:
  CUBLAS_WORKSPACE_CONFIG, cudnn.benchmark=False, allow_tf32=False,
  use_deterministic_algorithms(True)
- lerobot_eval.py: eval always runs in deterministic mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clean up main branch by deleting experimental policy directories
that are no longer needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add mixing parameter ("naive" vs "ratio") to finetune/finetune_wm:
  naive concatenates online+pretrain data and samples uniformly,
  ratio keeps the fixed per-batch split.
- Add load_optimizer toggle to resume Adam momentum/variance from
  checkpoint, with name-based param matching across different
  param group layouts (pretrain 2-group vs finetune 1-group).
- Replace evaluate_final SLURM submission with inline GPU eval.
- Remove unused collect_eval_results and subprocess import.
- Fix checkpoint save layout: pretrained_model/ + training_state/
  as siblings, matching the pretrain checkpoint structure.
- Add output_dir parameter to prevent nested save paths across
  iterations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add EVAL_N_EPISODES, EVAL_USE_PLANNING, EVAL_PLANNING_ALGORITHM,
  EVAL_PLANNING_OVERRIDES config knobs to playground for full eval
  flexibility (BC-only, GBP, MPPI with custom params).
- Add planning_overrides parameter to eval_and_collect and
  evaluate_final, applied via setattr on PlanningConfig.
- Guard finetune on FINETUNE_STEPS > 0 and finetune_wm on
  FINETUNE_WM_STEPS > 0 so setting either to 0 cleanly skips it.
- Default FINETUNE_WM_STEPS to 0 (skip WM finetune by default).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These modules were deleted in 8b0be16 but the imports were left
behind, causing ModuleNotFoundError on any policy load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Completes cleanup started in 8b0be16 — all imports, config branches,
policy class lookups, and processor branches for the deleted modules
are now removed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
varungiridhar and others added 19 commits April 1, 2026 18:09
…mpat fields

- Remove all act_awm, awm, fawm imports and branches from factory.py
- Restore multi_stage, gripper_closed_threshold, gripper_closed_steps
  fields in PlanningConfig for checkpoint compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add image_resize, wm_visual_pool, wm_pool_size, log_wm_action_sensitivity
to ACTSimpleWithAWMHeadConfig so draccus can deserialize older checkpoints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tune

Adds bc_mask_mode param to finetune() with three modes:
- "none" (default): full BC+WM loss on all samples
- "failure": zero BC loss on failure episodes, full on successes
- "all": zero BC loss everywhere (WM-only through e2e pipeline)

Also adds FINETUNE_ONLINE_MODE to control which buffer episodes
enter e2e finetuning (success_only/all/failure_only).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Delete self_improvement_playground.py (old orchestrator with inline
  training loop) — replaced by self_improvement.py which delegates to
  lerobot-train --resume.
- Delete self_improvement_utils.py (1,348-line training/data library) —
  replaced by self_improvement_data.py + _FinetuneDataset in
  lerobot_train.py.
- Untrack prompt_run_and_eval.md (kept on disk, added to .gitignore).
- Also gitignore results_self_improvement_deterministic.tsv.

The 3-line bc_loss_mask check in modeling_act_simple_with_awm_head.py
is intentionally kept — the new pipeline reuses it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rain/eval

Refactors the self-improvement pipeline to reuse existing training and
evaluation infrastructure instead of maintaining a separate training loop.

New files:
- self_improvement.py: orchestrator that collects on-policy data via
  eval_policy, packages it as a LeRobotDataset, merges with pretrain
  data, and calls lerobot-train --resume for continued training.
- self_improvement_data.py: data packaging utilities — converts eval
  rollouts to LeRobotDataset, creates merged (pretrain + online)
  datasets on disk with bc_loss_mask support.

Changes to existing files:
- configs/train.py: add override_lr field for post-resume LR override.
- lerobot_train.py: apply override_lr after loading optimizer state.
- wandb_utils.py: gracefully handle missing wandb run on resume with
  new output_dir (self-improvement finetuning creates a new run).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Don't crash on import when sys.argv[1] is missing (allows importing
  eval_and_collect/run_finetune from other scripts).
- Add test_self_improvement_v2.py: full pipeline test (collect → package
  → merge → finetune via --resume). All tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace 7-min on-disk merge with instant _FinetuneDataset wrapper in
  lerobot_train.py. Uses --online_dataset_root to concatenate pretrain +
  online datasets at load time (zero data copying).
- _FinetuneDataset handles mismatched keys (next.reward etc.) and
  injects bc_loss_mask=1.0 for pretrain samples when needed.
- Add online_dataset_root field to TrainPipelineConfig.
- Remove dead merge code from self_improvement_data.py.
- Add EVAL_AVG_MAX_REWARD / EVAL_EP_S output lines for TSV logging.
- Update orchestrator docstring and run_finetune signature.
- Update test to use instant concat path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…er fix)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eckpoint)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without this, LambdaLR.step() immediately overwrites the manual LR
override on the very first training step, making override_lr a no-op.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… runs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d train integration

Adds a complete self-improvement pipeline that collects on-policy trajectories,
packages them into LeRobotDatasets, and finetunes via in-process train() calls.

Key changes:
- self_improvement.py: CLI-configured orchestrator (collect → package → train → eval)
  with multi-iteration support, data accumulation, dual BC/planning eval, and
  deterministic execution
- self_improvement_data.py: converts eval rollouts into LeRobotDatasets with
  optional bc_loss_mask for failure masking
- lerobot_train.py: _FinetuneDataset for instant pretrain+online concatenation,
  dataset= parameter for caller-provided datasets, LR override post-resume,
  parameter freezing via trainable_param_keywords, FinetuneDataset-aware sampler
- configs/train.py: override_lr, trainable_param_keywords, online_dataset_root
  fields; validate() supports in-process callers pre-setting checkpoint_path
- pyproject.toml: lerobot-self-improve entry point

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r WM representations

SIGReg (Sketch Isotropic Gaussian Regularizer) encourages latent embeddings
to be approximately Gaussian-distributed via random projections and the
Epps-Pulley statistic, replacing the hard L2 norm constraint with a soft
regularization loss. Based on LeWorldModel (arXiv:2603.19312).

Made-with: Cursor
Integrates FastWAM (Wan2.2-TI2V-5B based VLA) into LeRobot as a
plug-and-play policy for LIBERO evaluation.

Key design decisions:
- __init__ always loads VAE + T5 text encoder from Wan2.2 pretrained;
  fine-tuned DiT weights come from model.safetensors via from_pretrained
- CausalConv3d.forward uses F.conv3d directly to avoid dispatch to
  slow_conv3d_forward (CPU-only kernel) on modern GPUs (H100/A100)
- FastWAMPolicy.to() syncs model.device (plain Python attr, not updated
  by nn.Module.to()) to keep VAE/text-encoder on the right device
- MIN_MAX normalization for state+action; IDENTITY for images (VAE
  normalises to [-1,1] internally)
- LiberoProcessorStep flips images 180° (LIBERO raw frames are upside-down)
- FastWAMGripperRemapStep: 1-2x then sign() maps checkpoint gripper
  convention to LIBERO convention
- libero_10 max_steps bumped from 520 → 700 to match original eval config

Evaluated at 80% success (1 episode/task) on libero_10 with
num_inference_steps=10 on H100.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move FastWAM checkpoints from fastwam/ to awm/ shared directory:
  hf_checkpoint_minmax  → awm/fastwam_checkpoint
  wan22_weights         → awm/fastwam_wan22_weights

Hardcode defaults so users need zero env setup:
- modeling_fastwam.py: os.environ.setdefault("DIFFSYNTH_MODEL_BASE_PATH", ...)
  so Wan2.2 weights are found automatically on import
- configuration_fastwam.py: FASTWAM_CHECKPOINT_PATH and
  FASTWAM_WAN22_WEIGHTS_PATH constants pointing to awm/ locations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add processor_fastwam.py and fix forward/backward training path.

processor_fastwam.py:
- make_fastwam_pre_post_processors: standard pre/post pipeline with
  MIN_MAX normalization for state+action, IDENTITY for images (VAE
  handles [-1,1] internally), device placement

modeling_fastwam.py:
- forward: unpack (loss, loss_dict) tuple from training_loss correctly
- forward: unsqueeze proprio (B,D) → (B,1,D) as build_inputs expects 3D
- _prepare_video_for_training: handle 4D images (n_obs_steps=1) by
  tiling single frame to T=5 (minimum valid T%4==1, T>1)

configuration_fastwam.py:
- observation_delta_indices: return list(range(n_obs_steps)) when
  n_obs_steps > 1 so datasets return multi-frame video tensors
- __post_init__: validate T%4==1 and chunk_size%(T-1)==0 for n_obs_steps>1

Verified: forward+backward with B=2, 4D images, loss=0.78,
1651 params with nonzero gradients (VAE/text-encoder frozen).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Integrates FastWAM policy into the lerobot pipeline alongside existing
ACT+AWM policy. Resolves trivial .gitignore conflict by keeping entries
from both branches.

Made-with: Cursor
…olicy

Adds variance and covariance regularization to the AWM latent space to
encourage isotropic Gaussian structure, improving world-model prediction
quality for planning.

Made-with: Cursor
@github-actions github-actions Bot added documentation Improvements or fixes to the project’s docs policies Items related to robot policies tests Problems with test coverage, failures, or improvements to testing configuration Problems with configuration files or settings processor Issue related to processor evaluation For issues or PRs related to environment evaluation, and benchmarks. labels Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

configuration Problems with configuration files or settings documentation Improvements or fixes to the project’s docs evaluation For issues or PRs related to environment evaluation, and benchmarks. policies Items related to robot policies processor Issue related to processor tests Problems with test coverage, failures, or improvements to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants