Incorporating SIGReg#3486
Open
dennisant wants to merge 49 commits intohuggingface:mainfrom
Open
Conversation
…re state alignment
Simplified ACT
…the only mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
act_simple: remove sine pos embed option, make learned 2D embeddings …
Add AGENTS.md for AI coding agent guidance
…huggingface#3102) Add a `cudnn_deterministic` flag to `TrainPipelineConfig` (default: False) that sets `torch.backends.cudnn.deterministic = True` and disables benchmark mode, eliminating CUDA floating-point non-determinism at the cost of ~10-20% training speed. When False (default) the existing benchmark=True behaviour is preserved.
* add basic awm * bugfix AR policy * add testing for tokenizer * add wandb log analysis help for claudde * add label smoothing * add cosine learning schedule * add basic world model * add WM image decoder for debugging Adds a lightweight (~160K param) convolutional image decoder driven by the detached world model latent z_pred. Reconstructed images are logged alongside ground-truth next-state frames locally (wm_viz/step_*.png) and to wandb at eval_freq frequency. - WMImageDecoder: Linear → reshape → 5× stride-2 ConvTranspose2d → Tanh - decoder_loss (MSE, detached from main model) added to training objective - AWMPolicy.visualize() for on-demand GT/decoded pair generation - WandBLogger.log_images() for wandb image panel logging - _log_wm_visualizations() helper in training loop, fires at eval_freq Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * add determenism settings for cuda * working latent state decoder * world model bugfixes * remove option to predict encoder output as world model prediction * world model take vision features as input * EMA, cosine learning rate, and normalized MSE + variance reg loss. * Logging + extra training schedule parameters * Flow action world models (policy is flow) * fawm added to policies in lerobot * Add SigLIP decoder policy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add ACT simple with AWM head policy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Register act_simple_with_awm_head and fawm policies in factory Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add goal state generation script and fix vis_dir mkdir Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add print_task_details utility script Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Signed-off-by: Varun Giridhar <32874672+varungiridhar@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: varungiridhar <varungiridhar21@gmail.com> Co-authored-by: Varun Giridhar <32874672+varungiridhar@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… gcp - Add GCPPlanner: Monte Carlo score-function gradient estimator that iteratively refines action trajectories via gradient descent on the latent cosine-similarity cost - Antithetic perturbation pairs for variance reduction - Early stopping when max abs action change < convergence_tol - New PlanningConfig fields: lr, lr_decay, convergence_tol, antithetic - Register "gcp" in make_planner(); "mppi" unchanged - Rename MCGradPlanner → GCPPlanner, "mcgrad" → "gcp" throughout Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Test-time latent planning (MPPI + GCP) for ACT+AWM policy
…coder GCPPlanner now differentiates the cosine-similarity cost directly w.r.t. the action sequence via torch.autograd.grad through the WM decoder, rather than estimating gradients via the score-function trick. Key changes: - GCPPlanner.optimize: use torch.enable_grad() + autograd.grad instead of antithetic perturbation sampling; raise RuntimeError with clear diagnostics if cost has no grad_fn or grad is None (indicates wiring bug rather than silently returning unoptimized actions) - MPPIPlanner.optimize: owns its own torch.no_grad() context (moved from @torch.no_grad on _plan_action_chunk) for cleaner context ownership per planner - _plan_action_chunk: remove @torch.no_grad() decorator; each planner now manages its own gradient context - lerobot_eval.py: switch select_action call from torch.inference_mode() to torch.no_grad() so GCPPlanner's torch.enable_grad() block can override it (inference_mode tensors are permanently non-differentiable and cannot be overridden by enable_grad) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Algorithm identifier changes: "gcp" → "gbp", GCPPlanner → GBPPlanner, and all associated docstring/comment references. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat(planning): latent-space test-time planning for ACT+AWM (MPPI & GBP)
…ogging Major refactor of self-improvement utilities: - Pretrain replay mixing: both finetune() and finetune_wm() mix online data with pretrain replay from lerobot/pusht at configurable ratios. finetune: 50% pretrain / 50% success, finetune_wm: 60% pretrain / 40% online. - finetune_wm(): new WM-only finetuning function that freezes all params except WM decoder internals. Crucially freezes wm_cross_attn_proj so the WM decoder sees the same key-value space as during pretraining. - finetune(): end-to-end training on success data + pretrain replay. Returns (checkpoint_path, new_global_step) for continuous step tracking. - Comprehensive wandb logging: core loss curves, validation metrics on pretrain held-out set, representation health (norms, stds, effective rank). - TrajectoryBuffer: added "failure_only" mode for WM training on suboptimal trajectories. - Playground: scaffolding for eval → finetune → finetune_wm pipeline with wandb integration and configurable hyperparameters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use video_backend='pyav' for pretrain replay dataset (torchcodec FFmpeg libs not available on all nodes) - Only include observation features that exist in the dataset (lerobot/pusht has no observation.environment_state) - Set num_workers=0 for pretrain DataLoaders - Use compute_inference.sh for eval jobs in smoke test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- lerobot_train.py: cudnn_deterministic=True now sets all 4 flags: CUBLAS_WORKSPACE_CONFIG, cudnn.benchmark=False, allow_tf32=False, use_deterministic_algorithms(True) - lerobot_eval.py: eval always runs in deterministic mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clean up main branch by deleting experimental policy directories that are no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add mixing parameter ("naive" vs "ratio") to finetune/finetune_wm:
naive concatenates online+pretrain data and samples uniformly,
ratio keeps the fixed per-batch split.
- Add load_optimizer toggle to resume Adam momentum/variance from
checkpoint, with name-based param matching across different
param group layouts (pretrain 2-group vs finetune 1-group).
- Replace evaluate_final SLURM submission with inline GPU eval.
- Remove unused collect_eval_results and subprocess import.
- Fix checkpoint save layout: pretrained_model/ + training_state/
as siblings, matching the pretrain checkpoint structure.
- Add output_dir parameter to prevent nested save paths across
iterations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add EVAL_N_EPISODES, EVAL_USE_PLANNING, EVAL_PLANNING_ALGORITHM, EVAL_PLANNING_OVERRIDES config knobs to playground for full eval flexibility (BC-only, GBP, MPPI with custom params). - Add planning_overrides parameter to eval_and_collect and evaluate_final, applied via setattr on PlanningConfig. - Guard finetune on FINETUNE_STEPS > 0 and finetune_wm on FINETUNE_WM_STEPS > 0 so setting either to 0 cleanly skips it. - Default FINETUNE_WM_STEPS to 0 (skip WM finetune by default). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These modules were deleted in 8b0be16 but the imports were left behind, causing ModuleNotFoundError on any policy load. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Completes cleanup started in 8b0be16 — all imports, config branches, policy class lookups, and processor branches for the deleted modules are now removed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mpat fields - Remove all act_awm, awm, fawm imports and branches from factory.py - Restore multi_stage, gripper_closed_threshold, gripper_closed_steps fields in PlanningConfig for checkpoint compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add image_resize, wm_visual_pool, wm_pool_size, log_wm_action_sensitivity to ACTSimpleWithAWMHeadConfig so draccus can deserialize older checkpoints. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tune Adds bc_mask_mode param to finetune() with three modes: - "none" (default): full BC+WM loss on all samples - "failure": zero BC loss on failure episodes, full on successes - "all": zero BC loss everywhere (WM-only through e2e pipeline) Also adds FINETUNE_ONLINE_MODE to control which buffer episodes enter e2e finetuning (success_only/all/failure_only). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Delete self_improvement_playground.py (old orchestrator with inline training loop) — replaced by self_improvement.py which delegates to lerobot-train --resume. - Delete self_improvement_utils.py (1,348-line training/data library) — replaced by self_improvement_data.py + _FinetuneDataset in lerobot_train.py. - Untrack prompt_run_and_eval.md (kept on disk, added to .gitignore). - Also gitignore results_self_improvement_deterministic.tsv. The 3-line bc_loss_mask check in modeling_act_simple_with_awm_head.py is intentionally kept — the new pipeline reuses it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rain/eval Refactors the self-improvement pipeline to reuse existing training and evaluation infrastructure instead of maintaining a separate training loop. New files: - self_improvement.py: orchestrator that collects on-policy data via eval_policy, packages it as a LeRobotDataset, merges with pretrain data, and calls lerobot-train --resume for continued training. - self_improvement_data.py: data packaging utilities — converts eval rollouts to LeRobotDataset, creates merged (pretrain + online) datasets on disk with bc_loss_mask support. Changes to existing files: - configs/train.py: add override_lr field for post-resume LR override. - lerobot_train.py: apply override_lr after loading optimizer state. - wandb_utils.py: gracefully handle missing wandb run on resume with new output_dir (self-improvement finetuning creates a new run). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Don't crash on import when sys.argv[1] is missing (allows importing eval_and_collect/run_finetune from other scripts). - Add test_self_improvement_v2.py: full pipeline test (collect → package → merge → finetune via --resume). All tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace 7-min on-disk merge with instant _FinetuneDataset wrapper in lerobot_train.py. Uses --online_dataset_root to concatenate pretrain + online datasets at load time (zero data copying). - _FinetuneDataset handles mismatched keys (next.reward etc.) and injects bc_loss_mask=1.0 for pretrain samples when needed. - Add online_dataset_root field to TrainPipelineConfig. - Remove dead merge code from self_improvement_data.py. - Add EVAL_AVG_MAX_REWARD / EVAL_EP_S output lines for TSV logging. - Update orchestrator docstring and run_finetune signature. - Update test to use instant concat path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…er fix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eckpoint) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without this, LambdaLR.step() immediately overwrites the manual LR override on the very first training step, making override_lr a no-op. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… runs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d train integration Adds a complete self-improvement pipeline that collects on-policy trajectories, packages them into LeRobotDatasets, and finetunes via in-process train() calls. Key changes: - self_improvement.py: CLI-configured orchestrator (collect → package → train → eval) with multi-iteration support, data accumulation, dual BC/planning eval, and deterministic execution - self_improvement_data.py: converts eval rollouts into LeRobotDatasets with optional bc_loss_mask for failure masking - lerobot_train.py: _FinetuneDataset for instant pretrain+online concatenation, dataset= parameter for caller-provided datasets, LR override post-resume, parameter freezing via trainable_param_keywords, FinetuneDataset-aware sampler - configs/train.py: override_lr, trainable_param_keywords, online_dataset_root fields; validate() supports in-process callers pre-setting checkpoint_path - pyproject.toml: lerobot-self-improve entry point Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r WM representations SIGReg (Sketch Isotropic Gaussian Regularizer) encourages latent embeddings to be approximately Gaussian-distributed via random projections and the Epps-Pulley statistic, replacing the hard L2 norm constraint with a soft regularization loss. Based on LeWorldModel (arXiv:2603.19312). Made-with: Cursor
Integrates FastWAM (Wan2.2-TI2V-5B based VLA) into LeRobot as a plug-and-play policy for LIBERO evaluation. Key design decisions: - __init__ always loads VAE + T5 text encoder from Wan2.2 pretrained; fine-tuned DiT weights come from model.safetensors via from_pretrained - CausalConv3d.forward uses F.conv3d directly to avoid dispatch to slow_conv3d_forward (CPU-only kernel) on modern GPUs (H100/A100) - FastWAMPolicy.to() syncs model.device (plain Python attr, not updated by nn.Module.to()) to keep VAE/text-encoder on the right device - MIN_MAX normalization for state+action; IDENTITY for images (VAE normalises to [-1,1] internally) - LiberoProcessorStep flips images 180° (LIBERO raw frames are upside-down) - FastWAMGripperRemapStep: 1-2x then sign() maps checkpoint gripper convention to LIBERO convention - libero_10 max_steps bumped from 520 → 700 to match original eval config Evaluated at 80% success (1 episode/task) on libero_10 with num_inference_steps=10 on H100. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move FastWAM checkpoints from fastwam/ to awm/ shared directory:
hf_checkpoint_minmax → awm/fastwam_checkpoint
wan22_weights → awm/fastwam_wan22_weights
Hardcode defaults so users need zero env setup:
- modeling_fastwam.py: os.environ.setdefault("DIFFSYNTH_MODEL_BASE_PATH", ...)
so Wan2.2 weights are found automatically on import
- configuration_fastwam.py: FASTWAM_CHECKPOINT_PATH and
FASTWAM_WAN22_WEIGHTS_PATH constants pointing to awm/ locations
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add processor_fastwam.py and fix forward/backward training path. processor_fastwam.py: - make_fastwam_pre_post_processors: standard pre/post pipeline with MIN_MAX normalization for state+action, IDENTITY for images (VAE handles [-1,1] internally), device placement modeling_fastwam.py: - forward: unpack (loss, loss_dict) tuple from training_loss correctly - forward: unsqueeze proprio (B,D) → (B,1,D) as build_inputs expects 3D - _prepare_video_for_training: handle 4D images (n_obs_steps=1) by tiling single frame to T=5 (minimum valid T%4==1, T>1) configuration_fastwam.py: - observation_delta_indices: return list(range(n_obs_steps)) when n_obs_steps > 1 so datasets return multi-frame video tensors - __post_init__: validate T%4==1 and chunk_size%(T-1)==0 for n_obs_steps>1 Verified: forward+backward with B=2, 4D images, loss=0.78, 1651 params with nonzero gradients (VAE/text-encoder frozen). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Integrates FastWAM policy into the lerobot pipeline alongside existing ACT+AWM policy. Resolves trivial .gitignore conflict by keeping entries from both branches. Made-with: Cursor
…olicy Adds variance and covariance regularization to the AWM latent space to encourage isotropic Gaussian structure, improving world-model prediction quality for planning. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
use_sigregandsigreg_weightconfig flags (disabled by default)