Skip to main content

Model Training


1. ACT

Below is a training example for Task 1 using the ACT policy, with full hyperparameters:


# Minimal training command
/isaac-sim/python.sh src/lerobot/scripts/lerobot_train.py \
--dataset.repo_id=your_org/your_dataset \
--dataset.video_backend=pyav \
--policy.type=act \
--output_dir=challenge2026_baseline/Part_Sorting/act \
--dataset.root=datasets/Part_Sorting/ \
--job_name=part_sorting_act \
--policy.device=cuda \
--wandb.enable=false \
--policy.repo_id=none \
--policy.push_to_hub=false


# Detailed training command (task1)
/isaac-sim/python.sh src/lerobot/scripts/lerobot_train.py \
--dataset.repo_id=your_org/your_dataset \
--dataset.root=datasets/Part_Sorting/ \
--dataset.video_backend=pyav \
--policy.type=act \
--policy.n_obs_steps=1 \
--policy.chunk_size=50 \
--policy.n_action_steps=50 \
--policy.vision_backbone=resnet18 \
--policy.pretrained_backbone_weights=ResNet18_Weights.IMAGENET1K_V1 \
--policy.dim_model=256 \
--policy.n_heads=4 \
--policy.dim_feedforward=1024 \
--policy.n_encoder_layers=4 \
--policy.n_decoder_layers=1 \
--policy.use_vae=true \
--policy.latent_dim=32 \
--policy.n_vae_encoder_layers=4 \
--policy.dropout=0.1 \
--policy.kl_weight=10.0 \
--policy.optimizer_lr=1e-5 \
--policy.optimizer_weight_decay=1e-4 \
--policy.optimizer_lr_backbone=1e-5 \
--policy.device=cuda \
--policy.use_amp=true \
--policy.push_to_hub=false \
--output_dir=challenge2026_baseline/Part_Sorting/act \
--job_name=part_sorting_act \
--resume=false \
--seed=1000 \
--num_workers=8 \
--batch_size=8 \
--steps=100000 \
--eval_freq=0 \
--log_freq=200 \
--save_checkpoint=true \
--save_freq=5000 \
--wandb.entity=your_wandb_entity

Replace your_org/your_dataset with your own dataset repo ID, replace challenge2026_baseline/Part_Sorting/act with your own output path, and replace your_wandb_entity with your WandB username or team name. If you don't use WandB, you can remove the --wandb.entity argument. Training ACT will download resnet18-f37072fd.pth.

Dataset & output arguments

ArgumentDescriptionDefault / Notes
dataset.repo_idDataset ID (Hugging Face or local org name)Required
dataset.rootLocal dataset root pathRequired
output_dirDirectory to save checkpoints and logsRequired
job_nameRun identifier (shown in logs / WandB)Optional
resumeResume training from the last checkpointfalse
seedGlobal random seed1000

Training loop arguments

ArgumentDescriptionDefault / Notes
stepsTotal training steps100000
batch_sizeBatch size8
num_workersDataLoader worker processes8
eval_freqEvaluation interval (0 disables)0
log_freqLog print interval200
save_checkpointWhether to save checkpointstrue
save_freqCheckpoint save interval (steps)5000

ACT policy arguments

ArgumentDescriptionDefault / Notes
policy.typePolicy algorithm typeact / pi0
policy.deviceDevicecuda / cpu
policy.use_ampEnable mixed-precision trainingtrue
policy.n_obs_stepsNumber of observation steps1
policy.chunk_sizeAction chunk length50
policy.n_action_stepsAction steps executed per inference50
policy.vision_backboneVision encoder backboneresnet18
policy.pretrained_backbone_weightsBackbone pretrained weightsResNet18_Weights.IMAGENET1K_V1
policy.dim_modelTransformer model dimension256
policy.n_headsNumber of attention heads4
policy.dim_feedforwardFeed-forward dimension1024
policy.n_encoder_layersEncoder layers4

2. Diffusion Policy (DP)

Below is a training example using Diffusion Policy, with full hyperparameters:

/isaac-sim/python.sh src/lerobot/scripts/lerobot_train.py \
--dataset.repo_id=your_org/your_dataset \
--dataset.root=datasets/Part_Sorting \
--dataset.video_backend=pyav \
--output_dir=challenge2026_baseline/Part_Sorting/diffusion \
--policy.repo_id=none \
--policy.type=diffusion \
--policy.n_obs_steps=2 \
--policy.horizon=16 \
--policy.n_action_steps=8 \
--policy.vision_backbone=resnet18 \
--policy.pretrained_backbone_weights=null \
--policy.resize_shape=null \
--policy.crop_ratio=1.0 \
--policy.crop_shape=null \
--policy.crop_is_random=true \
--policy.use_group_norm=true \
--policy.spatial_softmax_num_keypoints=32 \
--policy.use_separate_rgb_encoder_per_camera=false \
--policy.down_dims='[512,1024,2048]' \
--policy.kernel_size=5 \
--policy.n_groups=8 \
--policy.diffusion_step_embed_dim=128 \
--policy.use_film_scale_modulation=true \
--policy.noise_scheduler_type=DDPM \
--policy.num_train_timesteps=100 \
--policy.beta_schedule=squaredcos_cap_v2 \
--policy.beta_start=0.0001 \
--policy.beta_end=0.02 \
--policy.prediction_type=epsilon \
--policy.clip_sample=true \
--policy.clip_sample_range=1.0 \
--policy.num_inference_steps=null \
--policy.compile_model=false \
--policy.compile_mode=reduce-overhead \
--policy.do_mask_loss_for_padding=false \
--policy.optimizer_lr=1e-4 \
--policy.optimizer_betas='[0.95,0.999]' \
--policy.optimizer_eps=1e-8 \
--policy.optimizer_weight_decay=1e-6 \
--policy.scheduler_name=cosine \
--policy.scheduler_warmup_steps=500 \
--job_name=part_sorting_diffusion \
--resume=false \
--seed=1000 \
--num_workers=8 \
--batch_size=32 \
--steps=100000 \
--eval_freq=0 \
--log_freq=200 \
--save_checkpoint=true \
--save_freq=5000

Replace your_org/your_dataset with your own dataset repo ID, and replace challenge2026_baseline/Part_Sorting/diffusion with your own output path.

Diffusion Policy arguments - input/output structure

ArgumentDescriptionDefault / Notes
policy.typePolicy algorithm typediffusion
policy.n_obs_stepsNumber of observation steps2
policy.horizonAction prediction horizon16
policy.n_action_stepsAction steps executed per inference8

Diffusion Policy arguments - vision backbone

ArgumentDescriptionDefault / Notes
policy.vision_backboneVision encoder backboneresnet18
policy.pretrained_backbone_weightsBackbone pretrained weightsnull
policy.resize_shapeResize shape (H, W)null
policy.crop_ratioCrop ratio (0, 1]1.0
policy.crop_shapeCrop shape (H, W)null
policy.crop_is_randomRandom crop (during training)true
policy.use_group_normUse GroupNorm instead of BNtrue
policy.spatial_softmax_num_keypointsNumber of SpatialSoftmax keypoints32
policy.use_separate_rgb_encoder_per_cameraSeparate encoder per camerafalse

Diffusion Policy arguments - UNet architecture

ArgumentDescriptionDefault / Notes
policy.down_dimsUNet downsampling dims[512,1024,2048]
policy.kernel_sizeConvolution kernel size5
policy.n_groupsGroupNorm groups8
policy.diffusion_step_embed_dimDiffusion step embedding dim128
policy.use_film_scale_modulationUse FiLM scale modulationtrue

Diffusion Policy arguments - noise scheduler

ArgumentDescriptionDefault / Notes
policy.noise_scheduler_typeScheduler typeDDPM / DDIM
policy.num_train_timestepsDiffusion steps (train)100
policy.beta_scheduleBeta schedulesquaredcos_cap_v2
policy.beta_startBeta start0.0001
policy.beta_endBeta end0.02
policy.prediction_typePrediction typeepsilon / sample
policy.clip_sampleClip samplestrue
policy.clip_sample_rangeClip range1.0
policy.num_inference_stepsInference stepsnull (same as train steps)

Diffusion Policy arguments - optimizer & scheduler

ArgumentDescriptionDefault / Notes
policy.optimizer_lrLearning rate1e-4
policy.optimizer_betasAdam betas[0.95,0.999]
policy.optimizer_epsAdam eps1e-8
policy.optimizer_weight_decayWeight decay1e-6
policy.scheduler_nameLR schedulercosine
policy.scheduler_warmup_stepsWarmup steps500

Diffusion Policy arguments - other

ArgumentDescriptionDefault / Notes
policy.compile_modelCompile modelfalse
policy.compile_modeCompile modereduce-overhead
policy.do_mask_loss_for_paddingMask padding lossfalse

3. π₀ (PI0)

Download pretrained weights:

# Download pretrained weights
hf download \
lerobot/pi0_base \
--local-dir pretrained/pi0_base

hf download \
lerobot/pi05_base \
--local-dir pretrained/pi05_base

hf download google/paligemma-3b-pt-224 \
--local-dir pretrained/paligemma-3b-pt-224

Search in /workspace/GlobalHumanoidRobotChallenge_2026_Baseline/src/lerobot/processor/tokenizer_processor.py and replace the code block with:

if self.tokenizer is not None:
# Use provided tokenizer object directly
self.input_tokenizer = self.tokenizer
elif self.tokenizer_name is not None:
if AutoTokenizer is None:
raise ImportError("AutoTokenizer is not available")

# If tokenizer_name contains paligemma, it is a pi0 model; force local offline loading
if "paligemma" in self.tokenizer_name.lower():
self.input_tokenizer = AutoTokenizer.from_pretrained(
"/root/.cache/huggingface/hub/models--google--paligemma-3b-pt-224/snapshots/35e4f46485b4d07967e7e9935bc3786aad50687c",
local_files_only=True
)
else:
# Otherwise (e.g., act/smolvla), load normally using the provided tokenizer_name
self.input_tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name)

else:
raise ValueError(
"Either 'tokenizer' or 'tokenizer_name' must be provided. "
"Pass a tokenizer object directly or a tokenizer name to auto-load."
)

Below is a training example using π₀ (PI0), with full hyperparameters:

# 简洁训练命令
/isaac-sim/python.sh src/lerobot/scripts/lerobot_train.py \
--policy.path=lerobot/pi0_base \
--dataset.repo_id=your_org/your_dataset \
--batch_size=64 \
--steps=20000 \
--output_dir=challenge2026_baseline/Part_Sorting/pi0 \
--job_name=part_sorting_pi0 \
--policy.device=cuda \
--wandb.enable=true

# 详细训练命令
/isaac-sim/python.sh src/lerobot/scripts/lerobot_train.py \
--dataset.repo_id=your_org/your_dataset \
--dataset.root=datasets/Part_Sorting \
--policy.type=pi0 \
--policy.paligemma_variant=gemma_2b \
--policy.action_expert_variant=gemma_300m \
--policy.dtype=float32 \
--policy.n_obs_steps=1 \
--policy.chunk_size=50 \
--policy.n_action_steps=50 \
--policy.max_state_dim=32 \
--policy.max_action_dim=32 \
--policy.num_inference_steps=10 \
--policy.time_sampling_beta_alpha=1.5 \
--policy.time_sampling_beta_beta=1.0 \
--policy.time_sampling_scale=0.999 \
--policy.time_sampling_offset=0.001 \
--policy.min_period=0.004 \
--policy.max_period=4.0 \
--policy.image_resolution='[224,224]' \
--policy.empty_cameras=0 \
--policy.gradient_checkpointing=false \
--policy.compile_model=false \
--policy.compile_mode=max-autotune \
--policy.freeze_vision_encoder=false \
--policy.train_expert_only=false \
--policy.optimizer_lr=2.5e-5 \
--policy.optimizer_betas='[0.9,0.95]' \
--policy.optimizer_eps=1e-8 \
--policy.optimizer_weight_decay=0.01 \
--policy.optimizer_grad_clip_norm=1.0 \
--policy.scheduler_warmup_steps=1000 \
--policy.scheduler_decay_steps=30000 \
--policy.scheduler_decay_lr=2.5e-6 \
--policy.tokenizer_max_length=48 \
--output_dir=challenge2026_baseline/Part_Sorting/pi0 \
--job_name=part_sorting_pi0 \
--resume=false \
--seed=1000 \
--num_workers=8 \
--batch_size=8 \
--steps=100000 \
--eval_freq=0 \
--log_freq=200 \
--save_checkpoint=true \
--save_freq=5000 \
--wandb.entity=your_wandb_entity

Replace your_org/your_dataset with your own dataset repo ID, replace challenge2026_baseline/Part_Sorting/pi0 with your own output path, and replace your_wandb_entity with your WandB username or team name. If you don't use WandB, you can remove the --wandb.entity argument.

π₀ policy arguments - model architecture

ArgumentDescriptionDefault / Notes
policy.typePolicy algorithm typepi0
policy.paligemma_variantPaliGemma variantgemma_2b
policy.action_expert_variantAction Expert variantgemma_300m
policy.dtypeData typefloat32

π₀ policy arguments - input/output structure

ArgumentDescriptionDefault / Notes
policy.n_obs_stepsNumber of observation steps1
policy.chunk_sizeAction chunk size50
policy.n_action_stepsAction steps executed50
policy.max_state_dimMax state dim (padded to)32
policy.max_action_dimMax action dim (padded to)32

π₀ policy arguments - flow matching

ArgumentDescriptionDefault / Notes
policy.num_inference_stepsDenoising steps (inference)10
policy.time_sampling_beta_alphaTime-sampling beta α1.5
policy.time_sampling_beta_betaTime-sampling beta β1.0
policy.time_sampling_scaleTime-sampling scale0.999
policy.time_sampling_offsetTime-sampling offset0.001
policy.min_periodMinimum period0.004
policy.max_periodMaximum period4.0

π₀ policy arguments - images & cameras

ArgumentDescriptionDefault / Notes
policy.image_resolutionImage resolution (H, W)[224,224]
policy.empty_camerasNumber of empty cameras0

π₀ policy arguments - training settings

ArgumentDescriptionDefault / Notes
policy.gradient_checkpointingEnable gradient checkpointingfalse
policy.compile_modelCompile modelfalse
policy.compile_modeCompile modemax-autotune

π₀ policy arguments - fine-tuning

ArgumentDescriptionDefault / Notes
policy.freeze_vision_encoderFreeze vision encoderfalse
policy.train_expert_onlyTrain Action Expert onlyfalse

π₀ policy arguments - optimizer

ArgumentDescriptionDefault / Notes
policy.optimizer_lrLearning rate2.5e-5
policy.optimizer_betasAdamW betas[0.9,0.95]
policy.optimizer_epsAdamW eps1e-8
policy.optimizer_weight_decayWeight decay0.01
policy.optimizer_grad_clip_normGradient clip norm1.0

π₀ policy arguments - LR scheduler

ArgumentDescriptionDefault / Notes
policy.scheduler_warmup_stepsWarmup steps1000
policy.scheduler_decay_stepsDecay steps30000
policy.scheduler_decay_lrDecay learning rate2.5e-6

π₀ policy arguments - tokenizer

ArgumentDescriptionDefault / Notes
policy.tokenizer_max_lengthMax tokenizer length48

4. π₀.₅ (PI05)

Below is a training example using π₀.₅ (PI05) with full hyperparameters. π₀.₅ is an enhanced version of π₀ that supports open-world generalization. Key differences include QUANTILES normalization, a longer tokenizer length, and AdaRMS conditioning.

# 简洁训练命令
/isaac-sim/python.sh src/lerobot/scripts/lerobot_train.py \
--dataset.repo_id=your_org/your_dataset \
--policy.type=pi05 \
--output_dir=challenge2026_baseline/Part_Sorting/pi05 \
--job_name=part_sorting_pi05 \
--policy.repo_id=your_repo_id \
--policy.pretrained_path=lerobot/pi05_base \
--policy.compile_model=true \
--policy.gradient_checkpointing=true \
--wandb.enable=true \
--policy.dtype=bfloat16 \
--policy.freeze_vision_encoder=false \
--policy.train_expert_only=false \
--steps=3000 \
--policy.device=cuda \
--batch_size=32

# 详细训练命令
/isaac-sim/python.sh src/lerobot/scripts/lerobot_train.py \
--dataset.repo_id=your_org/your_dataset \
--dataset.root=datasets/Part_Sorting/ \
--policy.type=pi05 \
--policy.paligemma_variant=gemma_2b \
--policy.action_expert_variant=gemma_300m \
--policy.dtype=float32 \
--policy.n_obs_steps=1 \
--policy.chunk_size=50 \
--policy.n_action_steps=50 \
--policy.max_state_dim=32 \
--policy.max_action_dim=32 \
--policy.num_inference_steps=10 \
--policy.time_sampling_beta_alpha=1.5 \
--policy.time_sampling_beta_beta=1.0 \
--policy.time_sampling_scale=0.999 \
--policy.time_sampling_offset=0.001 \
--policy.min_period=0.004 \
--policy.max_period=4.0 \
--policy.image_resolution='[224,224]' \
--policy.empty_cameras=0 \
--policy.gradient_checkpointing=false \
--policy.compile_model=false \
--policy.compile_mode=max-autotune \
--policy.freeze_vision_encoder=false \
--policy.train_expert_only=false \
--policy.optimizer_lr=2.5e-5 \
--policy.optimizer_betas='[0.9,0.95]' \
--policy.optimizer_eps=1e-8 \
--policy.optimizer_weight_decay=0.01 \
--policy.optimizer_grad_clip_norm=1.0 \
--policy.scheduler_warmup_steps=1000 \
--policy.scheduler_decay_steps=30000 \
--policy.scheduler_decay_lr=2.5e-6 \
--policy.tokenizer_max_length=200 \
--output_dir=challenge2026_baseline/Part_Sorting/pi05 \
--job_name=part_sorting_pi05 \
--resume=false \
--seed=1000 \
--num_workers=8 \
--batch_size=8 \
--steps=100000 \
--eval_freq=0 \
--log_freq=200 \
--save_checkpoint=true \
--save_freq=5000 \
--wandb.entity=your_wandb_entity

Replace your_org/your_dataset with your own dataset repo ID, replace challenge2026_baseline/Part_Sorting/pi05 with your own output path, and replace your_wandb_entity with your WandB username or team name. If you don't use WandB, you can remove the --wandb.entity argument.

π₀.₅ policy arguments - model architecture

ArgumentDescriptionDefault / Notes
policy.typePolicy algorithm typepi05
policy.paligemma_variantPaliGemma variantgemma_2b
policy.action_expert_variantAction Expert variantgemma_300m
policy.dtypeData typefloat32

π₀.₅ policy arguments - input/output structure

ArgumentDescriptionDefault / Notes
policy.n_obs_stepsNumber of observation steps1
policy.chunk_sizeAction chunk size50
policy.n_action_stepsAction steps executed50
policy.max_state_dimMax state dim (padded to)32
policy.max_action_dimMax action dim (padded to)32

π₀.₅ policy arguments - flow matching

ArgumentDescriptionDefault / Notes
policy.num_inference_stepsDenoising steps (inference)10
policy.time_sampling_beta_alphaTime-sampling beta α1.5
policy.time_sampling_beta_betaTime-sampling beta β1.0
policy.time_sampling_scaleTime-sampling scale0.999
policy.time_sampling_offsetTime-sampling offset0.001
policy.min_periodMinimum period0.004
policy.max_periodMaximum period4.0

π₀.₅ policy arguments - images & cameras

ArgumentDescriptionDefault / Notes
policy.image_resolutionImage resolution (H, W)[224,224]
policy.empty_camerasNumber of empty cameras0

π₀.₅ policy arguments - training settings

ArgumentDescriptionDefault / Notes
policy.gradient_checkpointingEnable gradient checkpointingfalse
policy.compile_modelCompile modelfalse
policy.compile_modeCompile modemax-autotune

π₀.₅ policy arguments - fine-tuning

ArgumentDescriptionDefault / Notes
policy.freeze_vision_encoderFreeze vision encoderfalse
policy.train_expert_onlyTrain Action Expert onlyfalse

π₀.₅ policy arguments - optimizer

ArgumentDescriptionDefault / Notes
policy.optimizer_lrLearning rate2.5e-5
policy.optimizer_betasAdamW betas[0.9,0.95]
policy.optimizer_epsAdamW eps1e-8
policy.optimizer_weight_decayWeight decay0.01
policy.optimizer_grad_clip_normGradient clip norm1.0

π₀.₅ policy arguments - LR scheduler

ArgumentDescriptionDefault / Notes
policy.scheduler_warmup_stepsWarmup steps1000
policy.scheduler_decay_stepsDecay steps30000
policy.scheduler_decay_lrDecay learning rate2.5e-6

π₀.₅ policy arguments - tokenizer

ArgumentDescriptionDefault / Notes
policy.tokenizer_max_lengthMax tokenizer length200 (π₀ uses 48)

Key differences between π₀ and π₀.₅

Featureπ₀π₀.₅
Time conditioning injectionConcatenate time and action via action_time_mlp_*AdaRMS conditioning via time_mlp_*
AdaRMSNot usedUsed in Action Expert
Tokenizer length48 tokens200 tokens
Discrete state inputFalse (uses state_proj layer)True
Parameter countHigher (includes state embedding)Lower (no state embedding)
State normalizationMEAN_STDQUANTILES
Action normalizationMEAN_STDQUANTILES

5. SmolVLA

Below is a fine-tuning example using the SmolVLA policy. SmolVLA is built on the SmolVLM2-500M-Video-Instruct vision-language model and supports open-world generalization.

# 简洁训练命令
/isaac-sim/python.sh src/lerobot/scripts/lerobot_train.py \
--policy.path=lerobot/smolvla_base \
--dataset.repo_id=your_org/your_dataset \
--batch_size=64 \
--steps=20000 \
--output_dir=challenge2026_baseline/Part_Sorting/smolvla \
--job_name=part_sorting_smolvla \
--policy.device=cuda \
--wandb.enable=true

# 详细训练命令
/isaac-sim/python.sh src/lerobot/scripts/lerobot_train.py \
--dataset.repo_id=your_org/your_dataset \
--dataset.root=datasets/Part_Sorting/ \
--policy.type=smolvla \
--policy.vlm_model_name=HuggingFaceTB/SmolVLM2-500M-Video-Instruct \
--policy.load_vlm_weights=true \
--policy.dtype=float32 \
--policy.n_obs_steps=1 \
--policy.chunk_size=50 \
--policy.n_action_steps=50 \
--policy.max_state_dim=32 \
--policy.max_action_dim=32 \
--policy.num_steps=10 \
--policy.tokenizer_max_length=48 \
--policy.image_resolution='[224,224]' \
--policy.empty_cameras=0 \
--policy.freeze_vision_encoder=true \
--policy.train_expert_only=true \
--policy.train_state_proj=true \
--policy.gradient_checkpointing=false \
--policy.compile_model=false \
--policy.compile_mode=max-autotune \
--policy.attention_mode=cross_attn \
--policy.num_vlm_layers=16 \
--policy.self_attn_every_n_layers=2 \
--policy.expert_width_multiplier=0.75 \
--policy.optimizer_lr=1e-4 \
--policy.optimizer_betas='[0.9,0.95]' \
--policy.optimizer_eps=1e-8 \
--policy.optimizer_weight_decay=1e-10 \
--policy.optimizer_grad_clip_norm=10.0 \
--policy.scheduler_warmup_steps=1000 \
--policy.scheduler_decay_steps=30000 \
--policy.scheduler_decay_lr=2.5e-6 \
--policy.min_period=0.004 \
--policy.max_period=4.0 \
--output_dir=challenge2026_baseline/Part_Sorting/smolvla \
--job_name=part_sorting_smolvla \
--resume=false \
--seed=1000 \
--num_workers=8 \
--batch_size=8 \
--steps=100000 \
--eval_freq=0 \
--log_freq=200 \
--save_checkpoint=true \
--save_freq=5000 \
--wandb.entity=your_wandb_entity

Replace your_org/your_dataset with your own dataset repo ID, replace challenge2026_baseline/Part_Sorting/smolvla with your own output path, and replace your_wandb_entity with your WandB username or team name. If you don't use WandB, you can remove the --wandb.entity argument.

SmolVLA policy arguments - model architecture

ArgumentDescriptionDefault / Notes
policy.typePolicy algorithm typesmolvla
policy.vlm_model_nameVLM backboneHuggingFaceTB/SmolVLM2-500M-Video-Instruct
policy.load_vlm_weightsLoad pretrained VLM weightstrue
policy.dtypeData typefloat32

SmolVLA policy arguments - input/output structure

ArgumentDescriptionDefault / Notes
policy.n_obs_stepsNumber of observation steps1
policy.chunk_sizeAction chunk size50
policy.n_action_stepsAction steps executed50
policy.max_state_dimMax state dim (padded to)32
policy.max_action_dimMax action dim (padded to)32

SmolVLA policy arguments - decoding & tokenizer

ArgumentDescriptionDefault / Notes
policy.num_stepsDenoising steps (inference)10
policy.tokenizer_max_lengthMax tokenizer length48
policy.use_cacheUse attention cachetrue

SmolVLA policy arguments - images & cameras

ArgumentDescriptionDefault / Notes
policy.image_resolutionImage preprocessing resolution (H, W)[224,224]
policy.empty_camerasNumber of empty cameras0
policy.add_image_special_tokensUse image special tokensfalse

SmolVLA policy arguments - fine-tuning

ArgumentDescriptionDefault / Notes
policy.freeze_vision_encoderFreeze vision encodertrue
policy.train_expert_onlyTrain Action Expert onlytrue
policy.train_state_projTrain state projectiontrue

SmolVLA policy arguments - transformer architecture

ArgumentDescriptionDefault / Notes
policy.attention_modeAttention modecross_attn
policy.num_vlm_layersNumber of VLM layers used16
policy.self_attn_every_n_layersInsert self-attention every N layers2
policy.expert_width_multiplierAction Expert hidden width multiplier0.75

SmolVLA policy arguments - optimizer

ArgumentDescriptionDefault / Notes
policy.optimizer_lrLearning rate1e-4
policy.optimizer_betasAdamW betas[0.9,0.95]
policy.optimizer_epsAdamW eps1e-8
policy.optimizer_weight_decayWeight decay1e-10
policy.optimizer_grad_clip_normGradient clip norm10.0

SmolVLA policy arguments - LR scheduler

ArgumentDescriptionDefault / Notes
policy.scheduler_warmup_stepsWarmup steps1000
policy.scheduler_decay_stepsDecay steps30000
policy.scheduler_decay_lrDecay learning rate2.5e-6

SmolVLA policy arguments - training settings

ArgumentDescriptionDefault / Notes
policy.gradient_checkpointingEnable gradient checkpointingfalse
policy.compile_modelCompile modelfalse
policy.compile_modeCompile modemax-autotune