Hi team,
When running sample/llada_rl_rollout.py, I encountered an OOM issue when using DeepSpeed ZeRO-3 together with LoRA and modules_to_save.
It seems that the current rollout script may not fully support ZeRO-3, or may require additional configuration to handle the increased memory footprint introduced by modules_to_save (e.g., keeping wte and ff_out trainable).
Could you please confirm whether ZeRO-3 is officially supported for rollout (and if so, what the correct setup is)?
If not currently supported, it would be great to include guidance or example configs for using llada_rl_rollout.py with ZeRO-3 in the documentation.
Hi team,
When running sample/llada_rl_rollout.py, I encountered an OOM issue when using DeepSpeed ZeRO-3 together with LoRA and modules_to_save.
It seems that the current rollout script may not fully support ZeRO-3, or may require additional configuration to handle the increased memory footprint introduced by modules_to_save (e.g., keeping wte and ff_out trainable).
Could you please confirm whether ZeRO-3 is officially supported for rollout (and if so, what the correct setup is)?
If not currently supported, it would be great to include guidance or example configs for using llada_rl_rollout.py with ZeRO-3 in the documentation.