Maximum number of parameter elements to fetch ahead of use. Used by ZeRO3, ZeRO3-Offload, ZeRO-Infinity, and ZeRO-Inference. Constraints. ge = 0.
Example ZeRO-3 Configurations · Use ZeRO to partition the optimizer states (stage 1), gradients (stage 2), and parameters (stage 3). · Additionally offload the ...
2021/03/07 · DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.