TorchGpipe
GitHub - kakaobrain/torchgpipe: A GPipe implementation in PyTorch
PiPPy: Automated Pipeline Parallelism for PyTorch
Single-Machine Model Parallel Best Practices — PyTorch Tutorials 2.1.0+cu121 documentation
Parallelism intro:
Pipeline Parallelism — PyTorch 2.1 documentation
Training Transformer models using Pipeline Parallelism — PyTorch Tutorials 2.1.0+cu121 documentation
Efficient Training on Multiple GPUs
Tensor Parallelism - Amazon SageMaker
Getting Started with Machine Learning on Amazon SageMaker - Amazon Web Services
Papers + youtube:
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models - Microsoft Research
ZeRO & Fastest BERT: Increasing the scale and speed of deep learning training in DeepSpeed
Turing-NLG, DeepSpeed and the ZeRO optimizer
Large Model Training and Inference with DeepSpeed // Samyam Rajbhandari // LLMs in Prod Conference
Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision
FSDP
Getting Started with Fully Sharded Data Parallel(FSDP) — PyTorch Tutorials 2.1.0+cu121 documentation
Introducing PyTorch Fully Sharded Data Parallel (FSDP) API
Fully Sharded Data Parallel: faster AI training with fewer GPUs
DeepSeed
Megatron
GitHub - NVIDIA/Megatron-LM: Ongoing research training transformer models at scale