PipeGoose starter

Here we will have the resources you need to learn pre-pipegoose

TorchGpipe

GitHub - kakaobrain/torchgpipe: A GPipe implementation in PyTorch

PiPPy: Automated Pipeline Parallelism for PyTorch

Single-Machine Model Parallel Best Practices — PyTorch Tutorials 2.1.0+cu121 documentation
Parallelism intro:
- Pytorch Doc:
Pipeline Parallelism — PyTorch 2.1 documentation
- Torch Example:
Training Transformer models using Pipeline Parallelism — PyTorch Tutorials 2.1.0+cu121 documentation
- Huggingface Doc:
Model Parallelism
- Huggingface example with papers linked
Efficient Training on Multiple GPUs
- AWS intro:
Tensor Parallelism - Amazon SageMaker
- Sagemaker tutorial (not related to parallelism)
Getting Started with Machine Learning on Amazon SageMaker - Amazon Web Services
Papers + youtube:
- Zero
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models - Microsoft Research
- Zero + DeepSeed
ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters - Microsoft Research
- YouTube video on Zero
ZeRO & Fastest BERT: Increasing the scale and speed of deep learning training in DeepSpeed

Turing-NLG, DeepSpeed and the ZeRO optimizer

Large Model Training and Inference with DeepSpeed // Samyam Rajbhandari // LLMs in Prod Conference

Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision
FSDP

Getting Started with Fully Sharded Data Parallel(FSDP) — PyTorch Tutorials 2.1.0+cu121 documentation

Introducing PyTorch Fully Sharded Data Parallel (FSDP) API

Fully Sharded Data Parallel: faster AI training with fewer GPUs
DeepSeed
- Main paper
GitHub - microsoft/DeepSpeed: DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Megatron
- - DeepSpeed
GitHub - bigscience-workshop/Megatron-DeepSpeed: Ongoing research training transformer language models at scale, including: BERT & GPT-2
- MegatronLM
GitHub - NVIDIA/Megatron-LM: Ongoing research training transformer models at scale

Megatron-LM