Skip to content

prathams417/ShallowSpeed

 
 

Repository files navigation

Shallowspeed

stability-wip

A tiny POC implementation of distributed training for sequential deep learning models. Implemented using plain Numpy & mpi4py.

Currently implements:

  • Sequential models / deep MLPs, training using SGD.
  • Data parallel training with interleaved communication & computation, similar to PyTorch's DistributedDataParallel.
  • Pipeline parallel training:
    • Naive schedule without interleaved stages.
    • Gpipe schedule with interleaved FWD & interleaved BWD.
    • (soon) PipeDream Flush schedule with additional inter-FWD & BWD interleaving.
  • Any combination of DP & PP algorithms.

Setup

conda env create
pip install -e .
# M1 Macs: conda install "libblas=*=*accelerate"
python download_dataset.py
pytest

Usage

# Sequential training
python train.py
# Data parallel distributed training
mpirun -n 4 python train.py --dp 4
# Pipeline parallel distributed training
mpirun -n 4 python train.py --pp 4 --schedule naive
# Data & pipeline parallel distributed training
mpirun -n 8 python train.py --dp 2 --pp 4 --schedule gpipe

Internals

About

Small scale distributed training of sequential deep learning models, built on Numpy and MPI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%