Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

publication / ai-systems

OSDI January 01, 2022

Yuxiong He, Shen Li, Song Han, et al.

从编译与运行时联合优化的角度自动化分布式训练并行策略,是 AI 基础设施方向的重要参考。

Yuxiong He, Shen Li, Song Han, et al. (2022). 'Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.' OSDI.

Read paper

这篇论文从系统角度解释了如何自动组合不同粒度的并行策略。