GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

publication / ai-systems

ICLR January 01, 2021

Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, et al.

大模型自动切分与稀疏计算的代表性工作,适合连接 AI 工程化与底层集群资源调度问题。

Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, et al. (2021). 'GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.' ICLR.

Read paper

这篇论文适合作为分布式训练自动切分问题的入门材料。