Publications

这里收集我会长期反复阅读的论文入口，优先覆盖容器编排、集群调度、平台工程与 AI 系统方向。

featured publications

优先展示与 Kubernetes、调度系统、平台工程和 AI 基础设施相关的代表性论文。

3 篇

orchestration 2016

Borg, Omega, and Kubernetes

Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, John Wilkes

Published in ACM Queue, 2016

从 Borg、Omega 到 Kubernetes 的演进脉络，是理解容器编排系统设计取舍与控制平面抽象的核心材料。

Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, John Wilkes. (2016). "Borg, Omega, and Kubernetes." ACM Queue.

Details Paper Link

orchestration 2015

Large-scale cluster management at Google with Borg

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes

Published in EuroSys, 2015

经典的大规模集群管理论文，适合建立资源隔离、调度、优先级与高可用控制面的整体认知。

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes. (2015). "Large-scale cluster management at Google with Borg." EuroSys.

Details Paper Link

scheduling 2013

Omega: flexible, scalable schedulers for large compute clusters

Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes

Published in EuroSys, 2013

通过共享状态与乐观并发控制重新思考调度器设计，非常适合对比单调度器与多调度器架构。

Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes. (2013). 'Omega: flexible, scalable schedulers for large compute clusters.' EuroSys.

Details Paper Link

publication categories

按研究主题浏览论文、论文记录与本地归档资料。

11 篇

orchestration 2016

Borg, Omega, and Kubernetes

Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, John Wilkes

Published in ACM Queue, 2016

从 Borg、Omega 到 Kubernetes 的演进脉络，是理解容器编排系统设计取舍与控制平面抽象的核心材料。

Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, John Wilkes. (2016). "Borg, Omega, and Kubernetes." ACM Queue.

Details Paper Link

orchestration 2015

Large-scale cluster management at Google with Borg

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes

Published in EuroSys, 2015

经典的大规模集群管理论文，适合建立资源隔离、调度、优先级与高可用控制面的整体认知。

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes. (2015). "Large-scale cluster management at Google with Borg." EuroSys.

Details Paper Link

scheduling 2013

Omega: flexible, scalable schedulers for large compute clusters

Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes

Published in EuroSys, 2013

通过共享状态与乐观并发控制重新思考调度器设计，非常适合对比单调度器与多调度器架构。

Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes. (2013). 'Omega: flexible, scalable schedulers for large compute clusters.' EuroSys.

Details Paper Link

infrastructure 2018

The Datacenter as a Computer

Luiz André Barroso, Urs Hölzle, Parthasarathy Ranganathan

Published in Synthesis Lectures on Computer Architecture, 2018

从整机系统视角理解数据中心设计，适合作为平台工程、资源管理与系统容量规划的背景读物。

Luiz André Barroso, Urs Hölzle, Parthasarathy Ranganathan. (2018). "The Datacenter as a Computer." Synthesis Lectures on Computer Architecture.

Details Paper Link

ai-systems 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

Yuxiong He, Shen Li, Song Han, et al.

Published in OSDI, 2022

从编译与运行时联合优化的角度自动化分布式训练并行策略，是 AI 基础设施方向的重要参考。

Yuxiong He, Shen Li, Song Han, et al. (2022). 'Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.' OSDI.

Details Paper Link

ai-systems 2021

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, et al.

Published in ICLR, 2021

大模型自动切分与稀疏计算的代表性工作，适合连接 AI 工程化与底层集群资源调度问题。

Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, et al. (2021). 'GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.' ICLR.

Details Paper Link

archive 2025

Attention Residuals

Archived material

Published in Archived Research Material, 2025

一篇已归档保存的本地论文资料，作为长期研究记录的一部分对外展示。

Archived research material available through the site PDF archive.

Details Open PDF local file

featured publications

Borg, Omega, and Kubernetes

Large-scale cluster management at Google with Borg

Omega: flexible, scalable schedulers for large compute clusters

publication categories

Orchestration Papers

Borg, Omega, and Kubernetes

Large-scale cluster management at Google with Borg

Scheduling Papers

Omega: flexible, scalable schedulers for large compute clusters

Infrastructure Papers

The Datacenter as a Computer

AI Systems Papers

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Archived Papers

Attention Residuals