这里收集我会长期反复阅读的论文入口,优先覆盖容器编排、集群调度、平台工程与 AI 系统方向。

featured publications

优先展示与 Kubernetes、调度系统、平台工程和 AI 基础设施相关的代表性论文。

3 篇
orchestration 2016

Borg, Omega, and Kubernetes

Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, John Wilkes

Published in ACM Queue, 2016

从 Borg、Omega 到 Kubernetes 的演进脉络,是理解容器编排系统设计取舍与控制平面抽象的核心材料。

Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, John Wilkes. (2016). "Borg, Omega, and Kubernetes." ACM Queue.

  • kubernetes
  • scheduler
  • control-plane
orchestration 2015

Large-scale cluster management at Google with Borg

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes

Published in EuroSys, 2015

经典的大规模集群管理论文,适合建立资源隔离、调度、优先级与高可用控制面的整体认知。

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes. (2015). "Large-scale cluster management at Google with Borg." EuroSys.

  • cluster-management
  • borg
  • scheduling
scheduling 2013

Omega: flexible, scalable schedulers for large compute clusters

Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes

Published in EuroSys, 2013

通过共享状态与乐观并发控制重新思考调度器设计,非常适合对比单调度器与多调度器架构。

Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes. (2013). 'Omega: flexible, scalable schedulers for large compute clusters.' EuroSys.

  • omega
  • scheduler
  • concurrency

publication categories

按研究主题浏览论文、论文记录与本地归档资料。

11 篇

Orchestration Papers

容器编排、控制平面与集群管理相关论文。

orchestration 2016

Borg, Omega, and Kubernetes

Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, John Wilkes

Published in ACM Queue, 2016

从 Borg、Omega 到 Kubernetes 的演进脉络,是理解容器编排系统设计取舍与控制平面抽象的核心材料。

Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, John Wilkes. (2016). "Borg, Omega, and Kubernetes." ACM Queue.

  • kubernetes
  • scheduler
  • control-plane
orchestration 2015

Large-scale cluster management at Google with Borg

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes

Published in EuroSys, 2015

经典的大规模集群管理论文,适合建立资源隔离、调度、优先级与高可用控制面的整体认知。

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes. (2015). "Large-scale cluster management at Google with Borg." EuroSys.

  • cluster-management
  • borg
  • scheduling

Scheduling Papers

资源分配、并发控制与调度器设计相关论文。

scheduling 2013

Omega: flexible, scalable schedulers for large compute clusters

Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes

Published in EuroSys, 2013

通过共享状态与乐观并发控制重新思考调度器设计,非常适合对比单调度器与多调度器架构。

Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes. (2013). 'Omega: flexible, scalable schedulers for large compute clusters.' EuroSys.

  • omega
  • scheduler
  • concurrency

Infrastructure Papers

数据中心、平台工程与系统架构背景论文。

infrastructure 2018

The Datacenter as a Computer

Luiz André Barroso, Urs Hölzle, Parthasarathy Ranganathan

Published in Synthesis Lectures on Computer Architecture, 2018

从整机系统视角理解数据中心设计,适合作为平台工程、资源管理与系统容量规划的背景读物。

Luiz André Barroso, Urs Hölzle, Parthasarathy Ranganathan. (2018). "The Datacenter as a Computer." Synthesis Lectures on Computer Architecture.

  • datacenter
  • infrastructure
  • architecture

AI Systems Papers

大模型系统、分布式训练与自动并行方向论文。

ai-systems 2021

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, et al.

Published in ICLR, 2021

大模型自动切分与稀疏计算的代表性工作,适合连接 AI 工程化与底层集群资源调度问题。

Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, et al. (2021). 'GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.' ICLR.

  • llm
  • sharding
  • distributed-training

Archived Papers

本地归档的研究资料与论文 PDF。

archive 2025

Attention Residuals

Archived material

Published in Archived Research Material, 2025

一篇已归档保存的本地论文资料,作为长期研究记录的一部分对外展示。

Archived research material available through the site PDF archive.

  • local-pdf
  • resources
Details Open PDF local file