木叶吟
木叶吟
Home
Experience
Posts
Publications
Services
CV
Light
Dark
Automatic
English
中文 (简体)
GPU Scheduling
Tear Up the Bubble Boom: Lessons Learned From a Deep Learning Research and Development Cluster
With the proliferation of deep learning, there exists a strong need to efficiently operate GPU clusters for deep learning production in …
Zehua Yang
,
Zhisheng YE
,
Tianhao Fu
,
Jing Luo
,
Xiong Wei
,
Yingwei Luo
,
Xiaolin Wang
,
Zhenlin Wang
,
Tianwei Zhang
PDF
Cite
Dataset
DOI
ASTRAEA: A Fair Deep Learning Scheduler for Multi-tenant GPU Clusters
We design a new and practical GPU scheduler, ASTRAEA, to enforce the desired fairness among tenants and jobs for deep learning training clusters.
Zhisheng YE
,
Peng Sun
,
Wei Gao
,
Tianwei Zhang
,
Xiaolin Wang
,
Shengen Yan
,
Yingwei Luo
Preprint
Cite
Code
DOI
Chronus: A Novel Deadline-aware Scheduler for Deep Learning Training Jobs
We present Chronus, an end-to-end scheduling system to provide deadline guarantee for SLO jobs and maximize the performance of best-effort jobs for deep learning training jobs.
Wei Gao
,
Zhisheng YE
,
Peng Sun
,
Yonggang Wen
,
Tianwei Zhang
Preprint
PDF
Cite
Code
Video
DOI
«
Cite
×