木叶吟
木叶吟
Home
Experience
Publications
Posts
CV
Light
Dark
Automatic
1
CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control
Batch inference for agentic workloads stresses the GPU key-value (KV) cache in a sustained and cumulative manner, often causing severe …
Qiaoling Chen
,
Zhisheng YE
,
Tian Tang
,
Peng Sun
,
Boyu Tian
,
Guoteng Wang
,
Shenggui Li
,
Yonggang Wen
,
Zhenhua Han
,
Tianwei Zhang
Preprint
PDF
Cite
FlowGPU: Transparent and Efficient GPU Checkpointing and Restore
GPU checkpointing and restore promises to enable emerging tasks, such as deep learning, to benefit from functionalities like task …
Zehua Yang
,
Xiao Zheng
,
Yonghao Zou
,
Junyang Zhang
,
Zhisheng YE
,
Feng Xie
,
Xiaolin Wang
,
Yingwei Luo
,
Zhenlin Wang
,
Diyu Zhou
PDF
Cite
Latency-SLO-Aware Memory Offloading for Large Language Model Inference
Offloading large language models (LLMs) state to host memory during inference promises to reduce operational costs by supporting larger …
Chenxiang Ma
,
Hanyu Zhao
,
Zhisheng YE
,
Zehua Yang
,
Tianhao Fu
,
Jiaxun Han
,
Jie Zhang
,
Yingwei Luo
,
Xiaolin Wang
,
Zhenlin Wang
,
Yong Li
,
Diyu Zhou
Preprint
Cite
ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism
Hybrid parallelism underpins large-scale LLM training across tens of thousands of GPUs. At such scale, hardware failures on individual …
Tenghui Ma
,
Jihu Guo
,
Wei Gao
,
Sitian Lu
,
Zhisheng YE
,
Dahua Lin
,
Hanjing Wang
Cite
DOI
Characterization of Large Language Model Development in the Datacenter
Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to …
Qinghao Hu
,
Zhisheng YE
,
Zerui Wang
,
Guoteng Wang
,
Meng Zhang
,
Qiaoling Chen
,
Peng Sun
,
Dahua Lin
,
Xiaolin Wang
,
Yingwei Luo
,
Yonggang Wen
,
Tianwei Zhang
Preprint
Cite
Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters
Hyperparameter tuning is an essential step in deep learning model development that provides better model performance at the cost of …
Qinghao Hu
,
Zhisheng YE
,
Meng Zhang
,
Qiaoling Chen
,
Peng Sun
,
Yonggang Wen
,
Tianwei Zhang
PDF
Cite
Code
Slides
Video
Tear Up the Bubble Boom: Lessons Learned From a Deep Learning Research and Development Cluster
With the proliferation of deep learning, there exists a strong need to efficiently operate GPU clusters for deep learning production in …
Zehua Yang
,
Zhisheng YE
,
Tianhao Fu
,
Jing Luo
,
Xiong Wei
,
Yingwei Luo
,
Xiaolin Wang
,
Zhenlin Wang
,
Tianwei Zhang
PDF
Cite
Dataset
DOI
Chronus: A Novel Deadline-aware Scheduler for Deep Learning Training Jobs
We present Chronus, an end-to-end scheduling system to provide deadline guarantee for SLO jobs and maximize the performance of best-effort jobs for deep learning training jobs.
Wei Gao
,
Zhisheng YE
,
Peng Sun
,
Yonggang Wen
,
Tianwei Zhang
Preprint
PDF
Cite
Code
Video
DOI
Cite
×