1 | 木叶吟

CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control

Batch inference for agentic workloads stresses the GPU key-value (KV) cache in a sustained and cumulative manner, often causing severe …

Qiaoling Chen, Zhisheng YE, Tian Tang, Peng Sun, Boyu Tian, Guoteng Wang, Shenggui Li, Yonggang Wen, Zhenhua Han, Tianwei Zhang

CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control

FlowGPU: Transparent and Efficient GPU Checkpointing and Restore

GPU checkpointing and restore promises to enable emerging tasks, such as deep learning, to benefit from functionalities like task …

Zehua Yang, Xiao Zheng, Yonghao Zou, Junyang Zhang, Zhisheng YE, Feng Xie, Xiaolin Wang, Yingwei Luo, Zhenlin Wang, Diyu Zhou

Latency-SLO-Aware Memory Offloading for Large Language Model Inference

Offloading large language models (LLMs) state to host memory during inference promises to reduce operational costs by supporting larger …

Chenxiang Ma, Hanyu Zhao, Zhisheng YE, Zehua Yang, Tianhao Fu, Jiaxun Han, Jie Zhang, Yingwei Luo, Xiaolin Wang, Zhenlin Wang, Yong Li, Diyu Zhou

ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism

Hybrid parallelism underpins large-scale LLM training across tens of thousands of GPUs. At such scale, hardware failures on individual …

Tenghui Ma, Jihu Guo, Wei Gao, Sitian Lu, Zhisheng YE, Dahua Lin, Hanjing Wang

Characterization of Large Language Model Development in the Datacenter

Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to …

Qinghao Hu, Zhisheng YE, Zerui Wang, Guoteng Wang, Meng Zhang, Qiaoling Chen, Peng Sun, Dahua Lin, Xiaolin Wang, Yingwei Luo, Yonggang Wen, Tianwei Zhang

Characterization of Large Language Model Development in the Datacenter

Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters

Hyperparameter tuning is an essential step in deep learning model development that provides better model performance at the cost of …

Qinghao Hu, Zhisheng YE, Meng Zhang, Qiaoling Chen, Peng Sun, Yonggang Wen, Tianwei Zhang

Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters

Tear Up the Bubble Boom: Lessons Learned From a Deep Learning Research and Development Cluster

With the proliferation of deep learning, there exists a strong need to efficiently operate GPU clusters for deep learning production in …

Zehua Yang, Zhisheng YE, Tianhao Fu, Jing Luo, Xiong Wei, Yingwei Luo, Xiaolin Wang, Zhenlin Wang, Tianwei Zhang

Tear Up the Bubble Boom: Lessons Learned From a Deep Learning Research and Development Cluster

Chronus: A Novel Deadline-aware Scheduler for Deep Learning Training Jobs

We present Chronus, an end-to-end scheduling system to provide deadline guarantee for SLO jobs and maximize the performance of best-effort jobs for deep learning training jobs.

Wei Gao, Zhisheng YE, Peng Sun, Yonggang Wen, Tianwei Zhang

Chronus: A Novel Deadline-aware Scheduler for Deep Learning Training Jobs