木叶吟
木叶吟
Home
Experience
Publications
Posts
CV
Light
Dark
Automatic
LLM Serving
CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control
Batch inference for agentic workloads stresses the GPU key-value (KV) cache in a sustained and cumulative manner, often causing severe …
Qiaoling Chen
,
Zhisheng YE
,
Tian Tang
,
Peng Sun
,
Boyu Tian
,
Guoteng Wang
,
Shenggui Li
,
Yonggang Wen
,
Zhenhua Han
,
Tianwei Zhang
Preprint
PDF
Cite
DOI
Cite
×