木叶吟
木叶吟
Home
Experience
Posts
Publications
Services
CV
Light
Dark
Automatic
English
中文 (简体)
LLM Agents
CONCUR: Controlling Mid-Phase Thrashing in Agentic Batch Inference
A technical note on CONCUR, an agent-level admission control layer that prevents KV cache collapse during long-running agentic LLM inference.
Zhisheng YE
May 17, 2026
4 min read
CONCUR:让 Agent 批量推理避开中期拥塞
一篇关于 CONCUR 的技术笔记:它在 agent 层做准入控制,避免长时间运行的 LLM agent 推理把 KV cache 推入失控区间。
Zhisheng YE
May 17, 2026
CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control
Batch inference for agentic workloads stresses the GPU key-value (KV) cache in a sustained and cumulative manner, often causing severe …
Qiaoling Chen
,
Zhisheng YE
,
Tian Tang
,
Peng Sun
,
Boyu Tian
,
Guoteng Wang
,
Shenggui Li
,
Yonggang Wen
,
Zhenhua Han
,
Tianwei Zhang
Preprint
PDF
Cite
Cite
×