木叶吟
木叶吟
Home
Experience
Posts
Publications
Services
CV
Light
Dark
Automatic
English
中文 (简体)
LLM Agents
SpecGen: Accelerating Agentic Kernel Optimization with Speculative Generation
We present SpecGen, an agentic kernel optimization system that uses speculative generation to fork candidate kernels during LLM reasoning, validating and profiling them in parallel to reduce end-to-end optimization time and improve resource utilization.
Jihu Guo
,
Sitian Lu
,
Tenghui Ma
,
Wei Gao
,
Zhisheng YE
,
Xingcheng Zhang
,
Dahua Lin
Preprint
PDF
Cite
CONCUR: Controlling Mid-Phase Thrashing in Agentic Batch Inference
A technical note on CONCUR, an agent-level admission control layer that prevents KV cache collapse during long-running agentic LLM inference.
Zhisheng YE
May 17, 2026
5 min read
CONCUR:让 Agent 批量推理避开中期拥塞
一篇关于 CONCUR 的技术笔记:它在 agent 层做准入控制,避免长时间运行的 LLM agent 推理把 KV cache 推入失控区间。
Zhisheng YE
May 17, 2026
CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control
Batch inference for agentic workloads stresses the GPU key-value (KV) cache in a sustained and cumulative manner, often causing severe …
Qiaoling Chen
,
Zhisheng YE
,
Tian Tang
,
Peng Sun
,
Boyu Tian
,
Guoteng Wang
,
Shenggui Li
,
Yonggang Wen
,
Zhenhua Han
,
Tianwei Zhang
Preprint
PDF
Cite
Cite
×