LLM Agents

SpecGen: Accelerating Agentic Kernel Optimization with Speculative Generation

We present SpecGen, an agentic kernel optimization system that uses speculative generation to fork candidate kernels during LLM reasoning, validating and profiling them in parallel to reduce end-to-end optimization time and improve resource utilization.

Jihu Guo, Sitian Lu, Tenghui Ma, Wei Gao, Zhisheng YE, Xingcheng Zhang, Dahua Lin

CONCUR: Controlling Mid-Phase Thrashing in Agentic Batch Inference

A technical note on CONCUR, an agent-level admission control layer that prevents KV cache collapse during long-running agentic LLM inference.

Zhisheng YE

May 17, 2026 5 min read

CONCUR: Controlling Mid-Phase Thrashing in Agentic Batch Inference

CONCUR：让 Agent 批量推理避开中期拥塞

一篇关于 CONCUR 的技术笔记：它在 agent 层做准入控制，避免长时间运行的 LLM agent 推理把 KV cache 推入失控区间。

Zhisheng YE

May 17, 2026

CONCUR：让 Agent 批量推理避开中期拥塞

CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control

Batch inference for agentic workloads stresses the GPU key-value (KV) cache in a sustained and cumulative manner, often causing severe …

Qiaoling Chen, Zhisheng YE, Tian Tang, Peng Sun, Boyu Tian, Guoteng Wang, Shenggui Li, Yonggang Wen, Zhenhua Han, Tianwei Zhang

CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control