Chenxiang Ma,
Hanyu Zhao,
Zhisheng YE,
Zehua Yang,
Tianhao Fu,
Jiaxun Han,
Jie Zhang,
Yingwei Luo,
Xiaolin Wang,
Zhenlin Wang,
Yong Li,
Diyu Zhou
(2026).
Latency-SLO-Aware Memory Offloading for Large Language Model Inference.
ICS.
Xintian Han,
Honggang Chen,
Quan Lin,
Jingyue Gao,
Xiangyuan Ren,
Lifei Zhu,
Zhisheng YE,
Shikang Wu,
XiongHang Xie,
Xiaochu Gan,
Bingzheng Wei,
Peng Xu,
Zhe Wang,
Yuchao Zheng,
Jingjian Lin,
Di Wu,
Junfeng Ge
(2025).
LEMUR: Large Scale End-to-End Multimodal Recommendation.
arXiv.
Qinghao Hu,
Zhisheng YE,
Zerui Wang,
Guoteng Wang,
Meng Zhang,
Qiaoling Chen,
Peng Sun,
Dahua Lin,
Xiaolin Wang,
Yingwei Luo,
Yonggang Wen,
Tianwei Zhang
(2024).
Characterization of Large Language Model Development in the Datacenter.
In
NSDI.