Zhisheng YE

Zhisheng YE

Machine Learning Systems Researcher

Bytedance

Peking University

Biography

Hi, there! This is Zhisheng Ye. I am currently a machine learning systems researcher at Applied Machine Learning at Bytedance, where I work on building efficient and practical systems for emerging recommendation models and LLM workloads.

I received my Ph.D. in Institute of Networking and Energy-efficient Computing (NEEC) at Peking University in 2024, under the joint supervision of Prof. Yingwei Luo, the director of NEEC, and Prof. Xiaolin Wang. Previously, I received a B.S. degree in Computer Science and Technology from the School of Electronics Engineering and Computer Science (EECS) at Peking University, China, in 2019.

My research interests include resource management in machine learning systems and building efficient and practical systems for emerging LLM workloads. I am also interested in high performance computing and GPU systems, as a former member of PKUSC. I also received mentorship from Prof. Tianwei Zhang of NTU and had strong collaborations with his students, including Wei Gao, Qinghao Hu, Meng Zhang, and Qiaoling Chen. Moreover, I received mentorship from and collaborated with Peng Sun.

Download my CV.

Interests
  • AI Infrastructure for LLMs
  • Machine Learning Systems
  • Resource Management
Education
  • Ph.D. in Computer Architecture, 2024

    Peking University

  • BSc in Computer Science and Technology, 2019

    Peking University

Experience

 
 
 
 
 
Bytedance
Machine Learning Systems Researcher
Jul 2024 – Present Beijing, China
 
 
 
 
 
Shanghai AI Laboratory
Research Intern
Jul 2022 – Jan 2024 Beijing, China
  • Large scale model (e.g., LLM, MoE) training infrastructure optimization.
  • Deeply involved in the development of InternLM.
 
 
 
 
 
Sensetime Research
Research Intern
Sep 2019 – Jun 2022 Beijing, China
  • Supercomputing cluster scheduling and optimization for deep learning training workloads in Sensetime Research (now SenseCore).
  • Design and implementation of a fair scheduler for DLT jobs as first author.
 
 
 
 
 
Peng Cheng Laboratory
Research Intern
Jul 2018 – Sep 2021 Shenzhen, China
  • Contributed to development of OpenI-Octopus, an open-sourced scheduler for deep learning training workloads based on Kubernetes.
  • Safe GPU sharing and efficient migration mechanisms on Kubernetes.
  • Monitoring and logging systems.
 
 
 
 
 
Peking University Cluster Competition Team
Team member
Sep 2018 – Jun 2019 Beijing, China
  • Participated in analyzing, compiling, profiling, optimizing, and improving parallelizability of general HPC tasks.
  • First Price (Team), ASC19 Student Supercomputer Challenge

Recent Publications

(2026). CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control. In arXiv.

Preprint PDF Cite DOI

(2025). LEMUR: Large Scale End-to-End Multimodal Recommendation. arXiv.

Preprint Cite

(2025). Memory Offloading for Large Language Model Inference with Latency SLO Guarantees. arXiv.

Preprint Cite

(2024). Characterization of Large Language Model Development in the Datacenter. In NSDI.

Preprint Cite