Publications

(2026). Latency-SLO-Aware Memory Offloading for Large Language Model Inference. ICS.

Preprint Cite

(2026). ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism. Preprint.

Cite

(2026). CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control. Preprint.

Preprint PDF Cite DOI

(2025). LEMUR: Large Scale End-to-End Multimodal Recommendation. arXiv.

Preprint Cite

(2024). Characterization of Large Language Model Development in the Datacenter. In NSDI.

Preprint Cite

(2023). Deep Learning Workload Scheduling in GPU Datacenters: A Survey. In CSUR.

Preprint PDF Cite Project DOI

(2023). AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning. arXiv.

Preprint Cite

(2023). Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters. In OSDI.

PDF Cite Code Slides Video

(2022). Tear Up the Bubble Boom: Lessons Learned From a Deep Learning Research and Development Cluster. In ICCD.

PDF Cite Dataset DOI

(2021). ASTRAEA: A Fair Deep Learning Scheduler for Multi-tenant GPU Clusters. In TPDS.

Preprint Cite Code DOI