OctoPipe: Reducing Pipeline Bubbles for Heterogeneous Models via Co-Optimizing Partitioning, Placement, and Scheduling

Abstract

Pipeline parallelism is widely used to train large language models (LLMs). However, increasing heterogeneity in model architectures exacerbates pipeline bubbles, thereby reducing training efficiency. Prior approaches typically optimize only a single dimension (i.e., partitioning, placement, or scheduling) of a pipeline, leaving substantial pipeline bubbles. Although a natural approach to further reduce bubbles is co-optimization, it introduces complex performance modeling, a combinatorial search space, and irregular execution orders. We propose OctoPipe, a novel pipeline parallelism system that co-optimizes partitioning, placement, and scheduling. First, we build a graph-based pipeline simulator to provide accurate performance estimates for co-optimization. Second, we develop an iterative bubble-aware tuner to efficiently explore the large search space. Third, we implement a unified pipeline executor that dynamically orchestrates computation and communication to support irregular execution orders without deadlocks while maximizing communication-computation overlap. Experiments show that OctoPipe achieves 1.22-2.14x throughput improvement over Megatron-LM across various heterogeneous LLM architectures and scales.

Publication
In The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC)
Zhisheng YE
Zhisheng YE
Machine Learning Systems Researcher

My research interests include AI Infra for LLMs, algorithm–system co-design for machine learning systems and resource management.

Related