OctoPipe: Reducing Pipeline Bubbles for Heterogeneous Models via Co-Optimizing Partitioning, Placement, and Scheduling

Jihu Guo, Tenghui Ma, Wei Gao, Peng Sun, Xun Chen, Jiaxing Li, Zhisheng YE, Yuyang Jin, Dahua Lin

November 2026

Abstract

Pipeline parallelism is widely used to train large language models (LLMs). However, increasing heterogeneity in model architectures exacerbates pipeline bubbles, thereby reducing training efficiency. Prior approaches typically optimize only a single dimension (i.e., partitioning, placement, or scheduling) of a pipeline, leaving substantial pipeline bubbles. Although a natural approach to further reduce bubbles is co-optimization, it introduces complex performance modeling, a combinatorial search space, and irregular execution orders. We propose OctoPipe, a novel pipeline parallelism system that co-optimizes partitioning, placement, and scheduling. First, we build a graph-based pipeline simulator to provide accurate performance estimates for co-optimization. Second, we develop an iterative bubble-aware tuner to efficiently explore the large search space. Third, we implement a unified pipeline executor that dynamically orchestrates computation and communication to support irregular execution orders without deadlocks while maximizing communication-computation overlap. Experiments show that OctoPipe achieves 1.22-2.14x throughput improvement over Megatron-LM across various heterogeneous LLM architectures and scales.

Type

Conference paper

Publication

In The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC)

LLM Training Distributed Training

OctoPipe: Reducing Pipeline Bubbles for Heterogeneous Models via Co-Optimizing Partitioning, Placement, and Scheduling

Abstract

Zhisheng YE

Machine Learning Systems Researcher

Related