Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters

System design

Abstract

Hyperparameter tuning is an essential step in deep learning model development that provides better model performance at the cost of substantial resources. While existing systems can improve tuning efficiency, they still fail to handle large models with billions of parameters and efficiently leverage cluster resources. Motivated by these deficiencies, we introduce Hydro, a surrogate-based hyperparameter tuning service that optimizes tuning workloads in both the job-level and cluster-level granularities. Specifically, it consists of two key components: (1) Hydro Tuner automatically generates and optimizes surrogate models via scaling, parametrization and fusion; (2) Hydro Coordinator improves tuning efficiency and cluster-wide resource utilization by adaptively leveraging ephemeral and heterogeneous resources. Our comprehensive experiments on two tuning algorithms across six models show that Hydro Tuner can dramatically reduce tuning makespan by up to 78.5x compared with Ray Tune and no reduction in tuning quality.

Publication
In Operating Systems Design and Implementation
Zhisheng YE
Zhisheng YE
CS Ph.D. student

My research interests include distributed systems, machine learning systems and resource management, etc.

Related