Fine-Grained Accelerator Partitioning for Machine Learning and Scientific Computing in Function as a Service Platform

SC23 Proceedings

Workshops Archive

Fine-Grained Accelerator Partitioning for Machine Learning and Scientific Computing in Function as a Service Platform

Workshop: 13th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS)

Authors: Aditya Dhakal and Philipp Raith (Hewlett Packard Labs), Logan Ward (Argonne National Laboratory), Rolando P. Hong Enriquez and Gourav Rattihalli (Hewlett Packard Labs), Kyle Chard (University of Chicago), Ian Foster (Argonne National Laboratory), and Dejan Milojicic (Hewlett Packard Labs)

Abstract: Function-as-a-service (FaaS) is a promising execution environment for high-performance computing (HPC) and machine learning (ML) applications, as it offers developers a simple way to write and deploy programs. Nowadays, GPUs and other accelerators are indispensable for HPC and ML workloads. However, we have observed that state-of-the-art FaaS frameworks usually treat accelerators as a single device to run a single workload and have little support for multiplexing accelerators.

In this work, we have presented techniques to multiplex GPUs with Parsl, a popular FaaS framework. With our enhancements, we show up to 60% lower task completion time and 250% improvement in the throughput of a large language model when multiplexing a GPU vs running without multiplexing. We plan to extend the support for GPU multiplexing in FaaS platforms by tackling the challenges of changing compute resources in the partition and approximating how to right-size a GPU partition for a function.

Back to 13th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS) Archive Listing

Back to Full Workshop Archive Listing