Workshop: 13th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS)
Authors: Aditya Dhakal and Philipp Raith (Hewlett Packard Labs), Logan Ward (Argonne National Laboratory), Rolando P. Hong Enriquez and Gourav Rattihalli (Hewlett Packard Labs), Kyle Chard (University of Chicago), Ian Foster (Argonne National Laboratory), and Dejan Milojicic (Hewlett Packard Labs)
Abstract: Function-as-a-service (FaaS) is a promising execution environment for high-performance computing (HPC) and machine learning (ML) applications, as it offers developers a simple way to write and deploy programs. Nowadays, GPUs and other accelerators are indispensable for HPC and ML workloads. However, we have observed that state-of-the-art FaaS frameworks usually treat accelerators as a single device to run a single workload and have little support for multiplexing accelerators.
In this work, we have presented techniques to multiplex GPUs with Parsl, a popular FaaS framework. With our enhancements, we show up to 60% lower task completion time and 250% improvement in the throughput of a large language model when multiplexing a GPU vs running without multiplexing. We plan to extend the support for GPU multiplexing in FaaS platforms by tackling the challenges of changing compute resources in the partition and approximating how to right-size a GPU partition for a function.