SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Doctoral Showcase Archive

Corralling the Computing Continuum: Mobilizing Modern Distributed Resources for Machine Learning and Accessible Computing

Author: Matt Baughman (University of Chicago)

Advisor: Kyle Chard (University of Chicago; Argonne National Laboratory (ANL), Data Science and Learning Division), Ian Foster (Argonne National Laboratory (ANL), University of Chicago)

Abstract: To achieve the resource agnostic flexibility of compute described by the computing continuum, we combined our work in workload profiling and cost estimation with task provisioning to present DELTA‚Äďa framework for serverless workload placement across a computing ecosystem. To address the dynamic availability of modern computing resources as well as the multiple costs involved in computing, we presented extensions of our framework as DELTA+ which demonstrated the ability for resource provisioning and multidimensional compute costs.

To bring this idea of resource abstraction via serverless into the rapidly growing field of federated learning, we developed and released FLoX: Federated Learning on funcX. This framework was built from the ground up around a serverless computing paradigm with experimentation and usability in mind. Extending the lessons learned from DELTA around self-adaptive systems, we began exploring the potential of automating tradeoffs found in FLoX and federated learning in general.

Looking ahead, we are developing FLoX into a much more robust framework to enable the use of a wide range of computing resources while abstracting away the difficulties of configuring and optimizing a federated learning experiment. Additionally, we are actively working on a re-release of DELTA with all extensions combined into one framework with updated cost and execution time predictors and complete resource provisioning ability. Finally, we are designing an integration between FLoX and DELTA that will enable serverless-based FL to automatically place each component of an FL flow and move data as necessary to best use the available resources.

Thesis Canvas: pdf

Back to Doctoral Showcase Archive Listing