SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Research Posters Archive

Radium: Transparent Distributed Execution via Process Virtualization

Authors: Aidan Cully, Husheng Zhou, Dusan Veljko, Hyojong Kim, Vance Miller, Joel Zambrano, and Mazhar Memon (VMware Inc)

Abstract: The soaring demand for AI has led to a surge in specialized computation hardware, which poses challenges in sharing resources through conventional virtualization methods among end users. Moreover, the extensive data required by AI often cannot be conveniently co-located with the compute resources, resulting in costly and unsuitable migration attempts. To address these issues, Radium offers a userspace framework employing process virtualization, thread execution migration, and distributed shared memory. By leveraging Radium, an unmodified application binary operates in an encapsulated virtualized environment and its execution can be transparently distributed among nodes where resources are located. Radium enables resource aggregation with little performance penalty over high latency network connectivity. By choosing syscalls as the virtualization boundary, Radium supports novel hardware by nature without modifying existing infrastructure or applications.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing