SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

RDARuntime: An OS for AI Accelerators


Workshop: 13th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS)

Authors: Benjamin Glick, Arjun Sabnis, Renate Kempf, Arnav Goel, Aarti Lalwani, Guoyao Feng, and Kiran Ranganath (SambaNova Systems Inc)


Abstract: Today's supercomputers are more heterogeneous than ever before. As the share of AI workloads in data centers continues to grow, the share of GPUs and AI-specific hardware grows with it. AI accelerators are different from traditional hardware, affecting all aspects of system design, from data-center scale to single-chip scale. AI accelerators are much more efficient than CPUs or GPUs for some HPC workloads, especially in AI for Science. They also add complexity to system architecture, management, and programming. Although runtime frameworks are critical to reducing system complexity, there is little literature describing AI accelerator runtimes. In this paper, we introduce RDARuntime - an AI-specific OS tailored for the development and operation of SambaNova's reconfigurable dataflow architecture. We introduce the architecture, our design decisions, and some of the results we have achieved, along with some lessons we have learned while helping to deploy the Reconfigurable Dataflow Unit (RDU) to production environments.





Back to 13th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS) Archive Listing



Back to Full Workshop Archive Listing