CXL-Based Memory Disaggregation for HPC and AI Workloads

SC23 Proceedings

Exhibitor Forums Archive

CXL-Based Memory Disaggregation for HPC and AI Workloads

Authors: Hokyoon Lee and Jungmin Choi (SK hynix)

Abstract: The Compute Express Link (CXL) shows a characteristic of composability by nature, which enables the disaggregation of memory resources via CXL.mem transactions. In this forum, we focus on the demonstration of two powerful use cases - memory pooling and sharing - from which users can get benefits that have never been experienced before.

Memory Pooling Case: A key to alleviate a memory stranding issue The memory utilization of each host server in a compute cluster varies time to time, which mandates system operators to provision each server with DRAM capacity at its peak utilization for real-time or interactive applications. Unused memory in each server can never be utilized by other servers, which makes stranded memory. SK hynix’s Niagara, a CXL-based pooled memory solution, addresses this stranded memory issue. Our FPGA-based pooled memory solution can be connected to four host servers and supports four DDR DIMM channels with maximum capacity of 1TB. In our exhibition booth, we will demonstrate how Niagara can alleviate a memory stranding issue with its Elastic Memory feature.

Memory Sharing Case: A key to realize zero-copy distributed computing framework Conventional distributed computing frameworks such as Spark and Ray suffer from heavy network traffic for distributing data and tasks to computing nodes in a cluster. To address this issue, we have implemented a memory sharing feature in Niagara so that multiple host servers can directly access the same shared data without data transfer over a network. In this forum, we demonstrate the effectiveness of memory sharing with a real workload in Ray framework, which is known for being used in ChatGPT.

Presentation: file

Back to Exhibitor Forums Archive Listing