SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Birds of a Feather

Increasing Memory Utilization and Reducing Total Memory Cost Using CXL


Authors: Kurtis Bowman (CXL Consortium), Kurt Lender (Intel Corporation), Tracy Spitler (IntelliProp Inc.), Vijay Nain (Micron Technology Inc), Bill Gervasi (Wolley Inc.)

Abstract: CXL’s advanced memory expansion and fabric management capabilities can be used to increase system scalability and flexibility across multiple compute domains, enabling resource sharing for higher performance, reduced software stack complexity, and lower overall datacenter memory cost. The fabric enhancements and memory expansion features included in CXL 3.0 deliver new levels of composability required by the large models used in HPC and AI in the modern datacenter. Expert representatives from CXL Consortium member companies who are implementing the specification will explore the CXL 3.0 features, new use case enablement, and ROI examples when implementing CXL attached memory.

Long Description: The CXL Consortium is an industry standards body dedicated to advancing Compute Express Link™ (CXL™) technology. Our mission is to develop and promote CXL – an open coherent interconnect and enable an interoperable ecosystem for heterogeneous memory and computing solutions.

CXL is designed to address the growing high-performance computational workloads by supporting heterogeneous processing and memory systems with applications in Artificial Intelligence, Machine Learning, Analytics, Cloudification of the Network and Edge, and High-Performance Computing by enabling coherency and memory semantics on top of PCI Express® (PCIe®) based I/O semantics for optimized performance in evolving usage models.

Building on the success of CXL 2.0 and feedback from the industry and end-users, CXL 3.0 expands on previous technology generations to increase scalability and optimize system-level flows with advanced switching and fabric capabilities, efficient peer-to-peer communications, and fine-grained resource sharing across multiple compute domains. The specification also introduces fabric capabilities beyond the traditional tree-based architectural structures of PCIe and previous CXL generations. The CXL fabric can support up to 4096 nodes which can be a CPU host, a CXL accelerator with or without memory, a PCIe device, or a Global Fabric Attached Memory (GFAM) device. This architecture enables the construction of powerful solutions made of compute and memory elements arranged to meet the needs of today’s workloads.

The session will start with a lively discussion from our panelists about the current challenges the supercomputing industry is facing today and how CXL can address those challenges. Our panelists will also explore CXL 3.0 benefits and system scale of CXL, reliability at-scale for expanded memory footprint, and how to integrate IO subsystems into disaggregated CXL-based systems.

We invite the HPC/AI community to engage directly with CXL technology experts, share the challenges they are facing, and gain insights from our expert panelists. We encourage attendees to attend the session to learn more about the emerging ecosystem of CXL devices and provide their views to the Consortium on how the specification should evolve to meet the community’s requirements.


Website: https://www.computeexpresslink.org/





Back to Birds of a Feather Archive Listing