SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

Dynamic Memory Provisioning on Disaggregated HPC Systems


Workshop: Workshop on Memory Technologies, Systems, and Applications

Authors: Felippe Zacarias (Universitat Politècnica de Catalunya, Barcelona Supercomputing Center); Paul Carpenter (Barcelona Supercomputing Center); and Vinicius Petrucci (Micron Technology Inc)


Abstract: Disaggregated memory intends to break the rigid boundaries between node memory hierarchies by providing memory as a pooled resource. The resource manager allocates system’s memory at job’s submission time. But it is hard for users to know the job's precise peak memory footprint, and prior work has shown users have an incentive to overestimate. It leads to significant overallocation, and most of the physical memory in the system is wasted. We present a way to reclaim much of this overallocated memory. We extend the Slurm job scheduler to dynamically reallocate memory, according to the job’s current memory footprint. We enhance an existing Slurm simulator to model this situation and combine publicly available traces to model an HPC system on up to 1490 nodes. We show that dynamic memory provisioning approach increases the throughput per dollar by up to 38%, compared to a system with static allocation of disaggregated memory.





Back to Workshop on Memory Technologies, Systems, and Applications Archive Listing



Back to Full Workshop Archive Listing