Workshop: PMBS23: The 14th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems
Authors: Zhengji Zhao, Ermal Rrapaj, Sridutt Bhalachandra, Brian Austin, Hai Ah Nam, and Nicholas Wright (Lawrence Berkeley National Laboratory (LBNL))
Abstract: Power has become a key limiting factor in supercomputing. Understanding the power signatures of current production workloads is essential to address this limit and continue to advance scientific computing at scale. This paper analyzes the power characteristics of NERSC production workloads at the system and application levels. Our system-level analysis revealed a large gap between the average and peak power usage distribution, indicating a significant power swing from running various applications on the system. On the application level, we select four workflow benchmarks representing NERSC's production workloads to analyze the power characteristics of applications and attempt to correlate the observed power timeline patterns with GPU performance metrics and application profiling data. We found different applications have distinct power usage patterns and widespread average and peak power usage. We discuss how these findings may help improve the current system's operational power efficiency and the implications for future system procurement.