SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Technical Papers Archive

5 ExaFlop/s HPL-MxP Benchmark with Linear Scalability on the 40-Million-Core Sunway Supercomputer


Authors: Rongfen Lin (National Research Center of Parallel Computer Engineering and Technology, China; Tsinghua University, China); Xinhui Yuan (National Research Center of Parallel Computer Engineering and Technology, China); Wei Xue (Tsinghua University, China; Qinghai University); WanWang Yin (National Research Center of Parallel Computer Engineering & Technology, China); Jienan Yao (Tsinghua University, China); Junda Shi (National Research Center of Parallel Computer Engineering and Technology, China); Qiang Sun (National Research Center of Parallel Computer Engineering & Technology, China); Chaobo Song (National Research Center of Parallel Computer Engineering and Technology, China); and Fei Wang (Tsinghua University, China; National Research Center of Parallel Computer Engineering and Technology)

Abstract: HPL-MxP is an emerging high performance benchmark used to measure the mixed-precision computing capability of leading supercomputers. This work present our efforts on the new Sunway that linearly scales the benchmark to over 40 million cores, sustains an overall mixed-precision performance exceeding 5 ExaFlop/s, and achieves over 85% of peak performance, which is the highest efficiency reached among all heterogeneous systems on the HPL-MxP list. The optimizations of our HPL-MxP implementation include: (1)a Two-Direction Look-Ahead and Overlap algorithm that enables overlaps of all communications with computation; (2)a multi-level process-mapping and communication-scheduling method that uses the network as best as possible while maintaining conflict-free algorithm-flow; and (3)a CG-Fusion computing framework that eliminates up to 60% of inter-chip communications and removes the memory access bottleneck while serving both computation and communication simultaneously. This work could also provide useful insights for tuning cutting-edge applications on Sunway supercomputers as well as other heterogeneous supercomputers.




Back to Technical Papers Archive Listing