Authors: Yang Liu and Nan Ding (Lawrence Berkeley National Laboratory (LBNL)), Piyush Sao (Oak Ridge National Laboratory (ORNL)), and Samuel Williams and Xiaoye Sherry Li (Lawrence Berkeley National Laboratory (LBNL))
Abstract: This paper presents a unified framework for reducing communication costs of sparse triangular solvers (SpTRSV) on CPU and GPU clusters. The proposed framework builds upon a 3D communication-avoiding process layout that distributes a sparse triangular matrix into a 3D layout consisting of 2D grids. This work significantly reduces inter-process communication by replicating computation and using sparse allreduce operations across the 2D grids. This also allows for integration of a number of communication-optimized 2D SpTRSV algorithms including binary communication tree-based CPU algorithms and one-sided GPU communication (e.g., NVSHMEM)-based algorithms. With all these communication reduction schemes, the resulting SpTRSV exhibits significantly better scalability than existing works on leadership CPU and CPU clusters such as Cori, Perlmutter and Crusher.
Back to Technical Papers Archive Listing