Workshop: 2023 International Workshop on Performance, Portability, and Productivity in HPC (P3HPC)
Authors: Oscar Antepara, Samuel Williams, and Hans Johansen (Lawrence Berkeley National Laboratory (LBNL)) and Tuowen Zhao, Samantha Hirsch, Priya Goyal, and Mary Hall (University of Utah)
Abstract: In this new era where multiple GPU vendors are leading the supercomputing landscape, and multiple programming models are available to users, the drive to achieve performance portability across platforms faces new challenges. Consider stencil algorithms, where architecture-specific solutions are required to optimize for the parallelism hierarchy and memory hierarchy of emerging systems. In this work, we analyze performance portability of the BrickLib domain-specific library and vector code generator for stencils. BrickLib employs fine-grain data blocking to reduce the large amount of data movement associated with stencils. We compare different GPUs (NVIDIA, AMD and Intel) and their associated programming models (CUDA, HIP and SYCL). By testing a wide range of stencil configurations, we show that overall, BrickLib achieves good performance independent of machine or programming model. Moreover, we introduce correlation models as a new tool for comparing architectures and programming models from Roofline model data.