Workshop: PMBS23: The 14th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems
Authors: Sergej-Alexander Breiter (Ludwig Maximilian University of Munich), James D. Trotter (Simula Research Laboratory), and Karl Fürlinger (Ludwig Maximilian University of Munich)
Abstract: One of the novel features of the Fujitsu A64FX CPU is the sector cache. This feature enables hardware-supported partitioning of the L1 and L2 caches and allows the programmer control of which partition is used to place data in. This paper performs an in-depth study of how to apply the sector cache to a frequently used sparse matrix-vector multiplication (SpMV) kernel. A performance model based on reuse analysis is used to better understand situations where the sector cache leads to improved reuse and to predict the cache behavior. The model correctly predicts the number of L2 cache misses within 2–3 % for sequential and parallel SpMV with 48 threads using a collection of 490 sparse matrices. Further experiments show the effect of various sector cache configurations on performance. A median speedup of about 1.05× is achieved, whereas the maximum speedup is about 1.6×.