Workshop: Second International Workshop on RISC-V for HPC
Authors: Francisco Igual and Luis Piñuel (Complutense University of Madrid); Sandra Catalán (Jaume I University, Spain); Héctor Martínez (Universidad de Córdoba); and Adrián Castelló and Enrique Quintana-Ortí (Universidad Politecnica de Valencia)
Abstract: In this paper, we propose and evaluate several optimized implementations of the general matrix multiplication (Gemm) on two different RISC-V architecture cores implementing the RISC-V vector extension (RVV): C906 and C910 from T-HEAD. Specifically, we address the performance portability problem across these processor cores by means of an automatic assembly code generator, written in Python, capable of emitting RVV code for high performance computing (HPC), with a variety of combinations of specific and general optimizations.
Our experimental results using a number of automatically-generated micro-kernels for Gemm, on both RISC-V architectures, reveal different impact of each optimization, depending on the target architecture, and highlight the importance of automatically generating HPC RVV code to achieve performance portability while reducing the developers' effort. In addition, these optimizations show important performance gains with respect to to a state-of-the-art tuned BLAS library (OpenBLAS), reaching 3x and 1.3x speed-ups for the C910 and C906, respectively.
Back to Second International Workshop on RISC-V for HPC Archive Listing