Automatic Generation of Micro-Kernels for Performance Portability of Matrix Multiplication on RISC-V Vector Processors

SC23 Proceedings

Workshops Archive

Automatic Generation of Micro-Kernels for Performance Portability of Matrix Multiplication on RISC-V Vector Processors

Workshop: Second International Workshop on RISC-V for HPC

Authors: Francisco Igual and Luis Piñuel (Complutense University of Madrid); Sandra Catalán (Jaume I University, Spain); Héctor Martínez (Universidad de Córdoba); and Adrián Castelló and Enrique Quintana-Ortí (Universidad Politecnica de Valencia)

Abstract: In this paper, we propose and evaluate several optimized implementations of the general matrix multiplication (Gemm) on two different RISC-V architecture cores implementing the RISC-V vector extension (RVV): C906 and C910 from T-HEAD. Specifically, we address the performance portability problem across these processor cores by means of an automatic assembly code generator, written in Python, capable of emitting RVV code for high performance computing (HPC), with a variety of combinations of specific and general optimizations.

Our experimental results using a number of automatically-generated micro-kernels for Gemm, on both RISC-V architectures, reveal different impact of each optimization, depending on the target architecture, and highlight the importance of automatically generating HPC RVV code to achieve performance portability while reducing the developers' effort. In addition, these optimizations show important performance gains with respect to to a state-of-the-art tuned BLAS library (OpenBLAS), reaching 3x and 1.3x speed-ups for the C910 and C906, respectively.

Back to Second International Workshop on RISC-V for HPC Archive Listing

Back to Full Workshop Archive Listing