Hardware Specialization: Estimating Monte Carlo Cross-Section Lookup Kernel Performance and Area

SC23 Proceedings

Workshops Archive

Hardware Specialization: Estimating Monte Carlo Cross-Section Lookup Kernel Performance and Area

Workshop: PMBS23: The 14th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems

Authors: Kazutomo Yoshii, John Tramm, and Bryce Allen (Argonne National Laboratory); Tomohiro Ueno and Kentaro Sano (RIKEN Center for Computational Science (R-CCS)); and Andrew Siegel and Pete Beckman (Argonne National Laboratory)

Abstract: Hardware specialization is one of the promising directions in the post-Moore era. It is imperative to understand how hardware specialization paradigms can benefit HPC. An essential question revolves around estimating the theoretical performance of an optimally specialized architecture without requiring extensive hardware development expertise and efforts.

Focusing on the Monte Carlo cross-section lookup kernel, known for its notably low resource utilization, we develop a workflow to simulate a specialized architecture's timing and estimate resource usage to answer these questions, leveraging open-source hardware tools. We implement building blocks of the kernel pipeline in the Chisel construction language and generate Verilog codes for resource estimation. Our late-breaking results show that the kernel latency is 46 cycles per lookup while the optimized CPU code takes 680 cycles, and a potential 15k pipeline copies within a 698 mm2 die, reflective of the Intel Xeon Platinum 8180 dimensions.

Back to PMBS23: The 14th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems Archive Listing

Back to Full Workshop Archive Listing