Authors: Claudio Barone (Pacific Northwest National Laboratory (PNNL)), Giovanni Gozzi and Michele Fiorito (Politecnico di Milano), Ankur Limaye and Antonino Tumeo (Pacific Northwest National Laboratory (PNNL)), and Fabrizio Ferrandi (Politecnico di Milano)
Abstract: Accelerators based on reconfigurable devices are becoming popular for data analytics in high performance computing and cloud computing systems. However, designing these accelerators is a hard problem. High-Level Synthesis tools can help by generating RTL designs from high-level languages, but they tend to optimize the computational part of the kernel, often not considering data movement and memory accesses. For many applications, instead, memory operations take a significant part of the overall execution time and can be the actual bottleneck limiting performance, especially when accessing large, possibly remote, memories.
We propose an approach based on the generation and integration of highly-customizable accelerator caches in order to reduce the latency with which an HLS-generated accelerator accesses external memory through spatial and temporal locality. We integrate it in a state-of-the-art open-source HLS tool and show how our approach allows to easily explore tradeoffs between performance and resource utilization with minimal user effort required.
Best Poster Finalist (BP): yes
Poster: PDF
Poster summary: PDF
Back to Poster Archive Listing