SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Exhibitor Forums Archive

Overcoming the Cost of Data Movement in AI Inference Accelerators


Authors: Arun Iyengar (Untether AI)

Abstract: The largest performance bottleneck and energy usage in neural network acceleration is the fetching of weight and activation values prior to general matrix-vector (GEMV) or general matrix-matrix (GEMM) computation. Traditional von Neumann architectures, even with large on-chip caches, consume as much as 90% of their energy in data movement and only 10% for actual calculations, which limits their energy efficiency to, in most cases, low single digit TOPs/W. Analog in-memory compute, where the memory cell is used as part of the MAC calculation, suffers from accuracy issues and the required additional support circuitry, such as analog-to-digital and digital-to-analog converters, and compensation which obviates the inherent low-power advantages, limiting the state of the art to 3 TOPs/W.

The novel Untether AI at-memory compute architecture stores all weights directly on-chip in specially designed low-power SRAM using high-density bit cells that are tuned to directly feed the processing elements (PEs) using minimal energy. Because the PEs are directly adjacent to the SRAM cells, it only uses 2 femtojoules per bit-access. This innovation represents an order of magnitude improvement over compiled memory cells, and three orders of magnitude compared to fetching weights from external DRAM.



Presentation: file


Back to Exhibitor Forums Archive Listing