Overcoming the Cost of Data Movement in AI Inference Accelerators

SC23 Proceedings

Exhibitor Forums Archive

Overcoming the Cost of Data Movement in AI Inference Accelerators

Authors: Arun Iyengar (Untether AI)

Abstract: The largest performance bottleneck and energy usage in neural network acceleration is the fetching of weight and activation values prior to general matrix-vector (GEMV) or general matrix-matrix (GEMM) computation. Traditional von Neumann architectures, even with large on-chip caches, consume as much as 90% of their energy in data movement and only 10% for actual calculations, which limits their energy efficiency to, in most cases, low single digit TOPs/W. Analog in-memory compute, where the memory cell is used as part of the MAC calculation, suffers from accuracy issues and the required additional support circuitry, such as analog-to-digital and digital-to-analog converters, and compensation which obviates the inherent low-power advantages, limiting the state of the art to 3 TOPs/W.

The novel Untether AI at-memory compute architecture stores all weights directly on-chip in specially designed low-power SRAM using high-density bit cells that are tuned to directly feed the processing elements (PEs) using minimal energy. Because the PEs are directly adjacent to the SRAM cells, it only uses 2 femtojoules per bit-access. This innovation represents an order of magnitude improvement over compiled memory cells, and three orders of magnitude compared to fetching weights from external DRAM.

Presentation: file

Back to Exhibitor Forums Archive Listing