SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Research Posters Archive

A High-Performance I/O Framework for Accelerating DNN Model Updates Within Deep Learning Workflow

Authors: Jie Ye and Jaime Cernuda (Illinois Institute of Technology), Bogdan Nicolae (Argonne National Laboratory), and Anthony Kougkas and Xian-He Sun (Illinois Institute of Technology)

Abstract: In traditional deep learning workflows, AI applications (producers) train DNN models offline using fixed datasets, while inference serving systems (consumers) load the trained models for offering real-time inference queries. In practice, AI applications often operate in a dynamic environment where data is constantly changing. Compared to offline learning, Continuous learning frequently (re)-trains models to adapt to the ever-changing data. This demands regular deployment of the DNN models, increasing the model update frequency between producers and consumers. Typically, producers and consumers are connected via model repositories like PFS, which may result in high model update latency due to I/O bottleneck of PFS. To address this, our work introduces a high-performance I/O framework that speeds up model updates between producers and consumers. It employs a cache-aware model handler to minimize the latency and an intelligent performance predictor to maintain a balance between training and inference performance.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing