SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Panels Archive

Understanding the Performance, Reproducibility, Validation, Portability, and Sustainability of Coupled HPC Simulation and Deep Learning Calculations

Moderator: Ada Sedova (Oak Ridge National Laboratory (ORNL), Oak Ridge National Lab)

Panelists: Jeyan Thiyagalingam (Rutherford Appleton Laboratory, Science and Technology Facilities Council (STFC)), Daniel Reed (University of Utah), Karthik Kashinath (NVIDIA Corporation), Wesley Brewer (Oak Ridge National Laboratory (ORNL)), Daniel Martinez-Gonzalez (NASA Ames Research Center)

Abstract: Recent advances in deep learning (DL) for scientific computing have paved the way for a new type of integrated programming environment. This environment must support the seamless integration of simulation applications with deep learning frameworks using methods such as in-memory coupling and inference serving. Especially for HPC, this environment brings a slew of challenges, forcing developers to revisit decades of solved problems in scientific computing: kernel optimization, verification/validation strategies, building/porting practices. Interfacing HPC simulation codes with DL frameworks from industry—whose philosophies and strategies may differ from those within HPC—brings critical questions about how these two communities can work together to develop sustainable, integrated programming environments that are trustworthy, vetted, and portable, and where HPC communities can express requirements for scientific software and can track ownership. Discussions are needed about how to overcome these challenges: here, panelists from academia, national laboratories and industry will start a conversation, sharing perspectives and experiences.

Back to the Panel Archive Listing