Authors: Filippo Spiga (NVIDIA Corporation, AHUG), Simon McIntosh-Smith (University of Bristol, AHUG), Eva Siegmann (Stony Brook University, AHUG), Miwako Tsuji (RIKEN R-CCS, AHUG), Ross Miller (Oak Ridge National Laboratory), Hitoshi Murai (RIKEN Center for Computational Science (R-CCS)), Thomas Green (Cardiff University), Nicolas Renon (CALMIP), Zach Cobell (The Water Institute), Rober Harrison (Stony Brook University), John Cazes (Texas Advanced Computing Center (TACC))
Abstract: This BoF brings together the Arm HPC community to discuss experiences and lessons learnt in delivering and operating Arm-based HPC systems. The topic of Arm HPC ecosystem maturity has been extensively discussed, focusing especially on the upper part of the stack (compiler, libraries, applications). This BoF focuses instead on the other side of the coin with a focus on administration and management of systems. Primed by a short opening session from well-recognized experts in the community, the host and panel will engage attendees to share and ask probing questions. Audience participation is strongly encouraged.
Long Description: The Arm architecture has made a strong impact on the HPC community as evidenced by several projects including the Japanese Fugaku supercomputer, Sandia’s Astra system, the UK’s GW4/EPSRC efforts, and Europe’s “Mont-Blanc” project. Arm is now a major player in the HPC field, and there is a rapidly growing diversity of Arm-based HPC platforms. CPUs from Ampere, Fujitsu, and AWS power HPC at all scales, and NVIDIA, SiPearl, and others have announced accelerated Arm-based platforms. This BoF is part of a multi-year series with a common “Arm in HPC” theme. During each edition, the core topic evolves and gets tweaked to allow discussions and confrontation on new topics as well as to engage with a new set of speakers from the community. The topic of Arm HPC ecosystem maturity has been extensively discussed, focusing especially on the upper part of the stack (compiler, libraries, applications). Over the past decade, Arm has matured from research projects into a solid alternative to x86. Several systems have been deployed in recent years (Astra, Fugaku, Isambard, Ookami) and more are scheduled to be delivered and come online in the next 12 months. This round, instead of focusing on user experience or ecosystem readiness or the role of standards (either hardware and software), the BoF contributors will explore operational aspects such as deployment and management and day-to-day administration of either production systems or advanced multi-node test-beds. We seek to provide a venue where practitioners with similar experiences and needs can meet up for collaborations, exchanges and fruitful technical discussion.
Current list of proposed speakers:
- Ross Miller (ORNL) [ ‘Wombat’, NVIDIA Arm HPC Developer Kit ]
- Fujiyoshi Syoji (RIKEN R-CCS) or Hitoshi Murai (RIKEN R-CCS) [ ‘Fugaku’, Fujitsu A64FX ]
- Tom Green (GW4, United Kingdom) [ ‘Isambard2’, Marvell Thunderx2 andFujitsu A64FX ]
- Nicolas Renon (CALMIP, France) [ ‘Tupan’ , NVIDIA Arm HPC Developer Kit]
- A representative of PCSS (Poland) [Huawei-based system]
- An AWS customer [AWS Graviton virtual cluster]
Website: https://a-hug.org/events/sc-2023-ahug-event/