SC23 Proceedings

SC Technical Program Archives

Technical Papers

  • 5 ExaFlop/s HPL-MxP Benchmark with Linear Scalability on the 40-Million-Core Sunway Supercomputer Rongfen Lin (National Research Center of Parallel Computer Engineering and Technology, China; Tsinghua University, China); Xinhui Yuan (National Research Center of Parallel Computer Engineering and Technology, China); Wei Xue (Tsinghua University, China; Qinghai University); WanWang Yin (National Research Center of Parallel Computer Engineering & Technology, China); Jienan Yao (Tsinghua University, China); Junda Shi (National Research Center of Parallel Computer Engineering and Technology, China); Qiang Sun (National Research Center of Parallel Computer Engineering & Technology, China); Chaobo Song (National Research Center of Parallel Computer Engineering and Technology, China); and Fei Wang (Tsinghua University, China; National Research Center of Parallel Computer Engineering and Technology)
  • 69.7-PFlops Extreme Scale Earthquake Simulation with Crossing Multi-Faults and Topography on Sunway Wubing Wan and Lin Gan (Tsinghua University, China; National Supercomputing Center in Wuxi, China); Wenqiang Wang (Southern University of Science and Technology, China); Zekun Yin and Haodong Tian (Shandong University, China); Zhenguo Zhang (Southern University of Science and Technology, China); Yinuo Wang (Tsinghua University, China); Mengyuan Hua and Xiaohui Liu (Shandong University, China); Shengye Xiang and Zeyu Song (Tsinghua University, China); Zhongqiu He and Zijia Wang (Southern University of Science and Technology, China); Ping Gao (Tsinghua University, China; National Supercomputing Center in Wuxi, China); Yaojian Chen (Tsinghua University, China); Xiaohui Duan (Shandong University, China); Xin Liu (National Supercomputing Center in Wuxi); Wei Zhang (Southern University of Science and Technology, China); Haohuan Fu and Wei Xue (Tsinghua University, China); Weiguo Liu (Shandong University, China); Guangwen Yang (Tsinghua University, China); and Xiaofei Chen (Southern University of Science and Technology, China)
  • Accelerating Communications in Federated Applications with Transparent Object Proxies J. Gregory Pauloski and Valerie Hayot-Sasson (University of Chicago); Logan Ward (Argonne National Laboratory (ANL)); Nathaniel Hudson and Charlie Sabino (University of Chicago); and Matt Baughman, Kyle Chard, and Ian Foster (University of Chicago, Argonne National Laboratory (ANL))
  • Adaptive Workload-Balanced Scheduling Strategy for Global Ocean Data Assimilation on Massive GPUs Junmin Xiao (Institute of Computing Technology, Chinese Academy of Sciences); Chaoyang Shui (Institute of Computing Technology, Institute of Computing Technology, Chinese Academy of Sciences); and Di Cai, Kangyu Wang, Yunfei Pang, Mingyi Li, Hui Ma, and Guangming Tan (Institute of Computing Technology, Chinese Academy of Sciences)
  • ADT-FSE: A New Encoder for SZ Tao Lu (DapuStor Corporation); Yu Zhong, Zibin Sun, Xiang Chen, You Zhou, and Fei Wu (Huazhong University of Science & Technology); and Ying Yang, Yunxin Huang, and Yafei Yang (DapuStor Corporation)
  • AMRIC: A Novel In Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications Daoce Wang (Indiana University), Jesus Pulido and Pascal Grosset (Los Alamos National Laboratory (LANL)), Jiannan Tian and Sian Jin (Indiana University), Houjun Tang and Jean Sexton (Lawrence Berkeley National Laboratory (LBNL)), Sheng Di (Argonne National Laboratory (ANL)), Kai Zhao (Florida State University), Bo Fang (Pacific Northwest National Laboratory (PNNL)), Zarija Lukić (Lawrence Berkeley National Laboratory (LBNL)), Franck Cappello (Argonne National Laboratory (ANL)), James Ahrens (Los Alamos National Laboratory (LANL)), and Dingwen Tao (Indiana University)
  • ANT-MOC: Scalable Neutral Particle Transport Using 3D Method of Characteristics on Multi-GPU Systems Shunde Li and Zongguo Wang (Computer Network Information Center, Chinese Academy of Sciences); Lingkun Bu (National Center for Materials Service Safety, University of Science and Technology Beijing); Jue Wang and Zhikuang Xin (Computer Network Information Center, Chinese Academy of Sciences); Shigang Li (School of Computer Science, Beijing University of Posts and Telecommunications); Yangang Wang and Yangde Feng (Computer Network Information Center, Chinese Academy of Sciences); Peng Shi (National Center for Materials Service Safety, University of Science and Technology Beijing); Yun Hu (China Institute of Atomic Energy); and Xuebin Chi (Computer Network Information Center, Chinese Academy of Sciences)
  • Application Performance Modeling via Tensor Completion Edward Hutter and Edgar Solomonik (University of Illinois)
  • Automated Mapping of Task-Based Programs onto Distributed and Heterogeneous Machines Thiago S. F. X. Teixeira (Stanford University), Alexandra Henzinger (Massachusetts Institute of Technology (MIT)), and Rohan Yadav and Alex Aiken (Stanford University)
  • Automatic Generation of Distributed-Memory Mappings for Tensor Computations Martin Kong, Raneem Abu Yosef, and Atanas Rountev (Ohio State University) and P. Sadayappan (University of Utah)
  • Big Data Assimilation: Real-Time 30-Second-Refresh Heavy Rain Forecast Using Fugaku during Tokyo Olympics and Paralympics Takemasa Miyoshi, Arata Amemiya, Shigenori Otsuka, Yasumitsu Maejima, and James Taylor (RIKEN); Takumi Honda (Hokkaido University, Japan; RIKEN); Hirofumi Tomita Tomita and Seiya Nishizawa (RIKEN); Kenta Sueki (RIKEN, Meteorological Research Institute); Tsuyoshi Yamaura (RIKEN); Yutaka Ishikawa (National Institute of Informatics); Shinsuke Satoh (National Institute for Information and Communications Technology); Tomoo Ushio (Osaka University); Kana Koike (MTI Ltd.); and Atsuya Uno (RIKEN, National Research Institute for Earth Science and Disaster Resilience)
  • BLAD: Adaptive Load Balanced Scheduling and Operator Overlap Pipeline for Accelerating the Dynamic GNN Training Kaihua Fu, Quan Chen, Yuzhuo Yang, Jiuchen Shi, Chao Li, and Minyi Guo (Shanghai Jiao Tong University)
  • Breaking Boundaries: Distributed Domain Decomposition with Scalable Physics-Informed Neural PDE Solvers Arthur Feeney, Zitong Li, Ramin Bostanabad, and Aparna Chandramowlishwaran (University of California, Irvine)
  • Bringing Order to Sparsity: A Sparse Matrix Reordering Study on Multicore CPUs James D. Trotter (Simula Research Laboratory); Sinan Ekmekçibaşı (Istanbul University - Cerrahpaşa); Johannes Langguth (Simula Research Laboratory; University of Bergen, Norway); Tugba Torun and Emre Düzakın (Koç University); Aleksandar Ilic (INESC-ID, IST, University of Lisbon); and Didem Unat (Koç University, Turkey)
  • Calculon: a Methodology and Tool for High-Level Codesign of Systems and Large Language Models Mikhail Isaev (Georgia Institute of Technology), Nic McDonald and Larry Dennison (NVIDIA Corporation), and Richard Vuduc (Georgia Institute of Technology)
  • Choosing the Best Parallelization and Implementation Styles for Graph Analytics Codes: Lessons Learned from 1106 Programs Yiqian Liu, Noushin Azami, Avery VanAusdal, and Martin Burtscher (Texas State University)
  • Cloud Computing to Enable Wearable-Driven Longitudinal Hemodynamic Maps Cyrus Tanade, Emily Rakestraw, and William Ladd (Duke University); Erik Draeger (Lawrence Livermore National Laboratory); and Amanda Randles (Duke University)
  • Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service Baolin Li (Northeastern University); Siddharth Samsi and Vijay Gadepally (Massachusetts Institute of Technology (MIT), Lincoln Laboratory); and Devesh Tiwari (Northeastern University)
  • Co-Design Hardware and Algorithm for Vector Search Wenqi Jiang (ETH Zurich); Shigang Li (Beijing University of Posts and Telecommunications); Yu Zhu, Johannes de Fine Licht, Zhenhao He, and Runbin Shi (ETH Zurich); Cedric Renggli (Apple Inc); Shuai Zhang (ETH Zurich); Theodoros Rekatsinas (Apple Inc); and Torsten Hoefler and Gustavo Alonso (ETH Zurich)
  • cuSZp: An Ultra-Fast GPU Error-Bounded Lossy Compression Framework with Optimized End-to-End Performance Yafan Huang (University of Iowa), Sheng Di (Argonne National Laboratory (ANL)), Xiaodong Yu (Stevens Institute of Technology), Guanpeng Li (University of Iowa), and Franck Cappello (Argonne National Laboratory (ANL))
  • DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multiplication Yuechen Lu and Weifeng Liu (China University of Petroleum-Beijing)
  • Data Flow Lifecycles for Optimizing Workflow Coordination Hyungro Lee and Luanzheng Guo (Pacific Northwest National Laboratory (PNNL)), Meng Tang (Illinois Institute of Technology), Jesun Firoz and Nathan Tallent (Pacific Northwest National Laboratory (PNNL)), and Anthony Kougkas and Xian-He Sun (Illinois Institute of Technology)
  • Demystifying and Mitigating Cross-Layer Deficiencies of Soft Error Protection in Instruction Duplication Zhengyang He and Yafan Huang (University of Iowa), Hui Xu (Fudan University), Dingwen Tao (Indiana University), and Guanpeng Li (University of Iowa)
  • Design Considerations and Analysis of Multi-Level Erasure Coding in Large-Scale Data Centers Meng Wang, Jiajun Mao, and Rajdeep Rana (University of Chicago); John Bent (Los Alamos National Laboratory (LANL)); Serkay Olmez (Seagate Research); Anjus George (Oak Ridge National Laboratory (ORNL)); Garrett Wilson Ransom (Los Alamos National Laboratory (LANL)); Jun Li (CUNY Queens College and Graduate Center); and Haryadi S. Gunawi (University of Chicago)
  • DGAP: Efficient Dynamic Graph Analysis on Persistent Memory Abdullah Al Raqibul Islam and Dong Dai (University of North Carolina, Charlotte)
  • DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training Hongkuan Zhou (University of Southern California (USC)); Da Zheng, Xiang Song, and George Karypis (AWS AI); and Viktor Prasanna (University of Southern California (USC))
  • DPS: Adaptive Power Management for Overprovisioned Systems Jianru Ding and Henry Hoffmann (University of Chicago)
  • EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs Mingzhen Li (Beihang University), Wencong Xiao (Unaffiliated), Hailong Yang and Biao Sun (Beihang University), Hanyu Zhao and Shiru Ren (Unaffiliated), Zhongzhi Luan (Beihang University), Xianyan Jia (Unaffiliated), Yi Liu (Beihang University), Yong Li and Wei Lin (Unaffiliated), and Depei Qian (Beihang University)
  • Efficient Maximal Biclique Enumeration on GPUs Zhe Pan, Shuibing He, and Xu Li (Zhejiang University); Xuechen Zhang (Washington State University, Vancouver); and Rui Wang and Gang Chen (Zhejiang University)
  • Embracing Irregular Parallelism in HPC with YGM Trevor Steil (Lawrence Livermore National Laboratory), Tahsin Reza (University of Waterloo), and Benjamin Priest and Roger Pearce (Lawrence Livermore National Laboratory)
  • Enabling Real World Scale Structural Superlubricity All-Atom Simulation on the Next-Generation Sunway Supercomputer Xiaohui Duan (Shandong University, National Supercomputing Center in Wuxi); Jin Wang (International School for Advanced Studies, Italy); Ping Gao (Tsinghua University, National Supercomputing Center in Wuxi); Ming Ma (Tsinghua University); Lin Gan (Tsinghua University, National Supercomputing Center in Wuxi); Xin Liu (National Supercomputing Center in Wuxi); Haohuan Fu and Wei Xue (Tsinghua University, National Supercomputing Center in Wuxi); Dexun Chen (National Supercomputing Center in Wuxi); Guangwen Yang (Tsinghua University, National Supercomputing Center in Wuxi); and Weiguo Liu (Shandong University, National Supercomputing Center in Wuxi)
  • Enhance the Strong Scaling of LAMMPS on Fugaku Jianxiong Li, Tong Zhao, Zhuoqiang Guo, and Shunchen Shi (State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences); Lijun Liu (Department of Mechanical Engineering, Graduate School of Engineering, Osaka University.); and Guangming Tan, Weile Jia, Guojun Yuan, and Zhan Wang (State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences)
  • Enhancing Adaptive Physics Refinement Simulations through the Addition of Realistic Red Blood Cell Counts Sayan Roychowdhury, Samreen T. Mahmud, Aristotle Martin, Peter Balogh, and Daniel F. Puleri (Duke University); John Gounley (Oak Ridge National Laboratory (ORNL)); Erik W. Draeger (Lawrence Livermore National Laboratory); and Amanda Randles (Duke University)
  • Establishing a Modeling System in 3-km Horizontal Resolution for Global Atmospheric Circulation Triggered by Submarine Volcanic Eruptions with 200 Billion Smoothed Particles Hydrodynamics Junshi Chen and Shenghong Huang (University of Science and Technology of China (USTC), Laoshan Laboratory); Ziyu Zhang, Xiaoyu Hao, and Jun Gu (University of Science and Technology of China); Hong An, Chun Zhao, and Yan Hu (University of Science and Technology of China (USTC), Laoshan Laboratory); Zhanming Wang, Longkui Chen, Yifan Luo, Jineng Yao, Yi Zhang, Yang Zhao, and Zhihao Wang (University of Science and Technology of China); Dongning Jia and Zhao Jin (Laoshan Laboratory); Changming Song and Xisheng Luo (University of Science and Technology of China); and Xiaobin He and Dexun Chen (National Research Center of Parallel Computer Engineering and Technology, China)
  • Exascale Multiphysics Nuclear Reactor Simulations for Advanced Designs Elia Merzari (Pennsylvania State University); Steven Hamilton and Tom Evans (Oak Ridge National Laboratory (ORNL)); Misun Min (Argonne National Laboratory (ANL)); Paul Fischer (University of Illinois); Stefan Kerkemeier, Jun Fang, and Paul Romano (Argonne National Laboratory (ANL)); Yu-Hsiang Lan (University of Illinois); Malachi Phillips (University of Illinois, Sandia National Laboratories); Elliott Biondo and Katherine Royston (Oak Ridge National Laboratory (ORNL)); Tim Warburton (Virginia Tech); Noel Chalmers (Advanced Micro Devices (AMD) Inc); and Thilina Rathnayake (University of Illinois)
  • Experiences Readying Applications for Exascale Nicholas Malaya (Advanced Micro Devices (AMD) Inc); Bronson Messer (Oak Ridge National Laboratory (ORNL)); Joseph Glenski (Hewlett Packard Enterprise (HPE)); Antigoni Georgiadou, Justin Lietz, and Kalyana Gottiparthi (Oak Ridge National Laboratory (ORNL)); Marc Day (National Renewable Energy Laboratory (NREL)); Jackie Chen (Sandia National Laboratories); Jon Rood and Lucas Esclapez (National Renewable Energy Laboratory (NREL)); James White III (Hewlett Packard Enterprise (HPE)); Gustav R. Jansen (Oak Ridge National Laboratory (ORNL)); Nicholas Curtis (AMD Research); Stephen Nichols (Oak Ridge National Laboratory (ORNL)); Jakub Kurzak, Noel Chalmers, Chip Freitag, Paul Bauman, and Alessandro Fanfarillo (AMD Research); Reuben D. Budiardja and Thomas Papatheodore (Oak Ridge National Laboratory (ORNL)); Nicholas Frontiere (Argonne National Laboratory (ANL)); Damon McDougall (AMD Research); Matthew Norman, Sarat Sreepathi, Philip Roth, and Dmytro Bykov (Oak Ridge National Laboratory (ORNL)); Noah Wolfe and Paul Mullowney (AMD Research); Markus Eisenbach (Oak Ridge National Laboratory (ORNL)); Marc T. Henry de Frahan (National Renewable Energy Laboratory (NREL)); and Wayne Joubert (Oak Ridge National Laboratory (ORNL))
  • Experimental Evaluation of Xanadu X8 Photonic Quantum Computer: Error Measurement, Characterization, and Implications Aditya Ranjan (Northeastern University); Tirthak Patel (Rice University); and Harshitta Gandhi, Daniel Silver, William Cutler, and Devesh Tiwari (Northeastern University)
  • Exploring the Ultimate Regime of Turbulent Rayleigh–Bénard Convection through Unprecedented Spectral-Element Simulations Niclas Jansson, Martin Karp, Adalberto Perez, and Timofey Mukha (KTH Royal Institute of Technology); Yi Ju (Max Planck Computing and Data Facility); Jiahui Liu and Szilárd Páll (KTH Royal Institute of Technology); Erwin Laure (Max Planck Computing and Data Facility); Tino Weinkauf (KTH Royal Institute of Technology); Jörg Schumacher (Technische Universität Ilmenau); Philipp Schlatter (Friedrich-Alexander-Universität (FAU) Erlangen-Nürnberg; KTH Royal Institute of Technology, Sweden); and Stefano Markidis (KTH Royal Institute of Technology)
  • FASDA: An FPGA-Aided, Scalable, and Distributed Accelerator for Range-Limited Molecular Dynamics Chunshu Wu (Boston University); Tong Geng (University of Rochester); Anqi Guo, Sahan Bandara, and Pouya Haghi (Boston University); Chuan Liu (University of Rochester); Ang Li (Pacific Northwest National Laboratory (PNNL)); and Martin Herbordt (Boston University)
  • Fine-Grained Policy-Driven I/O Sharing for Burst Buffers Ed Karrels (University of Illinois), Lei Huang (Texas Advanced Computing Center (TACC)), Yuhong Kan and Ishank Arora (University of Texas), Yinzhi Wang (Texas Advanced Computing Center (TACC)), Daniel S. Katz and William Gropp (University of Illinois), and Zhao Zhang (Texas Advanced Computing Center (TACC))
  • FISCO-BCOS: An Enterprise-Grade Permissioned Blockchain System with High-Performance Huizhong Li (ICT/CAS, UCAS; WeBank Blockchain Team); Yujie Chen, Xiang Shi, Xingqiang Bai, Nan Mo, Wenlin Li, Rui Guo, and Zhang Wang (WeBank Blockchain Team); and Yi Sun (ICT/CAS, UCAS)
  • FORGE: Pre-Training Open Foundation Models for Science Junqi Yin, Sajal Dash, Feiyi Wang, and Mallikarjun Shankar (Oak Ridge National Laboratory (ORNL))
  • Frontier: Exploring Exascale Scott Atchley and Christopher Zimmer (Oak Ridge National Laboratory (ORNL)); John Lange (Oak Ridge National Laboratory (ORNL), University of Pittsburgh); David Bernholdt, Veronica Melesse Vergara, Thomas Beck, Michael Brim, and Reuben Budiardja (Oak Ridge National Laboratory (ORNL)); Sunita Chandrasekaran (University of Delaware); Markus Eisenbach, Thomas Evans, and Matthew Ezell (Oak Ridge National Laboratory (ORNL)); Nicholas Frontiere (Argonne National Laboratory (ANL)); Antigoni Georgiadou (Oak Ridge National Laboratory (ORNL)); Joe Glenski (Hewlett Packard Enterprise (HPE)); Philipp Grete (University of Hamburg); Steven Hamilton and John Holmen (Oak Ridge National Laboratory (ORNL)); Axel Huebl (Lawrence Berkeley National Laboratory (LBNL)); Daniel Jacobson and Wayne Joubert (Oak Ridge National Laboratory (ORNL)); Kim McMahon (Hewlett Packard Enterprise (HPE)); Elia Merzari (Pennsylvania State University); Stan Moore (Sandia National Laboratories); Andrew Myers (Lawrence Berkeley National Laboratory (LBNL)); Stephen Nichols, Sarp Oral, and Thomas Papatheodore (Oak Ridge National Laboratory (ORNL)); Danny Perez (Los Alamos National Laboratory (LANL)); David M. Rogers (Oak Ridge National Laboratory (ORNL)); Evan Schneider (University of Pittsburgh); Jean-Luc Vay (Lawrence Berkeley National Laboratory (LBNL)); and P. K. Yeung (Georgia Institute of Technology)
  • FuzzyFlow: Leveraging Dataflow to Find and Squash Program Optimization Bugs Philipp Schaad, Timo Schneider, Tal Ben-Nun, Alexandru Calotoiu, Alexandros Nikolaos Ziogas, and Torsten Hoefler (ETH Zurich)
  • A GPU Algorithm for Detecting Strongly Connected Components Ghadeer Alabandi (Texas State University); William Sands and George Biros (University of Texas, Oden Institute); and Martin Burtscher (Texas State University)
  • Graph3PO: A Temporal Graph Data Processing Method for Latency QoS Guarantee in Object Cloud Storage System Wang Zhang, Zhan Shi, Ziyi Liao, and Yiling Li (Huazhong University of Science and Technology (HUST)); Yu Du (Alibaba Group); and Yutong Wu, Fang Wang, and Dan Feng (Huazhong University of Science and Technology (HUST))
  • The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores Maciej Besta, Robert Gerstenberger, and Marc Fischer (ETH Zurich); Michał Podstawski (TCL Eagle Lab, Warsaw University of Technology); Nils Blach, Berke Egeli, and Georgy Mitenkov (ETH Zurich); Wojciech Chlapek (ICM UW); Marek Michalewicz (Sano Centre for Computational Medicine); Hubert Niewiadomski (Cledar); Jürgen Müller (BASF SE); and Torsten Hoefler (ETH Zurich)
  • GRAPHINE: Enhanced Neutral Atom Quantum Computing Using Application-Specific Rydberg Atom Arrangement Tirthak Patel (Rice University) and Daniel Silver and Devesh Tiwari (Northeastern University)
  • GraphSet: High Performance Graph Mining through Equivalent Set Transformations Tianhui Shi, Jidong Zhai, Haojie Wang, Qiqian Chen, Mingshu Zhai, Zixu Hao, Haoyu Yang, and Wenguang Chen (Tsinghua University, China)
  • GreenNFV: Energy-Efficient Network Function Virtualization with Service Level Agreement Constraints Md S. Q. Zulkar Nine and Tevfik Kosar (University at Buffalo), Muhammed Fatih Bulut (IBM TJ Watson Research Center), and Jinho Hwang (Meta)
  • Hanayo: Harnessing Wave-Like Pipeline Parallelism for Enhanced Large Model Training Efficiency Ziming Liu, Shenggan Cheng, Haotian Zhou, and Yang You (National University of Singapore)
  • HEAR: Homomorphically Encrypted Allreduce Marcin Chrapek, Mikhail Khalilov, and Torsten Hoefler (ETH Zurich)
  • High Throughput Training of Deep Surrogates from Large Ensemble Runs Lucas Meyer (National Institute for Research in Digital Science and Technology, Electricité de France); Marc Schouler and Robert Alexander Caulk (National Institute for Research in Digital Science and Technology); Alejandro Ribes (Electricité de France); and Bruno Raffin (National Institute for Research in Digital Science and Technology)
  • High-Performance and Programmable Attentional Graph Neural Networks with Global Tensor Formulations Maciej Besta (ETH Zurich); Paweł Renc (AGH-UST, Sano Centre for Computational Medicine); Robert Gerstenberger (ETH Zurich); Paolo Sylos Labini (Free University of Bozen-Bolzano, ETH Zurich); Alexandros Ziogas, Tiancheng Chen, Lukas Gianinazzi, Florian Scheidl, Kalman Szenes, Armon Carigiet, and Patrick Iff (ETH Zurich); Grzegorz Kwasniewski (NextSilicon); Raghavendra Kanakagiri (University of Illinois); Chio Ge and Sammy Jaeger (ETH Zurich); Jarosław Wąs (AGH-UST); Flavio Vella (University of Trento); and Torsten Hoefler (ETH Zurich)
  • A High-Performance MST Implementation for GPUs Alex Fallin, Andres Gonzalez, Jarim Seo, and Martin Burtscher (Texas State University)
  • High-Performance SVD Partial Spectrum Computation David Keyes and Hatem Ltaief (King Abdullah University of Science and Technology (KAUST)); Yuji Nakatsukasa (Mathematical Institute University of Oxford); and Dalal Sukkari (University of Tennessee, Innovative Computing Laboratory (ICL))
  • HPAC-Offload: Accelerating HPC Applications with Portable Approximate Computing on the GPU Zane Fink (Lawrence Livermore National Laboratory, University of Illinois) and Konstantinos Parasyris, Giorgis Georgakoudis, and Harshitha Menon (Lawrence Livermore National Laboratory)
  • I/O in WRF: A Case Study in Modern Parallel I/O Techniques Zanhua Huang, Kaiyuan Hou, Ankit Agrawal, and Alok Choudhary (Northwestern University); Robert Ross (Argonne National Laboratory (ANL)); and Wei-Keng Liao (Northwestern University)
  • Interference-Aware Multiplexing for Deep Learning in GPU Clusters: A Middleware Approach Wenyan Chen (University of Macau; Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences); Zizhao Mo and Huanle Xu (University of Macau); Kejiang Ye (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences); and Chengzhong Xu (University of Macau)
  • Itoyori: Reconciling Global Address Space and Global Fork-Join Task Parallelism Shumpei Shiina and Kenjiro Taura (University of Tokyo)
  • Large-Scale Materials Modeling at Quantum Accuracy: Ab Initio Simulations of Quasicrystals and Interacting Extended Defects in Metallic Alloys Sambit Das, Bikash Kanungo, and Vishal Subramanian (University of Michigan); Gourab Panigrahi and Phani Motamarri (Indian Institute of Science); David Rogers (Oak Ridge National Laboratory (ORNL)); and Paul Zimmerman and Vikram Gavini (University of Michigan)
  • Large-Scale Simulation of Structural Dynamics Computing on GPU Clusters Yumeng Shi (Computer Network Information Center, Chinese Academy of Sciences); Ningming Nie (Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences); Shunde Li and Jue Wang (Computer Network Information Center, Chinese Academy of Sciences); Kehao Lin (Hangzhou Dianzi University; Computer Network Information Center, Chinese Academy of Sciences); Chunbao Zhou (Computer Network Information Center, Chinese Academy of Sciences); Shigang Li (School of Computer Science, Beijing University of Posts and Telecommunications); Kehan Yao (Hangzhou Dianzi University); Yangde Feng (Computer Network Information Center, Chinese Academy of Sciences); Yan Zeng (Hangzhou Dianzi University); Fang Liu and Yangang Wang (Computer Network Information Center, Chinese Academy of Sciences); and Yue Gao (China Institute of Atomic Energy)
  • Legate Sparse: Distributed Sparse Computing in Python Rohan Yadav (Stanford University); Wonchan Lee, Melih Elibol, Manolis Papadakis, Taylor Lee-Patti, and Michael Garland (NVIDIA Corporation); Alex Aiken and Fredrik Kjolstad (Stanford University); and Michael Bauer (NVIDIA Corporation)
  • Leveraging the Compute Power of Two HPC Systems for Higher-Dimensional Grid-Based Simulations with the Widely-Distributed Sparse Grid Combination Technique Theresa Pollinger, Alexander Van Craen, Christoph Niethammer, Marcel Breyer, and Dirk Pflüger (University of Stuttgart)
  • MBFGraph: An SSD-Based External Graph System for Evolving Graphs Chun-Yi Liu (Micron Technology Inc), Wonil Choi (Hanyang University), and Soheil Khadirsharbiyani and Mahmut Kandemir (Pennsylvania State University)
  • Mirage: Toward Low-interruption Services on Batch GPU Clusters with Reinforcement Learning Qiyang Ding (University of Texas), Pengfei Zheng (University of Wisconsin), Shreyas Kudari (University of Texas), Shivaram Venkataraman (University of Wisconsin), and Zhao Zhang (Texas Advanced Computing Center (TACC))
  • Mitigating Coupling Map Constrained Correlated Measurement Errors on Quantum Devices Alan Robertson and Shuaiwen Song (University of Sydney)
  • NNQS-Transformer: An Efficient and Scalable Neural Network Quantum States Approach for Ab Initio Quantum Chemistry Yangjun Wu (Institute of Computing Technology, Chinese Academy of Sciences); Chu Guo (Hunan Normal University); Yi Fan (University of Science and Technology of China); Pengyu Zhou (Institute of Computing Technology, Chinese Academy of Sciences); and Honghui Shang (University of Science and Technology of China)
  • Optimizing Direct Convolutions on ARM Multi-Cores Pengyu Wang, Weiling Yang, Jianbin Fang, Dezun Dong, Chun Huang, Peng Zhang, and Tao Tang (National University of Defense Technology (NUDT), China) and Zheng Wang (School of Computing, University of Leeds, United Kingdom)
  • Optimizing High-Performance Linpack for Exascale Accelerated Architectures Noel Chalmers, Jakub Kurzak, Damon McDougall, and Paul Bauman (Advanced Micro Devices (AMD) Inc)
  • Optimizing MPI Collectives on Shared Memory Multi-Cores Jintao Peng, Jianbin Fang, Jie Liu, Min Xie, Yi Dai, Bo Yang, and Shengguo Li (National University of Defense Technology (NUDT), China) and Zheng Wang (School of Computing at the University of Leeds)
  • Optimizing Reconfigurable Optical Datacenters: The Power of Randomization Marcin Bienkowski (University of Wroclaw), David Fuchssteiner (University of Vienna), and Stefan Schmid (Technical University of Berlin)
  • PanguLU: A Scalable Regular Two-Dimensional Block-Cyclic Sparse Direct Solver on Distributed Heterogeneous Systems Xu Fu, Bingbin Zhang, Tengcheng Wang, Wenhao Li, Yuechen Lu, Enxin Yi, Jianqi Zhao, Xiaohan Geng, Fangying Li, Jingwen Zhang, Zhou Jin, and Weifeng Liu (China University of Petroleum-Beijing)
  • Parallel Top-K Algorithms on GPU: A Comprehensive Study and New Methods Jingrong Zhang, Akira Naruse, Xipeng Li, and Yong Wang (NVIDIA Corporation)
  • PeeK: A Prune-Centric Approach for K Shortest Path Computation Wang Feng (University of North Texas), Shiyang Chen and Hang Liu (Rutgers University), and Yuede Ji (University of North Texas)
  • Phases, Modalities, Spatial and Temporal Locality: Domain Specific ML Prefetcher for Accelerating Graph Analytics Pengmiao Zhang (University of Southern California (USC)), Rajgopal Kannan (DEVCOM Army Research Lab), and Viktor K. Prasanna (University of Southern California (USC))
  • Portable and Scalable All-Electron Quantum Perturbation Simulations on Exascale Supercomputers Zhikun Wu, Yangjun Wu, and Ying Liu (Institute of Computing Technology, Chinese Academy of Sciences); Honghui Shang (University of Science and Technology of China); Yingxiang Gao (National Supercomputer Center in Tianjin); Zhongcheng Zhang and Yuyang Zhang (Institute of Computing Technology, Chinese Academy of Sciences); Yingchi Long (Institute of Computing Technology, Chinese Academy of Sciences; Harbin Institute of Technology); and Xiaobing Feng and Huiming Cui (Institute of Computing Technology, Chinese Academy of Sciences)
  • Prodigy: Toward Unsupervised Anomaly Detection in Production HPC Systems Burak Aksar (Boston University, Sandia National Laboratories); Efe Sencan (Boston University); Benjamin Schwaller, Omar Aaziz, Vitus J. Leung, and Jim Brandt (Sandia National Laboratories); and Brian Kulis, Manuel Egele, and Ayse K. Coskun (Boston University)
  • A Quantitative Approach for Adopting Disaggregated Memory in HPC Systems Jacob Wahlgren and Gabin Schieffer (KTH Royal Institute of Technology), Maya Gokhale (Lawrence Livermore National Laboratory), and Ivy B. Peng (KTH Royal Institute of Technology)
  • Rapid Simulations of Atmospheric Data Assimilation of Hourly-Scale Phenomena with Modern Neural Networks Yiyuan Li, Xiting Ju, Yi Xiao, Qilong Jia, and Yongxiao Zhou (Tsinghua University, China); Simeng Qian (National Supercomputing Center in Wuxi); Rongfen Lin (National Research Center of Parallel Computer Engineering and Technology, China); Bin Yang (Tsinghua University, China); Shupeng Shi (National Supercomputing Center in Wuxi); Xin Liu, Jie Gao, Zhen Wang, Sha Liu, Jian Tan, and Xuan Wang (National Research Center of Parallel Computer Engineering and Technology); Zhengding Hu (University of Science and Technology of China); Limin Yan (Beijing Sankuai Online Technology Co, Ltd; National Supercomputing Center in Wuxi); and Wei Xue (Tsinghua University, China; Department of Computer Technology and Application, Qinghai University)
  • ReFloat: Low-Cost Floating-Point Processing in ReRAM for Accelerating Iterative Linear Solvers Linghao Song (University of California Los Angeles (UCLA)), Fan Chen (Indiana University), and Hai Li and Yiran Chen (Duke University)
  • Rethinking Deployment for Serverless Functions: A Performance-First Perspective Yiming Li, Laiping Zhao, Yanan Yang, and Wenyu Qu (Tianjin University)
  • Runtime Composition of Iterations for Fusing Loop-Carried Sparse Dependence Kazem Cheshmi (McMaster University); Michelle Mills Strout (University of Arizona, Hewlett Packard Enterprise (HPE)); and Maryam Mehri Dehnavi (University of Toronto)
  • Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay Konstantinos Parasyris and Giorgis Georgakoudis (Lawrence Livermore National Laboratory), Esteban Rangel (Argonne National Laboratory (ANL)), and Ignacio Laguna and Johannes Doerfert (Lawrence Livermore National Laboratory)
  • Scaling the “Memory Wall” for Multi-Dimensional Seismic Processing with Algebraic Compression on Cerebras CS-2 Systems David Keyes, Hatem Ltaief, and Yuxi Hong (King Abdullah University of Science and Technology (KAUST)); Leighton Wilson and Mathias Jacquelin (Cerebras Systems, Inc.); and Matteo Ravasi (King Abdullah University of Science and Technology (KAUST))
  • Scaling the Leading Accuracy of Deep Equivariant Models to Biomolecular Simulations of Realistic Size Boris Kozinsky, Albert Musaelian, Anders Johansson, and Simon Batzner (Harvard University)
  • The Simple Cloud-Resolving E3SM Atmosphere Model Running on the Frontier Exascale System Mark Taylor (Sandia National Laboratories); Peter M. Caldwell (Lawrence Livermore National Laboratory); Luca Bertagna and Conrad Clevenger (Sandia National Laboratories); Aaron S. Donahue (Lawrence Livermore National Laboratory); James G. Foucar, Oksana Guba, and Benjamin R. Hillman (Sandia National Laboratories); Noel Keen (Lawrence Berkeley National Laboratory (LBNL)); Jayesh Krishna (Argonne National Laboratory); Matthew R. Norman and Sarat Sreepathi (Oak Ridge National Laboratory (ORNL)); Christopher R. Terai (Lawrence Livermore National Laboratory); James B. White III (Hewlett Packard Enterprise (HPE)); Danqing Wu (Argonne National Laboratory (ANL)); Andrew G. Salinger (Sandia National Laboratories); Renata B. McCoy (Lawrence Livermore National Laboratory); L. Ruby Leung (Pacific Northwest National Laboratory (PNNL)); and David C. Bader (Lawrence Livermore National Laboratory)
  • Space Efficient Sequence Alignment for SRAM-Based Computing: X-Drop on the Graphcore IPU Luk Burchard (Simula Research Laboratory); Max Xiaohang Zhao (Charité Universitätsmedizin Berlin); Johannes Langguth (Simula Research Laboratory, University of Bergen); Aydın Buluç (Lawrence Berkeley National Laboratory (LBNL)); and Giulia Guidi (Cornell University)
  • Structural Coding: A Low-Cost Scheme to Protect CNNs from Large-Granularity Memory Faults Ali Asgari Khoshouyeh (University of British Columbia); Florian Geissler, Seyed Qutub, and Michael Paulitsch (Intel Corporation); and Prashant Nair and Karthik Pattabiraman (University of British Columbia)
  • SYnergy: Fine-Grained Energy-Efficient Heterogeneous Computing for Scalable Energy Saving Kaijie Fan (TU Berlin, University of Salerno); Marco D'Antonio, Lorenzo Carpentieri, and Biagio Cosenza (University of Salerno); and Federico Ficarelli and Daniele Cesarini (CINECA)
  • TANGO: Re-Thinking Quantization for Graph Neural Network Training on GPUs Shiyang Chen (Rutgers University); Da Zheng (Amazon); Caiwen Ding (University of Connecticut); Chengying Huan (Institute of Software, Chinese Academy of Sciences); Yuede Ji (University of North Texas); and Hang Liu (Rutgers University)
  • Toward Exascale Computation for Turbomachinery Flows Yuhang Fu, Weiqi Shen, Jiahuan Cui, and Yao Zheng (Zhejiang University); Guangwen Yang and Zhao Liu (Tsinghua University, China; National Supercomputing Center in Wuxi); Jifa Zhang, Tingwei Ji, and Fangfang Xie (Zhejiang University); Xiaojing Lv, Hanyue Liu, and Xu Liu (National Supercomputing Center in Wuxi); Xiyang Liu and Xiaoyu Song (Taiyuan University of Technology); Guocheng Tao (Zhejiang University); Yan Yan (Xi’an Jiaotong-Liverpool University); Paul Tucker (University of Cambridge); Steven Miller (University of Florida); Shirui Luo and Seid Koric (University of Illinois); and Weimin Zheng (Tsinghua University)
  • Toward Sustainable HPC: Carbon Footprint Estimation and Environmental Implications of HPC Systems Baolin Li, Rohan Basu Roy, and Daniel Wang (Northeastern University); Siddharth Samsi and Vijay Gadepally (Massachusetts Institute of Technology (MIT), Lincoln Laboratory); and Devesh Tiwari (Northeastern University)
  • TrivialSpy: Identifying Software Triviality via Fine-Grained and Dataflow-Based Value Profiling Xin You, Hailong Yang, Kelun Lei, Zhongzhi Luan, and Depei Qian (Beihang University)
  • Understanding the Effects of Permanent Faults in GPU’s Parallelism Management and Control Units Juan David Guerrero Balaguera and Josie Esteban Rodriguez Condia (Politecnico di Torino); Fernando Fernandes dos Santos (University of Rennes, Inria Rennes - Bretagne Atlantique Research Centre); Matteo Sonza Reorda (Politecnico di Torino); and Paolo Rech (University of Trento)
  • Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters Yang Liu and Nan Ding (Lawrence Berkeley National Laboratory (LBNL)), Piyush Sao (Oak Ridge National Laboratory (ORNL)), and Samuel Williams and Xiaoye Sherry Li (Lawrence Berkeley National Laboratory (LBNL))
  • Unity ECC: Unified Memory Protection Against Bit and Chip Errors Dongwhee Kim, Jaeyoon Lee, and Wonyeong Jung (Sungkyunkwan University); Michael Sullivan (NVIDIA Corporation); and Jungrae Kim (Sungkyunkwan University)
  • VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores Roberto L. Castro (Universidade da Coruña), Andrei Ivanov (ETH Zürich), Diego Andrade (Universidade da Coruña), Tal Ben-Nun (ETH Zürich), Basilio B. Fraguela (Universidade da Coruña), and Torsten Hoefler (ETH Zürich)
  • Xfast: Extreme File Attribute Stat Acceleration for Lustre Yingjin Qian (Data Direct Networks), Wen Cheng and Lingfang Zeng (Zhejiang Lab), Xi Li (Data Direct Networks), Marc-André Vef (Johannes Gutenberg University Mainz), Andreas Dilger and Siyao Lai (Whamcloud Inc), Shuichi Ihara (Data Direct Networks), Yong Fan (Intel Corporation), and André Brinkmann (Johannes Gutenberg University Mainz)


  • Back to SC23 Proceedings Archive