Authors: Thomas Eickermann (Juelich Supercomputing Centre (JSC)), Ryousei Takano (National Institute of Advanced Industrial Science & Technology), William Thigpen (NASA), Volker Lindenstruth (Goethe University Frankfurt), Xavier Delaruelle (French Alternative Energies and Atomic Energy Commission (CEA)), Ken Wood (KLW Engineering), Axel Auweter (MEGWARE Computer), Chris Tanner (NASA), Romain Fihue (French Alternative Energies and Atomic Energy Commission (CEA)), Peter Seto (Energy Efficient HPC Working Group)
Abstract: Modular and container-based industrial structures for HPC buildings are now common. Resulting CapEx reductions include shorter design-build schedules, and commodity pricing of the structural envelope, and flexibility for expansion and upgradability are enhanced. Typical HPC life cycles for power, cooling and compute machinery are highly varied and require constant modification and renovation of facilities. Commodity structures can reduce this problem. Replacing concrete with steel, creating vertically stacked compute racks, might allow 3-D cube compute architectures with low latency communication and high accessibility for servicing. The transition from air to liquid cooling will drive this change.
Long Description: Modular and container-based industrial structures for HPC buildings are now common. Capex savings from short design-build schedules, commodity pricing of the structural envelope, flexibility for expansion and upgradability are readily available using these structures. A cubic 3-D compute architectures with low latency communication and high accessibility for servicing can be readily created with the elimination of concrete structural members and the use of steel, multi-level, stacked racks. The facility, infrastructure and computer lifecycles are better matched as compute machinery continues to evolve.
The organizers of this BoF include people who have direct experience with modular and container based HPC facilities.
- The Modular Supercomputing Facility (MSF) at NASA uses energy-efficient, self-contained modules to house its machines. The MSF has reduced water use by as much as 96% and electricity used for cooling by about 90%, compared with running the same computer resources in a traditional data center. Its modular approach makes it easy to upgrade the facility for fast-turnaround work on high-priority NASA missions.
- The AI Bridging Cloud Infrastructure (ABCI) data center in Japan is an ultra high-density data center that uses a low-cost, light weight "warehouse" with a double-structured design including internal scaffolding for racks and cooling pods. It achieves 20 times the thermal density of ordinary data centers.
- In France, the Alternative Energies and Atomic Energy Commission (CEA) has a modular HPC infrastructure. It is more rapidly deployed and extensible than a classical building. This was a joint CEA-ATOS project that comprised 72 containers prepared at the factory. It hosts an ATOS XH200 supercomputer, pre-assembled at the ATOS factory, with blades inserted in the containerised racks on premise.
- The Jülich Supercomputing Centre will deploy a modular data center for Jupiter, the 1st European Exascale System. They anticipate that pre-fabricated modules will reduce time for planning and installation; the sizing will be tailored to each system and no over-provisioning is needed; that containers can be added/changed which makes leveraging technology advances more easy. The expected life-time of a container is 15 years. Construction of a generic concrete flat slab 80m x 40m is underway now.
The primary purpose of this BoF is to bring together these early adopters of modular and container based data centers in order to build a community and to learn from each other. There has not been a forum where these sites have had an opportunity to meet each other and talk about their experiences. These sites have individually presented their work at different forums hosted by the Energy Efficient HPC Working Group [https://eehpcwg.llnl.gov/] and the European Workshop for HPC Infrastructure (hosted by PRACE) and other venues. This will be the first opportunity for a gathering of these key early adopter sites. It is also a venue for soliciting participation and interest from the broader community.
A short description of the session, along with the presentations and a synopsis of the discussion will be posted on the EE HPC WG website.
Website: https://sites.google.com/lbl.gov/eehpcwgmodulardatacenters?usp=sharing