Workshop: 5th International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC)
Authors: Reid Priedhorsky, Jordan Ogas, and Claude H. (Rusty) Davis IV (Los Alamos National Laboratory (LANL)); Z. Noah Hounshel (Los Alamos National Laboratory (LANL), U. North Carolina Wilmington); Ashlyn Lee (Los Alamos National Laboratory (LANL), Colorado State University); Benjamin Stormer (Los Alamos National Laboratory (LANL), University of Texas); and R. Shane Goff (Los Alamos National Laboratory (LANL))
Abstract: A popular approach to deploying scientific applications in high performance computing (HPC) is Linux containers, which package an application and all its dependencies as a single unit. This image is built by interpreting instructions in a machine-readable recipe, which is faster with a build cache that stores instruction results for re-use. The standard approach (used e.g. by Docker and Podman) is a many-layered union filesystem, encoding differences between layers as tar archives.
We describe a new approach, implemented in Charliecloud: store changing images in a Git repository. Our experiments show this performs similarly to layered caches on both build time and disk usage, with a considerable advantage for many-instruction recipes. Our approach also has structural advantages: better diff format, lower cache overhead, and better file de-duplication. These results show that a Git-based cache for layer-free container implementations is not only possible but may outperform the layered approach on important dimensions.