SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

Charliecloud’s Layer-Free, Git-Based Container Build Cache


Workshop: 5th International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC)

Authors: Reid Priedhorsky, Jordan Ogas, and Claude H. (Rusty) Davis IV (Los Alamos National Laboratory (LANL)); Z. Noah Hounshel (Los Alamos National Laboratory (LANL), U. North Carolina Wilmington); Ashlyn Lee (Los Alamos National Laboratory (LANL), Colorado State University); Benjamin Stormer (Los Alamos National Laboratory (LANL), University of Texas); and R. Shane Goff (Los Alamos National Laboratory (LANL))


Abstract: A popular approach to deploying scientific applications in high performance computing (HPC) is Linux containers, which package an application and all its dependencies as a single unit. This image is built by interpreting instructions in a machine-readable recipe, which is faster with a build cache that stores instruction results for re-use. The standard approach (used e.g. by Docker and Podman) is a many-layered union filesystem, encoding differences between layers as tar archives.

We describe a new approach, implemented in Charliecloud: store changing images in a Git repository. Our experiments show this performs similarly to layered caches on both build time and disk usage, with a considerable advantage for many-instruction recipes. Our approach also has structural advantages: better diff format, lower cache overhead, and better file de-duplication. These results show that a Git-based cache for layer-free container implementations is not only possible but may outperform the layered approach on important dimensions.





Back to 5th International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC) Archive Listing



Back to Full Workshop Archive Listing