SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Technical Papers Archive

Design Considerations and Analysis of Multi-Level Erasure Coding in Large-Scale Data Centers

Authors: Meng Wang, Jiajun Mao, and Rajdeep Rana (University of Chicago); John Bent (Los Alamos National Laboratory (LANL)); Serkay Olmez (Seagate Research); Anjus George (Oak Ridge National Laboratory (ORNL)); Garrett Wilson Ransom (Los Alamos National Laboratory (LANL)); Jun Li (CUNY Queens College and Graduate Center); and Haryadi S. Gunawi (University of Chicago)

Abstract: Multi-level erasure coding (MLEC) has seen large deployments in the field, but there is no in-depth study of design considerations for MLEC at scale. In this paper, we provide comprehensive design considerations and analysis of MLEC at scale. We introduce the design space of MLEC in multiple dimensions, including various code parameter selections, chunk placement schemes, and various repair methods. We quantify their performance and durability, and show which MLEC schemes and repair methods can provide the best tolerance against independent/correlated failures and reduce repair network traffic by orders of magnitude. To achieve this, we use various evaluation strategies including simulation, splitting, dynamic programming, and mathematical modeling. We also compare the performance and durability of MLEC with other EC schemes such as SLEC and LRC and show that MLEC can provide high durability with higher encoding throughput and less repair network traffic over both SLEC and LRC.

Back to Technical Papers Archive Listing