Workshop: The 9th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-9)
Authors: Robert R. Underwood (Argonne National Laboratory (ANL), University of Chicago); Sheng Di (Argonne National Laboratory, University of Chicago); Sian Jin (Indiana University); Md Hasanur Rahman (University of Iowa); Arham Khan (University of Chicago); and Franck Cappello (Argonne National Laboratory, University of Chicago)
Abstract: Over recent years, substantial efforts have gone into developing systems to infer compression performance without running compressors. These efforts have driven down the error in the estimates, reduced their runtimes, and improved their generality. However, these efforts are uncoordinated increasing the efforts required to perform comparisons between them. There may be subtle differences in sampling approaches, and nuances to the interfaces requiring efforts to port applications between them and to reproduce experiments. Additionally, many of these methods call for substantial amounts of training data to produce reliable estimates, as well as scalable codes to perform the training. In this work, we present LibPressio-Predict -- a scalable library for use in applications using predictions of compression performance and a scalable tool LibPressio-Bench to run these experiments quickly at scale. We use this tool to evaluate 3 recent compression prediction approaches systematically with all 48 timesteps and 13 fields Hurricane Issable dataset.