Workshop: PDSW23: 8th International Parallel Data Systems Workshop
Authors: Dominik Scheinert, Soeren Becker, Jonathan Will, and Luis Englaender (Technical University of Berlin) and Lauritz Thamsen (University of Glasgow)
Abstract: Optimizing the underlying cluster configurations of distributed data processing frameworks can be complex and often requires performance modeling techniques due to the multitude of performance-affecting factors. While these approaches may not always be applicable due to the need for substantial training data, at the same time, data analytics jobs oftentimes share common characteristics, such as algorithm implementations, which suggest the potential for collaborative performance modeling. Current collaborative approaches, however, mainly assume a centralized storage infrastructure, which comes with its own potential drawbacks, i.e., with regard to data privacy, storage costs, or system maintenance. We envision a peer-to-peer-based data distribution layer, facilitating data sovereignty, failure resilience, and means of ad-hoc collaboration, thereby fostering cross-context resource optimization approaches for big data analytics.
Back to PDSW23: 8th International Parallel Data Systems Workshop Archive Listing