SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Research Posters Archive

Graph Based Anomaly Detection in Chimbuko: Feasible or Fallible?

Authors: Chase Phelps, Ankur Lahiry, and Tanzima Z. Islam (Texas State University) and Christopher Kelly (Brookhaven National Laboratory)

Abstract: Performance anomaly detection can aid in discovering algorithmic inefficiencies or hardware issues in an application’s environment. The Chimbuko framework monitors large-scale workflow applications in real-time and identifies function executions which deviate from accumulated statistics (performance anomalies). Performance anomalies across runs correlate with variation in execution times of an application; quicker resolution of performance anomalies caused by hardware issues improves cluster performance. Anomalous and normal executions are stored as events in Chimbuko. In this study, we investigate the applicability of graph-based deep learning methods for anomaly classification. We hypothesize that transforming data into a graph will allow correlations to be modeled, thus allowing graph-based methods to learn embeddings that can improve the effectiveness of downstream anomaly classification tasks. Our evaluations demonstrate that the graph-based methods yield up to 95% accuracy and outperform a state-of-the-art gradient-based method. Moreover, we provide an explanation of the classification model’s decision-making process through explainable AI techniques.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing