SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

Self-Service Monitoring of HPC and Openstack Jobs for Users


Workshop: HPC Systems Professionals Workshop (HPCSYSPROS23)

Authors: Simon Guilbault (Université Laval)


Abstract: Using correctly the compute capacity of an HPC or Openstack cluster is often a stumbling block for users, especially those from non-traditional domains where a cluster is only a tool and not the subject of their research.

This paper describes a web portal called TrailblazingTurtle built for HPC and Openstack Cluster to let users view the resources used and wasted by their jobs, without having to modify their workflow. The metrics are collected from various data sources on the cluster to enable monitoring at the job and VM level and are presented to the users and staff members as a simple web application. This platform makes it easy for newer users to request the correct quantity of computing resources for their work, see their impact on the shared file system, and the evolution of the priority of their group in Slurm.





Back to HPC Systems Professionals Workshop (HPCSYSPROS23) Archive Listing



Back to Full Workshop Archive Listing