Replay with Feedback: How does the performance of HPC system impact user submission behavior? - Système d’Exploitation, systèmes Répartis, de l’Intergiciel à l’Architecture
Article Dans Une Revue Future Generation Computer Systems Année : 2024

Replay with Feedback: How does the performance of HPC system impact user submission behavior?

Résumé

High Performance Computing (HPC) is a key infrastructure to solve large scale scientific problems, from weather to quantum simulations. Scheduling jobs in HPC infrastructures is complex due to their scale, the different behaviors of their users, and the multiple objectives, from performance to ecological impact. Schedulers are evaluated on data center simulations, due to the complexity and cost of evaluating them in-situ. One key element for this evaluation is the behavioral model of users. Most studies are limited to replaying past workload of existing data centers. This reduces the realism of performance evaluation in cases where the scheduler and the hardware infrastructure are not exactly the same. Any such change would potentially impact the behavior of the users. In this article we introduce a novel model “Replay with Feedback” accounting for the impact of HPC system performances on user submission behavior in simulations. Instead of keeping the original timestamps of job submissions, we exhibit and use the relationships between each user jobs. We propose an open-source implementation of this model along with an extensive and reproducible set of experiments to assess the impact of the scheduler and infrastructure changes. We also provide new metrics adapted to the flexibility of user submission behaviors. Results show that using this model, we advance towards more realistic simulations of schedulers in HPC systems.
Fichier principal
Vignette du fichier
1-s2.0-S0167739X24000219-main.pdf (2.26 Mo) Télécharger le fichier
Origine Fichiers éditeurs autorisés sur une archive ouverte
licence

Dates et versions

hal-04432711 , version 1 (12-03-2024)

Licence

Identifiants

Citer

Maël Madon, Georges da Costa, Jean-Marc Pierson. Replay with Feedback: How does the performance of HPC system impact user submission behavior?. Future Generation Computer Systems, 2024, 155, pp.66-79. ⟨10.1016/j.future.2024.01.024⟩. ⟨hal-04432711⟩
711 Consultations
66 Téléchargements

Altmetric

Partager

More