The creation of the first holistic dataset of a tier-0 Top10 supercomputer which will serve also to Graph-Massivizer project

We are pleased to announce the publication of a new paper by our esteemed partners at University of Bologna, which showcases the valuable results obtained from their collaborative work. This groundbreaking paper presents the culmination of 10 years of research efforts, highlighting significant findings related to supercomputers and the high complex data they use. The paper serves as a comprehensive resource, providing in-depth analysis, novel methodologies, and practical applications derived from the continuous study. With its publication, the authors aim to contribute to the broader scientific community and foster further advancements in the field. This significant achievement is expected to have an impact also on future research and development endeavors of our Graph-Massivizer project.

Supercomputers, which are the most advanced computing machines available to society, play a crucial role in driving economic, industrial, and societal progress. They are utilized by scientists, engineers, decision-makers, and data analysts to tackle complex problems through computational means. However, supercomputers and their accompanying data centers are intricate and power-intensive systems themselves. Enhancing their efficiency, availability, and resilience is of utmost importance and is the focus of numerous research and engineering endeavors. Nonetheless, researchers face a significant obstacle in the form of a lack of reliable data that accurately describes the behavior of operational supercomputers.

In this paper, the authors present the outcomes of a ten-year-long project aimed at developing a monitoring framework called EXAMON, which has been implemented in the Italian supercomputers at CINECA data center. They unveil the first comprehensive dataset of a tier-0 Top10 supercomputer, encompassing management, workload, facility, and infrastructure data from the Marconi100 supercomputer over a span of two and a half years of operation. This dataset, which has been published via Zenodo, represents the largest publicly available dataset to date, with an uncompressed size of 49.9TB. Additionally, the authors offer open-source software modules that simplify data access and provide practical usage examples.

Martin Molan, PhD, Universita Di Bologna

For a more in depth analysis please check out the whole paper here: https://www.nature.com/articles/s41597-023-02174-3