In this document, the EuroHPC JU Center of Excellence in Exascale CFD (CEEC) aims to provide users/ application developers with a brief overview of possibilities, limitations, and best practices for measuring energy consumption on European HPC systems. CEEC is working to reduce the energy footprint of its consortium codes on such systems by applying novel algorithmic solutions. However, in initially exploring options for collecting energy measurements on both local and European HPC systems, we found no single approach for energy measurements and the process of taking these measurements comparatively more difficult than measuring time-to-solution with e.g. basic start-end time calls. This difficulty often stems from a requirement for privileged access to specific hardware counters. Mitigation strategies for this restriction exist and enable users to collect the energy metric, but they are not widely known. We describe these strategies followed by concrete examples from CEEC on how to harvest the energy measurements. We believe this will help to increase awareness and thus utilization of energy consumption measurements in the application development process.
Furthermore, we describe several other important issues: 1) granularity and overhead of measurements since energy=power x time and 2) what is included (there multiple factors) in the number delivered by a tool/ framework/ workload manager. We strive to be concise and precise aiming to provide a glimpse of energy measurement methods as well as many references for further exploration. Our takeaway messages are
- The community/ data centers need to facilitate energy measurements on the European HPC systems and teach the community how to conduct such measurements.
- The community/ data centers need to provide transparent and easy-to-use guides on each (at least large) European HPC system, outlining the ways to collect energy measurements.
In CEEC, we are taking the first steps towards spreading these messages, aiming to create a larger consortium including experts and data centers, who can contribute to and update this document. Explore and stay tuned!