Time based analysis of core\energy usage

Nov 8, 2022hpc

Being able to analyze the energy usage of every core in every CPU of the cluster enables us to detect jobs that are not making good use of allocated cores over time.

Here is a node that is using 1 core, but this is only a 1 dimensional snapshot in time:

Looking at the energy over time since the job started shows quite clearly that the job is single threaded, and not very intense at that. This node is undersubscribed and the parameters of this job should be adjusted:

Alternately here is a node that appears to be underutilized:

However looking at the energy usage over time shows us that the job is “bursty”, utilizing all the cores from time to time and the parameters are in fact correct.

Time based analysis of core\energy usage

UCTHPC