Hi,
we noticed an error message every time the Node exporter metrics endpoint is hit on our HPC cluster when enabling --collector.cpu.info:
level=ERROR source=http.go:175 msg="error gathering metrics: 192 error(s) occurred:
* [from Gatherer #2] collected metric node_cpu_frequency_hertz label:{name:"cpu" value:"0"} gauge:{value:2.4e+09} has help "Current CPU thread frequency in hertz." but should have "CPU frequency in hertz from /proc/cpuinfo."
[repeating for all other cores]
Weirdly enough, this only seems to happen on nodes with a relatively high amount of cores (192 in the case above), but not for our smaller nodes. Also, the metric node_cpu_frequency_hertz is exported just fine with Current CPU thread frequency in hertz as the help text. My Go knowledge is basic at best, but it seems a bit strange to me that cpufreq_common.go and cpu_linux.go seem to define a Desc with the same FQName, but with different help texts:
|
cpuFrequencyHz: prometheus.NewDesc( |
|
prometheus.BuildFQName(namespace, cpuCollectorSubsystem, "frequency_hertz"), |
|
"CPU frequency in hertz from /proc/cpuinfo.", |
|
[]string{"package", "core", "cpu"}, nil, |
|
), |
|
cpuFreqHertzDesc = prometheus.NewDesc( |
|
prometheus.BuildFQName(namespace, cpuCollectorSubsystem, "frequency_hertz"), |
|
"Current CPU thread frequency in hertz.", |
|
[]string{"cpu"}, nil, |
|
) |
The error indeed disappears when the two help texts are the same, but I don't know if that only fixes the symptom or if a better fix would be to avoid the FQName duplication in the first place.
Details about test environment
- Tested with both the EPEL version of node-exporter for Rocky Linux 9 and a fresh build against the master branch:
node_exporter, version 1.11.1 (branch: tarball, revision: 1.el9)
build user:
build date: 20260407
go version: go1.25.8 (Red Hat 1.25.8-1.el9_7)
platform: linux/amd64
tags: rpm_crashtraceback,libtrust_openssl,netgo,osusergo,static_build
node_exporter, version (branch: , revision: b9b6820db7e125a4666280cedb2c3abcd45f9327-modified)
build user:
build date:
go version: go1.25.7
platform: linux/amd64
tags: unknown
- The error message appears on a dual-socket AMD EPYC 9654 node with 192 cores, but not on a dual-socket Intel Xeon Gold 6140 node with 36 cores.
- Full commandline of node-exporter (the port was changed to avoid interfering with the normal monitoring):
prometheus-node-exporter --web.listen-address=127.0.0.1:8100 --collector.cpu.info
Hi,
we noticed an error message every time the Node exporter metrics endpoint is hit on our HPC cluster when enabling
--collector.cpu.info:Weirdly enough, this only seems to happen on nodes with a relatively high amount of cores (192 in the case above), but not for our smaller nodes. Also, the metric
node_cpu_frequency_hertzis exported just fine withCurrent CPU thread frequency in hertzas the help text. My Go knowledge is basic at best, but it seems a bit strange to me that cpufreq_common.go and cpu_linux.go seem to define a Desc with the same FQName, but with different help texts:node_exporter/collector/cpu_linux.go
Lines 102 to 106 in b9b6820
node_exporter/collector/cpufreq_common.go
Lines 23 to 27 in b9b6820
The error indeed disappears when the two help texts are the same, but I don't know if that only fixes the symptom or if a better fix would be to avoid the FQName duplication in the first place.
Details about test environment