Conversation
c91c133 to
cdeebad
Compare
Contributor
Author
--- kubernetes/apps/observability/dcgm-exporter/app Kustomization: observability/dcgm-exporter OCIRepository: observability/dcgm-exporter
+++ kubernetes/apps/observability/dcgm-exporter/app Kustomization: observability/dcgm-exporter OCIRepository: observability/dcgm-exporter
@@ -10,9 +10,9 @@
spec:
interval: 15m
layerSelector:
mediaType: application/vnd.cncf.helm.chart.content.v1.tar+gzip
operation: copy
ref:
- tag: 4.7.1
+ tag: 4.8.1
url: oci://ghcr.io/home-operations/charts-mirror/dcgm-exporter
|
Contributor
Author
--- HelmRelease: observability/dcgm-exporter ConfigMap: observability/exporter-metrics-config-map
+++ HelmRelease: observability/dcgm-exporter ConfigMap: observability/exporter-metrics-config-map
@@ -47,12 +47,13 @@
# DCGM_FI_DEV_LOW_UTIL_VIOLATION, counter, Throttling duration due to low utilization (in ns).
# DCGM_FI_DEV_RELIABILITY_VIOLATION, counter, Throttling duration due to reliability constraints (in ns).
# Memory usage
DCGM_FI_DEV_FB_FREE, gauge, Framebuffer memory free (in MiB).
DCGM_FI_DEV_FB_USED, gauge, Framebuffer memory used (in MiB).
+ DCGM_FI_DEV_FB_RESERVED, gauge, Framebuffer memory reserved (in MiB).
# ECC
# DCGM_FI_DEV_ECC_SBE_VOL_TOTAL, counter, Total number of single-bit volatile ECC errors.
# DCGM_FI_DEV_ECC_DBE_VOL_TOTAL, counter, Total number of double-bit volatile ECC errors.
# DCGM_FI_DEV_ECC_SBE_AGG_TOTAL, counter, Total number of single-bit persistent ECC errors.
# DCGM_FI_DEV_ECC_DBE_AGG_TOTAL, counter, Total number of double-bit persistent ECC errors.
@@ -75,18 +76,21 @@
# Remapped rows
DCGM_FI_DEV_UNCORRECTABLE_REMAPPED_ROWS, counter, Number of remapped rows for uncorrectable errors
DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWS, counter, Number of remapped rows for correctable errors
DCGM_FI_DEV_ROW_REMAP_FAILURE, gauge, Whether remapping of rows has failed
+ # Static configuration information. These appear as labels on the other metrics
+ DCGM_FI_DRIVER_VERSION, label, Driver Version
+
# DCP metrics
DCGM_FI_PROF_GR_ENGINE_ACTIVE, gauge, Ratio of time the graphics engine is active.
# DCGM_FI_PROF_SM_ACTIVE, gauge, The ratio of cycles an SM has at least 1 warp assigned.
# DCGM_FI_PROF_SM_OCCUPANCY, gauge, The ratio of number of warps resident on an SM.
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE, gauge, Ratio of cycles the tensor (HMMA) pipe is active.
DCGM_FI_PROF_DRAM_ACTIVE, gauge, Ratio of cycles the device memory interface is active sending or receiving data.
# DCGM_FI_PROF_PIPE_FP64_ACTIVE, gauge, Ratio of cycles the fp64 pipes are active.
# DCGM_FI_PROF_PIPE_FP32_ACTIVE, gauge, Ratio of cycles the fp32 pipes are active.
# DCGM_FI_PROF_PIPE_FP16_ACTIVE, gauge, Ratio of cycles the fp16 pipes are active.
- DCGM_FI_PROF_PCIE_TX_BYTES, counter, The number of bytes of active pcie tx data including both header and payload.
- DCGM_FI_PROF_PCIE_RX_BYTES, counter, The number of bytes of active pcie rx data including both header and payload.
+ DCGM_FI_PROF_PCIE_TX_BYTES, gauge, The rate of data transmitted over the PCIe bus - including both protocol headers and data payloads - in bytes per second.
+ DCGM_FI_PROF_PCIE_RX_BYTES, gauge, The rate of data received over the PCIe bus - including both protocol headers and data payloads - in bytes per second.
--- HelmRelease: observability/dcgm-exporter DaemonSet: observability/dcgm-exporter
+++ HelmRelease: observability/dcgm-exporter DaemonSet: observability/dcgm-exporter
@@ -61,13 +61,13 @@
add:
- SYS_ADMIN
drop:
- ALL
runAsNonRoot: false
runAsUser: 0
- image: nvcr.io/nvidia/k8s/dcgm-exporter:4.4.2-4.7.1-ubuntu22.04
+ image: nvcr.io/nvidia/k8s/dcgm-exporter:4.5.2-4.8.1-distroless
imagePullPolicy: IfNotPresent
args:
- -f
- /etc/dcgm-exporter/default-counters.csv
env:
- name: DCGM_EXPORTER_KUBERNETES
--- HelmRelease: observability/dcgm-exporter ServiceMonitor: observability/dcgm-exporter
+++ HelmRelease: observability/dcgm-exporter ServiceMonitor: observability/dcgm-exporter
@@ -19,10 +19,11 @@
matchNames:
- observability
endpoints:
- port: metrics
path: /metrics
interval: 15s
+ scrapeTimeout: 10s
honorLabels: false
relabelings: []
metricRelabelings: []
|
…cgm-exporter ( 4.7.1 ➔ 4.8.1 )
cdeebad to
7fafe64
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
4.7.1→4.8.1Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Renovate Bot.