Skip to content

[BUG]: Helm chart NIMCache templates missing nodeSelector and tolerations fields #1636

@dafinley

Description

@dafinley

Version: 26.1.2
Installation method: Kubernetes/Helm (NIM Operator mode)
Describe the bug:
The NIMCache templates in the nv-ingest Helm chart (version 26.1.2) do not render nodeSelector or tolerations fields, even when these values are provided in the Helm values file. This causes NIMCache pods to be unschedulable on clusters where GPU nodes have taints or require specific node selection.
Looking at the template templates/llama-3.2-nv-embedqa-1b-v2.yaml, the NIMService section correctly renders nodeSelector and tolerations:

kind: NIMService
spec:
  nodeSelector:
{{ toYaml .Values.nimOperator.embedqa.nodeSelector | nindent 4 }}
  tolerations:
{{ toYaml .Values.nimOperator.embedqa.tolerations | nindent 4 }}

However, the NIMCache section in the same template only renders source and storage, omitting nodeSelector and tolerations entirely:

kind: NIMCache
spec:
  source:
    ngc:
      modelPuller: "..."
      pullSecret: "..."
      authSecret: ...
  storage:
    pvc:
      ...

Expected behavior:
The NIMCache templates should include nodeSelector and tolerations fields, similar to NIMService:

kind: NIMCache
spec:
  source:
    ...
  storage:
    ...
  nodeSelector:
{{ toYaml .Values.nimOperator.embedqa.nodeSelector | nindent 4 }}
  tolerations:
{{ toYaml .Values.nimOperator.embedqa.tolerations | nindent 4 }}

Workaround:
Currently requires manually patching each NIMCache resource after Helm deployment:

kubectl patch nimcache <name> -n nim --type=merge -p '{
  "spec": {
    "nodeSelector": {"cloud.google.com/gke-nodepool": "gpu-pool"},
    "tolerations": [{"key": "nvidia.com/gpu", "operator": "Exists", "effect": "NoSchedule"}]
  }
}'

Affected templates:

  • templates/llama-3.2-nv-embedqa-1b-v2.yaml
  • templates/nemoretriever-graphic-elements-v1.yaml
  • templates/nemoretriever-ocr-v1.yaml
  • templates/nemoretriever-page-elements-v3.yaml
  • templates/nemoretriever-table-structure-v1.yaml

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions