You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: latest/ug/nodes/hybrid-nodes-creds.adoc
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -84,6 +84,7 @@ To run the steps in this section, the IAM principal using the {aws} console or {
84
84
**`rolesanywhere:CreateProfile`
85
85
**`iam:PassRole`
86
86
87
+
[#hybrid-nodes-creds-cloudformation]
87
88
=== {aws} CloudFormation
88
89
89
90
Install and configure the {aws} CLI, if you haven't already. See link:cli/latest/userguide/getting-started-install.html[Installing or updating to the last version of the {aws} CLI,type="documentation"].
@@ -165,7 +166,8 @@ aws cloudformation deploy \
165
166
--capabilities CAPABILITY_NAMED_IAM
166
167
----
167
168
168
-
*{aws} CLI*
169
+
[#hybrid-nodes-creds-awscli]
170
+
=== {aws} CLI
169
171
170
172
Install and configure the {aws} CLI, if you haven't already. See link:cli/latest/userguide/getting-started-install.html[Installing or updating to the last version of the {aws} CLI,type="documentation"].
Copy file name to clipboardExpand all lines: latest/ug/workloads/workloads-add-ons-available-eks.adoc
+27Lines changed: 27 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,6 +66,10 @@ You can use any of the following Amazon EKS add-ons.
66
66
|<<addons-hyperpod-observability>>
67
67
|EC2, EKS Auto Mode,
68
68
69
+
|Amazon SageMaker HyperPod training operator enables efficient distributed training on Amazon EKS clusters with advanced scheduling and resource management capabilities.
70
+
|<<addons-hyperpod-training-operator>>
71
+
|EC2, EKS Auto Mode
72
+
69
73
|A Kubernetes agent that collects and reports network flow data to Amazon CloudWatch, enabling comprehensive monitoring of TCP connections across cluster nodes.
70
74
|<<addons-network-flow>>
71
75
|EC2, EKS Auto Mode
@@ -437,6 +441,29 @@ This add-on uses the IAM roles for service accounts capability of Amazon EKS. Fo
437
441
438
442
To learn more about the add-on and its capabilities, see link:sagemaker/latest/dg/sagemaker-hyperpod-eks-cluster-observability-cluster.html["SageMaker HyperPod Observability",type="documentation"].
439
443
444
+
[#addons-hyperpod-training-operator]
445
+
== Amazon SageMaker HyperPod training operator
446
+
447
+
The Amazon SageMaker HyperPod training operator helps you accelerate generative AI model development by efficiently managing distributed training across large GPU clusters. It introduces intelligent fault recovery, hang job detection, and process-level management capabilities that minimize training disruptions and reduce costs. Unlike traditional training infrastructure that requires complete job restarts when failures occur, this operator implements surgical process recovery to keep your training jobs running smoothly.
448
+
449
+
The operator also works with HyperPod's health monitoring and observability functions, providing real-time visibility into training execution and automatic monitoring of critical metrics like loss spikes and throughput degradation. You can define recovery policies through simple YAML configurations without code changes, allowing you to quickly respond to and recover from unrecoverable training states. These monitoring and recovery capabilities work together to maintain optimal training performance while minimizing operational overhead.
450
+
451
+
The Amazon EKS add-on name is `amazon-sagemaker-hyperpod-training-operator`.
452
+
453
+
For more information, see link:sagemaker/latest/dg/sagemaker-eks-operator.html[Using the HyperPod training operatorr,type="documentation"] in the _Amazon SageMaker Developer Guide_.
454
+
455
+
=== Required IAM permissions
456
+
457
+
This add-on requires IAM permissions, and uses EKS Pod Identity.
458
+
459
+
{aws} suggests the `AmazonSageMakerHyperPodTrainingOperatorAccess` link:aws-managed-policy/latest/reference/AmazonSageMakerHyperPodTrainingOperatorAccess.html["managed policy",type="documentation"].
460
+
461
+
For more information, see link:sagemaker/latest/dg/sagemaker-eks-operator-install.html#sagemaker-eks-operator-install-operator[Installing the training operator,type="documentation"] in the _Amazon SageMaker Developer Guide_.
462
+
463
+
=== Additional information
464
+
465
+
To learn more about the add-on, see link:sagemaker/latest/dg/sagemaker-eks-operator.html["SageMaker HyperPod training operator",type="documentation"].
Copy file name to clipboardExpand all lines: latest/ug/workloads/workloads-add-ons-available-vendors.adoc
+24Lines changed: 24 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -333,6 +333,30 @@ A managed policy isn't used with this add-on.
333
333
334
334
Custom permissions aren't used with this add-on.
335
335
336
+
[#add-on-instana]
337
+
== IBM Instana
338
+
339
+
The add-on name is `instana-agent` and the namespace is `instana-agent`. IBM publishes the add-on.
340
+
341
+
For information about the add-on, see link:blogs/ibm-redhat/implement-observability-for-amazon-eks-workloads-using-the-instana-amazon-eks-add-on/[Implement observability for Amazon EKS workloads using the Instana Amazon EKS add-on,type="marketing"] and link:blogs/ibm-redhat/monitor-and-optimize-amazon-eks-costs-with-ibm-instana-and-kubecost/[Monitor and optimize Amazon EKS costs with IBM Instana and Kubecost,type="marketing"] in the {aws} Blog.
342
+
343
+
Instana Observability (Instana) offers an Amazon EKS Add-on that deploys Instana agents to Amazon EKS clusters. Customers can use this add-on to collect and analyze real-time performance data to gain insights into their containerized applications. The Instana Amazon EKS add-on provides visibility across your Kubernetes environments. Once deployed, the Instana agent automatically discovers components within your Amazon EKS clusters including nodes, namespaces, deployments, services, and pods.
0 commit comments