Skip to content

SPLAT-2167: Added dedicated host support for AWS#374

Open
vr4manta wants to merge 6 commits intoopenshift:mainfrom
vr4manta:SPLAT-2167
Open

SPLAT-2167: Added dedicated host support for AWS#374
vr4manta wants to merge 6 commits intoopenshift:mainfrom
vr4manta:SPLAT-2167

Conversation

@vr4manta
Copy link

@vr4manta vr4manta commented Oct 1, 2025

SPLAT-2167

Changes

  • Added dedicated host support for AWS
  • Created new dedicated host tests
  • Fixed tests that were breaking due to updates upstream that were pulled in
  • Added missing permission ec2:DescribeInstanceTypes to cluster api credentials request

Dependencies

Notes

There seems to be a required permission that was missing for dedicated host support. When running these changes, a warning event was observed in the cluster capi operator namespace.

6s          Warning   FailedDescribeInstanceTypes   awscluster/ngirard-dh-5bb5w                           insufficient permissions to describe instance types for instance type "m6i.xlarge", falling back to the default architecture of "x86_64": operation error EC2: DescribeInstanceTypes, https response error StatusCode: 403, RequestID: 387549b4-ab58-48af-b14d-3882b6c7da52, api error UnauthorizedOperation: You are not authorized to perform this operation. User: arn:aws:iam::726924432237:user/ngirard-dh-5bb5w-openshift-cluster-api-aws-72f7q is not authorized to perform: ec2:DescribeInstanceTypes because no identity-based policy allows the ec2:DescribeInstanceTypes action

Summary by CodeRabbit

  • New Features

    • Added AWS dedicated-host support with tenancy modes ("default", "dedicated", "host"), HostID validation, and deterministic dynamic host-allocation tag ordering.
  • Security / Permissions

    • AWS credentials policy expanded to allow instance-type discovery (ec2:DescribeInstanceTypes).
  • Tests

    • Expanded conversion and fuzz tests covering host-affinity, HostID formats, dynamic host allocation, and validation scenarios.
  • Bug Fixes

    • Infra diffs now include host-affinity and HostID, enabling drift detection.
  • Chores

    • Dependency/module version updates.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 1, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 1, 2025

@vr4manta: This pull request references SPLAT-2167 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

SPLAT-2167

Changes

  • Added dedicated host support for AWS

Dependencies

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 1, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 1, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link

coderabbitai bot commented Oct 1, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds AWS dedicated-host placement support across CAPA↔MAPI conversions, extends fuzz and unit tests for placement scenarios, re-enables host field diffing in infra comparisons, bumps several module deps, and adds ec2:DescribeInstanceTypes to the credentials-request.

Changes

Cohort / File(s) Summary
Credentials Request
manifests/0000_30_cluster-api_01_credentials-request.yaml
Added ec2:DescribeInstanceTypes to AWS provider IAM actions.
CAPA → MAPI conversion
pkg/conversion/capi2mapi/aws.go, pkg/conversion/capi2mapi/aws_fuzz_test.go, pkg/conversion/capi2mapi/aws_test.go
Add dedicated-host support: tenancy constants (TenancyDefault, TenancyDedicated, TenancyHost), HostID regex/validation and new error messages, host-affinity conversion helpers, deterministic tag sorting, integrate Placement into toProviderSpec/status, and expand fuzz/tests for HostAffinity/HostID cases.
MAPI → CAPA conversion
pkg/conversion/mapi2capi/aws.go, pkg/conversion/mapi2capi/aws_fuzz_test.go, pkg/conversion/mapi2capi/aws_test.go
Implement conversion helpers mapping MAPI HostPlacement → CAPA fields (HostAffinity, HostID, DynamicHostAllocation); populate AWSMachineSpec/status from placement; add fuzzPlacement and placement tests; sort dynamic allocation tags deterministically.
Controllers — machineset / machinesync
pkg/controllers/machinesetsync/machineset_sync_controller.go, pkg/controllers/machinesync/machine_sync_mapi2capi_infrastructure.go
Removed diff-ignore options for AWS spec.hostID and spec.hostAffinity; those fields are now considered in infra template and infra machine comparisons.
Module updates
go.mod, e2e/go.mod, hack/tools/go.mod
Bumped sigs.k8s.io/cluster-api-provider-aws and openshift/api references, added a replace for github.com/openshift/api, and bumped kustomize/tool versions in hack/tools/go.mod.

Sequence Diagram(s)

sequenceDiagram
  participant CAPA as CAPA AWSMachine
  participant C2M as capi2mapi.converter
  participant MAPI as MAPI ProviderConfig/Placement
  participant M2C as mapi2capi.converter
  participant Controller as machineset/machinesync

  CAPA->>C2M: provide AWSMachineSpec (Tenancy, HostAffinity, HostID, DynamicHostAllocation)
  C2M->>MAPI: convert to ProviderConfig.Placement & ProviderStatus.DedicatedHost
  MAPI->>M2C: expose ProviderConfig.Placement
  M2C->>CAPA: convert Placement back to AWSMachineSpec/Status
  Controller->>MAPI: compute diffs (includes hostID/hostAffinity)
  Controller->>Controller: act on drift/updates
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through specs and regex with glee,

Mapped HostAffinity, HostID, tags in a spree,
Fuzzers twirled through eight playful ways,
Diffs now mind hosts by nights and days,
A tiny rabbit patch, neat as can be.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 1, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 1, 2025

@vr4manta: This pull request references SPLAT-2167 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

SPLAT-2167

Changes

  • Added dedicated host support for AWS

Dependencies

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 1, 2025
@vr4manta vr4manta force-pushed the SPLAT-2167 branch 2 times, most recently from 6355141 to 7ba97c3 Compare November 10, 2025 16:22
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 11, 2025

@vr4manta: This pull request references SPLAT-2167 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

SPLAT-2167

Changes

  • Added dedicated host support for AWS

Dependencies

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@vr4manta vr4manta force-pushed the SPLAT-2167 branch 2 times, most recently from dc53d82 to e5cbce2 Compare November 11, 2025 17:16
@vr4manta
Copy link
Author

/test all

@vr4manta
Copy link
Author

/test all

@vr4manta vr4manta force-pushed the SPLAT-2167 branch 2 times, most recently from ed41da7 to 320eac0 Compare November 13, 2025 14:40
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 13, 2025

@vr4manta: This pull request references SPLAT-2167 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

SPLAT-2167

Changes

  • Added dedicated host support for AWS

Dependencies

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 14, 2025

@vr4manta: This pull request references SPLAT-2167 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

SPLAT-2167

Changes

  • Added dedicated host support for AWS
  • Created new dedicated host tests
  • Fixed tests that were breaking due to updates upstream that were pulled in
  • Added missing permission ec2:DescribeInstanceTypes to cluster api credentials request

Dependencies

Notes

There seems to be a required permission that was missing for dedicated host support. When running these changes, a warning event was observed in the cluster capi operator namespace.

6s          Warning   FailedDescribeInstanceTypes   awscluster/ngirard-dh-5bb5w                           insufficient permissions to describe instance types for instance type "m6i.xlarge", falling back to the default architecture of "x86_64": operation error EC2: DescribeInstanceTypes, https response error StatusCode: 403, RequestID: 387549b4-ab58-48af-b14d-3882b6c7da52, api error UnauthorizedOperation: You are not authorized to perform this operation. User: arn:aws:iam::726924432237:user/ngirard-dh-5bb5w-openshift-cluster-api-aws-72f7q is not authorized to perform: ec2:DescribeInstanceTypes because no identity-based policy allows the ec2:DescribeInstanceTypes action

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@vr4manta vr4manta marked this pull request as ready for review November 14, 2025 13:15
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 14, 2025
@sunzhaohua2
Copy link
Contributor

/test unit

@sunzhaohua2
Copy link
Contributor

capi-techpreview job some migration case failed, @huali9 can you take a look?

            Reason: "FailedToConvertCAPIMachineToMAPI",
            Message: "failed to convert Cluster API machine to Machine API machine: spec.dedicatedHost: Required value: either id or dynamicHostAllocation is required when hostAffinity is host",

@huali9
Copy link
Contributor

huali9 commented Feb 27, 2026

capi-techpreview job some migration case failed, @huali9 can you take a look?

            Reason: "FailedToConvertCAPIMachineToMAPI",
            Message: "failed to convert Cluster API machine to Machine API machine: spec.dedicatedHost: Required value: either id or dynamicHostAllocation is required when hostAffinity is host",

Sure, I checked the cluster must gather, hostAffinity is still defaulted to host. @damdo already changed that in the upstream, so maybe a downstream rebase pr is needed.

@damdo
Copy link
Member

damdo commented Feb 27, 2026

@huali9 yes we need an upstream release for these:
kubernetes-sigs/cluster-api-provider-aws@v2.10.1...release-2.10

and then a upstream -> downstream sync

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 3, 2026
@sunzhaohua2
Copy link
Contributor

/test e2e-aws-capi-techpreview

@vr4manta
Copy link
Author

vr4manta commented Mar 5, 2026

I'm looking into the e2e failure to see what is up

@huali9
Copy link
Contributor

huali9 commented Mar 6, 2026

"failed to update MAPI machine set: admission webhook \"validation.machineset.machine.openshift.io\" denied the request: spec.placement.host: Forbidden: host may only be specified when tenancy is 'host'"

The error is caused by inconsistent between MAPI and CAPI.
Only set hostAffinity: default is allowed in CAPI. But the corresponding configuration is not allowed in MAPI:
Only set

            host:
              affinity: AnyAvailable  

I already raised a bug https://issues.redhat.com/browse/OCPBUGS-73821 for this before.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 9, 2026
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 9, 2026
@vr4manta
Copy link
Author

vr4manta commented Mar 9, 2026

/test e2e-aws-capi-techpreview

@vr4manta
Copy link
Author

vr4manta commented Mar 9, 2026

/test e2e-aws-capi-techpreview

@vr4manta
Copy link
Author

vr4manta commented Mar 9, 2026

Fixed e2e. A few minor issues found in conversion logic (both ways -> capi2mapi and mapi2capi). Now manually running some of the tests from OCPBUGS-73821. Some tests may not be completely valid, but making sure behavior is same on both side.

@huali9
Copy link
Contributor

huali9 commented Mar 10, 2026

/test regression-clusterinfra-aws-ipi-techpreview-capi

@huali9
Copy link
Contributor

huali9 commented Mar 10, 2026

Thank you for addressing the e2e issues. I'm currently testing the feature. Given the scope of the changes, I'll need some time to perform a full retest of the functionality to ensure everything works as expected.

However, if you prefer to merge the PR sooner, I can add the verified label now, as regression tests (e2e-aws-capi-techpreview and regression-clusterinfra-aws-ipi-techpreview-capi) have passed successfully. Please let me know which option works best for you.

@JoelSpeed
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 10, 2026
@openshift-ci-robot
Copy link

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@vr4manta
Copy link
Author

/test e2e-aws-ovn e2e-aws-ovn-serial-1of2 e2e-aws-ovn-serial-2of2 e2e-aws-ovn-techpreview e2e-aws-ovn-techpreview-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 10, 2026

@vr4manta: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 3820d2b link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-openstack-ovn-techpreview 68b083f link true /test e2e-openstack-ovn-techpreview
ci/prow/e2e-metal3-capi-techpreview 68b083f link false /test e2e-metal3-capi-techpreview

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@vr4manta
Copy link
Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants