Skip to content

Comments

CFP-42466-change-cilium-pod-ip-pool-crd.md#79

Merged
gandro merged 1 commit intocilium:mainfrom
kyounghunJang:main
Jan 12, 2026
Merged

CFP-42466-change-cilium-pod-ip-pool-crd.md#79
gandro merged 1 commit intocilium:mainfrom
kyounghunJang:main

Conversation

@kyounghunJang
Copy link
Contributor

Related

@xmulligan
Copy link
Member

cc: @cilium/sig-ipam

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the CFP. Did you AI to generate the CFP or parts of it? If so, please declare this.

I have a really hard time following the intent behind certain sections.

In general, I think the CFP should focus clearly on the CR changes, the intended behavior and most importantly on the migration path. Low-level implementation details are less relevant.

@kyounghunJang
Copy link
Contributor Author

@gandro
Sorry for providing unclear context
I’ll make sure to clarify and refocus the CFP to clearly describe the CR changes, intended behavior, and migration path.
Thanks a lot for the feedback — I’ll update the proposal accordingly.

@kyounghunJang kyounghunJang marked this pull request as draft November 6, 2025 02:05
@kyounghunJang kyounghunJang marked this pull request as ready for review November 12, 2025 07:44
@kyounghunJang kyounghunJang requested a review from gandro November 12, 2025 08:34
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reworking the CFP! I found it much easier to follow. I think we need more details on the migration though.

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. One thing that still seems unaddressed is the question of what the actual migration process is.

I would expect that the operator converts the existing v2alpha1 objects to v2, and then ingests the v2 objects as described in the CFP. But when does it do that conversion? How do we make sure it is only done once?

@kyounghunJang
Copy link
Contributor Author

Thanks for the update. One thing that still seems unaddressed is the question of what the actual migration process is.

I would expect that the operator converts the existing v2alpha1 objects to v2, and then ingests the v2 objects as described in the CFP. But when does it do that conversion? How do we make sure it is only done once?

I think we can prevent duplicate processing by using an annotation flag.
For example, something like cilium.io/migrated-to-v2="true" should work.

Regarding when the operator should perform the conversion, I believe there are two possible approaches:

  1. Run the conversion during the operator’s initialization phase
    This reduces the number of management points, but if the amount of data to migrate is large, it may delay the operator’s startup.

  2. Perform a lazy migration
    Add the migration logic to the reconcile loop. This way, the operator’s initialization is not impacted, and the migration can be handled more flexibly over time.

Personally, I think the second approach is better.
What are your thoughts on this?

Thank you!

@gandro
Copy link
Member

gandro commented Nov 25, 2025

In the current operator architecture, we don't really have an initialization phase. There are Hive start hooks, but they cannot be used for long running tasks. Therefore I don't think option 1 is really feasible.

The important bits to consider are:
a) Make sure migration can be turned off for users who don't want/need it.
b) Make sure the multi-pool allocator correctly copes with pools being migrated (which I think should just work)
c) What's the semantics of the annotation? Does it mean we no longer look at that pool at all? What happens if v2alpha1 and v2 differ? Probably best to assume v2 objects are the source of truth post migration.
d) Document clearly when it it safe for the user to delete the old v2alpha1 objects (i.e. migration has succeeded without errors, and a downgrade to earlier Cilium versions for any reason (potentially unrelated to multi-pool) no longer is needed).

@kyounghunJang
Copy link
Contributor Author

@gandro

Sorry for the late reply. I’d appreciate your thoughts on whether the following approach makes sense.

a) I think the migration behavior can be controlled through Helm values or a ConfigMap. For example, we could introduce a flag like enableV2Migration.

c) The annotation I had in mind would indicate whether the user’s CRD has been successfully migrated from v2alpha to v2.
If the flag is set to true, it means the migration from v2alpha to v2 was completed successfully, and the object should now follow the v2.
Additionally, if both v2 and v2alpha objects exist at the same time, I believe we should ensure that only the v2 object is used.

Thank you!

@gandro
Copy link
Member

gandro commented Dec 1, 2025

Additionally, if both v2 and v2alpha objects exist at the same time, I believe we should ensure that only the v2 object is used.

Yes. I don't think we want to ingest both v2 and v2alpha objects concurrently, that will lead to unnecessary complexity. I think the operator should only ingest v2, and the migrator ingests v2alpha and produces v2.

Given that I think we have discussed the major discussion points now, could you update the CFP the reflect the decisions being taken? In other words, I don't think some of the questions are open questions anymore, and we have decided on an approach.

Let's document that approach that you are planning to take for the implementation, and then once everyone is happy with that plan we can move the CFP into implementable state.

@kyounghunJang kyounghunJang force-pushed the main branch 2 times, most recently from cc7b2ad to bb1bad9 Compare December 7, 2025 12:17
- **Struct Change**: The `cidr` field in `spec` will be changed from `[]string` to `[]struct { cidr: string, unallocatable: bool }`.
- **Allocator Logic**: The allocator will check the `unallocatable` flag and skip those CIDRs during IP allocation.
- **Migration Strategy (Lazy Migration)**:
- **Single Source of Truth**: The Operator will prefer the `v2` version of CiliumPodIPPool when both `v2` and `v2alpha1` objects exist. If only `v2alpha1` is present, the Operator will ingest the `v2alpha1` object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should only only ingest v2. If only v2alpha objects are present, let's rely on the conversion.

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unblocking the "changes requested". I think this is in a good enough state to be implementable.

Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM minus a minor nit. Thanks! 💯

@networkop
Copy link

what do you think about the following use case: instead of only controlling the availability of the entire CIDR, allow users to specify parts of that CIDR? so something like this:

apiVersion: cilium.io/v2
kind: CiliumPodIPPool
metadata:
  name: green-pool
spec:
  ipv4:
    cidrs:
      - cidr: 10.20.0.0/16
        reservedRange: 10.20.0.0-10.20.0.99
      - cidr: 10.30.0.0/16

This would enable a use case where PodIPPool CIDR is shared with an external entity, e.g. Pods are attached and use L2 pod announcement while the default gateway resides in both Cilium and the TOR switch. This would allow users to disable the assignment of default gateway IP and potentially any other IPs from the same CIDR/prefix. (there was a similar use case described in cilium/cilium#31600)

@kyounghunJang
Copy link
Contributor Author

@gandro @pippolo84
I also agree with the suggestion that using reservedRange allows for more flexible handling.
If everyone is aligned, I can update the CFP based on this approach.

what do you think about the following use case: instead of only controlling the availability of the entire CIDR, allow users to specify parts of that CIDR? so something like this:

apiVersion: cilium.io/v2
kind: CiliumPodIPPool
metadata:
  name: green-pool
spec:
  ipv4:
    cidrs:
      - cidr: 10.20.0.0/16
        reservedRange: 10.20.0.0-10.20.0.99
      - cidr: 10.30.0.0/16

@gandro
Copy link
Member

gandro commented Dec 17, 2025

@kyounghunJang Thanks for addressing the feedback.

Could you change the state to Implementable? And maybe address the feedback by Fabio here #79 (comment)

Note

Implementable: CFPs that have been approved by one committer and merged into this repository are given the implementable status. This means that the ideas in the CFP have been agreed in principle and the coding work can now be started. When merged, the CFP should be up to date and the relevant stakeholders should be in alignment even if they are still going through the practical ramifications of how to implement it.

I assume while implementing the CFP, there might be some changes to the schema that may be required during implementation or code review, but I feel we have overall agreement on what to change and how to get there. We can modify further changes to the CR in this document even after this PR has been merged.

Once you change the Status to implementable, I will merge it.

Side-note: I'm about to be out of office yet again until next year, feel free to already start working on the code even if it isn't merged yet. Assuming no last minute discussion emerge, I will merge it once I'm back.

Signed-off-by: Kyounghoon Jang <matkimchi_@naver.com>
@gandro
Copy link
Member

gandro commented Jan 12, 2026

Merging this as implementable as per the comment above. Thanks!

@gandro gandro merged commit ded6c42 into cilium:main Jan 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants