Skip to content

fix: haproxy: drain connections when disabling endpoints#668

Closed
christf wants to merge 1 commit intoopenshift:masterfrom
christf:drain
Closed

fix: haproxy: drain connections when disabling endpoints#668
christf wants to merge 1 commit intoopenshift:masterfrom
christf:drain

Conversation

@christf
Copy link
Copy Markdown

@christf christf commented Aug 26, 2025

sending set-server maint will stop sending traffic to endpoints, which will cause traffic to be dropped. This instructs haproxy to gracefully drain an endpoint while sending new connections to other ready endpoints

see [1] for further information on the difference between drain and maint

[1] https://www.haproxy.com/documentation/haproxy-configuration-manual/new/latest/management/#section-9.3.-set-server

@openshift-ci openshift-ci Bot requested review from alebedev87 and knobunc August 26, 2025 16:33
@openshift-ci openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 26, 2025
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Aug 26, 2025

Hi @christf. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

func (b *Backend) DisableServer(name string) error {
log.V(4).Info("disabling server with maint state", "server", name)
return b.UpdateServerState(name, BackendServerStateMaint)
return b.UpdateServerState(name, BackendServerStateDrain)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Currently set server state is not used on any shipped OpenShift product. It's part of the Dynamic Configuration Manager feature which is still in TechPreview. 2) router watches for endpoints and react to changes, DisableServer is used for deleted endpoints. That is, corresponding pods are not there anymore, so the server should be disabled.

Copy link
Copy Markdown
Author

@christf christf Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback! I am aware of (1) and I am raising this PR to eventually be able to use this tech preview feature.

Can we dig a bit into (2) please?
As per my understanding, DisableServer is being run, when kube-proxy is notified that an endpoint is to be removed. As per kubernetes/kubernetes#106476, the notification to remove an endpoint happens around at the same time as the pod is being asked to terminate. So the pods are still very much ready to serve requests and they need to continue to do so until they have handled all in-flight requests. During this time the router must ensure no new requests are being sent into these pods while still retaining the active connections to those pods that are about to be terminated.
If "maint" is used, all in-flight connections are being broken. "drain" will keep them alive until they are being closed by either end of the connection (either clients are done, or the pod gets SIGKILLED which is governed by a timeout already)
The goal of this change is to support rolling deployments without losing a single request.

There is another bit missing to make it perfect, which is finding a way to delay the SIGTERM to the pod until the endpoint has been drained. But that is another can of worms.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning about the second point seems to be valid. Let me try to check our test coverage for this use case.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to progress this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for a long delay.

After testing of the maintenance mode done by my colleague (@jcmoraisjr) we cannot confirm the following statement:

If "maint" is used, all in-flight connections are being broken.

Established connections remained intact letting servers respond to in-flight queries. Also, I didn't find any confirmation of the statement above in the official HAProxy documentation. Can you provide us with any reference which would confirm the statement?

Another concern we had was that the drain mode allowed new connections (for sticky sessions) which may result into error responses when pod disappeared but HAProxy health check didn't find this out yet.

@alebedev87
Copy link
Copy Markdown
Contributor

/assign

@candita
Copy link
Copy Markdown
Contributor

candita commented Sep 24, 2025

/label ok-to-test

@candita
Copy link
Copy Markdown
Contributor

candita commented Sep 24, 2025

/ok-to-test

@openshift-ci openshift-ci Bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 24, 2025
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Sep 24, 2025

@candita: Can not set label ok-to-test: Must be member in one of these teams: [openshift-patch-managers openshift-staff-engineers openshift-release-oversight openshift-sustaining-engineers]

Details

In response to this:

/label ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sending set-server maint will stop sending traffic to endpoints, which
will cause traffic to be dropped. This instructs haproxy to gracefully
drain an endpoint while sending new connections to other ready endpoints

see [1] for further information on the difference  between drain and maint

[1] https://www.haproxy.com/documentation/haproxy-configuration-manual/new/latest/management/#section-9.3.-set-server
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 22, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from alebedev87. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 23, 2025

@christf: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-router 1723e77 link false /test e2e-metal-ipi-ovn-router
ci/prow/e2e-metal-ipi-ovn-ipv6 1723e77 link false /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-metal-ipi-ovn-dualstack 1723e77 link false /test e2e-metal-ipi-ovn-dualstack
ci/prow/e2e-aws-serial 1723e77 link true /test e2e-aws-serial
ci/prow/e2e-agnostic 1b414f3 link true /test e2e-agnostic
ci/prow/okd-scos-e2e-aws-ovn 1b414f3 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@christf
Copy link
Copy Markdown
Author

christf commented Oct 23, 2025

it is unclear to me how this change is related to the failing test. Please advise how to best proceed

@openshift-bot
Copy link
Copy Markdown
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2026
@alebedev87
Copy link
Copy Markdown
Contributor

/remove-lifecycle stale

@openshift-ci openshift-ci Bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2026
@alebedev87
Copy link
Copy Markdown
Contributor

Closing for the moment as we didn't managed to confirm that maint state drops inflight connections.

/close

@openshift-ci openshift-ci Bot closed this Mar 27, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 27, 2026

@alebedev87: Closed this PR.

Details

In response to this:

Closing for the moment as we didn't managed to confirm that maint state drops inflight connections.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@christf
Copy link
Copy Markdown
Author

christf commented Apr 6, 2026

quoting from https://github.com/haproxy/haproxy/blob/master/doc/management.txt

Setting the state to "maint" disables any traffic
  to the server as well as any health checks. This is the equivalent of the
  "disable server" command. Setting the mode to "drain" only removes the server
  from load balancing but still allows it to be checked and to accept new
  persistent connections. Changes are propagated to tracking servers if any.

Since this seems a bit vague, I stepped into the rabbit hole of looking at the code of haproxy. I found
1385e33eb089093dbc970dbc2759d2969ae533c5 which fixed a bug in september 2024.
OCP4 is using ancient versions of haproxy, for example: https://docs.redhat.com/en/documentation/openshift_container_platform/4.16/html/release_notes/ocp-4-16-release-notes

This might explain articles like https://medium.com/@ushaushraghunath/haproxy-maint-vs-drain-what-really-happens-to-connections-and-api-requests-6c3da376c299 or https://serverfault.com/questions/705991/haproxy-how-to-prepare-a-server-maintenance-without-kicking-app-sessions that go into the same direction as this PR.

In any case, I still think drain is the correct path forward because of long sticky sessions. Were those part of your tests?

@jcmoraisjr
Copy link
Copy Markdown
Member

Hi @christf, the reason of putting a server in maintenance/drain mode is that the underlying endpoint is not available anymore, either due to a scale-in operation, a rolling update or so. Our best option would be to remove the server instead, but this API wasn't available when DCM was developed.

The difference from maint and drain, according to the management doc you pointed, is that the former removes the server from the balancing, while the later also removes but adds an exception for clients whose cookie/sticky session/persistent connection points to it. I also agree that the later could be better for persistent connections, but this is not an option for us since the endpoint should be dead in a moment - and maybe it should already be dead when this api call is made to HAProxy.

From my understanding, what Willy fixed was the ability to redistribute new connections from the queue to new servers. Since it is from the queue, it's not about in flight connections, but instead new ones from clients waiting for a chance to connect to that (persistent) backend server. This is indeed a desirable fix to improve resilience in case of slow backends/small backend server maxconn. This is not related with in flight connections.

Last but not least, the enable/disable api call approach is dated and needs an update, we are working on add/del api calls instead, making the enable/disable calls deprecated and probably going to be removed. That said, even if drain would be an option for us, it should be going to be removed in the next release.

@christf
Copy link
Copy Markdown
Author

christf commented Apr 6, 2026

Hi @christf, the reason of putting a server in maintenance/drain mode is that the underlying endpoint is not available anymore, either due to a scale-in operation, a rolling update or so.

I argue that there is a design flaw in kubernetes in that are that causes lost requests on rolling deployments without need. The previously referenced thread on kubernetes/kubernetes#106476 discusses that.

It seems that everyone agrees that pods should only be terminated after the loadbalancer was drained but the architecture of kubernetes wasn't built that way. The workaround is to put sleep xx into the pre-stop hook of the container and thereby cause SIGTERM to be delayed by the max duration of a connection. In those cases the shutdown of the pod is delayed until all connections are handled. For this to work, haproxy mustn't break connections itself when removing and endpoint. When the synchronization problem inside k8s is resolved, this is even more relevant. (Ideally, haproxy receives the signal to drain a backend, then signals back when it is drained, then the backend is sent a SIGTERM. This is a different can of worms though).

So, no. This does not get executed when the backend is not available any more. This gets executed whenever haproxy received the request, which is async to the pod lifecycle. The point is: the user can exercise control on how long the backend still lives and therefore it matters if haproxy is breaking connections.

I do agree, we do not have to patch the thing that is about to be deprecated. Will the new api support the use case? (remove a backend only after the last connection has been closed by the client or backend)?

@jcmoraisjr
Copy link
Copy Markdown
Member

The point is: the user can exercise control on how long the backend still lives and therefore it matters if haproxy is breaking connections.

This is not what happens from the best of my knowledge. A connection is not dropped once established against a backend server when an API call moves its state to maint. Moving to maint mode means that no new connections can reach that server; established connections are preserved. My guess is that articles stating otherwise are wrong in some way. I'll be happy to run some reproducer proving me wrong.

The del server api call does the same: the removal of the backend server fails in case there is at least one active connection. So answering to your question, the new approach will also preserve in flight connections just like the maint api call, and just like we expect.

Note that not only maint or del api calls preserve established connections, SIGUSR against the process makes it to move the listening sockets to the next process, but the process continues alive until the last connection is finished by either the client or the backend server. All of them - maint/del server/sigusr - are consistent between each other.

I agree however that we are simply kicking new persistent connections from that server in case the server would survive longer in order to drain their current requests. Maybe using the drain api call would fit here, but we'd need to move it to maint/use del api call sooner or later, otherwise the server would continue alive for some persistent clients, making the removal of a backend server much more complex. Moreover, sending new connections to a server going to be removed sounds counterintuitive: clients need to reach a new one sooner or later, making all this effort not so useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test Indicates a non-member PR verified by an org member that is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants