fix: haproxy: drain connections when disabling endpoints by christf · Pull Request #668 · openshift/router

christf · 2025-08-26T16:32:05Z

sending set-server maint will stop sending traffic to endpoints, which will cause traffic to be dropped. This instructs haproxy to gracefully drain an endpoint while sending new connections to other ready endpoints

see [1] for further information on the difference between drain and maint

[1] https://www.haproxy.com/documentation/haproxy-configuration-manual/new/latest/management/#section-9.3.-set-server

openshift-ci · 2025-08-26T16:33:25Z

Hi @christf. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

alebedev87 · 2025-08-27T12:05:18Z

 func (b *Backend) DisableServer(name string) error {
 	log.V(4).Info("disabling server with maint state", "server", name)
-	return b.UpdateServerState(name, BackendServerStateMaint)
+	return b.UpdateServerState(name, BackendServerStateDrain)


Currently set server state is not used on any shipped OpenShift product. It's part of the Dynamic Configuration Manager feature which is still in TechPreview. 2) router watches for endpoints and react to changes, DisableServer is used for deleted endpoints. That is, corresponding pods are not there anymore, so the server should be disabled.

Thank you for the feedback! I am aware of (1) and I am raising this PR to eventually be able to use this tech preview feature.

Can we dig a bit into (2) please?
As per my understanding, DisableServer is being run, when kube-proxy is notified that an endpoint is to be removed. As per kubernetes/kubernetes#106476, the notification to remove an endpoint happens around at the same time as the pod is being asked to terminate. So the pods are still very much ready to serve requests and they need to continue to do so until they have handled all in-flight requests. During this time the router must ensure no new requests are being sent into these pods while still retaining the active connections to those pods that are about to be terminated.
If "maint" is used, all in-flight connections are being broken. "drain" will keep them alive until they are being closed by either end of the connection (either clients are done, or the pod gets SIGKILLED which is governed by a timeout already)
The goal of this change is to support rolling deployments without losing a single request.

There is another bit missing to make it perfect, which is finding a way to delay the SIGTERM to the pod until the endpoint has been drained. But that is another can of worms.

The reasoning about the second point seems to be valid. Let me try to check our test coverage for this use case.

How to progress this?

Sorry for a long delay.

After testing of the maintenance mode done by my colleague (@jcmoraisjr) we cannot confirm the following statement:

If "maint" is used, all in-flight connections are being broken.

Established connections remained intact letting servers respond to in-flight queries. Also, I didn't find any confirmation of the statement above in the official HAProxy documentation. Can you provide us with any reference which would confirm the statement?

Another concern we had was that the drain mode allowed new connections (for sticky sessions) which may result into error responses when pod disappeared but HAProxy health check didn't find this out yet.

alebedev87 · 2025-09-17T14:35:12Z

/assign

candita · 2025-09-24T14:53:31Z

/label ok-to-test

candita · 2025-09-24T14:53:59Z

/ok-to-test

openshift-ci · 2025-09-24T14:55:56Z

@candita: Can not set label ok-to-test: Must be member in one of these teams: [openshift-patch-managers openshift-staff-engineers openshift-release-oversight openshift-sustaining-engineers]

Details

In response to this:

/label ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sending set-server maint will stop sending traffic to endpoints, which will cause traffic to be dropped. This instructs haproxy to gracefully drain an endpoint while sending new connections to other ready endpoints see [1] for further information on the difference between drain and maint [1] https://www.haproxy.com/documentation/haproxy-configuration-manual/new/latest/management/#section-9.3.-set-server

openshift-ci · 2025-10-22T22:47:32Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from alebedev87. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-10-23T02:20:10Z

@christf: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-metal-ipi-ovn-router	`1723e77`	link	false	`/test e2e-metal-ipi-ovn-router`
ci/prow/e2e-metal-ipi-ovn-ipv6	`1723e77`	link	false	`/test e2e-metal-ipi-ovn-ipv6`
ci/prow/e2e-metal-ipi-ovn-dualstack	`1723e77`	link	false	`/test e2e-metal-ipi-ovn-dualstack`
ci/prow/e2e-aws-serial	`1723e77`	link	true	`/test e2e-aws-serial`
ci/prow/e2e-agnostic	`1b414f3`	link	true	`/test e2e-agnostic`
ci/prow/okd-scos-e2e-aws-ovn	`1b414f3`	link	false	`/test okd-scos-e2e-aws-ovn`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

christf · 2025-10-23T16:56:28Z

it is unclear to me how this change is related to the failing test. Please advise how to best proceed

openshift-bot · 2026-01-22T01:00:42Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

alebedev87 · 2026-01-22T08:30:14Z

/remove-lifecycle stale

alebedev87 · 2026-03-27T11:49:35Z

Closing for the moment as we didn't managed to confirm that maint state drops inflight connections.

/close

openshift-ci · 2026-03-27T11:49:53Z

@alebedev87: Closed this PR.

Details

In response to this:

Closing for the moment as we didn't managed to confirm that maint state drops inflight connections.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

christf · 2026-04-06T17:05:54Z

quoting from https://github.com/haproxy/haproxy/blob/master/doc/management.txt

Setting the state to "maint" disables any traffic
  to the server as well as any health checks. This is the equivalent of the
  "disable server" command. Setting the mode to "drain" only removes the server
  from load balancing but still allows it to be checked and to accept new
  persistent connections. Changes are propagated to tracking servers if any.

Since this seems a bit vague, I stepped into the rabbit hole of looking at the code of haproxy. I found
1385e33eb089093dbc970dbc2759d2969ae533c5 which fixed a bug in september 2024.
OCP4 is using ancient versions of haproxy, for example: https://docs.redhat.com/en/documentation/openshift_container_platform/4.16/html/release_notes/ocp-4-16-release-notes

This might explain articles like https://medium.com/@ushaushraghunath/haproxy-maint-vs-drain-what-really-happens-to-connections-and-api-requests-6c3da376c299 or https://serverfault.com/questions/705991/haproxy-how-to-prepare-a-server-maintenance-without-kicking-app-sessions that go into the same direction as this PR.

In any case, I still think drain is the correct path forward because of long sticky sessions. Were those part of your tests?

jcmoraisjr · 2026-04-06T17:50:28Z

Hi @christf, the reason of putting a server in maintenance/drain mode is that the underlying endpoint is not available anymore, either due to a scale-in operation, a rolling update or so. Our best option would be to remove the server instead, but this API wasn't available when DCM was developed.

The difference from maint and drain, according to the management doc you pointed, is that the former removes the server from the balancing, while the later also removes but adds an exception for clients whose cookie/sticky session/persistent connection points to it. I also agree that the later could be better for persistent connections, but this is not an option for us since the endpoint should be dead in a moment - and maybe it should already be dead when this api call is made to HAProxy.

From my understanding, what Willy fixed was the ability to redistribute new connections from the queue to new servers. Since it is from the queue, it's not about in flight connections, but instead new ones from clients waiting for a chance to connect to that (persistent) backend server. This is indeed a desirable fix to improve resilience in case of slow backends/small backend server maxconn. This is not related with in flight connections.

Last but not least, the enable/disable api call approach is dated and needs an update, we are working on add/del api calls instead, making the enable/disable calls deprecated and probably going to be removed. That said, even if drain would be an option for us, it should be going to be removed in the next release.

christf · 2026-04-06T19:25:25Z

Hi @christf, the reason of putting a server in maintenance/drain mode is that the underlying endpoint is not available anymore, either due to a scale-in operation, a rolling update or so.

I argue that there is a design flaw in kubernetes in that are that causes lost requests on rolling deployments without need. The previously referenced thread on kubernetes/kubernetes#106476 discusses that.

It seems that everyone agrees that pods should only be terminated after the loadbalancer was drained but the architecture of kubernetes wasn't built that way. The workaround is to put sleep xx into the pre-stop hook of the container and thereby cause SIGTERM to be delayed by the max duration of a connection. In those cases the shutdown of the pod is delayed until all connections are handled. For this to work, haproxy mustn't break connections itself when removing and endpoint. When the synchronization problem inside k8s is resolved, this is even more relevant. (Ideally, haproxy receives the signal to drain a backend, then signals back when it is drained, then the backend is sent a SIGTERM. This is a different can of worms though).

So, no. This does not get executed when the backend is not available any more. This gets executed whenever haproxy received the request, which is async to the pod lifecycle. The point is: the user can exercise control on how long the backend still lives and therefore it matters if haproxy is breaking connections.

I do agree, we do not have to patch the thing that is about to be deprecated. Will the new api support the use case? (remove a backend only after the last connection has been closed by the client or backend)?

jcmoraisjr · 2026-04-06T21:03:04Z

The point is: the user can exercise control on how long the backend still lives and therefore it matters if haproxy is breaking connections.

This is not what happens from the best of my knowledge. A connection is not dropped once established against a backend server when an API call moves its state to maint. Moving to maint mode means that no new connections can reach that server; established connections are preserved. My guess is that articles stating otherwise are wrong in some way. I'll be happy to run some reproducer proving me wrong.

The del server api call does the same: the removal of the backend server fails in case there is at least one active connection. So answering to your question, the new approach will also preserve in flight connections just like the maint api call, and just like we expect.

Note that not only maint or del api calls preserve established connections, SIGUSR against the process makes it to move the listening sockets to the next process, but the process continues alive until the last connection is finished by either the client or the backend server. All of them - maint/del server/sigusr - are consistent between each other.

I agree however that we are simply kicking new persistent connections from that server in case the server would survive longer in order to drain their current requests. Maybe using the drain api call would fit here, but we'd need to move it to maint/use del api call sooner or later, otherwise the server would continue alive for some persistent clients, making the removal of a backend server much more complex. Moreover, sending new connections to a server going to be removed sounds counterintuitive: clients need to reach a new one sooner or later, making all this effort not so useful.

openshift-ci Bot requested review from alebedev87 and knobunc August 26, 2025 16:33

openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 26, 2025

alebedev87 reviewed Aug 27, 2025

View reviewed changes

openshift-ci Bot assigned alebedev87 Sep 17, 2025

openshift-ci Bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 24, 2025

christf force-pushed the drain branch from 1723e77 to 1b414f3 Compare October 22, 2025 22:46

openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2026

openshift-ci Bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2026

openshift-ci Bot closed this Mar 27, 2026

Conversation

christf commented Aug 26, 2025

Uh oh!

openshift-ci Bot commented Aug 26, 2025

Uh oh!

alebedev87 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

christf Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alebedev87 Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

christf Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

alebedev87 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

alebedev87 commented Sep 17, 2025

Uh oh!

candita commented Sep 24, 2025

Uh oh!

candita commented Sep 24, 2025

Uh oh!

openshift-ci Bot commented Sep 24, 2025

Uh oh!

openshift-ci Bot commented Oct 22, 2025

Uh oh!

openshift-ci Bot commented Oct 23, 2025

Uh oh!

christf commented Oct 23, 2025

Uh oh!

openshift-bot commented Jan 22, 2026

Uh oh!

alebedev87 commented Jan 22, 2026

Uh oh!

alebedev87 commented Mar 27, 2026

Uh oh!

openshift-ci Bot commented Mar 27, 2026

Uh oh!

christf commented Apr 6, 2026

Uh oh!

jcmoraisjr commented Apr 6, 2026

Uh oh!

christf commented Apr 6, 2026

Uh oh!

jcmoraisjr commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

christf Aug 27, 2025 •

edited

Loading