fix: haproxy: drain connections when disabling endpoints#668
fix: haproxy: drain connections when disabling endpoints#668christf wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
Hi @christf. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| func (b *Backend) DisableServer(name string) error { | ||
| log.V(4).Info("disabling server with maint state", "server", name) | ||
| return b.UpdateServerState(name, BackendServerStateMaint) | ||
| return b.UpdateServerState(name, BackendServerStateDrain) |
There was a problem hiding this comment.
- Currently
set server stateis not used on any shipped OpenShift product. It's part of the Dynamic Configuration Manager feature which is still in TechPreview. 2) router watches for endpoints and react to changes,DisableServeris used for deleted endpoints. That is, corresponding pods are not there anymore, so the server should be disabled.
There was a problem hiding this comment.
Thank you for the feedback! I am aware of (1) and I am raising this PR to eventually be able to use this tech preview feature.
Can we dig a bit into (2) please?
As per my understanding, DisableServer is being run, when kube-proxy is notified that an endpoint is to be removed. As per kubernetes/kubernetes#106476, the notification to remove an endpoint happens around at the same time as the pod is being asked to terminate. So the pods are still very much ready to serve requests and they need to continue to do so until they have handled all in-flight requests. During this time the router must ensure no new requests are being sent into these pods while still retaining the active connections to those pods that are about to be terminated.
If "maint" is used, all in-flight connections are being broken. "drain" will keep them alive until they are being closed by either end of the connection (either clients are done, or the pod gets SIGKILLED which is governed by a timeout already)
The goal of this change is to support rolling deployments without losing a single request.
There is another bit missing to make it perfect, which is finding a way to delay the SIGTERM to the pod until the endpoint has been drained. But that is another can of worms.
There was a problem hiding this comment.
The reasoning about the second point seems to be valid. Let me try to check our test coverage for this use case.
There was a problem hiding this comment.
Sorry for a long delay.
After testing of the maintenance mode done by my colleague (@jcmoraisjr) we cannot confirm the following statement:
If "maint" is used, all in-flight connections are being broken.
Established connections remained intact letting servers respond to in-flight queries. Also, I didn't find any confirmation of the statement above in the official HAProxy documentation. Can you provide us with any reference which would confirm the statement?
Another concern we had was that the drain mode allowed new connections (for sticky sessions) which may result into error responses when pod disappeared but HAProxy health check didn't find this out yet.
|
/assign |
|
/label ok-to-test |
|
/ok-to-test |
|
@candita: Can not set label ok-to-test: Must be member in one of these teams: [openshift-patch-managers openshift-staff-engineers openshift-release-oversight openshift-sustaining-engineers] DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
sending set-server maint will stop sending traffic to endpoints, which will cause traffic to be dropped. This instructs haproxy to gracefully drain an endpoint while sending new connections to other ready endpoints see [1] for further information on the difference between drain and maint [1] https://www.haproxy.com/documentation/haproxy-configuration-manual/new/latest/management/#section-9.3.-set-server
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@christf: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
it is unclear to me how this change is related to the failing test. Please advise how to best proceed |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale |
|
Closing for the moment as we didn't managed to confirm that /close |
|
@alebedev87: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
quoting from https://github.com/haproxy/haproxy/blob/master/doc/management.txt Since this seems a bit vague, I stepped into the rabbit hole of looking at the code of haproxy. I found This might explain articles like https://medium.com/@ushaushraghunath/haproxy-maint-vs-drain-what-really-happens-to-connections-and-api-requests-6c3da376c299 or https://serverfault.com/questions/705991/haproxy-how-to-prepare-a-server-maintenance-without-kicking-app-sessions that go into the same direction as this PR. In any case, I still think drain is the correct path forward because of long sticky sessions. Were those part of your tests? |
|
Hi @christf, the reason of putting a server in maintenance/drain mode is that the underlying endpoint is not available anymore, either due to a scale-in operation, a rolling update or so. Our best option would be to remove the server instead, but this API wasn't available when DCM was developed. The difference from maint and drain, according to the management doc you pointed, is that the former removes the server from the balancing, while the later also removes but adds an exception for clients whose cookie/sticky session/persistent connection points to it. I also agree that the later could be better for persistent connections, but this is not an option for us since the endpoint should be dead in a moment - and maybe it should already be dead when this api call is made to HAProxy. From my understanding, what Willy fixed was the ability to redistribute new connections from the queue to new servers. Since it is from the queue, it's not about in flight connections, but instead new ones from clients waiting for a chance to connect to that (persistent) backend server. This is indeed a desirable fix to improve resilience in case of slow backends/small backend server maxconn. This is not related with in flight connections. Last but not least, the enable/disable api call approach is dated and needs an update, we are working on add/del api calls instead, making the enable/disable calls deprecated and probably going to be removed. That said, even if drain would be an option for us, it should be going to be removed in the next release. |
I argue that there is a design flaw in kubernetes in that are that causes lost requests on rolling deployments without need. The previously referenced thread on kubernetes/kubernetes#106476 discusses that. It seems that everyone agrees that pods should only be terminated after the loadbalancer was drained but the architecture of kubernetes wasn't built that way. The workaround is to put sleep xx into the pre-stop hook of the container and thereby cause SIGTERM to be delayed by the max duration of a connection. In those cases the shutdown of the pod is delayed until all connections are handled. For this to work, haproxy mustn't break connections itself when removing and endpoint. When the synchronization problem inside k8s is resolved, this is even more relevant. (Ideally, haproxy receives the signal to drain a backend, then signals back when it is drained, then the backend is sent a SIGTERM. This is a different can of worms though). So, no. This does not get executed when the backend is not available any more. This gets executed whenever haproxy received the request, which is async to the pod lifecycle. The point is: the user can exercise control on how long the backend still lives and therefore it matters if haproxy is breaking connections. I do agree, we do not have to patch the thing that is about to be deprecated. Will the new api support the use case? (remove a backend only after the last connection has been closed by the client or backend)? |
This is not what happens from the best of my knowledge. A connection is not dropped once established against a backend server when an API call moves its state to maint. Moving to maint mode means that no new connections can reach that server; established connections are preserved. My guess is that articles stating otherwise are wrong in some way. I'll be happy to run some reproducer proving me wrong. The del server api call does the same: the removal of the backend server fails in case there is at least one active connection. So answering to your question, the new approach will also preserve in flight connections just like the maint api call, and just like we expect. Note that not only maint or del api calls preserve established connections, SIGUSR against the process makes it to move the listening sockets to the next process, but the process continues alive until the last connection is finished by either the client or the backend server. All of them - maint/del server/sigusr - are consistent between each other. I agree however that we are simply kicking new persistent connections from that server in case the server would survive longer in order to drain their current requests. Maybe using the drain api call would fit here, but we'd need to move it to maint/use del api call sooner or later, otherwise the server would continue alive for some persistent clients, making the removal of a backend server much more complex. Moreover, sending new connections to a server going to be removed sounds counterintuitive: clients need to reach a new one sooner or later, making all this effort not so useful. |
sending set-server maint will stop sending traffic to endpoints, which will cause traffic to be dropped. This instructs haproxy to gracefully drain an endpoint while sending new connections to other ready endpoints
see [1] for further information on the difference between drain and maint
[1] https://www.haproxy.com/documentation/haproxy-configuration-manual/new/latest/management/#section-9.3.-set-server