Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions images/router/haproxy/start-haproxy
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#!/bin/bash
set -euo pipefail

##
## This script expects HAProxy running in foreground, do not initialize it in daemon mode!

timeoutStr=${ROUTER_GRACEFUL_SHUTDOWN_DELAY:-45s}
if ! [[ "$timeoutStr" =~ ^[0-9]+s$ ]]; then
echo "Invalid timeout: $timeoutStr"
exit 1
fi

timeout="${timeoutStr%s}"
signaled=0

stopHAProxy() {
signaled=1

# HAProxy handles SIGUSR1 by finishing its process as soon as all the current connections,
# active or not, are closed by either the client or the backend server.
echo "Sending SIGUSR1 to HAProxy process $haproxyPID"
kill -s USR1 $haproxyPID
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

kill commands are unguarded under set -e — risk of premature script termination

set -euo pipefail is active for the whole script. Both kill -s USR1 $haproxyPID (line 22) and kill -s TERM $haproxyPID (line 37) will cause the script to abort if HAProxy has already exited. The race on line 37 is realistically reachable: HAProxy can drain and die naturally during the kill -0 poll loop, then the loop falls through by timeout — at that point HAProxy is already gone and kill -s TERM fails. A similar, narrower race exists at line 22.

🐛 Proposed fix
-    kill -s USR1 $haproxyPID
+    kill -s USR1 $haproxyPID 2>/dev/null || true
-    kill -s TERM $haproxyPID
+    kill -s TERM $haproxyPID 2>/dev/null || true

Also applies to: 37-37

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@images/router/haproxy/start-haproxy` at line 22, The unguarded kill
invocations using haproxyPID (the calls to "kill -s USR1 $haproxyPID" and "kill
-s TERM $haproxyPID") will abort the script under set -e if HAProxy already
exited; change them to tolerate a missing PID by either checking the process
first (use "kill -0 $haproxyPID" and only send the signal if that succeeds) or
temporarily disable exit-on-error around the signal send (e.g., set +e; kill -s
USR1 $haproxyPID || true; set -e) so the kill in the shutdown/poll loop (the
code that currently uses kill -0 for polling) does not cause the script to
terminate on a race.


# Poll the process, retuning as soon as it is not alive anymore.
for i in $(seq 1 $timeout); do
sleep 1
if ! kill -0 $haproxyPID 2>/dev/null; then
echo "All connections are closed"
return
fi
done

# HAProxy handles SIGTERM by closing all the TCP connections and waiting for the full TCP handshake
# (FIN / ACK / FIN-ACK) from both sides, and only after that it finishes. `kill` runs asynchronous,
# and `wait` maintains this script alive until haproxy finishes.
echo "SIGUSR1 timed out, sending SIGTERM to HAProxy process $haproxyPID"
kill -s TERM $haproxyPID
}

echo "Starting HAProxy. SIGUSR1 timeout is ${timeout}s"
/usr/sbin/haproxy "$@" &
haproxyPID=$!

trap stopHAProxy SIGTERM SIGUSR1 SIGINT
trap "kill -s HUP $haproxyPID" SIGHUP

exit_code=0
while kill -0 $haproxyPID 2>/dev/null; do
# Start `wait` again in case it returned due to a non terminate signal, e.g. SIGHUP
wait $haproxyPID || exit_code=$?
done

# Received signal (usually SIGTERM) takes precedence during `wait`, overriding with a clean exit
[ "$signaled" = 1 ] && exit 0

echo "haproxy exited with status code $exit_code"
exit $exit_code
48 changes: 46 additions & 2 deletions pkg/router/template/router.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package templaterouter

import (
"bytes"
"context"
"crypto/md5"
"encoding/pem"
"fmt"
Expand All @@ -15,10 +16,12 @@ import (
"text/template"
"time"

"github.com/bcicen/go-haproxy"
"github.com/fsnotify/fsnotify"
"github.com/prometheus/client_golang/prometheus"

"k8s.io/apimachinery/pkg/util/sets"
"k8s.io/apimachinery/pkg/util/wait"

routev1 "github.com/openshift/api/route/v1"

Expand Down Expand Up @@ -666,8 +669,22 @@ func (r *templateRouter) writeCertificates(cfg *ServiceAliasConfig) error {
return nil
}

// reloadRouter executes the router's reload script.
// reloadRouter reloads haproxy.
func (r *templateRouter) reloadRouter(shutdown bool) error {
adminSocket := os.Getenv("ROUTER_HAPROXY_ADMIN_UNIX_SOCKET")
if adminSocket != "" {
if shutdown {
// We are in HAProxy's master/worker mode, currently implemented as a sidecar,
// so there is no local process to handle and the sidecar one already received SIGTERM.
return nil
}
return r.reloadRouterExternal(adminSocket)
}
return r.reloadRouterEmbedded(shutdown)
}

// reloadRouterEmbedded executes the router's reload script.
func (r *templateRouter) reloadRouterEmbedded(shutdown bool) error {
if r.reloadFn != nil {
return r.reloadFn(shutdown)
}
Expand All @@ -679,7 +696,34 @@ func (r *templateRouter) reloadRouter(shutdown bool) error {
if err != nil {
return fmt.Errorf("error reloading router: %v\n%s", err, string(out))
}
log.V(0).Info("router reloaded", "output", string(out))
log.V(0).Info("router reloaded", "mode", "embedded", "output", string(out))
return nil
}

// reloadRouterExternal sends a reload command to the external haproxy.
func (r *templateRouter) reloadRouterExternal(adminSocket string) error {
// TODO missing application's context
_ = wait.PollUntilContextCancel(context.Background(), 2*time.Second, true, func(ctx context.Context) (done bool, err error) {
_, errstat := os.Lstat(adminSocket)
if errstat != nil {
log.Info("waiting for haproxy socket", "message", errstat.Error())
return false, nil
}
return true, nil
})
Comment on lines +706 to +713
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Unbounded socket poll blocks the reload goroutine permanently if HAProxy's admin socket never appears

context.Background() never cancels, so if the sidecar never creates the socket (crash, misconfigured path, wrong volume mount), wait.PollUntilContextCancel loops forever and the router's commit/reload flow is permanently blocked — no further route changes will ever be applied.

Additionally, the error result is discarded (_ =); even after a proper context is plumbed in, a cancelled poll will be silently ignored and execution will proceed to RunCommand against a socket that may not exist.

🛡️ Proposed fix
-    // TODO missing application's context
-    _ = wait.PollUntilContextCancel(context.Background(), 2*time.Second, true, func(ctx context.Context) (done bool, err error) {
+    pollCtx, cancel := context.WithTimeout(ctx, 30*time.Second) // propagate caller's context; add a sensible upper bound
+    defer cancel()
+    if err := wait.PollUntilContextCancel(pollCtx, 2*time.Second, true, func(ctx context.Context) (done bool, err error) {
         _, errstat := os.Lstat(adminSocket)
         if errstat != nil {
             log.Info("waiting for haproxy socket", "message", errstat.Error())
             return false, nil
         }
         return true, nil
-    })
+    }); err != nil {
+        return fmt.Errorf("timed out waiting for haproxy admin socket %q: %w", adminSocket, err)
+    }

The caller (reloadRouterExternal) would need to accept a ctx context.Context parameter, threaded from the application-level context.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/router/template/router.go` around lines 706 - 713, The poll currently
uses context.Background() and discards its error, which can block reloads
forever; change reloadRouterExternal to accept a ctx context.Context (threaded
from the caller), pass that ctx into wait.PollUntilContextCancel instead of
context.Background(), capture the returned error (remove the "_ =" discard), and
if the poll returns an error or context cancellation, abort/return the error
before calling RunCommand against adminSocket so you don't attempt to use a
missing socket; update callers to propagate the application context into
reloadRouterExternal.

client := haproxy.HAProxyClient{Addr: "unix://" + adminSocket, Timeout: 10 /*seconds*/}
outputBuffer, err := client.RunCommand("reload")
if err != nil {
return fmt.Errorf("error connecting haproxy: %w", err)
}
output := outputBuffer.String()

// `reload` command is synchronous since haproxy 2.7, so it is safe to continue as soon as it returns.
// It should return Success=1 in the first line in case everything went well, anything else is considered a failure.
if !strings.HasPrefix(output, "Success=1") {
return fmt.Errorf("error reloading router: %s", output)
}
log.Info("router reloaded", "mode", "sidecar")
return nil
}

Expand Down