Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 64 additions & 6 deletions docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,19 @@ storage-e2e/
│ │ └── vm_block_device.go
│ │
│ ├── infrastructure/ # Infrastructure layer
│ │ └── ssh/ # SSH operations
│ │ └── ssh/ # SSH operations (legacy)
│ │ ├── client.go
│ │ ├── interface.go
│ │ ├── tunnel.go
│ │ └── types.go
│ │ ├── types.go
│ │ └── v2/ # Self-healing SSH client (Dialer/Route + Tunnel)
│ │ ├── client.go # New, Client, Close + package docs
│ │ ├── conn.go # connection core: snapshot/refresh/keepalive + withConn
│ │ ├── dialer.go # Dialer interface, Route, chain closer
│ │ ├── endpoint.go # Endpoint, auth, host/key resolution
│ │ ├── errors.go # transient classification
│ │ ├── options.go # functional options
│ │ └── tunnel.go # Tunnel, accept loop
│ │
│ └── logger/ # Structured logging
│ ├── logger.go # Logger implementation
Expand Down Expand Up @@ -445,10 +453,18 @@ internal/kubernetes/ # Internal Kubernetes clients

```
infrastructure/ssh/
├── client.go # SSH client implementation (Exec, ExecCapture, tunnels)
├── interface.go # SSH client interface
├── tunnel.go # Port forwarding and tunneling
└── types.go # SSH-related types
├── client.go # SSH client implementation (Exec, ExecCapture, tunnels) [legacy]
├── interface.go # SSH client interface [legacy]
├── tunnel.go # Port forwarding and tunneling [legacy]
├── types.go # SSH-related types [legacy]
└── v2/ # Self-healing SSH client (see below)
├── client.go # New, Client, Close + package docs
├── conn.go # connection core: snapshot/refresh/keepalive + withConn executor
├── dialer.go # Dialer interface, Route, chain closer
├── endpoint.go # Endpoint, auth, host/key resolution
├── errors.go # transient classification
├── options.go # functional options
└── tunnel.go # Tunnel, accept loop
```

**Responsibilities**:
Expand All @@ -466,6 +482,48 @@ infrastructure/ssh/
- `ExecCapture` keeps stdout and stderr separate while preserving retry/reconnect behavior
- Proper resource cleanup

#### 3.4.1 Self-healing SSH client (`internal/infrastructure/ssh/v2/`)

A ground-up rewrite that lives in parallel with the legacy package (no consumers
migrated yet). It separates **how we connect** (directly or via jump hosts) from
**what we do over the connection** (currently only tunneling), and hides every
reconnect from callers.

**Design**:

- `Dialer` is the injection point: `Dial(ctx) (*ssh.Client, io.Closer, error)` +
`Describe()`. `Route(first Endpoint, more ...Endpoint)` builds the built-in
implementation; the last hop is always the target, so the `(first, more...)`
signature guarantees at least one hop at compile time. The returned `io.Closer`
tears down the whole chain (target + every jump + ssh-agent connections).
- `Endpoint` describes a single host: `User`, `Addr` (`host` or `host:port`,
default `:22`), `KeyPath` (`~` expanded), optional `Passphrase`
(falls back to `SSH_PASSPHRASE` then ssh-agent), optional per-hop `HostKey`.
- The unexported `conn` core owns the current `*ssh.Client`, its chain `Closer`,
and a generation counter under a mutex. `snapshot` reads them; `refresh`
re-dials via `singleflight` keyed on the failed generation so concurrent
reconnects collapse into one and a stale generation never tears down a freshly
healed link. The slow `Dial` runs outside the lock on a detached context
(`context.WithoutCancel` + timeout) so one caller's cancellation can't abort
the shared flight.
- A single generic executor `withConn[T]` runs an operation against the live
client and heals on transient failures (bounded by `WithRetries`); the tunnel
uses it today and `Run`/`Upload` are designed to reuse it unchanged.
- Optional keepalive (`WithKeepalive`) probes the link and heals through the same
`refresh` path; every heal is logged at WARN.

**Public API v1**: `New(ctx, Dialer, ...Option)`, `Client.Tunnel(ctx, remotePort)`
(self-healing local forward on a free `127.0.0.1` port; `Tunnel.LocalAddr`,
`Tunnel.Close`), `Client.Close`. Options: `WithKeepalive`, `WithRetries`,
`WithLogger`, `WithHostKeyCallback`, `WithInsecureIgnoreHostKey` (host key
defaults to `InsecureIgnoreHostKey` — a conscious default for ephemeral e2e VMs).

**Extension points (designed, not yet implemented)**: `Run` (transparent retry
only when the session fails to open; mid-flight drops heal but surface the error
to avoid double side effects; opt-in `Idempotent` for true retry) and `Upload`.
Transient-error classification uses `errors.Is`/`errors.As` against standard
types — never error-string matching.

### 3.5 Logger Module (`internal/logger/`)

```
Expand Down
15 changes: 14 additions & 1 deletion docs/WORKLOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,20 @@ All notable changes to this repository are documented here. New entries are appe
- **Add** `pkg/config/config_test.go`: unit tests for `config.New` covering provider parsing, missing required `TEST_CLUSTER_PROVIDER` (error), empty-value handling, and table-driven provider values.

---

## 2026-06-18

- **Add** new self-healing SSH client package `internal/infrastructure/ssh/v2` (parallel to the legacy
`internal/infrastructure/ssh`, no consumers migrated yet). Separates connection strategy from operations: `Dialer`
interface with `Route(first, more...)` for direct or multi-jump chains; `Endpoint` (auth via key file + passphrase/ssh-agent fallback, `~` expansion, per-hop host key); a `conn` core with `snapshot`/`refresh` (
singleflight + generation counter to dedupe concurrent reconnects) and optional keepalive that heals through the same
path; a single generic reconnect-aware executor `withConn[T]` reused by all operations; public API v1 `New` +
`Client.Tunnel` (transparently self-healing local forward on a free `127.0.0.1` port) + `Close`, plus
`Tunnel.LocalAddr`/`Close` and options `WithKeepalive`/`WithRetries`/`WithLogger`/`WithHostKeyCallback`/
`WithInsecureIgnoreHostKey`. Extension points (`Run`/`Upload`, `*ExitError`) laid out without core changes.
Transient-error classification via `errors.Is`/`As` (no string matching). Table-driven tests with an in-process SSH
server, run with `-race`.
- **Add** `golang.org/x/sync` as a direct dependency (singleflight) in `go.mod`.
---
## 2026-06-19

- **Bugfix** `internal/config.ResolveModulePullOverrides`: detect malformed `${...}` on the original string (stripping
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ require (
github.com/onsi/gomega v1.39.1
github.com/pkg/sftp v1.13.10
golang.org/x/crypto v0.52.0
golang.org/x/sync v0.21.0
golang.org/x/term v0.43.0
gopkg.in/yaml.v3 v3.0.1
k8s.io/api v0.34.2
Expand Down Expand Up @@ -248,7 +249,6 @@ require (
golang.org/x/mod v0.35.0 // indirect
golang.org/x/net v0.54.0 // indirect
golang.org/x/oauth2 v0.30.0 // indirect
golang.org/x/sync v0.20.0 // indirect
golang.org/x/sys v0.45.0 // indirect
golang.org/x/text v0.37.0 // indirect
golang.org/x/time v0.12.0 // indirect
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -716,8 +716,8 @@ golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJ
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4=
golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0=
golang.org/x/sync v0.21.0 h1:HLII4xRRTtCRkxYp4HNFF0Js/Og6q2i++KXbg0gHCwM=
golang.org/x/sync v0.21.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0=
golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20180909124046-d0be0721c37e/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
Expand Down
74 changes: 74 additions & 0 deletions internal/infrastructure/ssh/v2/client.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
/*
Copyright 2025 Flant JSC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

// Package ssh provides a self-healing SSH client whose connection strategy
// ("how we connect" — directly or through jump hosts) is separated from the
// operations performed over it ("what we do" — currently tunneling).
//
// The injection point is the Dialer: Route builds one for a direct connection or
// an arbitrary chain of jump hosts. New opens a Client over a Dialer and hides
// every reconnect: callers invoke methods and never reason about reconnection.
// All operations funnel through a single reconnect-aware executor (withConn) over
// a shared connection core (conn), so future operations such as Run and Upload
// can be added without touching the healing logic.
//
// The primary use case is opening a tunnel to the API server of a closed
// Kubernetes cluster and pointing a kubeconfig at it:
//
// c, _ := ssh.New(ctx, ssh.Route(jumpEp, targetEp))
// defer c.Close()
// t, _ := c.Tunnel(ctx, 6443)
// defer t.Close()
// rest := &rest.Config{Host: "https://" + t.LocalAddr()}
package ssh

import (
"context"
"errors"
"log/slog"
)

type Client struct {
conn *conn
retries int
log *slog.Logger
}

func New(ctx context.Context, d Dialer, opts ...Option) (*Client, error) {
if d == nil {
return nil, errors.New("ssh: nil dialer")
}

o := defaultOptions()
for _, opt := range opts {
opt(&o)
}

if hkd, ok := d.(hostKeyDefaulter); ok {
hkd.setDefaultHostKey(o.hostKey)
}

core, err := newConn(ctx, d, o)
if err != nil {
return nil, err
}

return &Client{conn: core, retries: o.retries, log: o.log}, nil
}

func (c *Client) Close() error {
return c.conn.Close()
}
Loading
Loading