diff --git a/README.md b/README.md index e776e98..46c95ab 100644 --- a/README.md +++ b/README.md @@ -79,8 +79,6 @@ Testkit-specific env variables: - `CSI_CEPH_OSD_STORAGE_CLASS` — pre-existing block-mode StorageClass used to back Rook OSD PVCs. When empty, a `sds-local-volume` Thick SC is auto-provisioned via `EnsureDefaultStorageClass`. -- `CSI_CEPH_MODULE_PULL_OVERRIDE` — image tag for `csi-ceph`'s - ModulePullOverride (dev registries only, e.g. when testing a PR build). #### `modulePullOverride` in `cluster_config.yml` @@ -95,7 +93,32 @@ dkpParameters: modulePullOverride: pr131 ``` -`${VAR}` placeholders in `modulePullOverride` are rejected at config load time. +`${VAR}` placeholders **inside** `modulePullOverride` are rejected at config +load time — the static file stays literal and readable. + +##### Per-module env override (for CI) + +To point the module-under-test at a CI image without editing the committed +YAML, set the per-module env var `_MODULE_PULL_OVERRIDE` (the module +name upper-cased, `-`/`.` → `_`). It overrides the static value at load time; +unset modules keep their YAML default. Examples: + +- `csi-ceph` → `CSI_CEPH_MODULE_PULL_OVERRIDE` +- `sds-elastic` → `SDS_ELASTIC_MODULE_PULL_OVERRIDE` + +```bash +SDS_ELASTIC_MODULE_PULL_OVERRIDE=pr123 go test ./tests/ +``` + +Each applied override is logged at load time, naming both the static tag and +the env var that takes precedence, e.g.: + +``` +modulePullOverride[sds-elastic]: cluster_config.yml pins tag "main", but SDS_ELASTIC_MODULE_PULL_OVERRIDE="pr123" is set — using tag "pr123" for this test run +``` + +A single global tag (e.g. `MODULE_IMAGE_TAG`) is intentionally **not** used: +configs with several dev modules would be ambiguous. Use one var per module. ### csi-all-stress-tests diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 6d128a0..354b1b6 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -341,6 +341,7 @@ config/ ├── config.go # Main configuration operations ├── env.go # Environment variable definitions and validation ├── types.go # Configuration type definitions +├── overrides.go # Per-module modulePullOverride env overrides └── images.go # OS image URL definitions ``` @@ -780,6 +781,7 @@ logger.Error("Failed to create resource: %v", err) | `TEST_CLUSTER_CLEANUP` | `false` | Cleanup cluster after tests | | `LOG_LEVEL` | `debug` | Log level (debug/info/warn/error) | | `KUBE_CONFIG_PATH` | - | Explicit kubeconfig path. Used when SSH retrieval of `/etc/kubernetes/{super-admin,admin}.conf` from the master fails. If unset and SSH also fails, `GetKubeconfig` returns an error (no silent fallback to `~/.kube/config`). | +| `_MODULE_PULL_OVERRIDE` | - | Per-module override of a module's `modulePullOverride` at config load (module name upper-cased, non-`[A-Z0-9]` → `_`; e.g. `SDS_ELASTIC_MODULE_PULL_OVERRIDE`, `CSI_CEPH_MODULE_PULL_OVERRIDE`). Replaces the static `cluster_config.yml` tag for CI image builds (`pr`/`mr`); each applied override is logged at INFO. The static YAML stays literal — `${VAR}` inside `modulePullOverride` is still rejected. See `internal/config/overrides.go`. | ### Commander Variables (only when `TEST_CLUSTER_CREATE_MODE=commander`) | Variable | Default | Description | diff --git a/docs/FUNCTIONS_GLOSSARY.md b/docs/FUNCTIONS_GLOSSARY.md index 77faf96..d46228b 100644 --- a/docs/FUNCTIONS_GLOSSARY.md +++ b/docs/FUNCTIONS_GLOSSARY.md @@ -27,15 +27,11 @@ All exported functions available in the `pkg/` directory, grouped by resource. - [Modules](#modules) - [Retry](#retry) - [Rook Config Override](#rook-config-override) -- [Ceph Credentials](#ceph-credentials) -- [CephCluster (Rook)](#cephcluster-rook) -- [CephBlockPool (Rook)](#cephblockpool-rook) - [CephFilesystem (Rook)](#cephfilesystem-rook) -- [CephClusterConnection / CephClusterAuthentication (csi-ceph)](#cephclusterconnection--cephclusterauthentication-csi-ceph) -- [CephStorageClass (csi-ceph)](#cephstorageclass-csi-ceph) +- [ElasticCluster / ElasticStorageClass (sds-elastic)](#elasticcluster--elasticstorageclass-sds-elastic) +- [Rook verifiers (sds-elastic renamed group)](#rook-verifiers-sds-elastic-renamed-group) - [Default StorageClass (Testkit)](#default-storageclass-testkit) -- [Ceph StorageClass (Testkit)](#ceph-storageclass-testkit) -- [Ceph Cluster (Testkit) — no csi-ceph wiring](#ceph-cluster-testkit--no-csi-ceph-wiring) +- [Elastic (Testkit)](#elastic-testkit) - [Stress Tests (Testkit)](#stress-tests-testkit) - [Ceph CRC (Testkit)](#ceph-crc-testkit) @@ -198,6 +194,7 @@ All exported functions available in the `pkg/` directory, grouped by resource. - `GetConsumableBlockDevices(ctx, kubeconfig)` — Returns all consumable BlockDevices from the cluster. - `GetConsumableBlockDevicesByNode(ctx, kubeconfig, nodeName)` — Returns consumable BlockDevices for a specific node. +- `LabelBlockDevice(ctx, kubeconfig, name, labelKey, labelValue)` — Sets a label on a single `BlockDevice` CR (via the dynamic client on `BlockDeviceGVR`). Idempotent (skips the update when the label already matches) and tolerant of optimistic-concurrency conflicts. Used to mark disks eligible for adoption by an `ElasticCluster.spec.storage.blockDeviceSelector`. ## LVMVolumeGroup @@ -269,30 +266,6 @@ All exported functions available in the `pkg/` directory, grouped by resource. - `DeleteRookConfigOverride(ctx, kubeconfig, namespace)` — Removes the ConfigMap; safe if it does not exist. - `RenderCephGlobalConfig(globals)` — Pure helper that renders a `[global]` section for `ceph.conf` from a `map[string]string`. Keys are sorted so the output is byte-stable across calls with logically-equivalent maps (used by `SetRookConfigOverride` to avoid spurious ConfigMap updates and by callers that need to compare the desired vs. live ConfigMap content before deciding to roll daemons). -## Ceph Credentials - -`pkg/kubernetes/cephcredentials.go` - -- `WaitForCephCredentials(ctx, kubeconfig, namespace, timeout)` — Polls Rook's `rook-ceph-mon` Secret and `rook-ceph-mon-endpoints` ConfigMap until all pieces required to connect a CSI client to the cluster (`fsid`, admin user, admin key, monitor endpoints) are present. Returns a `*CephCredentials`. - -## CephCluster (Rook) - -`pkg/kubernetes/cephcluster.go` - -- `CreateCephCluster(ctx, kubeconfig, cfg)` — Creates or updates a Rook `CephCluster` CR using `CephClusterConfig` (image, mon/mgr counts, network provider, OSD storage class / count / size, data-dir host path, etc.). Idempotent. **Fail-fast:** if an existing CR has `metadata.deletionTimestamp != nil`, returns an error instead of trying to update a Terminating object (which would silently no-op and trap the next `WaitForCephClusterReady` for 15-20 minutes). -- `WaitForCephClusterReady(ctx, kubeconfig, namespace, name, timeout)` — Blocks until `status.state == "Created"` (or `status.phase == "Ready"`). HEALTH_WARN is tolerated so single-OSD test clusters still succeed. Each Get is bounded by `PollGetTimeout` (30s) and consecutive Get failures emit WARN, so a dropped SSH tunnel surfaces in seconds instead of after the readyTimeout. **Fail-fast** when the CR comes back with `deletionTimestamp != nil` — there's no point waiting for Ready on a Terminating object. -- `DeleteCephCluster(ctx, kubeconfig, namespace, name)` — Fire-and-forget delete; NotFound is treated as success. Does NOT garbage-collect OSD data on host disks. Pair with `WaitForCephClusterGone` if the next step depends on the CR being fully GC'd (e.g. before re-creating the cluster, or to detect a stuck `cephcluster.ceph.rook.io` finalizer early). -- `WaitForCephClusterGone(ctx, kubeconfig, namespace, name, timeout)` — Polls until the CR returns NotFound (default `CephClusterGoneTimeout` = 10m when timeout is 0). Logs deletionTimestamp / finalizers progress periodically, so a stuck finalizer (typical after a teardown that left dependents alive — see `DeletionIsBlocked`) is visible immediately instead of after a silent timeout. Fail-fast on timeout: does NOT auto-strip finalizers — investigate the cluster manually before re-running. - -## CephBlockPool (Rook) - -`pkg/kubernetes/cephblockpool.go` - -- `CreateCephBlockPool(ctx, kubeconfig, cfg)` — Creates or updates a Rook `CephBlockPool` from `CephBlockPoolConfig` (replicated with optional `requireSafeReplicaSize` override, or erasure-coded with `dataChunks`/`codingChunks`; `failureDomain`). **Fail-fast** when the existing CR has `deletionTimestamp != nil`. -- `WaitForCephBlockPoolReady(ctx, kubeconfig, namespace, name, timeout)` — Polls until `status.phase == "Ready"`. Each Get is bounded by `PollGetTimeout` (30s) and consecutive Get failures emit WARN, so a dropped SSH tunnel surfaces in seconds instead of after the readyTimeout. Fail-fast on `deletionTimestamp != nil`. -- `DeleteCephBlockPool(ctx, kubeconfig, namespace, name)` — Fire-and-forget delete; NotFound is treated as success. Pair with `WaitForCephBlockPoolGone` to make sure the parent CephCluster's deletion isn't blocked by `ObjectHasDependents`. -- `WaitForCephBlockPoolGone(ctx, kubeconfig, namespace, name, timeout)` — Polls until the CR is GC'd (default `CephBlockPoolGoneTimeout` = 5m). Logs progress periodically. - ## CephFilesystem (Rook) `pkg/kubernetes/cephfilesystem.go` @@ -303,26 +276,33 @@ All exported functions available in the `pkg/` directory, grouped by resource. - `WaitForCephFilesystemGone(ctx, kubeconfig, namespace, name, timeout)` — Polls until the CR is GC'd (default `CephFilesystemGoneTimeout` = 5m). Logs progress periodically. - `CephFSDataPoolFullName(fsName, dataPoolName)` — Returns the full Ceph pool name (`-`) that should be passed to `CephStorageClass.spec.cephFS.pool`. -## CephClusterConnection / CephClusterAuthentication (csi-ceph) +## ElasticCluster / ElasticStorageClass (sds-elastic) + +`pkg/kubernetes/elasticcluster.go`, `pkg/kubernetes/elasticstorageclass.go` -`pkg/kubernetes/cephclusterconnection.go` +Low-level helpers over the cluster-scoped `storage.deckhouse.io/v1alpha1` `ElasticCluster` (`ec`) and `ElasticStorageClass` (`esc`) CRs. Both are addressed as `unstructured` via the dynamic client, so storage-e2e takes no build dependency on the sds-elastic module. Condition types and teardown reasons are mirrored as plain-string constants (`ElasticClusterCondition*`, `ElasticClusterReason*`, `ElasticStorageClassCondition*`, `ElasticStorageClassReason*`) — keep in sync with `sds-elastic/api/v1alpha1`. -- `CreateCephClusterAuthentication(ctx, kubeconfig, cfg)` — Creates or updates a `CephClusterAuthentication` CR (`userID` + `userKey`) used by csi-ceph to log in to Ceph. **Fail-fast** when the existing CR has `deletionTimestamp != nil`. -- `DeleteCephClusterAuthentication(ctx, kubeconfig, name)` — Fire-and-forget delete; NotFound is treated as success. -- `WaitForCephClusterAuthenticationGone(ctx, kubeconfig, name, timeout)` — Polls until the CR is GC'd (default `CephClusterAuthenticationGoneTimeout` = 1m). -- `CreateCephClusterConnection(ctx, kubeconfig, cfg)` — Creates or updates a `CephClusterConnection` CR (`clusterID == fsid`, `monitors`, `userID`, `userKey`). `clusterID` is immutable: existing-resource updates leave it unchanged and only sync monitors/user. **Fail-fast** when the existing CR has `deletionTimestamp != nil`. -- `DeleteCephClusterConnection(ctx, kubeconfig, name)` — Fire-and-forget delete; NotFound is treated as success. -- `WaitForCephClusterConnectionGone(ctx, kubeconfig, name, timeout)` — Polls until the CR is GC'd (default `CephClusterConnectionGoneTimeout` = 1m). -- `WaitForCephClusterConnectionCreated(ctx, kubeconfig, name, timeout)` — Polls until csi-ceph reports `status.phase == "Created"` (credentials + monitors validated against the live Ceph cluster). +- `CreateElasticCluster(ctx, kubeconfig, params)` — Creates or updates an `ElasticCluster` from `ElasticClusterParams` (name + `nodeSelector` / `blockDeviceSelector` matchLabels + optional `network.{public,cluster}`). Idempotent; **fail-fast** on a Terminating existing CR. +- `WaitForElasticClusterCondition(ctx, kubeconfig, name, condType, wantStatus, timeout)` — Blocks until the EC has the named status condition at the wanted status. Refuses to wait on a Terminating object. +- `WaitForElasticClusterReady(ctx, kubeconfig, name, timeout)` — Convenience: waits for `Ready=True`. +- `GetElasticClusterCondition(ctx, kubeconfig, name, condType)` — Single GET returning `(status, reason, message, found)`; wrap in a Gomega `Eventually`/`Consistently` to assert teardown-guard reasons on a Terminating CR. +- `GetElasticClusterCephTopology(ctx, kubeconfig, name)` — Reads `status.cephTopology` (effective mon/mgr counts + promotion reason). `found` is false until the controller records a topology. +- `DeleteElasticCluster(ctx, kubeconfig, name)` / `WaitForElasticClusterGone(ctx, kubeconfig, name, timeout)` — Fire-and-forget delete + GC wait (default `ElasticClusterGoneTimeout` = 15m). +- `CreateElasticStorageClass(ctx, kubeconfig, params)` — Creates or updates an `ElasticStorageClass` from `ElasticStorageClassParams` (`clusterRef`, `type` RBD/CephFS, `replication`). Idempotent; fail-fast on Terminating. +- `WaitForElasticStorageClassCondition` / `WaitForElasticStorageClassReady` / `GetElasticStorageClassCondition` — Same shape as the EC helpers. +- `AnnotateElasticStorageClassForceDeletion(ctx, kubeconfig, name)` — Sets `sds-elastic.deckhouse.io/force-deletion=true`, authorising the destructive purge of a non-empty RBD pool; never bypasses the bound-PV guard. +- `DeleteElasticStorageClass(ctx, kubeconfig, name)` / `WaitForElasticStorageClassGone(ctx, kubeconfig, name, timeout)` — Fire-and-forget delete + GC wait (default `ElasticStorageClassGoneTimeout` = 10m). -## CephStorageClass (csi-ceph) +## Rook verifiers (sds-elastic renamed group) -`pkg/kubernetes/cephstorageclass.go` +`pkg/kubernetes/elasticrook.go` -- `CreateCephStorageClass(ctx, kubeconfig, cfg)` — Creates or updates a csi-ceph `CephStorageClass` CR (RBD by default; CephFS when `Type == "CephFS"` and `CephFSName` / `CephFSPool` are set). The csi-ceph controller provisions a corresponding core `storage.k8s.io/v1 StorageClass` as a side effect. **Fail-fast** when the existing CR has `deletionTimestamp != nil`. -- `DeleteCephStorageClass(ctx, kubeconfig, name)` — Fire-and-forget delete; the controller removes the backing StorageClass. -- `WaitForCephStorageClassGone(ctx, kubeconfig, name, timeout)` — Polls until the CR is GC'd (default `CephStorageClassGoneTimeout` = 1m). -- `WaitForCephStorageClassCreated(ctx, kubeconfig, name, timeout)` — Polls until `status.phase == "Created"`. +The sds-elastic module ships a vendored Rook operator whose API group is renamed from upstream `ceph.rook.io` to `internal.sdselastic.deckhouse.io`. These helpers verify the Rook resources the EC controller created and that no upstream group leaked. + +- `WaitForElasticRookCephClusterReady(ctx, kubeconfig, namespace, name, timeout)` — Blocks until the renamed-group `CephCluster` reports `state=Created` (or `phase=Ready`). +- `WaitForElasticRookCephBlockPoolReady` / `WaitForElasticRookCephFilesystemReady` — Block until the renamed-group pool/filesystem reports `status.phase=Ready`. +- `ListElasticRookCephClusterNames(ctx, kubeconfig, namespace)` — Names of all renamed-group `CephCluster`s in the namespace. +- `ServerHasAPIGroup(ctx, kubeconfig, group)` — Discovery check; used to assert the upstream `ceph.rook.io` group is absent on a cluster running sds-elastic. ## Default StorageClass (Testkit) @@ -331,19 +311,17 @@ All exported functions available in the `pkg/` directory, grouped by resource. - `CreateDefaultStorageClass(ctx, kubeconfig, cfg)` — High-level helper: discovers nodes, enables sds-node-configurator/sds-local-volume modules, labels nodes, optionally attaches VirtualDisks, creates LVMVolumeGroups (Thick or Thin with thin pool), creates LocalStorageClass, waits for StorageClass. Configured via `DefaultStorageClassConfig`. - `EnsureDefaultStorageClass(ctx, kubeconfig, cfg)` — Idempotent wrapper around `CreateDefaultStorageClass`. Checks if StorageClass already exists, skips creation if so, then sets it as the cluster default via "global" ModuleConfig. -## Ceph StorageClass (Testkit) - -`pkg/testkit/ceph.go` - -- `EnsureCephStorageClass(ctx, kubeconfig, cfg)` — High-level end-to-end helper that turns an empty test cluster into one with a working csi-ceph `StorageClass`. Steps: (1) enable `sds-node-configurator`, `sds-elastic`, `csi-ceph` modules and wait Ready; (2) optionally call `EnsureDefaultStorageClass` to auto-provision a sds-local-volume SC for OSDs when `OSDStorageClass` is empty; (3) seed `rook-config-override` with `GlobalCephConfigOverrides` (e.g. `ms_crc_data=false`); (4) create Rook `CephCluster` and wait Created; (5) create the backing pool primitive — `CephBlockPool` (when `Type == "RBD"`, default) or `CephFilesystem` (when `Type == "CephFS"`) — and wait Ready; (6) read fsid/monitors/admin-key from Rook-managed secrets; (7) wire csi-ceph by creating `CephClusterAuthentication` + `CephClusterConnection`; (8) create the matching `CephStorageClass` (RBD pool or `-` for CephFS) and wait for the backing core StorageClass. Idempotent; returns the resulting StorageClass name. -- `EnsureDefaultCephStorageClass(ctx, kubeconfig, cfg)` — `EnsureCephStorageClass` + `SetGlobalDefaultStorageClass` so new PVCs without an explicit `storageClassName` use the provisioned Ceph (RBD or CephFS) class. -- `TeardownCephStorageClass(ctx, kubeconfig, cfg)` — Reverse of `EnsureCephStorageClass`. After every Delete it now waits for the CR to be fully GC'd via the matching `WaitForXxxGone` helper. Order is: `CephStorageClass` → `CephClusterConnection` → `CephClusterAuthentication` → (`CephBlockPool` or `CephFilesystem` per `cfg.Type`) → `CephCluster` (unless `SkipClusterTeardown`) → `rook-config-override` ConfigMap. Without those waits the parent `CephCluster` would be deleted before its dependents are gone, Rook would record `DeletionIsBlocked / ObjectHasDependents`, and the next test run would either find a stuck Terminating CR or hang in `WaitForCephClusterReady`. Fail-fast on a Wait*Gone timeout: errors are aggregated and returned, no auto-strip of finalizers — investigate the cluster manually before re-running. NotFound is still treated as success; subsequent deletions are still attempted on partial failures. +## Elastic (Testkit) -## Ceph Cluster (Testkit) — no csi-ceph wiring +`pkg/testkit/elastic.go` -`pkg/testkit/ceph_cluster.go` +High-level helpers that drive the sds-elastic stack end-to-end. They assume the modules (`sds-node-configurator`, `csi-ceph`, `sds-elastic`) are already enabled on the cluster (e.g. via the suite's `cluster_config.yml`); module enablement is intentionally out of scope here. Type/replication enums are re-exported (`ElasticStorageClassType*`, `ElasticReplication*`). -- `EnsureCephCluster(ctx, kubeconfig, cfg)` — "Stop-before-csi-ceph" variant of `EnsureCephStorageClass`: brings up a Rook-managed Ceph cluster + CephBlockPool via sds-elastic alone. Steps: (1) enable `sds-node-configurator` + `sds-elastic` (does **not** enable `csi-ceph`); (2) resolve/provision OSD backing StorageClass (reuses `EnsureDefaultStorageClass`); (3) seed `rook-config-override` with `GlobalCephConfigOverrides`; (4) create Rook `CephCluster` and wait Created; (5) create `CephBlockPool` and wait Ready. Does not create `CephClusterConnection`/`CephClusterAuthentication`/`CephStorageClass`. Useful when tests need a live Ceph backend to talk to directly (e.g. from within csi-ceph's own e2e) without the testkit preselecting a csi-ceph-backed StorageClass. Idempotent; returns the pool name. +- `EnsureElasticOSDBlockDevices(ctx, kubeconfig, cfg)` — Prepares raw disks for OSD adoption: resolves storage nodes (explicit list or all workers), labels them, waits for `>= MinBlockDevices` consumable `BlockDevice`s to surface on them, then labels those BDs. Returns the labelled BD names. Node/BD label key/value default to the `sds-elastic-e2e.storage.deckhouse.io/*` constants and must match the EC selectors. +- `EnsureElasticCluster(ctx, kubeconfig, cfg)` — Creates (or reuses) an `ElasticCluster` with the given selectors and waits until `Ready` (default 25m). Returns the EC name. +- `EnsureElasticStorageClass(ctx, kubeconfig, cfg)` — Creates (or reuses) an `ElasticStorageClass`, waits until `Ready` (default 10m), and confirms the 1:1-named core StorageClass exists. Returns the StorageClass name. +- `TeardownElasticStorageClass(ctx, kubeconfig, name, force, timeout)` — Optionally sets the force-deletion annotation, deletes the ESC, and waits until it is gone. Force authorises an RBD pool purge but never bypasses the bound-PV guard. +- `TeardownElasticCluster(ctx, kubeconfig, name, timeout)` — Deletes the EC and waits until the controller finalizer (Rook CephCluster + csi-ceph teardown) completes. Tear down referencing ESCs first, or the EC sticks on the non-bypassable `StorageClassesExist` guard. ## Stress Tests (Testkit) diff --git a/docs/WORKLOG.md b/docs/WORKLOG.md index c217718..5f7d321 100644 --- a/docs/WORKLOG.md +++ b/docs/WORKLOG.md @@ -4,6 +4,25 @@ All notable changes to this repository are documented here. New entries are appe --- +## 2026-06-15 + +- **Update** `pkg/kubernetes/nodegroup.go::CreateStaticNodeGroup`: wrap the existence-check + create in `retry.DoVoid` (backoff 2s→15s, ×1.5, 30 attempts, bounded by the caller's `NodeGroupTimeout` context). Right after `dhctl bootstrap` the node-manager validating webhook (`node-controller-webhook` in `d8-cloud-instance-manager`) is frequently still unreachable, so the apiserver rejects the create with a transient `InternalError` (`failed calling webhook ... connect: operation not permitted`). `retry.IsRetryable` already classifies both `InternalError` and `failed calling webhook` as transient; the loop re-reads the NodeGroup each attempt so it stays idempotent even if a prior attempt created it without us seeing the success response. + - **Why**: the suite previously failed deterministically on freshly bootstrapped clusters because the single-shot create raced the webhook's readiness. Retrying converges instead of aborting the whole run. +- **Update** `internal/config/config.go`: `NodeGroupTimeout` 2m → 4m (now a retry budget, not a single attempt) and `SecretsWaitTimeout` 2m → 10m. Bootstrap secret materialization and webhook convergence routinely exceed 2m on slower/nested clusters, so the old values produced spurious bootstrap failures. + +--- + +## 2026-06-14 + +- **Add** `internal/config/overrides.go` (`ApplyModulePullOverrideEnv`, `EnvKeyForModulePullOverride`, `ModulePullOverrideChange`): per-module env override for `modulePullOverride`, keyed by module name (`sds-elastic` → `SDS_ELASTIC_MODULE_PULL_OVERRIDE`, `csi-ceph` → `CSI_CEPH_MODULE_PULL_OVERRIDE`; name upper-cased, non-`[A-Z0-9]` → `_`). When the var is set it replaces the module's static `modulePullOverride` at config-load time; the YAML keeps a literal default (`main`). + - **Why**: a dozen-plus module e2e suites need to pin the module-under-test to a CI image tag (`pr`/`mr`) without editing the committed `cluster_config.yml`. Per-suite Makefile rendering (envsubst) does not scale, drifts across repos, and breaks plain `go test ./tests/`. Centralizing the substitution in the shared library gives every suite one contract. + - **Why per-module, not a single global `${VAR}`**: directly addresses the 2026-05-20 review objection — a global `MODULE_IMAGE_TAG` is ambiguous when several modules in one config need different tags. A per-module key is explicit and matches the pre-existing `CSI_CEPH_MODULE_PULL_OVERRIDE` README precedent. In-YAML `${...}` stays rejected by `ValidateModulePullOverrides`; the env override is a separate, explicit channel applied right before validation. +- **Update** `pkg/cluster/cluster.go::loadClusterConfigFromPath` and `internal/cluster/cluster.go::LoadClusterConfig`: call `ApplyModulePullOverrideEnv` after `yaml.Unmarshal` / before `ValidateModulePullOverrides`, logging each applied override at INFO and naming BOTH the static `cluster_config.yml` tag and the env var/tag that wins, e.g. `modulePullOverride[sds-elastic]: cluster_config.yml pins tag "main", but SDS_ELASTIC_MODULE_PULL_OVERRIDE="pr123" is set — using tag "pr123" for this test run`. +- **Add** `internal/config/overrides_test.go`: env-key normalization plus override / no-env / equal-value / empty-YAML-default cases. +- **Update** `README.md`: documented the `_MODULE_PULL_OVERRIDE` per-module override, the load-time log line, and why a single global tag is intentionally avoided. + +--- + ## 2026-05-06 - **Add** `UploadPrivate` on `ssh.SSHClient` (`internal/infrastructure/ssh`): SFTP `Chmod` immediately after `Create`, before payload copy; `uploadOverSFTPOnce`, `uploadWithSFTPRetries`, `jumpUploadWithSFTPRetries`; passphrase `BootstrapCluster` uses it with `install -d -m 0700` staging (`pkg/cluster/setup.go`); ARCHITECTURE mentions ssh uploads diff --git a/internal/cluster/cluster.go b/internal/cluster/cluster.go index caee1c9..05d0b1f 100644 --- a/internal/cluster/cluster.go +++ b/internal/cluster/cluster.go @@ -76,6 +76,13 @@ func LoadClusterConfig(configFilename string) (*config.ClusterDefinition, error) return nil, fmt.Errorf("failed to parse YAML config: %w", err) } + // Apply per-module modulePullOverride env overrides (e.g. + // SDS_ELASTIC_MODULE_PULL_OVERRIDE) before validation, logging each one so + // the running image tag's source is unambiguous. + for _, ch := range config.ApplyModulePullOverrideEnv(&clusterDef) { + logger.Info("%s", ch.LogLine()) + } + if err := config.ValidateModulePullOverrides(&clusterDef); err != nil { return nil, err } diff --git a/internal/config/config.go b/internal/config/config.go index 9b19b18..92fcb4c 100644 --- a/internal/config/config.go +++ b/internal/config/config.go @@ -56,8 +56,8 @@ const ( // Kubernetes operations ModuleCheckTimeout = 10 * time.Second // Timeout for checking module status NamespaceTimeout = 30 * time.Second // Timeout for creating namespace - NodeGroupTimeout = 2 * time.Minute // Timeout for creating NodeGroup (API can be slow right after bootstrap) - SecretsWaitTimeout = 2 * time.Minute // Timeout for waiting for bootstrap secrets to appear + NodeGroupTimeout = 4 * time.Minute // Timeout (with retries) for creating NodeGroup; the node-manager validating webhook is often unreachable for a while right after bootstrap + SecretsWaitTimeout = 10 * time.Minute // Timeout for waiting for bootstrap secrets to appear ClusterHealthTimeout = 15 * time.Minute // Timeout for cluster health check ModuleDeployTimeout = 15 * time.Minute // Timeout for waiting for ONE module to be ready diff --git a/internal/config/overrides.go b/internal/config/overrides.go new file mode 100644 index 0000000..e7e54d6 --- /dev/null +++ b/internal/config/overrides.go @@ -0,0 +1,113 @@ +/* +Copyright 2026 Flant JSC + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package config + +import ( + "fmt" + "os" + "strings" +) + +// ModulePullOverrideEnvSuffix is appended to the normalized module name to form +// the per-module env var that overrides modulePullOverride. For example module +// "sds-elastic" maps to "SDS_ELASTIC_MODULE_PULL_OVERRIDE". +const ModulePullOverrideEnvSuffix = "_MODULE_PULL_OVERRIDE" + +// ModulePullOverrideDefaultTag is the image tag storage-e2e applies for dev +// registries when a module declares no modulePullOverride. It is surfaced here +// only so logs can name the effective default when the YAML value was empty. +const ModulePullOverrideDefaultTag = "main" + +// EnvKeyForModulePullOverride returns the per-module env var name that overrides +// a module's modulePullOverride. The module name is upper-cased and every +// character invalid in a shell env var (anything outside [A-Z0-9]) is replaced +// with '_', so "sds-elastic" -> "SDS_ELASTIC_MODULE_PULL_OVERRIDE" and +// "csi-ceph" -> "CSI_CEPH_MODULE_PULL_OVERRIDE". +func EnvKeyForModulePullOverride(moduleName string) string { + norm := strings.Map(func(r rune) rune { + switch { + case r >= 'a' && r <= 'z': + return r - ('a' - 'A') + case r >= 'A' && r <= 'Z', r >= '0' && r <= '9': + return r + default: + return '_' + } + }, moduleName) + return norm + ModulePullOverrideEnvSuffix +} + +// ModulePullOverrideChange records a single env-driven override of a module's +// modulePullOverride so the caller can log it explicitly. +type ModulePullOverrideChange struct { + Module string // module name + EnvVar string // env var that triggered the override + FromYAML string // value declared in cluster_config.yml ("" when unset) + ToEnv string // value taken from the env var (the effective tag) +} + +// LogLine renders a human-readable explanation of the override, naming BOTH the +// static cluster_config.yml value and the env var/tag that takes precedence, so +// the test output makes the source of the running image tag unambiguous. +func (c ModulePullOverrideChange) LogLine() string { + if c.FromYAML == "" { + return fmt.Sprintf( + "modulePullOverride[%s]: cluster_config.yml sets no tag (effective default %q), but %s=%q is set — using tag %q for this test run", + c.Module, ModulePullOverrideDefaultTag, c.EnvVar, c.ToEnv, c.ToEnv, + ) + } + return fmt.Sprintf( + "modulePullOverride[%s]: cluster_config.yml pins tag %q, but %s=%q is set — using tag %q for this test run", + c.Module, c.FromYAML, c.EnvVar, c.ToEnv, c.ToEnv, + ) +} + +// ApplyModulePullOverrideEnv overrides each module's ModulePullOverride from its +// per-module env var (see EnvKeyForModulePullOverride) when that var is set and +// differs from the static value. This is the sanctioned, per-module channel for +// pointing the module-under-test at a CI image tag (pr/mr/main) without +// editing the committed cluster_config.yml — chosen over a single global +// MODULE_IMAGE_TAG so configs with several dev modules stay unambiguous. +// +// In-YAML ${VAR} templating remains unsupported (ValidateModulePullOverrides +// rejects it): the static file keeps literal, readable defaults and this env +// channel is applied right before validation. Returns the applied changes so +// the caller (which owns the logger) can report them; mutates def in place. +func ApplyModulePullOverrideEnv(def *ClusterDefinition) []ModulePullOverrideChange { + if def == nil { + return nil + } + var changes []ModulePullOverrideChange + for _, m := range def.DKPParameters.Modules { + if m == nil { + continue + } + key := EnvKeyForModulePullOverride(m.Name) + val := strings.TrimSpace(os.Getenv(key)) + if val == "" || val == m.ModulePullOverride { + continue + } + changes = append(changes, ModulePullOverrideChange{ + Module: m.Name, + EnvVar: key, + FromYAML: m.ModulePullOverride, + ToEnv: val, + }) + m.ModulePullOverride = val + } + return changes +} diff --git a/internal/config/overrides_test.go b/internal/config/overrides_test.go new file mode 100644 index 0000000..8875e78 --- /dev/null +++ b/internal/config/overrides_test.go @@ -0,0 +1,107 @@ +/* +Copyright 2026 Flant JSC + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package config + +import ( + "strings" + "testing" +) + +func TestEnvKeyForModulePullOverride(t *testing.T) { + cases := map[string]string{ + "sds-elastic": "SDS_ELASTIC_MODULE_PULL_OVERRIDE", + "csi-ceph": "CSI_CEPH_MODULE_PULL_OVERRIDE", + "sds-node-configurator": "SDS_NODE_CONFIGURATOR_MODULE_PULL_OVERRIDE", + "snapshot-controller": "SNAPSHOT_CONTROLLER_MODULE_PULL_OVERRIDE", + } + for module, want := range cases { + if got := EnvKeyForModulePullOverride(module); got != want { + t.Errorf("EnvKeyForModulePullOverride(%q) = %q, want %q", module, got, want) + } + } +} + +func newDef(modules ...*ModuleConfig) *ClusterDefinition { + return &ClusterDefinition{DKPParameters: DKPParameters{Modules: modules}} +} + +func TestApplyModulePullOverrideEnv_OverridesAndRecords(t *testing.T) { + t.Setenv("SDS_ELASTIC_MODULE_PULL_OVERRIDE", "pr123") + + def := newDef( + &ModuleConfig{Name: "sds-elastic", ModulePullOverride: "main"}, + &ModuleConfig{Name: "csi-ceph", ModulePullOverride: "main"}, + ) + + changes := ApplyModulePullOverrideEnv(def) + if len(changes) != 1 { + t.Fatalf("expected 1 change, got %d: %+v", len(changes), changes) + } + if got := def.DKPParameters.Modules[0].ModulePullOverride; got != "pr123" { + t.Errorf("sds-elastic ModulePullOverride = %q, want pr123", got) + } + if got := def.DKPParameters.Modules[1].ModulePullOverride; got != "main" { + t.Errorf("csi-ceph ModulePullOverride = %q, want main (untouched)", got) + } + + ch := changes[0] + if ch.Module != "sds-elastic" || ch.EnvVar != "SDS_ELASTIC_MODULE_PULL_OVERRIDE" || + ch.FromYAML != "main" || ch.ToEnv != "pr123" { + t.Errorf("unexpected change: %+v", ch) + } + line := ch.LogLine() + for _, want := range []string{"sds-elastic", `"main"`, "SDS_ELASTIC_MODULE_PULL_OVERRIDE", `"pr123"`} { + if !strings.Contains(line, want) { + t.Errorf("LogLine() = %q, missing %q", line, want) + } + } +} + +func TestApplyModulePullOverrideEnv_NoEnvIsNoop(t *testing.T) { + def := newDef(&ModuleConfig{Name: "sds-elastic", ModulePullOverride: "main"}) + if changes := ApplyModulePullOverrideEnv(def); len(changes) != 0 { + t.Fatalf("expected no changes without env, got %+v", changes) + } + if got := def.DKPParameters.Modules[0].ModulePullOverride; got != "main" { + t.Errorf("ModulePullOverride = %q, want main (untouched)", got) + } +} + +func TestApplyModulePullOverrideEnv_EqualValueIsNoop(t *testing.T) { + t.Setenv("SDS_ELASTIC_MODULE_PULL_OVERRIDE", "main") + def := newDef(&ModuleConfig{Name: "sds-elastic", ModulePullOverride: "main"}) + if changes := ApplyModulePullOverrideEnv(def); len(changes) != 0 { + t.Fatalf("expected no changes when env equals YAML, got %+v", changes) + } +} + +func TestApplyModulePullOverrideEnv_EmptyYAMLDefaultLogged(t *testing.T) { + t.Setenv("SDS_ELASTIC_MODULE_PULL_OVERRIDE", "pr7") + def := newDef(&ModuleConfig{Name: "sds-elastic"}) + + changes := ApplyModulePullOverrideEnv(def) + if len(changes) != 1 { + t.Fatalf("expected 1 change, got %d", len(changes)) + } + if got := def.DKPParameters.Modules[0].ModulePullOverride; got != "pr7" { + t.Errorf("ModulePullOverride = %q, want pr7", got) + } + // With no static value the log should name the effective default tag. + if line := changes[0].LogLine(); !strings.Contains(line, ModulePullOverrideDefaultTag) { + t.Errorf("LogLine() = %q, expected to mention default %q", line, ModulePullOverrideDefaultTag) + } +} diff --git a/pkg/cluster/cluster.go b/pkg/cluster/cluster.go index a7ccb69..1255841 100644 --- a/pkg/cluster/cluster.go +++ b/pkg/cluster/cluster.go @@ -149,6 +149,13 @@ func loadClusterConfigFromPath(configPath string) (*config.ClusterDefinition, er return nil, fmt.Errorf("failed to parse YAML config: %w", err) } + // Apply per-module modulePullOverride env overrides (e.g. + // SDS_ELASTIC_MODULE_PULL_OVERRIDE) before validation, logging each one so + // the running image tag's source is unambiguous. + for _, ch := range config.ApplyModulePullOverrideEnv(&clusterDef) { + logger.Info("%s", ch.LogLine()) + } + if err := config.ValidateModulePullOverrides(&clusterDef); err != nil { return nil, err } diff --git a/pkg/kubernetes/blockdevice.go b/pkg/kubernetes/blockdevice.go index 769d7fc..979fa64 100644 --- a/pkg/kubernetes/blockdevice.go +++ b/pkg/kubernetes/blockdevice.go @@ -20,12 +20,25 @@ import ( "context" "fmt" + apierrors "k8s.io/apimachinery/pkg/api/errors" + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + "k8s.io/apimachinery/pkg/runtime/schema" "k8s.io/client-go/rest" "github.com/deckhouse/storage-e2e/internal/kubernetes/storage" "github.com/deckhouse/storage-e2e/internal/logger" ) +// BlockDeviceGVR is the GroupVersionResource of the sds-node-configurator +// BlockDevice CR (cluster-scoped). Used to label individual BlockDevices so a +// selector (e.g. ElasticCluster.spec.storage.blockDeviceSelector) can adopt +// them for OSDs. +var BlockDeviceGVR = schema.GroupVersionResource{ + Group: "storage.deckhouse.io", + Version: "v1alpha1", + Resource: "blockdevices", +} + // BlockDevice represents a block device in the cluster (re-export for public API) type BlockDevice = storage.BlockDeviceInfo @@ -68,3 +81,45 @@ func GetConsumableBlockDevicesByNode(ctx context.Context, kubeconfig *rest.Confi logger.Debug("Found %d consumable BlockDevices on node %s", len(blockDevices), nodeName) return blockDevices, nil } + +// LabelBlockDevice sets a label on a single BlockDevice CR. Idempotent (skips +// the update when the label already has the desired value) and tolerant of +// optimistic-concurrency conflicts. Used to mark BlockDevices eligible for +// adoption by an ElasticCluster's blockDeviceSelector. +func LabelBlockDevice(ctx context.Context, kubeconfig *rest.Config, name, labelKey, labelValue string) error { + dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) + if err != nil { + return fmt.Errorf("failed to create dynamic client: %w", err) + } + + const maxRetries = 5 + var lastErr error + for attempt := 0; attempt < maxRetries; attempt++ { + bd, err := dynamicClient.Resource(BlockDeviceGVR).Get(ctx, name, metav1.GetOptions{}) + if err != nil { + return fmt.Errorf("failed to get BlockDevice %s: %w", name, err) + } + labels := bd.GetLabels() + if labels[labelKey] == labelValue { + logger.Debug("BlockDevice %s already has label %s=%s", name, labelKey, labelValue) + return nil + } + if labels == nil { + labels = map[string]string{} + } + labels[labelKey] = labelValue + bd.SetLabels(labels) + + _, lastErr = dynamicClient.Resource(BlockDeviceGVR).Update(ctx, bd, metav1.UpdateOptions{}) + if lastErr == nil { + logger.Info("Labeled BlockDevice %s with %s=%s", name, labelKey, labelValue) + return nil + } + if apierrors.IsConflict(lastErr) { + logger.Debug("Conflict labeling BlockDevice %s (attempt %d/%d), retrying...", name, attempt+1, maxRetries) + continue + } + return fmt.Errorf("failed to label BlockDevice %s: %w", name, lastErr) + } + return fmt.Errorf("failed to label BlockDevice %s after %d attempts: %w", name, maxRetries, lastErr) +} diff --git a/pkg/kubernetes/cephblockpool.go b/pkg/kubernetes/cephblockpool.go deleted file mode 100644 index 8ad2dfc..0000000 --- a/pkg/kubernetes/cephblockpool.go +++ /dev/null @@ -1,225 +0,0 @@ -/* -Copyright 2025 Flant JSC - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package kubernetes - -import ( - "context" - "fmt" - "time" - - apierrors "k8s.io/apimachinery/pkg/api/errors" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" - "k8s.io/apimachinery/pkg/runtime/schema" - "k8s.io/client-go/rest" - - "github.com/deckhouse/storage-e2e/internal/logger" -) - -// CephBlockPoolGVR is the GroupVersionResource of Rook's CephBlockPool. -var CephBlockPoolGVR = schema.GroupVersionResource{ - Group: "ceph.rook.io", - Version: "v1", - Resource: "cephblockpools", -} - -// CephBlockPoolConfig describes a minimal replicated or erasure-coded Ceph -// RBD pool managed by Rook. Exactly one of ReplicaSize or ErasureCoded must -// be set; leaving both zero defaults to a single-replica pool suitable for -// single-node test clusters. -type CephBlockPoolConfig struct { - // Name of the CephBlockPool CR (also becomes the Ceph pool name). - Name string - - // Namespace the Rook operator watches (typically "d8-sds-elastic"). - Namespace string - - // FailureDomain is the CRUSH failure domain: "host" or "osd" (default: "host"). - FailureDomain string - - // --- Replicated pool knobs (used when ErasureCoded is nil) --- - - // ReplicaSize is the number of object copies. Default: 1. - ReplicaSize int - - // RequireSafeReplicaSize toggles Ceph's safeguard against single-replica - // pools. When nil, it is set to `false` for ReplicaSize==1 (unsafe single - // replica, accepted for e2e test clusters) and left unset otherwise. - RequireSafeReplicaSize *bool - - // --- Erasure-coded pool knobs --- - - // ErasureCoded, when non-nil, produces an EC pool instead of a replicated - // one. Its fields map to `spec.erasureCoded.{dataChunks,codingChunks}`. - ErasureCoded *CephBlockPoolErasureCoded -} - -// CephBlockPoolErasureCoded configures a Ceph erasure-coded RBD pool. -type CephBlockPoolErasureCoded struct { - DataChunks int - CodingChunks int -} - -// CreateCephBlockPool creates (or updates, if already present) a CephBlockPool -// in the given namespace from the provided configuration. It is idempotent and -// safe to call on every test run. -func CreateCephBlockPool(ctx context.Context, kubeconfig *rest.Config, cfg CephBlockPoolConfig) error { - if cfg.Name == "" { - return fmt.Errorf("CephBlockPool name is required") - } - if cfg.Namespace == "" { - return fmt.Errorf("CephBlockPool namespace is required") - } - if cfg.ErasureCoded == nil && cfg.ReplicaSize <= 0 { - cfg.ReplicaSize = 1 - } - if cfg.FailureDomain == "" { - cfg.FailureDomain = "host" - } - - spec := map[string]interface{}{ - "failureDomain": cfg.FailureDomain, - } - - if cfg.ErasureCoded != nil { - if cfg.ErasureCoded.DataChunks <= 0 || cfg.ErasureCoded.CodingChunks <= 0 { - return fmt.Errorf("ErasureCoded pool requires positive dataChunks and codingChunks") - } - spec["erasureCoded"] = map[string]interface{}{ - "dataChunks": int64(cfg.ErasureCoded.DataChunks), - "codingChunks": int64(cfg.ErasureCoded.CodingChunks), - } - } else { - replicated := map[string]interface{}{ - "size": int64(cfg.ReplicaSize), - } - requireSafe := cfg.RequireSafeReplicaSize - if requireSafe == nil && cfg.ReplicaSize == 1 { - f := false - requireSafe = &f - } - if requireSafe != nil { - replicated["requireSafeReplicaSize"] = *requireSafe - } - spec["replicated"] = replicated - } - - obj := &unstructured.Unstructured{ - Object: map[string]interface{}{ - "apiVersion": "ceph.rook.io/v1", - "kind": "CephBlockPool", - "metadata": map[string]interface{}{ - "name": cfg.Name, - "namespace": cfg.Namespace, - }, - "spec": spec, - }, - } - - dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) - if err != nil { - return fmt.Errorf("failed to create dynamic client: %w", err) - } - - logger.Info("Creating CephBlockPool %s/%s", cfg.Namespace, cfg.Name) - _, err = dynamicClient.Resource(CephBlockPoolGVR).Namespace(cfg.Namespace).Create(ctx, obj, metav1.CreateOptions{}) - if err == nil { - logger.Success("CephBlockPool %s/%s created", cfg.Namespace, cfg.Name) - return nil - } - if !apierrors.IsAlreadyExists(err) { - return fmt.Errorf("failed to create CephBlockPool %s/%s: %w", cfg.Namespace, cfg.Name, err) - } - - logger.Info("CephBlockPool %s/%s already exists, updating spec", cfg.Namespace, cfg.Name) - existing, err := dynamicClient.Resource(CephBlockPoolGVR).Namespace(cfg.Namespace).Get(ctx, cfg.Name, metav1.GetOptions{}) - if err != nil { - return fmt.Errorf("failed to fetch existing CephBlockPool %s/%s: %w", cfg.Namespace, cfg.Name, err) - } - if err := errIfTerminating(existing, "CephBlockPool", formatRef(cfg.Namespace, cfg.Name)); err != nil { - return err - } - existing.Object["spec"] = spec - if _, err := dynamicClient.Resource(CephBlockPoolGVR).Namespace(cfg.Namespace).Update(ctx, existing, metav1.UpdateOptions{}); err != nil { - return fmt.Errorf("failed to update CephBlockPool %s/%s: %w", cfg.Namespace, cfg.Name, err) - } - return nil -} - -// WaitForCephBlockPoolReady blocks until the CephBlockPool reports -// `status.phase == "Ready"`. Rook transitions the pool from Progressing to -// Ready once the Ceph OSDs have accepted the new pool and its CRUSH rule. -// -// Per-call deadlines and loud (WARN) logging on consecutive network failures -// are inherited from pollResourceUntilReady, so a dropped SSH tunnel surfaces -// in seconds instead of after the parent timeout. -func WaitForCephBlockPoolReady(ctx context.Context, kubeconfig *rest.Config, namespace, name string, timeout time.Duration) error { - return pollResourceUntilReady( - ctx, kubeconfig, CephBlockPoolGVR, namespace, name, - timeout, PollTickInterval, "CephBlockPool", - func(obj *unstructured.Unstructured) (bool, string) { - phase, _, _ := unstructured.NestedString(obj.Object, "status", "phase") - if phase == "Ready" { - return true, "phase=Ready" - } - logger.Debug("CephBlockPool %s/%s phase: %q, waiting...", obj.GetNamespace(), obj.GetName(), phase) - return false, "" - }, - ) -} - -// DeleteCephBlockPool deletes a CephBlockPool. Safe to call if the pool does -// not exist. NOTE: this is fire-and-forget — the API call returns as soon as -// the apiserver accepts the request, but Rook may still be running its -// finalizer (`cephblockpool.ceph.rook.io`) for a few minutes afterwards. If -// you want to be certain the CR is fully gone before continuing, follow up -// with WaitForCephBlockPoolGone. -func DeleteCephBlockPool(ctx context.Context, kubeconfig *rest.Config, namespace, name string) error { - dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) - if err != nil { - return fmt.Errorf("failed to create dynamic client: %w", err) - } - - if err := dynamicClient.Resource(CephBlockPoolGVR).Namespace(namespace).Delete(ctx, name, metav1.DeleteOptions{}); err != nil { - if apierrors.IsNotFound(err) { - return nil - } - return fmt.Errorf("failed to delete CephBlockPool %s/%s: %w", namespace, name, err) - } - logger.Info("Deleted CephBlockPool %s/%s", namespace, name) - return nil -} - -// CephBlockPoolGoneTimeout is the default budget for WaitForCephBlockPoolGone. -// Rook removes the underlying RBD pool from Ceph before lifting the -// finalizer; with one OSD the pool delete normally completes in seconds but -// can take a few minutes if the cluster is unhealthy. -const CephBlockPoolGoneTimeout = 5 * time.Minute - -// WaitForCephBlockPoolGone polls until the CephBlockPool is fully GC'd by -// Kubernetes (GET returns NotFound). Use this after DeleteCephBlockPool to -// be sure the parent CephCluster won't be blocked by `ObjectHasDependents` -// when it gets deleted next. -func WaitForCephBlockPoolGone(ctx context.Context, kubeconfig *rest.Config, namespace, name string, timeout time.Duration) error { - if timeout <= 0 { - timeout = CephBlockPoolGoneTimeout - } - return pollResourceUntilGone( - ctx, kubeconfig, CephBlockPoolGVR, namespace, name, - timeout, PollTickInterval, "CephBlockPool", - ) -} diff --git a/pkg/kubernetes/cephcluster.go b/pkg/kubernetes/cephcluster.go deleted file mode 100644 index 501d8d8..0000000 --- a/pkg/kubernetes/cephcluster.go +++ /dev/null @@ -1,411 +0,0 @@ -/* -Copyright 2025 Flant JSC - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package kubernetes - -import ( - "context" - "fmt" - "time" - - apierrors "k8s.io/apimachinery/pkg/api/errors" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" - "k8s.io/apimachinery/pkg/runtime/schema" - "k8s.io/client-go/rest" - - "github.com/deckhouse/storage-e2e/internal/logger" -) - -// CephClusterGVR is the GroupVersionResource of Rook's CephCluster. -var CephClusterGVR = schema.GroupVersionResource{ - Group: "ceph.rook.io", - Version: "v1", - Resource: "cephclusters", -} - -// Defaults shared between CephClusterConfig and the testkit-level helper. -const ( - DefaultRookNamespace = "d8-sds-elastic" - DefaultCephClusterName = "ceph-cluster" - DefaultCephImage = "quay.io/ceph/ceph:v18.2.7" - DefaultDataDirHostPath = "/var/lib/rook" - DefaultOSDStorageClassSize = "10Gi" -) - -// CephClusterConfig describes a Rook-managed Ceph cluster suitable for e2e -// testing. It is intentionally narrower than Rook's native CephCluster CRD: -// knobs that don't matter for our scenarios are hidden behind hard-coded -// defaults (mirroring the values from the internal Flant wiki instruction -// on deploying sds-elastic + Rook + Ceph on LVM). -type CephClusterConfig struct { - // Name of the CephCluster (default: "ceph-cluster"). - Name string - - // Namespace where Rook watches (default: "d8-sds-elastic"). - Namespace string - - // CephImage is the Ceph container image tag. - // Default: "quay.io/ceph/ceph:v18.2.7". - CephImage string - - // AllowUnsupportedCephVersion flips spec.cephVersion.allowUnsupported. - // Default: true (e2e clusters are allowed to run any version Ceph ships). - AllowUnsupportedCephVersion *bool - - // MonCount / MgrCount are the Rook mon/mgr replica counts. Defaults: - // 1 / 1, which is appropriate for single-node / tiny test clusters. - MonCount int - MgrCount int - - // AllowMultipleMonPerNode allows multiple mons on the same node - // (required for single-node clusters). Default: true. - AllowMultipleMonPerNode *bool - - // DataDirHostPath is where Rook persists mon/OSD data on each node. - // Default: "/var/lib/rook". - DataDirHostPath string - - // NetworkProvider selects the Rook networking mode. Supported values: - // "" — default CNI pod network (suitable for in-cluster e2e); - // "host" — host networking (matches the Flant wiki production layout). - NetworkProvider string - - // PublicNetworkCIDRs / ClusterNetworkCIDRs are the public/cluster CIDRs - // plumbed into `spec.network.addressRanges` when NetworkProvider is - // non-empty. They are ignored for the default (CNI) mode. - PublicNetworkCIDRs []string - ClusterNetworkCIDRs []string - - // --- OSD backing --- - - // OSDStorageClass is the name of a k8s StorageClass able to hand out - // block-mode PVCs. Those PVCs are used by Rook's - // `storage.storageClassDeviceSets` to back OSDs. - OSDStorageClass string - - // OSDCount is the number of OSDs to provision (default: 1). - OSDCount int - - // OSDSize is the size of each OSD PVC (default: "10Gi"). - OSDSize string - - // OSDDeviceSetName is the `storageClassDeviceSets[].name` (default: - // "set1"). Changing it is useful mostly for debugging. - OSDDeviceSetName string -} - -func (c *CephClusterConfig) applyDefaults() { - if c.Name == "" { - c.Name = DefaultCephClusterName - } - if c.Namespace == "" { - c.Namespace = DefaultRookNamespace - } - if c.CephImage == "" { - c.CephImage = DefaultCephImage - } - if c.AllowUnsupportedCephVersion == nil { - t := true - c.AllowUnsupportedCephVersion = &t - } - if c.MonCount <= 0 { - c.MonCount = 1 - } - if c.MgrCount <= 0 { - c.MgrCount = 1 - } - if c.AllowMultipleMonPerNode == nil { - t := true - c.AllowMultipleMonPerNode = &t - } - if c.DataDirHostPath == "" { - c.DataDirHostPath = DefaultDataDirHostPath - } - if c.OSDCount <= 0 { - c.OSDCount = 1 - } - if c.OSDSize == "" { - c.OSDSize = DefaultOSDStorageClassSize - } - if c.OSDDeviceSetName == "" { - c.OSDDeviceSetName = "set1" - } -} - -// CreateCephCluster creates (or updates) a CephCluster in the given namespace. -// It is idempotent: if the resource already exists, its spec is overwritten -// with the freshly-rendered one so callers can tweak `CephClusterConfig` and -// re-apply without manual cleanup. -func CreateCephCluster(ctx context.Context, kubeconfig *rest.Config, cfg CephClusterConfig) error { - cfg.applyDefaults() - - if cfg.OSDStorageClass == "" { - return fmt.Errorf("CephCluster requires OSDStorageClass (backing StorageClass for OSD PVCs)") - } - - spec := buildCephClusterSpec(cfg) - - obj := &unstructured.Unstructured{ - Object: map[string]interface{}{ - "apiVersion": "ceph.rook.io/v1", - "kind": "CephCluster", - "metadata": map[string]interface{}{ - "name": cfg.Name, - "namespace": cfg.Namespace, - }, - "spec": spec, - }, - } - - dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) - if err != nil { - return fmt.Errorf("failed to create dynamic client: %w", err) - } - - logger.Info("Creating CephCluster %s/%s (image=%s, mon=%d, mgr=%d, osd=%d x %s on SC %s)", - cfg.Namespace, cfg.Name, cfg.CephImage, cfg.MonCount, cfg.MgrCount, cfg.OSDCount, cfg.OSDSize, cfg.OSDStorageClass) - - _, err = dynamicClient.Resource(CephClusterGVR).Namespace(cfg.Namespace).Create(ctx, obj, metav1.CreateOptions{}) - if err == nil { - logger.Success("CephCluster %s/%s created", cfg.Namespace, cfg.Name) - return nil - } - if !apierrors.IsAlreadyExists(err) { - return fmt.Errorf("failed to create CephCluster %s/%s: %w", cfg.Namespace, cfg.Name, err) - } - - logger.Info("CephCluster %s/%s already exists, updating spec", cfg.Namespace, cfg.Name) - existing, err := dynamicClient.Resource(CephClusterGVR).Namespace(cfg.Namespace).Get(ctx, cfg.Name, metav1.GetOptions{}) - if err != nil { - return fmt.Errorf("failed to fetch existing CephCluster %s/%s: %w", cfg.Namespace, cfg.Name, err) - } - if err := errIfTerminating(existing, "CephCluster", formatRef(cfg.Namespace, cfg.Name)); err != nil { - return err - } - existing.Object["spec"] = spec - if _, err := dynamicClient.Resource(CephClusterGVR).Namespace(cfg.Namespace).Update(ctx, existing, metav1.UpdateOptions{}); err != nil { - return fmt.Errorf("failed to update CephCluster %s/%s: %w", cfg.Namespace, cfg.Name, err) - } - return nil -} - -// buildCephClusterSpec renders the spec portion of a CephCluster object. The -// choice of fields follows the Flant internal wiki instruction for -// sds-elastic + Rook + Ceph, stripped down to the parts that matter in e2e: -// - mon/mgr counts come from the config (1/1 by default for single-node); -// - network.provider=host is opt-in via NetworkProvider; -// - OSDs are backed by one `storageClassDeviceSets[0]` entry that points -// to a user-supplied StorageClass capable of issuing block-mode PVCs. -func buildCephClusterSpec(cfg CephClusterConfig) map[string]interface{} { - spec := map[string]interface{}{ - "cephVersion": map[string]interface{}{ - "image": cfg.CephImage, - "allowUnsupported": *cfg.AllowUnsupportedCephVersion, - }, - "dataDirHostPath": cfg.DataDirHostPath, - "skipUpgradeChecks": false, - "continueUpgradeAfterChecksEvenIfNotHealthy": false, - "mon": map[string]interface{}{ - "count": int64(cfg.MonCount), - "allowMultiplePerNode": *cfg.AllowMultipleMonPerNode, - }, - "mgr": map[string]interface{}{ - "count": int64(cfg.MgrCount), - "allowMultiplePerNode": *cfg.AllowMultipleMonPerNode, - "modules": []interface{}{ - map[string]interface{}{ - "name": "pg_autoscaler", - "enabled": true, - }, - }, - }, - "dashboard": map[string]interface{}{ - "enabled": false, - "ssl": false, - }, - "crashCollector": map[string]interface{}{ - "disable": false, - }, - "logCollector": map[string]interface{}{ - "enabled": true, - "periodicity": "daily", - "maxLogSize": "100M", - }, - "priorityClassNames": map[string]interface{}{ - "mon": "system-node-critical", - "osd": "system-node-critical", - "mgr": "system-cluster-critical", - }, - "disruptionManagement": map[string]interface{}{ - "managePodBudgets": true, - "osdMaintenanceTimeout": int64(30), - "pgHealthCheckTimeout": int64(0), - }, - "storage": map[string]interface{}{ - "useAllNodes": true, - "useAllDevices": false, - "storageClassDeviceSets": []interface{}{ - map[string]interface{}{ - "name": cfg.OSDDeviceSetName, - "count": int64(cfg.OSDCount), - "portable": false, - "tuneDeviceClass": true, - "volumeClaimTemplates": []interface{}{ - map[string]interface{}{ - "metadata": map[string]interface{}{ - "name": "data", - }, - "spec": map[string]interface{}{ - "resources": map[string]interface{}{ - "requests": map[string]interface{}{ - "storage": cfg.OSDSize, - }, - }, - "storageClassName": cfg.OSDStorageClass, - "volumeMode": "Block", - "accessModes": []interface{}{"ReadWriteOnce"}, - }, - }, - }, - }, - }, - }, - } - - if cfg.NetworkProvider != "" { - network := map[string]interface{}{ - "provider": cfg.NetworkProvider, - "connections": map[string]interface{}{ - "encryption": map[string]interface{}{"enabled": false}, - "compression": map[string]interface{}{"enabled": false}, - "requireMsgr2": false, - }, - } - - addrs := map[string]interface{}{} - if len(cfg.PublicNetworkCIDRs) > 0 { - addrs["public"] = toInterfaceSlice(cfg.PublicNetworkCIDRs) - } - if len(cfg.ClusterNetworkCIDRs) > 0 { - addrs["cluster"] = toInterfaceSlice(cfg.ClusterNetworkCIDRs) - } - if len(addrs) > 0 { - network["addressRanges"] = addrs - } - spec["network"] = network - } - - return spec -} - -// toInterfaceSlice converts a []string to a []interface{} so it can be -// embedded into an `unstructured.Unstructured`'s object tree. -func toInterfaceSlice(in []string) []interface{} { - out := make([]interface{}, len(in)) - for i, v := range in { - out[i] = v - } - return out -} - -// WaitForCephClusterReady blocks until the CephCluster status reports that -// Ceph is up and healthy. Rook exposes the cluster state through two status -// fields: -// - `status.state` — overall lifecycle phase ("Creating", "Created", -// "Updating", "Error"); -// - `status.ceph.health` — the Ceph health summary ("HEALTH_OK", -// "HEALTH_WARN", "HEALTH_ERR"). On a single-OSD test cluster Ceph often -// sits in HEALTH_WARN (PGs undersized, no replicas), which we still treat -// as "good enough" as long as `status.state == "Created"`. -// -// We return success once `state == "Created"`. HEALTH_ERR is reported in the -// log and does not short-circuit (Rook may recover). -// -// Network errors are logged loud (WARN) after a few consecutive failures so a -// dropped SSH tunnel surfaces in seconds instead of getting buried in Debug -// output. See pollResourceUntilReady for the per-call deadline rationale. -func WaitForCephClusterReady(ctx context.Context, kubeconfig *rest.Config, namespace, name string, timeout time.Duration) error { - return pollResourceUntilReady( - ctx, kubeconfig, CephClusterGVR, namespace, name, - timeout, 10*time.Second, "CephCluster", - func(obj *unstructured.Unstructured) (bool, string) { - state, _, _ := unstructured.NestedString(obj.Object, "status", "state") - health, _, _ := unstructured.NestedString(obj.Object, "status", "ceph", "health") - phase, _, _ := unstructured.NestedString(obj.Object, "status", "phase") - - if state == "Created" || phase == "Ready" { - return true, fmt.Sprintf("state=%s phase=%s ceph health: %s", state, phase, health) - } - logger.Debug("CephCluster %s/%s state=%q phase=%q health=%q", - obj.GetNamespace(), obj.GetName(), state, phase, health) - return false, "" - }, - ) -} - -// DeleteCephCluster removes a CephCluster. Tearing down the cluster this way -// is a *destructive* operation — Rook will leave OSD data on host disks under -// `dataDirHostPath` and operator-managed PVCs will not be garbage-collected -// automatically. The operation is still idempotent: a NotFound error is -// swallowed. -// -// NOTE: this is fire-and-forget. The apiserver returns success as soon as it -// records the delete intent; Rook then runs its `cephcluster.ceph.rook.io` -// finalizer for several minutes, removing pools, mon/mgr/osd pods, and so -// on. If any dependent CR (CephBlockPool, CephFilesystem, ...) is still -// alive, Rook records `DeletionIsBlocked / ObjectHasDependents` and the CR -// stays in `phase=Deleting` indefinitely. Always tear down dependents first -// (and call WaitForCephBlockPoolGone / WaitForCephFilesystemGone on them) -// before invoking DeleteCephCluster, then follow up with -// WaitForCephClusterGone. -func DeleteCephCluster(ctx context.Context, kubeconfig *rest.Config, namespace, name string) error { - dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) - if err != nil { - return fmt.Errorf("failed to create dynamic client: %w", err) - } - - if err := dynamicClient.Resource(CephClusterGVR).Namespace(namespace).Delete(ctx, name, metav1.DeleteOptions{}); err != nil { - if apierrors.IsNotFound(err) { - return nil - } - return fmt.Errorf("failed to delete CephCluster %s/%s: %w", namespace, name, err) - } - logger.Info("Deleted CephCluster %s/%s", namespace, name) - return nil -} - -// CephClusterGoneTimeout is the default budget for WaitForCephClusterGone. -// Rook needs to drain mon/mgr/osd pods, remove the CRUSH map, and unset -// finalizers — easily 5+ minutes on a single-OSD cluster, longer on -// degraded ones. -const CephClusterGoneTimeout = 10 * time.Minute - -// WaitForCephClusterGone polls until the CephCluster is fully GC'd by -// Kubernetes (GET returns NotFound). The poller logs the -// deletionTimestamp/finalizers progress periodically so a stuck finalizer -// (typical e2e failure: orphan dependent CR, broken Ceph health) is -// immediately visible in the test log instead of being hidden behind a -// silent timeout. -func WaitForCephClusterGone(ctx context.Context, kubeconfig *rest.Config, namespace, name string, timeout time.Duration) error { - if timeout <= 0 { - timeout = CephClusterGoneTimeout - } - return pollResourceUntilGone( - ctx, kubeconfig, CephClusterGVR, namespace, name, - timeout, PollTickInterval, "CephCluster", - ) -} diff --git a/pkg/kubernetes/cephclusterconnection.go b/pkg/kubernetes/cephclusterconnection.go deleted file mode 100644 index f8117db..0000000 --- a/pkg/kubernetes/cephclusterconnection.go +++ /dev/null @@ -1,313 +0,0 @@ -/* -Copyright 2025 Flant JSC - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package kubernetes - -import ( - "context" - "fmt" - "time" - - apierrors "k8s.io/apimachinery/pkg/api/errors" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" - "k8s.io/apimachinery/pkg/runtime/schema" - "k8s.io/client-go/rest" - - "github.com/deckhouse/storage-e2e/internal/logger" -) - -// GVRs of the csi-ceph cluster-scoped CRs. We use unstructured to avoid -// pulling github.com/deckhouse/csi-ceph/api into go.mod just for these -// tiny types. -var ( - CephClusterConnectionGVR = schema.GroupVersionResource{ - Group: "storage.deckhouse.io", - Version: "v1alpha1", - Resource: "cephclusterconnections", - } - CephClusterAuthenticationGVR = schema.GroupVersionResource{ - Group: "storage.deckhouse.io", - Version: "v1alpha1", - Resource: "cephclusterauthentications", - } -) - -// CephClusterAuthenticationConfig describes CephX credentials that csi-ceph -// reuses for every StorageClass that references the authentication. -type CephClusterAuthenticationConfig struct { - // Name of the CephClusterAuthentication CR. - Name string - // UserID is the Ceph user (typically "admin"). - UserID string - // UserKey is the CephX key of UserID. - UserKey string -} - -// CreateCephClusterAuthentication creates (or updates) a -// CephClusterAuthentication CR with the given CephX credentials. -func CreateCephClusterAuthentication(ctx context.Context, kubeconfig *rest.Config, cfg CephClusterAuthenticationConfig) error { - if cfg.Name == "" { - return fmt.Errorf("CephClusterAuthentication name is required") - } - if cfg.UserID == "" { - return fmt.Errorf("CephClusterAuthentication UserID is required") - } - if cfg.UserKey == "" { - return fmt.Errorf("CephClusterAuthentication UserKey is required") - } - - obj := &unstructured.Unstructured{ - Object: map[string]interface{}{ - "apiVersion": "storage.deckhouse.io/v1alpha1", - "kind": "CephClusterAuthentication", - "metadata": map[string]interface{}{ - "name": cfg.Name, - }, - "spec": map[string]interface{}{ - "userID": cfg.UserID, - "userKey": cfg.UserKey, - }, - }, - } - - dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) - if err != nil { - return fmt.Errorf("failed to create dynamic client: %w", err) - } - - logger.Info("Creating CephClusterAuthentication %s (userID=%s)", cfg.Name, cfg.UserID) - _, err = dynamicClient.Resource(CephClusterAuthenticationGVR).Create(ctx, obj, metav1.CreateOptions{}) - if err == nil { - return nil - } - if !apierrors.IsAlreadyExists(err) { - return fmt.Errorf("failed to create CephClusterAuthentication %s: %w", cfg.Name, err) - } - - logger.Info("CephClusterAuthentication %s already exists, updating spec", cfg.Name) - existing, err := dynamicClient.Resource(CephClusterAuthenticationGVR).Get(ctx, cfg.Name, metav1.GetOptions{}) - if err != nil { - return fmt.Errorf("failed to fetch CephClusterAuthentication %s: %w", cfg.Name, err) - } - if err := errIfTerminating(existing, "CephClusterAuthentication", cfg.Name); err != nil { - return err - } - existing.Object["spec"] = obj.Object["spec"] - if _, err := dynamicClient.Resource(CephClusterAuthenticationGVR).Update(ctx, existing, metav1.UpdateOptions{}); err != nil { - return fmt.Errorf("failed to update CephClusterAuthentication %s: %w", cfg.Name, err) - } - return nil -} - -// DeleteCephClusterAuthentication removes a CephClusterAuthentication. -// NotFound is treated as success. Pair with WaitForCephClusterAuthenticationGone -// when teardown order matters. -func DeleteCephClusterAuthentication(ctx context.Context, kubeconfig *rest.Config, name string) error { - dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) - if err != nil { - return fmt.Errorf("failed to create dynamic client: %w", err) - } - if err := dynamicClient.Resource(CephClusterAuthenticationGVR).Delete(ctx, name, metav1.DeleteOptions{}); err != nil { - if apierrors.IsNotFound(err) { - return nil - } - return fmt.Errorf("failed to delete CephClusterAuthentication %s: %w", name, err) - } - logger.Info("Deleted CephClusterAuthentication %s", name) - return nil -} - -// CephClusterAuthenticationGoneTimeout is the default budget for -// WaitForCephClusterAuthenticationGone. The CR has no heavy finalizer. -const CephClusterAuthenticationGoneTimeout = 1 * time.Minute - -// WaitForCephClusterAuthenticationGone polls until the CephClusterAuthentication -// is fully GC'd by Kubernetes (GET returns NotFound). -func WaitForCephClusterAuthenticationGone(ctx context.Context, kubeconfig *rest.Config, name string, timeout time.Duration) error { - if timeout <= 0 { - timeout = CephClusterAuthenticationGoneTimeout - } - return pollResourceUntilGone( - ctx, kubeconfig, CephClusterAuthenticationGVR, "", name, - timeout, PollTickInterval, "CephClusterAuthentication", - ) -} - -// CephClusterConnectionConfig describes a csi-ceph CephClusterConnection CR. -// Its spec.clusterID (== Ceph fsid) is immutable once created. -type CephClusterConnectionConfig struct { - // Name of the CephClusterConnection CR. - Name string - // ClusterID is the Ceph fsid. Immutable after creation. - ClusterID string - // Monitors is the list of `ip:port` monitor endpoints. - Monitors []string - // UserID is the Ceph user (typically "admin"). - UserID string - // UserKey is the CephX key of UserID. - UserKey string -} - -// CreateCephClusterConnection creates (or updates) a CephClusterConnection CR. -// If the resource already exists we do *not* attempt to update spec.clusterID -// (which the CRD marks immutable) — only Monitors/UserID/UserKey are synced. -func CreateCephClusterConnection(ctx context.Context, kubeconfig *rest.Config, cfg CephClusterConnectionConfig) error { - if cfg.Name == "" { - return fmt.Errorf("CephClusterConnection name is required") - } - if cfg.ClusterID == "" { - return fmt.Errorf("CephClusterConnection ClusterID (fsid) is required") - } - if len(cfg.Monitors) == 0 { - return fmt.Errorf("CephClusterConnection Monitors is required") - } - - monitors := make([]interface{}, len(cfg.Monitors)) - for i, m := range cfg.Monitors { - monitors[i] = m - } - - obj := &unstructured.Unstructured{ - Object: map[string]interface{}{ - "apiVersion": "storage.deckhouse.io/v1alpha1", - "kind": "CephClusterConnection", - "metadata": map[string]interface{}{ - "name": cfg.Name, - }, - "spec": map[string]interface{}{ - "clusterID": cfg.ClusterID, - "monitors": monitors, - "userID": cfg.UserID, - "userKey": cfg.UserKey, - }, - }, - } - - dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) - if err != nil { - return fmt.Errorf("failed to create dynamic client: %w", err) - } - - logger.Info("Creating CephClusterConnection %s (clusterID=%s, mons=%d)", cfg.Name, cfg.ClusterID, len(cfg.Monitors)) - _, err = dynamicClient.Resource(CephClusterConnectionGVR).Create(ctx, obj, metav1.CreateOptions{}) - if err == nil { - return nil - } - if !apierrors.IsAlreadyExists(err) { - return fmt.Errorf("failed to create CephClusterConnection %s: %w", cfg.Name, err) - } - - logger.Info("CephClusterConnection %s already exists, syncing monitors/userID/userKey", cfg.Name) - existing, err := dynamicClient.Resource(CephClusterConnectionGVR).Get(ctx, cfg.Name, metav1.GetOptions{}) - if err != nil { - return fmt.Errorf("failed to fetch CephClusterConnection %s: %w", cfg.Name, err) - } - if err := errIfTerminating(existing, "CephClusterConnection", cfg.Name); err != nil { - return err - } - if err := unstructured.SetNestedSlice(existing.Object, monitors, "spec", "monitors"); err != nil { - return fmt.Errorf("set monitors: %w", err) - } - if err := unstructured.SetNestedField(existing.Object, cfg.UserID, "spec", "userID"); err != nil { - return fmt.Errorf("set userID: %w", err) - } - if err := unstructured.SetNestedField(existing.Object, cfg.UserKey, "spec", "userKey"); err != nil { - return fmt.Errorf("set userKey: %w", err) - } - if _, err := dynamicClient.Resource(CephClusterConnectionGVR).Update(ctx, existing, metav1.UpdateOptions{}); err != nil { - return fmt.Errorf("failed to update CephClusterConnection %s: %w", cfg.Name, err) - } - return nil -} - -// DeleteCephClusterConnection removes a CephClusterConnection. -// NotFound is treated as success. Pair with WaitForCephClusterConnectionGone -// when teardown order matters. -func DeleteCephClusterConnection(ctx context.Context, kubeconfig *rest.Config, name string) error { - dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) - if err != nil { - return fmt.Errorf("failed to create dynamic client: %w", err) - } - if err := dynamicClient.Resource(CephClusterConnectionGVR).Delete(ctx, name, metav1.DeleteOptions{}); err != nil { - if apierrors.IsNotFound(err) { - return nil - } - return fmt.Errorf("failed to delete CephClusterConnection %s: %w", name, err) - } - logger.Info("Deleted CephClusterConnection %s", name) - return nil -} - -// CephClusterConnectionGoneTimeout is the default budget for -// WaitForCephClusterConnectionGone. The CR has no heavy finalizer. -const CephClusterConnectionGoneTimeout = 1 * time.Minute - -// WaitForCephClusterConnectionGone polls until the CephClusterConnection is -// fully GC'd by Kubernetes (GET returns NotFound). -func WaitForCephClusterConnectionGone(ctx context.Context, kubeconfig *rest.Config, name string, timeout time.Duration) error { - if timeout <= 0 { - timeout = CephClusterConnectionGoneTimeout - } - return pollResourceUntilGone( - ctx, kubeconfig, CephClusterConnectionGVR, "", name, - timeout, PollTickInterval, "CephClusterConnection", - ) -} - -// WaitForCephClusterConnectionCreated polls until the CephClusterConnection -// status reports phase=Created. csi-ceph's controller flips the status from -// Pending to Created once it has verified the supplied fsid / monitors / -// CephX credentials against the real Ceph cluster. -func WaitForCephClusterConnectionCreated(ctx context.Context, kubeconfig *rest.Config, name string, timeout time.Duration) error { - if name == "" { - return fmt.Errorf("name is required") - } - - logger.Debug("Waiting for CephClusterConnection %s phase=Created (timeout: %v)", name, timeout) - - dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) - if err != nil { - return fmt.Errorf("failed to create dynamic client: %w", err) - } - - ctx, cancel := context.WithTimeout(ctx, timeout) - defer cancel() - - ticker := time.NewTicker(5 * time.Second) - defer ticker.Stop() - - for { - obj, err := dynamicClient.Resource(CephClusterConnectionGVR).Get(ctx, name, metav1.GetOptions{}) - if err == nil { - phase, _, _ := unstructured.NestedString(obj.Object, "status", "phase") - reason, _, _ := unstructured.NestedString(obj.Object, "status", "reason") - if phase == "Created" { - logger.Success("CephClusterConnection %s is Created", name) - return nil - } - logger.Debug("CephClusterConnection %s phase=%q reason=%q", name, phase, reason) - } else if !apierrors.IsNotFound(err) { - logger.Debug("Error getting CephClusterConnection %s: %v", name, err) - } - - select { - case <-ctx.Done(): - return fmt.Errorf("timeout waiting for CephClusterConnection %s: %w", name, ctx.Err()) - case <-ticker.C: - } - } -} diff --git a/pkg/kubernetes/cephcredentials.go b/pkg/kubernetes/cephcredentials.go deleted file mode 100644 index 11f68ec..0000000 --- a/pkg/kubernetes/cephcredentials.go +++ /dev/null @@ -1,183 +0,0 @@ -/* -Copyright 2025 Flant JSC - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package kubernetes - -import ( - "context" - "fmt" - "sort" - "strings" - "time" - - apierrors "k8s.io/apimachinery/pkg/api/errors" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/client-go/rest" - - "github.com/deckhouse/storage-e2e/internal/logger" -) - -// Well-known Rook resources that hold Ceph connection data. -const ( - // RookMonSecretName is the Secret that the Rook operator populates with - // admin credentials and cluster fsid once the CephCluster is bootstrapped. - RookMonSecretName = "rook-ceph-mon" - - // RookMonEndpointsConfigMapName is the ConfigMap the operator keeps in - // sync with the current set of Ceph monitors. - RookMonEndpointsConfigMapName = "rook-ceph-mon-endpoints" -) - -// CephCredentials holds the information a Ceph CSI client needs to connect -// to a cluster bootstrapped by Rook. -type CephCredentials struct { - // FSID is the Ceph cluster unique identifier. - FSID string - - // AdminUser is the Ceph user name (typically "admin"). - AdminUser string - - // AdminKey is the CephX key for AdminUser. - AdminKey string - - // Monitors is the list of monitor endpoints in "IP:PORT" form, sorted - // alphabetically to make the output stable across runs. - Monitors []string -} - -// WaitForCephCredentials blocks until all pieces of information required to -// connect to the Rook-managed Ceph cluster are populated: -// - Secret `rook-ceph-mon` exists and has `fsid`, `ceph-username`, `ceph-secret`. -// - ConfigMap `rook-ceph-mon-endpoints` exists and has at least one reachable monitor. -// -// The returned CephCredentials is suitable for wiring csi-ceph CRs -// (CephClusterConnection, CephClusterAuthentication). -func WaitForCephCredentials(ctx context.Context, kubeconfig *rest.Config, namespace string, timeout time.Duration) (*CephCredentials, error) { - if namespace == "" { - return nil, fmt.Errorf("namespace is required") - } - - logger.Debug("Waiting for Ceph credentials in %s (timeout: %v)", namespace, timeout) - - clientset, err := NewClientsetWithRetry(ctx, kubeconfig) - if err != nil { - return nil, fmt.Errorf("failed to create clientset: %w", err) - } - - ctx, cancel := context.WithTimeout(ctx, timeout) - defer cancel() - - ticker := time.NewTicker(5 * time.Second) - defer ticker.Stop() - - for { - secret, err := clientset.CoreV1().Secrets(namespace).Get(ctx, RookMonSecretName, metav1.GetOptions{}) - if err != nil && !apierrors.IsNotFound(err) { - logger.Debug("Failed to get Secret %s/%s: %v", namespace, RookMonSecretName, err) - } - - cm, cmErr := clientset.CoreV1().ConfigMaps(namespace).Get(ctx, RookMonEndpointsConfigMapName, metav1.GetOptions{}) - if cmErr != nil && !apierrors.IsNotFound(cmErr) { - logger.Debug("Failed to get ConfigMap %s/%s: %v", namespace, RookMonEndpointsConfigMapName, cmErr) - } - - if err == nil && cmErr == nil { - creds, extractErr := extractCephCredentials(secret.Data, cm.Data) - if extractErr == nil { - logger.Success("Ceph credentials ready in %s (fsid=%s, %d monitor(s))", namespace, creds.FSID, len(creds.Monitors)) - return creds, nil - } - logger.Debug("Rook credentials not complete yet: %v", extractErr) - } - - select { - case <-ctx.Done(): - return nil, fmt.Errorf("timeout waiting for Ceph credentials in %s: %w", namespace, ctx.Err()) - case <-ticker.C: - } - } -} - -// extractCephCredentials parses the Rook-managed Secret/ConfigMap payloads -// into a CephCredentials struct. It returns an error if any required field -// is missing so the caller can keep polling until the operator has populated -// everything. -func extractCephCredentials(secretData map[string][]byte, cmData map[string]string) (*CephCredentials, error) { - fsid := strings.TrimSpace(string(secretData["fsid"])) - if fsid == "" { - return nil, fmt.Errorf("Secret %s is missing `fsid`", RookMonSecretName) - } - - adminUser := strings.TrimSpace(string(secretData["ceph-username"])) - if adminUser == "" { - adminUser = "client.admin" - } - adminUser = strings.TrimPrefix(adminUser, "client.") - - adminKey := strings.TrimSpace(string(secretData["ceph-secret"])) - if adminKey == "" { - return nil, fmt.Errorf("Secret %s is missing `ceph-secret`", RookMonSecretName) - } - - raw, ok := cmData["data"] - if !ok { - return nil, fmt.Errorf("ConfigMap %s is missing `data`", RookMonEndpointsConfigMapName) - } - monitors, err := parseMonEndpoints(raw) - if err != nil { - return nil, err - } - if len(monitors) == 0 { - return nil, fmt.Errorf("ConfigMap %s has no populated monitor endpoints", RookMonEndpointsConfigMapName) - } - - return &CephCredentials{ - FSID: fsid, - AdminUser: adminUser, - AdminKey: adminKey, - Monitors: monitors, - }, nil -} - -// parseMonEndpoints parses the Rook-maintained monitor endpoints string. -// -// Rook stores the current mon list in the `data` key of the -// `rook-ceph-mon-endpoints` ConfigMap as a comma-separated list of -// `=:` pairs, for example: -// -// a=10.0.0.1:6789,b=10.0.0.2:6789,c=10.0.0.3:6789 -// -// This helper returns just the `:` portion of every entry, sorted -// alphabetically for stable output. -func parseMonEndpoints(raw string) ([]string, error) { - out := []string{} - for _, part := range strings.Split(raw, ",") { - part = strings.TrimSpace(part) - if part == "" { - continue - } - // Strip the "=" prefix if present. - if idx := strings.Index(part, "="); idx >= 0 { - part = part[idx+1:] - } - if part == "" { - continue - } - out = append(out, part) - } - sort.Strings(out) - return out, nil -} diff --git a/pkg/kubernetes/cephfilesystem.go b/pkg/kubernetes/cephfilesystem.go index 91fab14..5088bd4 100644 --- a/pkg/kubernetes/cephfilesystem.go +++ b/pkg/kubernetes/cephfilesystem.go @@ -39,7 +39,7 @@ var CephFilesystemGVR = schema.GroupVersionResource{ // CephFilesystemConfig describes a minimal Rook CephFilesystem with one // metadata pool and exactly one data pool. Defaults are tuned for tiny -// single-node test clusters and mirror CephBlockPoolConfig conventions. +// single-node test clusters. type CephFilesystemConfig struct { // Name of the CephFilesystem CR. Name string diff --git a/pkg/kubernetes/cephstorageclass.go b/pkg/kubernetes/cephstorageclass.go deleted file mode 100644 index 942dd49..0000000 --- a/pkg/kubernetes/cephstorageclass.go +++ /dev/null @@ -1,252 +0,0 @@ -/* -Copyright 2025 Flant JSC - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package kubernetes - -import ( - "context" - "fmt" - "time" - - apierrors "k8s.io/apimachinery/pkg/api/errors" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" - "k8s.io/apimachinery/pkg/runtime/schema" - "k8s.io/client-go/rest" - - "github.com/deckhouse/storage-e2e/internal/logger" -) - -// CephStorageClassGVR points at csi-ceph's CephStorageClass CR (not to be -// confused with Rook's CephCluster / CephBlockPool). -var CephStorageClassGVR = schema.GroupVersionResource{ - Group: "storage.deckhouse.io", - Version: "v1alpha1", - Resource: "cephstorageclasses", -} - -// Supported CephStorageClass types, mirroring csi-ceph's CRD enum. -const ( - CephStorageClassTypeRBD = "RBD" - CephStorageClassTypeCephFS = "CephFS" -) - -// CephStorageClassConfig is an intentionally narrow shape tailored for the -// e2e scenarios we care about today — an RBD StorageClass backed by a single -// block pool. CephFS variant is supported but requires FSName+FSPool to be -// set by the caller. -type CephStorageClassConfig struct { - // Name of the CephStorageClass CR (becomes the k8s StorageClass name). - Name string - - // ClusterConnectionName points at a CephClusterConnection CR. - ClusterConnectionName string - - // ClusterAuthenticationName points at a CephClusterAuthentication CR. - ClusterAuthenticationName string - - // ReclaimPolicy mirrors StorageClass.ReclaimPolicy ("Delete" / "Retain"). - // Default: "Delete". - ReclaimPolicy string - - // Type is "RBD" (default) or "CephFS". - Type string - - // --- RBD options (Type == "RBD") --- - - // RBDPool is the Ceph pool name (e.g. "ceph-rbd-r1"). - RBDPool string - - // RBDDefaultFSType picks the filesystem mkfs on volume attach. - // Default: "ext4". - RBDDefaultFSType string - - // --- CephFS options (Type == "CephFS") --- - CephFSName string // Name of the CephFilesystem. - CephFSPool string // Pool to use inside that filesystem. -} - -// CreateCephStorageClass creates (or updates) a CephStorageClass CR. On -// success the csi-ceph controller provisions a corresponding core -// storage.k8s.io/v1 StorageClass in the cluster. -func CreateCephStorageClass(ctx context.Context, kubeconfig *rest.Config, cfg CephStorageClassConfig) error { - if cfg.Name == "" { - return fmt.Errorf("CephStorageClass name is required") - } - if cfg.ClusterConnectionName == "" { - return fmt.Errorf("CephStorageClass ClusterConnectionName is required") - } - if cfg.ClusterAuthenticationName == "" { - return fmt.Errorf("CephStorageClass ClusterAuthenticationName is required") - } - if cfg.Type == "" { - cfg.Type = CephStorageClassTypeRBD - } - if cfg.ReclaimPolicy == "" { - cfg.ReclaimPolicy = "Delete" - } - - spec := map[string]interface{}{ - "clusterConnectionName": cfg.ClusterConnectionName, - "clusterAuthenticationName": cfg.ClusterAuthenticationName, - "reclaimPolicy": cfg.ReclaimPolicy, - "type": cfg.Type, - } - - switch cfg.Type { - case CephStorageClassTypeRBD: - if cfg.RBDPool == "" { - return fmt.Errorf("CephStorageClass of type RBD requires RBDPool") - } - if cfg.RBDDefaultFSType == "" { - cfg.RBDDefaultFSType = "ext4" - } - spec["rbd"] = map[string]interface{}{ - "defaultFSType": cfg.RBDDefaultFSType, - "pool": cfg.RBDPool, - } - case CephStorageClassTypeCephFS: - if cfg.CephFSName == "" || cfg.CephFSPool == "" { - return fmt.Errorf("CephStorageClass of type CephFS requires CephFSName and CephFSPool") - } - spec["cephFS"] = map[string]interface{}{ - "fsName": cfg.CephFSName, - "pool": cfg.CephFSPool, - } - default: - return fmt.Errorf("unsupported CephStorageClass Type: %s", cfg.Type) - } - - obj := &unstructured.Unstructured{ - Object: map[string]interface{}{ - "apiVersion": "storage.deckhouse.io/v1alpha1", - "kind": "CephStorageClass", - "metadata": map[string]interface{}{ - "name": cfg.Name, - }, - "spec": spec, - }, - } - - dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) - if err != nil { - return fmt.Errorf("failed to create dynamic client: %w", err) - } - - logger.Info("Creating CephStorageClass %s (type=%s, conn=%s, auth=%s)", - cfg.Name, cfg.Type, cfg.ClusterConnectionName, cfg.ClusterAuthenticationName) - _, err = dynamicClient.Resource(CephStorageClassGVR).Create(ctx, obj, metav1.CreateOptions{}) - if err == nil { - return nil - } - if !apierrors.IsAlreadyExists(err) { - return fmt.Errorf("failed to create CephStorageClass %s: %w", cfg.Name, err) - } - - logger.Info("CephStorageClass %s already exists, updating spec", cfg.Name) - existing, err := dynamicClient.Resource(CephStorageClassGVR).Get(ctx, cfg.Name, metav1.GetOptions{}) - if err != nil { - return fmt.Errorf("failed to fetch CephStorageClass %s: %w", cfg.Name, err) - } - if err := errIfTerminating(existing, "CephStorageClass", cfg.Name); err != nil { - return err - } - existing.Object["spec"] = spec - if _, err := dynamicClient.Resource(CephStorageClassGVR).Update(ctx, existing, metav1.UpdateOptions{}); err != nil { - return fmt.Errorf("failed to update CephStorageClass %s: %w", cfg.Name, err) - } - return nil -} - -// DeleteCephStorageClass removes a CephStorageClass. NotFound is treated as -// success. The underlying k8s StorageClass is removed by the csi-ceph -// controller as a side effect. Use WaitForCephStorageClassGone to confirm -// the CR is fully GC'd. -func DeleteCephStorageClass(ctx context.Context, kubeconfig *rest.Config, name string) error { - dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) - if err != nil { - return fmt.Errorf("failed to create dynamic client: %w", err) - } - if err := dynamicClient.Resource(CephStorageClassGVR).Delete(ctx, name, metav1.DeleteOptions{}); err != nil { - if apierrors.IsNotFound(err) { - return nil - } - return fmt.Errorf("failed to delete CephStorageClass %s: %w", name, err) - } - logger.Info("Deleted CephStorageClass %s", name) - return nil -} - -// CephStorageClassGoneTimeout is the default budget for -// WaitForCephStorageClassGone. CephStorageClass has no heavyweight finalizer -// (csi-ceph just deletes the backing k8s StorageClass), so this typically -// completes in seconds. -const CephStorageClassGoneTimeout = 1 * time.Minute - -// WaitForCephStorageClassGone polls until the CephStorageClass is fully GC'd -// by Kubernetes (GET returns NotFound). -func WaitForCephStorageClassGone(ctx context.Context, kubeconfig *rest.Config, name string, timeout time.Duration) error { - if timeout <= 0 { - timeout = CephStorageClassGoneTimeout - } - return pollResourceUntilGone( - ctx, kubeconfig, CephStorageClassGVR, "", name, - timeout, PollTickInterval, "CephStorageClass", - ) -} - -// WaitForCephStorageClassCreated polls until the CephStorageClass status -// reports phase=Created (the csi-ceph controller flips this once the backing -// k8s StorageClass has been provisioned). -func WaitForCephStorageClassCreated(ctx context.Context, kubeconfig *rest.Config, name string, timeout time.Duration) error { - if name == "" { - return fmt.Errorf("name is required") - } - - logger.Debug("Waiting for CephStorageClass %s phase=Created (timeout: %v)", name, timeout) - - dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) - if err != nil { - return fmt.Errorf("failed to create dynamic client: %w", err) - } - - ctx, cancel := context.WithTimeout(ctx, timeout) - defer cancel() - - ticker := time.NewTicker(3 * time.Second) - defer ticker.Stop() - - for { - obj, err := dynamicClient.Resource(CephStorageClassGVR).Get(ctx, name, metav1.GetOptions{}) - if err == nil { - phase, _, _ := unstructured.NestedString(obj.Object, "status", "phase") - reason, _, _ := unstructured.NestedString(obj.Object, "status", "reason") - if phase == "Created" { - logger.Success("CephStorageClass %s is Created", name) - return nil - } - logger.Debug("CephStorageClass %s phase=%q reason=%q", name, phase, reason) - } else if !apierrors.IsNotFound(err) { - logger.Debug("Error getting CephStorageClass %s: %v", name, err) - } - - select { - case <-ctx.Done(): - return fmt.Errorf("timeout waiting for CephStorageClass %s: %w", name, ctx.Err()) - case <-ticker.C: - } - } -} diff --git a/pkg/kubernetes/elasticcluster.go b/pkg/kubernetes/elasticcluster.go new file mode 100644 index 0000000..4e82357 --- /dev/null +++ b/pkg/kubernetes/elasticcluster.go @@ -0,0 +1,332 @@ +/* +Copyright 2026 Flant JSC + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package kubernetes + +import ( + "context" + "fmt" + "time" + + apierrors "k8s.io/apimachinery/pkg/api/errors" + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" + "k8s.io/apimachinery/pkg/runtime/schema" + "k8s.io/client-go/rest" + + "github.com/deckhouse/storage-e2e/internal/logger" +) + +// ElasticClusterGVR is the GroupVersionResource of the sds-elastic +// ElasticCluster CR. It is cluster-scoped (no namespace), so all dynamic +// calls below omit .Namespace(). The CR is the high-level entry point of the +// sds-elastic module: the controller turns it into a Rook CephCluster +// (renamed group internal.sdselastic.deckhouse.io) backed by LVM-local OSDs. +var ElasticClusterGVR = schema.GroupVersionResource{ + Group: "storage.deckhouse.io", + Version: "v1alpha1", + Resource: "elasticclusters", +} + +// Well-known ElasticCluster status condition types. Mirrored from +// sds-elastic/api/v1alpha1 (ECCondition*) and kept here as plain strings so +// storage-e2e does not take a build dependency on the sds-elastic module. +// Keep in sync with the api package. +const ( + ElasticClusterConditionReady = "Ready" + ElasticClusterConditionStorageReady = "StorageReady" + ElasticClusterConditionCephClusterReady = "CephClusterReady" + ElasticClusterConditionCredentialsReady = "CredentialsReady" + ElasticClusterConditionCsiCephReady = "CsiCephReady" +) + +// Well-known ElasticCluster teardown reasons set on the aggregate Ready +// condition while the CR is being deleted. Domain-level on purpose (they +// never name the underlying Rook/csi-ceph resources). Mirrored from +// sds-elastic/api/v1alpha1 (ECReason*). +const ( + ElasticClusterReasonStorageClassesExist = "StorageClassesExist" + ElasticClusterReasonVolumesExist = "VolumesExist" + ElasticClusterReasonTerminating = "Terminating" +) + +// ElasticClusterParams is the minimal description of an ElasticCluster the +// e2e suite needs to render. Selectors are expressed as plain matchLabels +// maps (the only selector form the suite exercises); spec.network is emitted +// only when both CIDRs are provided (otherwise Rook uses host networking on +// every storage-node IP, which is what the default e2e cluster wants). +type ElasticClusterParams struct { + // Name of the ElasticCluster (cluster-scoped, so no namespace). + Name string + + // NodeSelectorMatchLabels populates spec.storage.nodeSelector.matchLabels. + // Must be non-empty: it is how the controller picks storage nodes. + NodeSelectorMatchLabels map[string]string + + // BlockDeviceSelectorMatchLabels populates + // spec.storage.blockDeviceSelector.matchLabels. Must be non-empty: it is + // how the controller adopts BlockDevices for OSDs. + BlockDeviceSelectorMatchLabels map[string]string + + // NetworkPublic / NetworkCluster optionally pin spec.network.{public, + // cluster}. Both must be set together; otherwise spec.network is omitted. + NetworkPublic string + NetworkCluster string + + // Labels / Annotations are applied verbatim to metadata. + Labels map[string]string + Annotations map[string]string +} + +// buildElasticClusterObject renders the unstructured ElasticCluster object +// from params. It deliberately sets only the fields the e2e suite controls; +// everything else is left to the CRD defaults / controller. +func buildElasticClusterObject(params ElasticClusterParams) *unstructured.Unstructured { + storage := map[string]interface{}{ + "nodeSelector": map[string]interface{}{ + "matchLabels": toStringMapInterface(params.NodeSelectorMatchLabels), + }, + "blockDeviceSelector": map[string]interface{}{ + "matchLabels": toStringMapInterface(params.BlockDeviceSelectorMatchLabels), + }, + } + + spec := map[string]interface{}{ + "storage": storage, + } + if params.NetworkPublic != "" && params.NetworkCluster != "" { + spec["network"] = map[string]interface{}{ + "public": params.NetworkPublic, + "cluster": params.NetworkCluster, + } + } + + meta := map[string]interface{}{ + "name": params.Name, + } + if len(params.Labels) > 0 { + meta["labels"] = toStringMapInterface(params.Labels) + } + if len(params.Annotations) > 0 { + meta["annotations"] = toStringMapInterface(params.Annotations) + } + + return &unstructured.Unstructured{ + Object: map[string]interface{}{ + "apiVersion": ElasticClusterGVR.Group + "/" + ElasticClusterGVR.Version, + "kind": "ElasticCluster", + "metadata": meta, + "spec": spec, + }, + } +} + +// CreateElasticCluster creates (or updates the spec of) an ElasticCluster. +// Idempotent: re-running overwrites spec so callers can tweak ElasticClusterParams +// and re-apply. Fails fast if the existing CR is Terminating (its spec update +// would be a no-op while the finalizer unwinds, and a follow-up wait-Ready +// would hang on a never-Ready object). +func CreateElasticCluster(ctx context.Context, kubeconfig *rest.Config, params ElasticClusterParams) error { + if params.Name == "" { + return fmt.Errorf("ElasticCluster requires a Name") + } + if len(params.NodeSelectorMatchLabels) == 0 { + return fmt.Errorf("ElasticCluster %s requires NodeSelectorMatchLabels", params.Name) + } + if len(params.BlockDeviceSelectorMatchLabels) == 0 { + return fmt.Errorf("ElasticCluster %s requires BlockDeviceSelectorMatchLabels", params.Name) + } + + obj := buildElasticClusterObject(params) + + dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) + if err != nil { + return fmt.Errorf("failed to create dynamic client: %w", err) + } + + logger.Info("Creating ElasticCluster %s (nodeSelector=%v, blockDeviceSelector=%v)", + params.Name, params.NodeSelectorMatchLabels, params.BlockDeviceSelectorMatchLabels) + + _, err = dynamicClient.Resource(ElasticClusterGVR).Create(ctx, obj, metav1.CreateOptions{}) + if err == nil { + logger.Success("ElasticCluster %s created", params.Name) + return nil + } + if !apierrors.IsAlreadyExists(err) { + return fmt.Errorf("failed to create ElasticCluster %s: %w", params.Name, err) + } + + logger.Info("ElasticCluster %s already exists, updating spec", params.Name) + existing, err := dynamicClient.Resource(ElasticClusterGVR).Get(ctx, params.Name, metav1.GetOptions{}) + if err != nil { + return fmt.Errorf("failed to fetch existing ElasticCluster %s: %w", params.Name, err) + } + if err := errIfTerminating(existing, "ElasticCluster", params.Name); err != nil { + return err + } + existing.Object["spec"] = obj.Object["spec"] + if _, err := dynamicClient.Resource(ElasticClusterGVR).Update(ctx, existing, metav1.UpdateOptions{}); err != nil { + return fmt.Errorf("failed to update ElasticCluster %s: %w", params.Name, err) + } + return nil +} + +// WaitForElasticClusterCondition blocks until the named ElasticCluster has a +// status condition of the given type observed at the wanted status (e.g. +// type="Ready", status="True"). It refuses to wait on a Terminating object — +// use GetElasticClusterCondition + a Gomega Eventually loop when you need to +// observe a teardown-guard reason on a CR that is already being deleted. +func WaitForElasticClusterCondition(ctx context.Context, kubeconfig *rest.Config, name, condType, wantStatus string, timeout time.Duration) error { + return pollResourceUntilReady( + ctx, kubeconfig, ElasticClusterGVR, "", name, + timeout, PollTickInterval, "ElasticCluster", + func(obj *unstructured.Unstructured) (bool, string) { + status, reason, message, found := findUnstructuredCondition(obj, condType) + if found && status == wantStatus { + return true, fmt.Sprintf("%s=%s reason=%s", condType, status, reason) + } + phase, _, _ := unstructured.NestedString(obj.Object, "status", "phase") + logger.Debug("ElasticCluster %s %s=%q (want %q) phase=%q reason=%q msg=%q", + name, condType, status, wantStatus, phase, reason, message) + return false, "" + }, + ) +} + +// WaitForElasticClusterReady waits for the aggregate Ready condition to flip +// to True (i.e. storage staged, Rook CephCluster up, credentials backed up, +// csi-ceph wired). +func WaitForElasticClusterReady(ctx context.Context, kubeconfig *rest.Config, name string, timeout time.Duration) error { + return WaitForElasticClusterCondition(ctx, kubeconfig, name, ElasticClusterConditionReady, "True", timeout) +} + +// GetElasticClusterCondition returns the (status, reason, message) of the +// named condition on the ElasticCluster, plus whether the condition exists. +// Single GET, no waiting — meant to be wrapped in a Gomega Eventually / +// Consistently when asserting teardown-guard reasons on a Terminating CR. +func GetElasticClusterCondition(ctx context.Context, kubeconfig *rest.Config, name, condType string) (status, reason, message string, found bool, err error) { + dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) + if err != nil { + return "", "", "", false, fmt.Errorf("failed to create dynamic client: %w", err) + } + obj, err := dynamicClient.Resource(ElasticClusterGVR).Get(ctx, name, metav1.GetOptions{}) + if err != nil { + return "", "", "", false, fmt.Errorf("failed to get ElasticCluster %s: %w", name, err) + } + status, reason, message, found = findUnstructuredCondition(obj, condType) + return status, reason, message, found, nil +} + +// ElasticClusterCephTopology mirrors status.cephTopology — the effective +// mon/mgr counts the controller asked Rook to apply, plus the audit reason. +type ElasticClusterCephTopology struct { + MonCount int64 + MgrCount int64 + Reason string +} + +// GetElasticClusterCephTopology reads status.cephTopology of the named +// ElasticCluster. found is false when the controller has not recorded a +// topology yet (cluster still bootstrapping). +func GetElasticClusterCephTopology(ctx context.Context, kubeconfig *rest.Config, name string) (topology ElasticClusterCephTopology, found bool, err error) { + dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) + if err != nil { + return ElasticClusterCephTopology{}, false, fmt.Errorf("failed to create dynamic client: %w", err) + } + obj, err := dynamicClient.Resource(ElasticClusterGVR).Get(ctx, name, metav1.GetOptions{}) + if err != nil { + return ElasticClusterCephTopology{}, false, fmt.Errorf("failed to get ElasticCluster %s: %w", name, err) + } + raw, ok, err := unstructured.NestedMap(obj.Object, "status", "cephTopology") + if err != nil || !ok || raw == nil { + return ElasticClusterCephTopology{}, false, nil + } + mon, _, _ := unstructured.NestedInt64(obj.Object, "status", "cephTopology", "monCount") + mgr, _, _ := unstructured.NestedInt64(obj.Object, "status", "cephTopology", "mgrCount") + reason, _, _ := unstructured.NestedString(obj.Object, "status", "cephTopology", "reason") + return ElasticClusterCephTopology{MonCount: mon, MgrCount: mgr, Reason: reason}, true, nil +} + +// DeleteElasticCluster removes an ElasticCluster. Idempotent (NotFound is +// swallowed). Fire-and-forget: the controller then runs its ordered teardown +// finalizer (delete CephCluster + csi-ceph wiring the operator cannot delete +// by hand). Follow with WaitForElasticClusterGone to be sure it is GC'd. +func DeleteElasticCluster(ctx context.Context, kubeconfig *rest.Config, name string) error { + dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) + if err != nil { + return fmt.Errorf("failed to create dynamic client: %w", err) + } + if err := dynamicClient.Resource(ElasticClusterGVR).Delete(ctx, name, metav1.DeleteOptions{}); err != nil { + if apierrors.IsNotFound(err) { + return nil + } + return fmt.Errorf("failed to delete ElasticCluster %s: %w", name, err) + } + logger.Info("Deleted ElasticCluster %s (controller teardown in progress)", name) + return nil +} + +// ElasticClusterGoneTimeout is the default budget for WaitForElasticClusterGone. +// The controller tears down the whole Rook CephCluster (mon/mgr/osd drain, +// CRUSH map removal) before releasing the finalizer — easily 10+ minutes. +const ElasticClusterGoneTimeout = 15 * time.Minute + +// WaitForElasticClusterGone polls until the ElasticCluster GET returns +// NotFound (Kubernetes has GC'd it after the controller finalizer completed). +func WaitForElasticClusterGone(ctx context.Context, kubeconfig *rest.Config, name string, timeout time.Duration) error { + if timeout <= 0 { + timeout = ElasticClusterGoneTimeout + } + return pollResourceUntilGone( + ctx, kubeconfig, ElasticClusterGVR, "", name, + timeout, PollTickInterval, "ElasticCluster", + ) +} + +// findUnstructuredCondition extracts the (status, reason, message) of a +// metav1.Condition-shaped entry from obj.status.conditions. Shared by the +// ElasticCluster and ElasticStorageClass helpers in this package. +func findUnstructuredCondition(obj *unstructured.Unstructured, condType string) (status, reason, message string, found bool) { + conditions, ok, err := unstructured.NestedSlice(obj.Object, "status", "conditions") + if err != nil || !ok { + return "", "", "", false + } + for _, c := range conditions { + m, ok := c.(map[string]interface{}) + if !ok { + continue + } + t, _, _ := unstructured.NestedString(m, "type") + if t != condType { + continue + } + status, _, _ = unstructured.NestedString(m, "status") + reason, _, _ = unstructured.NestedString(m, "reason") + message, _, _ = unstructured.NestedString(m, "message") + return status, reason, message, true + } + return "", "", "", false +} + +// toStringMapInterface converts a map[string]string to map[string]interface{} +// so it can be embedded into an unstructured object tree. +func toStringMapInterface(in map[string]string) map[string]interface{} { + out := make(map[string]interface{}, len(in)) + for k, v := range in { + out[k] = v + } + return out +} diff --git a/pkg/kubernetes/elasticrook.go b/pkg/kubernetes/elasticrook.go new file mode 100644 index 0000000..87dcac3 --- /dev/null +++ b/pkg/kubernetes/elasticrook.go @@ -0,0 +1,163 @@ +/* +Copyright 2026 Flant JSC + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package kubernetes + +import ( + "context" + "fmt" + "time" + + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" + "k8s.io/apimachinery/pkg/runtime/schema" + "k8s.io/client-go/rest" + + "github.com/deckhouse/storage-e2e/internal/logger" +) + +// The sds-elastic module ships a vendored Rook operator whose API group is +// renamed from the upstream ceph.rook.io to internal.sdselastic.deckhouse.io +// (see sds-elastic images/operator/patches). The verifiers below address the +// renamed group so the e2e suite can assert that the Rook resources the +// sds-elastic controller created are healthy AND that no upstream ceph.rook.io +// objects leaked onto the cluster (handled at the suite level via discovery). +const ( + // ElasticRookGroup is the renamed Rook API group used by sds-elastic. + ElasticRookGroup = "internal.sdselastic.deckhouse.io" + // ElasticRookVersion is the renamed Rook API version. + ElasticRookVersion = "v1" + // UpstreamRookGroup is the upstream Rook API group that MUST NOT appear + // on a cluster running sds-elastic. + UpstreamRookGroup = "ceph.rook.io" +) + +var ( + // ElasticRookCephClusterGVR is the renamed-group CephCluster GVR. + ElasticRookCephClusterGVR = schema.GroupVersionResource{ + Group: ElasticRookGroup, + Version: ElasticRookVersion, + Resource: "cephclusters", + } + // ElasticRookCephBlockPoolGVR is the renamed-group CephBlockPool GVR. + ElasticRookCephBlockPoolGVR = schema.GroupVersionResource{ + Group: ElasticRookGroup, + Version: ElasticRookVersion, + Resource: "cephblockpools", + } + // ElasticRookCephFilesystemGVR is the renamed-group CephFilesystem GVR. + ElasticRookCephFilesystemGVR = schema.GroupVersionResource{ + Group: ElasticRookGroup, + Version: ElasticRookVersion, + Resource: "cephfilesystems", + } +) + +// WaitForElasticRookCephClusterReady blocks until the renamed-group +// CephCluster reports state=Created (or phase=Ready). Mirrors the readiness +// logic of WaitForCephClusterReady but against internal.sdselastic.deckhouse.io. +func WaitForElasticRookCephClusterReady(ctx context.Context, kubeconfig *rest.Config, namespace, name string, timeout time.Duration) error { + return pollResourceUntilReady( + ctx, kubeconfig, ElasticRookCephClusterGVR, namespace, name, + timeout, 10*time.Second, "Rook CephCluster", + func(obj *unstructured.Unstructured) (bool, string) { + state, _, _ := unstructured.NestedString(obj.Object, "status", "state") + health, _, _ := unstructured.NestedString(obj.Object, "status", "ceph", "health") + phase, _, _ := unstructured.NestedString(obj.Object, "status", "phase") + if state == "Created" || phase == "Ready" { + return true, fmt.Sprintf("state=%s phase=%s ceph health: %s", state, phase, health) + } + logger.Debug("Rook CephCluster %s/%s state=%q phase=%q health=%q", + obj.GetNamespace(), obj.GetName(), state, phase, health) + return false, "" + }, + ) +} + +// WaitForElasticRookCephBlockPoolReady blocks until the renamed-group +// CephBlockPool reports status.phase=Ready. +func WaitForElasticRookCephBlockPoolReady(ctx context.Context, kubeconfig *rest.Config, namespace, name string, timeout time.Duration) error { + return pollResourceUntilReady( + ctx, kubeconfig, ElasticRookCephBlockPoolGVR, namespace, name, + timeout, PollTickInterval, "Rook CephBlockPool", + func(obj *unstructured.Unstructured) (bool, string) { + phase, _, _ := unstructured.NestedString(obj.Object, "status", "phase") + if phase == "Ready" { + return true, "phase=Ready" + } + logger.Debug("Rook CephBlockPool %s/%s phase=%q", obj.GetNamespace(), obj.GetName(), phase) + return false, "" + }, + ) +} + +// WaitForElasticRookCephFilesystemReady blocks until the renamed-group +// CephFilesystem reports status.phase=Ready. +func WaitForElasticRookCephFilesystemReady(ctx context.Context, kubeconfig *rest.Config, namespace, name string, timeout time.Duration) error { + return pollResourceUntilReady( + ctx, kubeconfig, ElasticRookCephFilesystemGVR, namespace, name, + timeout, PollTickInterval, "Rook CephFilesystem", + func(obj *unstructured.Unstructured) (bool, string) { + phase, _, _ := unstructured.NestedString(obj.Object, "status", "phase") + if phase == "Ready" { + return true, "phase=Ready" + } + logger.Debug("Rook CephFilesystem %s/%s phase=%q", obj.GetNamespace(), obj.GetName(), phase) + return false, "" + }, + ) +} + +// ListElasticRookCephClusterNames returns the names of all renamed-group +// CephClusters in the namespace. Used to assert the sds-elastic controller +// created exactly the CephCluster(s) it should have. +func ListElasticRookCephClusterNames(ctx context.Context, kubeconfig *rest.Config, namespace string) ([]string, error) { + dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) + if err != nil { + return nil, fmt.Errorf("failed to create dynamic client: %w", err) + } + list, err := dynamicClient.Resource(ElasticRookCephClusterGVR).Namespace(namespace).List(ctx, metav1.ListOptions{}) + if err != nil { + return nil, fmt.Errorf("failed to list Rook CephClusters in %s: %w", namespace, err) + } + names := make([]string, 0, len(list.Items)) + for i := range list.Items { + names = append(names, list.Items[i].GetName()) + } + return names, nil +} + +// ServerHasAPIGroup reports whether the apiserver advertises the given API +// group in discovery. The e2e suite uses it to assert that the upstream +// ceph.rook.io group is absent on a cluster running sds-elastic (the module +// renames Rook to internal.sdselastic.deckhouse.io to avoid clobbering a +// user-installed upstream Rook). +func ServerHasAPIGroup(ctx context.Context, kubeconfig *rest.Config, group string) (bool, error) { + clientset, err := NewClientsetWithRetry(ctx, kubeconfig) + if err != nil { + return false, fmt.Errorf("failed to create clientset: %w", err) + } + groups, err := clientset.Discovery().ServerGroups() + if err != nil { + return false, fmt.Errorf("failed to list server API groups: %w", err) + } + for i := range groups.Groups { + if groups.Groups[i].Name == group { + return true, nil + } + } + return false, nil +} diff --git a/pkg/kubernetes/elasticstorageclass.go b/pkg/kubernetes/elasticstorageclass.go new file mode 100644 index 0000000..134765f --- /dev/null +++ b/pkg/kubernetes/elasticstorageclass.go @@ -0,0 +1,296 @@ +/* +Copyright 2026 Flant JSC + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package kubernetes + +import ( + "context" + "fmt" + "time" + + apierrors "k8s.io/apimachinery/pkg/api/errors" + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" + "k8s.io/apimachinery/pkg/runtime/schema" + "k8s.io/client-go/rest" + + "github.com/deckhouse/storage-e2e/internal/logger" +) + +// ElasticStorageClassGVR is the GroupVersionResource of the sds-elastic +// ElasticStorageClass CR (cluster-scoped). The controller maps it to a Ceph +// pool/filesystem plus a 1:1-named csi-ceph CephStorageClass and a core +// storage.k8s.io/v1 StorageClass of the same name. +var ElasticStorageClassGVR = schema.GroupVersionResource{ + Group: "storage.deckhouse.io", + Version: "v1alpha1", + Resource: "elasticstorageclasses", +} + +// ElasticStorageClass spec enums (mirrored from sds-elastic/api/v1alpha1 so +// storage-e2e stays free of a build dependency on the module). Keep in sync. +const ( + ElasticStorageClassTypeRBD = "RBD" + ElasticStorageClassTypeCephFS = "CephFS" + + ElasticReplicationAvailabilityWithoutConsistency = "AvailabilityWithoutConsistency" + ElasticReplicationConsistencyAndAvailability = "ConsistencyAndAvailability" + ElasticReplicationHighRedundancy = "HighRedundancy" + ElasticReplicationErasureCodedCompact = "ErasureCodedCompact" +) + +// Well-known ElasticStorageClass status condition types (mirror of +// ESCCondition* in the api package). +const ( + ElasticStorageClassConditionReady = "Ready" + ElasticStorageClassConditionPoolReady = "PoolReady" + ElasticStorageClassConditionCsiStorageClassReady = "CsiStorageClassReady" +) + +// Well-known ElasticStorageClass teardown reasons set on the aggregate Ready +// condition while the CR is being deleted (mirror of ESCReason* in the api +// package). +const ( + ElasticStorageClassReasonBoundVolumesExist = "BoundVolumesExist" + ElasticStorageClassReasonDataPresentInPool = "DataPresentInPool" + ElasticStorageClassReasonFilesystemNotEmpty = "FilesystemNotEmpty" + ElasticStorageClassReasonTerminating = "Terminating" +) + +// ElasticStorageClassForceDeleteAnnotation, set to "true" on an +// ElasticStorageClass, authorises the destructive purge of a non-empty RBD +// pool (the controller propagates it to the underlying CephBlockPool as the +// Rook force-deletion annotation). It NEVER bypasses the bound-PV guard. +// Mirror of v1alpha1 ESCForceDeleteAnnotation. +const ElasticStorageClassForceDeleteAnnotation = "sds-elastic.deckhouse.io/force-deletion" + +// ElasticStorageClassParams is the minimal description of an +// ElasticStorageClass the e2e suite renders. +type ElasticStorageClassParams struct { + // Name of the ElasticStorageClass; also the name of the resulting + // csi-ceph CephStorageClass and the core k8s StorageClass. + Name string + + // ClusterRef is the ElasticCluster this ESC belongs to. Required. + ClusterRef string + + // Type selects RBD (block) or CephFS (shared filesystem). Required. + Type string + + // Replication picks the high-level replication strategy. Empty defaults + // to ConsistencyAndAvailability (the CRD default). + Replication string + + // Labels / Annotations are applied verbatim to metadata. + Labels map[string]string + Annotations map[string]string +} + +func buildElasticStorageClassObject(params ElasticStorageClassParams) *unstructured.Unstructured { + spec := map[string]interface{}{ + "clusterRef": params.ClusterRef, + "type": params.Type, + } + if params.Replication != "" { + spec["replication"] = params.Replication + } + + meta := map[string]interface{}{ + "name": params.Name, + } + if len(params.Labels) > 0 { + meta["labels"] = toStringMapInterface(params.Labels) + } + if len(params.Annotations) > 0 { + meta["annotations"] = toStringMapInterface(params.Annotations) + } + + return &unstructured.Unstructured{ + Object: map[string]interface{}{ + "apiVersion": ElasticStorageClassGVR.Group + "/" + ElasticStorageClassGVR.Version, + "kind": "ElasticStorageClass", + "metadata": meta, + "spec": spec, + }, + } +} + +// CreateElasticStorageClass creates (or updates the spec of) an +// ElasticStorageClass. Idempotent; fails fast on a Terminating existing CR. +func CreateElasticStorageClass(ctx context.Context, kubeconfig *rest.Config, params ElasticStorageClassParams) error { + if params.Name == "" { + return fmt.Errorf("ElasticStorageClass requires a Name") + } + if params.ClusterRef == "" { + return fmt.Errorf("ElasticStorageClass %s requires a ClusterRef", params.Name) + } + if params.Type == "" { + return fmt.Errorf("ElasticStorageClass %s requires a Type (RBD or CephFS)", params.Name) + } + + obj := buildElasticStorageClassObject(params) + + dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) + if err != nil { + return fmt.Errorf("failed to create dynamic client: %w", err) + } + + logger.Info("Creating ElasticStorageClass %s (clusterRef=%s, type=%s, replication=%s)", + params.Name, params.ClusterRef, params.Type, params.Replication) + + _, err = dynamicClient.Resource(ElasticStorageClassGVR).Create(ctx, obj, metav1.CreateOptions{}) + if err == nil { + logger.Success("ElasticStorageClass %s created", params.Name) + return nil + } + if !apierrors.IsAlreadyExists(err) { + return fmt.Errorf("failed to create ElasticStorageClass %s: %w", params.Name, err) + } + + logger.Info("ElasticStorageClass %s already exists, updating spec", params.Name) + existing, err := dynamicClient.Resource(ElasticStorageClassGVR).Get(ctx, params.Name, metav1.GetOptions{}) + if err != nil { + return fmt.Errorf("failed to fetch existing ElasticStorageClass %s: %w", params.Name, err) + } + if err := errIfTerminating(existing, "ElasticStorageClass", params.Name); err != nil { + return err + } + existing.Object["spec"] = obj.Object["spec"] + if _, err := dynamicClient.Resource(ElasticStorageClassGVR).Update(ctx, existing, metav1.UpdateOptions{}); err != nil { + return fmt.Errorf("failed to update ElasticStorageClass %s: %w", params.Name, err) + } + return nil +} + +// WaitForElasticStorageClassCondition blocks until the named ESC has a status +// condition of the given type at the wanted status. Refuses to wait on a +// Terminating object (see WaitForElasticClusterCondition). +func WaitForElasticStorageClassCondition(ctx context.Context, kubeconfig *rest.Config, name, condType, wantStatus string, timeout time.Duration) error { + return pollResourceUntilReady( + ctx, kubeconfig, ElasticStorageClassGVR, "", name, + timeout, PollTickInterval, "ElasticStorageClass", + func(obj *unstructured.Unstructured) (bool, string) { + status, reason, message, found := findUnstructuredCondition(obj, condType) + if found && status == wantStatus { + return true, fmt.Sprintf("%s=%s reason=%s", condType, status, reason) + } + phase, _, _ := unstructured.NestedString(obj.Object, "status", "phase") + logger.Debug("ElasticStorageClass %s %s=%q (want %q) phase=%q reason=%q msg=%q", + name, condType, status, wantStatus, phase, reason, message) + return false, "" + }, + ) +} + +// WaitForElasticStorageClassReady waits for the aggregate Ready condition to +// flip to True (pool/filesystem provisioned, csi-ceph SC materialised). +func WaitForElasticStorageClassReady(ctx context.Context, kubeconfig *rest.Config, name string, timeout time.Duration) error { + return WaitForElasticStorageClassCondition(ctx, kubeconfig, name, ElasticStorageClassConditionReady, "True", timeout) +} + +// GetElasticStorageClassCondition returns the (status, reason, message) of the +// named condition on the ESC, plus whether it exists. Single GET; wrap in a +// Gomega Eventually/Consistently to assert teardown-guard reasons. +func GetElasticStorageClassCondition(ctx context.Context, kubeconfig *rest.Config, name, condType string) (status, reason, message string, found bool, err error) { + dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) + if err != nil { + return "", "", "", false, fmt.Errorf("failed to create dynamic client: %w", err) + } + obj, err := dynamicClient.Resource(ElasticStorageClassGVR).Get(ctx, name, metav1.GetOptions{}) + if err != nil { + return "", "", "", false, fmt.Errorf("failed to get ElasticStorageClass %s: %w", name, err) + } + status, reason, message, found = findUnstructuredCondition(obj, condType) + return status, reason, message, found, nil +} + +// AnnotateElasticStorageClassForceDeletion sets the force-deletion annotation +// on the named ESC, authorising the destructive purge of a non-empty RBD +// pool. It never bypasses the bound-PV guard. Idempotent; retries on +// optimistic-concurrency conflicts. +func AnnotateElasticStorageClassForceDeletion(ctx context.Context, kubeconfig *rest.Config, name string) error { + dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) + if err != nil { + return fmt.Errorf("failed to create dynamic client: %w", err) + } + + const maxRetries = 5 + var lastErr error + for attempt := 0; attempt < maxRetries; attempt++ { + existing, err := dynamicClient.Resource(ElasticStorageClassGVR).Get(ctx, name, metav1.GetOptions{}) + if err != nil { + return fmt.Errorf("failed to get ElasticStorageClass %s: %w", name, err) + } + annotations := existing.GetAnnotations() + if annotations[ElasticStorageClassForceDeleteAnnotation] == "true" { + logger.Debug("ElasticStorageClass %s already has force-deletion annotation", name) + return nil + } + if annotations == nil { + annotations = map[string]string{} + } + annotations[ElasticStorageClassForceDeleteAnnotation] = "true" + existing.SetAnnotations(annotations) + + _, lastErr = dynamicClient.Resource(ElasticStorageClassGVR).Update(ctx, existing, metav1.UpdateOptions{}) + if lastErr == nil { + logger.Info("Annotated ElasticStorageClass %s with %s=true", name, ElasticStorageClassForceDeleteAnnotation) + return nil + } + if apierrors.IsConflict(lastErr) { + logger.Debug("Conflict annotating ElasticStorageClass %s (attempt %d/%d), retrying...", name, attempt+1, maxRetries) + continue + } + return fmt.Errorf("failed to annotate ElasticStorageClass %s: %w", name, lastErr) + } + return fmt.Errorf("failed to annotate ElasticStorageClass %s after %d attempts: %w", name, maxRetries, lastErr) +} + +// DeleteElasticStorageClass removes an ElasticStorageClass. Idempotent. +// Fire-and-forget: the controller runs the destructive pool/filesystem +// teardown (and the bound-PV guard) under its finalizer. Follow with +// WaitForElasticStorageClassGone. +func DeleteElasticStorageClass(ctx context.Context, kubeconfig *rest.Config, name string) error { + dynamicClient, err := NewDynamicClientWithRetry(ctx, kubeconfig) + if err != nil { + return fmt.Errorf("failed to create dynamic client: %w", err) + } + if err := dynamicClient.Resource(ElasticStorageClassGVR).Delete(ctx, name, metav1.DeleteOptions{}); err != nil { + if apierrors.IsNotFound(err) { + return nil + } + return fmt.Errorf("failed to delete ElasticStorageClass %s: %w", name, err) + } + logger.Info("Deleted ElasticStorageClass %s (controller teardown in progress)", name) + return nil +} + +// ElasticStorageClassGoneTimeout is the default budget for +// WaitForElasticStorageClassGone. Pool/filesystem teardown plus the csi-ceph +// SC removal take a few minutes; a force-deletion purge of a populated pool +// can take longer. +const ElasticStorageClassGoneTimeout = 10 * time.Minute + +// WaitForElasticStorageClassGone polls until the ESC GET returns NotFound. +func WaitForElasticStorageClassGone(ctx context.Context, kubeconfig *rest.Config, name string, timeout time.Duration) error { + if timeout <= 0 { + timeout = ElasticStorageClassGoneTimeout + } + return pollResourceUntilGone( + ctx, kubeconfig, ElasticStorageClassGVR, "", name, + timeout, PollTickInterval, "ElasticStorageClass", + ) +} diff --git a/pkg/kubernetes/nodegroup.go b/pkg/kubernetes/nodegroup.go index 7686fed..f55e348 100644 --- a/pkg/kubernetes/nodegroup.go +++ b/pkg/kubernetes/nodegroup.go @@ -19,29 +19,52 @@ package kubernetes import ( "context" "fmt" + "time" "k8s.io/apimachinery/pkg/api/errors" "k8s.io/client-go/rest" "github.com/deckhouse/storage-e2e/internal/kubernetes/deckhouse" + "github.com/deckhouse/storage-e2e/pkg/retry" ) -// CreateStaticNodeGroup creates a NodeGroup resource with Static nodeType +// CreateStaticNodeGroup creates a NodeGroup resource with Static nodeType. +// +// Right after bootstrap the node-manager validating webhook +// (node-controller-webhook in d8-cloud-instance-manager) is frequently not +// reachable yet, so the apiserver rejects the create with a transient +// InternalError ("failed calling webhook ... connect: operation not +// permitted"). We retry with backoff until the webhook converges; +// retry.IsRetryable already classifies both InternalError and +// "failed calling webhook" as transient. The loop is bounded by the caller's +// context (config.NodeGroupTimeout). func CreateStaticNodeGroup(ctx context.Context, config *rest.Config, name string) error { - // Check if NodeGroup already exists - _, err := deckhouse.GetNodeGroup(ctx, config, name) - if err == nil { - // NodeGroup already exists, nothing to do - return nil - } - if !errors.IsNotFound(err) { - return fmt.Errorf("failed to check if nodegroup %s exists: %w", name, err) + retryCfg := retry.Config{ + MaxRetries: 30, + InitialWait: 2 * time.Second, + MaxWait: 15 * time.Second, + Backoff: 1.5, + LogRetries: true, } - // Create NodeGroup with Static nodeType - if err := deckhouse.CreateNodeGroup(ctx, config, name, "Static"); err != nil { - return fmt.Errorf("failed to create nodegroup %s: %w", name, err) - } + return retry.DoVoid(ctx, retryCfg, fmt.Sprintf("create NodeGroup %s", name), func() error { + // Check if NodeGroup already exists. A previous (retried) attempt may + // have created it even though we never saw a success response, so this + // keeps the operation idempotent across retries. + _, err := deckhouse.GetNodeGroup(ctx, config, name) + if err == nil { + // NodeGroup already exists, nothing to do + return nil + } + if !errors.IsNotFound(err) { + return fmt.Errorf("failed to check if nodegroup %s exists: %w", name, err) + } - return nil + // Create NodeGroup with Static nodeType + if err := deckhouse.CreateNodeGroup(ctx, config, name, "Static"); err != nil { + return fmt.Errorf("failed to create nodegroup %s: %w", name, err) + } + + return nil + }) } diff --git a/pkg/kubernetes/rookconfigoverride.go b/pkg/kubernetes/rookconfigoverride.go index dab8aad..5e30439 100644 --- a/pkg/kubernetes/rookconfigoverride.go +++ b/pkg/kubernetes/rookconfigoverride.go @@ -30,6 +30,12 @@ import ( "github.com/deckhouse/storage-e2e/internal/logger" ) +// DefaultRookNamespace is the namespace the sds-elastic module deploys its +// vendored Rook operator (and the rook-config-override ConfigMap) into. It is +// the default scope for the Rook-daemon helpers that survive the removal of +// the raw-Rook cluster builders. +const DefaultRookNamespace = "d8-sds-elastic" + // RookConfigOverrideName is the well-known ConfigMap name Rook reads Ceph // config overrides from (see Rook docs: "Advanced Configuration – Custom // ceph.conf Settings"). Rook watches this ConfigMap in its operator namespace diff --git a/pkg/testkit/ceph.go b/pkg/testkit/ceph.go deleted file mode 100644 index 7427967..0000000 --- a/pkg/testkit/ceph.go +++ /dev/null @@ -1,622 +0,0 @@ -/* -Copyright 2025 Flant JSC - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package testkit - -import ( - "context" - "fmt" - "time" - - "k8s.io/client-go/rest" - - "github.com/deckhouse/storage-e2e/internal/infrastructure/ssh" - "github.com/deckhouse/storage-e2e/internal/logger" - "github.com/deckhouse/storage-e2e/pkg/kubernetes" -) - -// Re-exports of the supported CephStorageClass types so callers don't have -// to import the lower-level pkg/kubernetes package just to set cfg.Type. -const ( - CephStorageClassTypeRBD = kubernetes.CephStorageClassTypeRBD - CephStorageClassTypeCephFS = kubernetes.CephStorageClassTypeCephFS -) - -// CephStorageClassConfig controls the end-to-end provisioning of a -// Rook-managed Ceph cluster plus a csi-ceph-backed k8s StorageClass: -// -// 1. Enables Deckhouse modules required for the stack: -// sds-node-configurator, sds-elastic (Rook), csi-ceph. -// 2. (Optional) Falls back to EnsureDefaultStorageClass to produce a -// sds-local-volume StorageClass for backing OSD PVCs. -// 3. Seeds `rook-config-override` with per-test global Ceph settings -// (e.g. `ms_crc_data = false` for the PR #131 scenario). -// 4. Creates a CephCluster (Rook) and waits until it is Created. -// 5. Creates a CephBlockPool and waits until it is Ready. -// 6. Reads fsid / monitors / CephX admin key from Rook-managed secrets -// and wires them into CephClusterConnection + CephClusterAuthentication -// CRs so csi-ceph can talk to the cluster. -// 7. Creates a CephStorageClass CR and waits for the csi-ceph controller -// to materialize a core storage.k8s.io/v1 StorageClass. -// -// Only StorageClassName is strictly required; everything else has sensible -// defaults tuned for single-node / tiny test clusters. -type CephStorageClassConfig struct { - // --- Top-level identity --- - - // StorageClassName is the name of the CephStorageClass CR (and of the - // resulting k8s StorageClass). Required. - StorageClassName string - - // Namespace is the Rook / sds-elastic namespace. Default: "d8-sds-elastic". - Namespace string - - // --- sds-elastic / Rook CephCluster --- - - // CephClusterName is the Rook CephCluster name. Default: "ceph-cluster". - CephClusterName string - - // CephImage is the Ceph container image tag. Default: "quay.io/ceph/ceph:v18.2.7". - CephImage string - - // MonCount / MgrCount are the Rook mon/mgr replica counts. - // Defaults: 1 / 1 (good for 1..3 node test clusters). - MonCount int - MgrCount int - - // NetworkProvider: "" for CNI (default), "host" for host networking. - NetworkProvider string - PublicNetworkCIDRs []string - ClusterNetworkCIDRs []string - - // GlobalCephConfigOverrides populates `rook-config-override` under - // `[global]`, e.g. {"ms_crc_data": "false"}. nil / empty map leaves - // the ConfigMap untouched except for creating it as an empty `[global]`. - GlobalCephConfigOverrides map[string]string - - // --- OSD backing --- - - // OSDStorageClass is a block-capable StorageClass used to back OSD PVCs. - // When empty, EnsureDefaultStorageClass is invoked with - // OSDBackingStorageClass* to provision a sds-local-volume SC. - OSDStorageClass string - - // OSDCount is the number of OSDs. Default: 1. - OSDCount int - - // OSDSize is the size of each OSD PVC. Default: "10Gi". - OSDSize string - - // --- Fallback SC provisioning via sds-local-volume (when OSDStorageClass is empty) --- - - // OSDBackingStorageClassName names the sds-local-volume SC that we - // auto-provision for OSDs. Default: "sds-local-volume-thin-ceph-osd". - OSDBackingStorageClassName string - - // OSDBackingLVMType passed to EnsureDefaultStorageClass ("Thick"/"Thin"). - // Default: "Thick" (simpler for block-mode PVCs used as Ceph OSDs). - OSDBackingLVMType string - - // OSDBackingIncludeMasters exposes EnsureDefaultStorageClass.IncludeMasters. - OSDBackingIncludeMasters bool - - // OSDBackingBaseKubeconfig/VMNamespace/BaseStorageClassName are plumbed - // through to EnsureDefaultStorageClass to enable automatic VirtualDisk - // attachment on nested-VM clusters. - OSDBackingBaseKubeconfig *rest.Config - OSDBackingVMNamespace string - OSDBackingBaseStorageClassName string - - // MasterSSH is optional SSH access to the control plane. Not used by - // EnsureCephStorageClass in this revision; callers may set it for - // follow-up bootstrap or diagnostics hooks. - MasterSSH ssh.SSHClient - - // --- CephBlockPool --- - - // PoolName is the Rook CephBlockPool name (also becomes the Ceph pool - // name referenced by CephStorageClass.spec.rbd.pool). - // Default: "ceph-rbd-r". - PoolName string - - // ReplicaSize is the CephBlockPool replication factor. Default: 1. - ReplicaSize int - - // FailureDomain is the CRUSH failure domain: "host" or "osd". - // Default: "osd" when ReplicaSize==1, "host" otherwise. - FailureDomain string - - // --- Pool kind --- - - // Type selects the backing Ceph primitive: "RBD" (default) provisions a - // CephBlockPool; "CephFS" provisions a CephFilesystem. The resulting - // csi-ceph CephStorageClass CR mirrors this choice via spec.type. - Type string - - // --- CephFilesystem (used only when Type == "CephFS") --- - - // CephFSName is the Rook CephFilesystem name. Default: "ceph-fs". - CephFSName string - - // CephFSDataPoolName is the per-filesystem data pool name (Rook-side, - // not the full Ceph pool name). Default: "data0". - CephFSDataPoolName string - - // CephFSMetadataReplicas is the metadata pool replication factor. - // Default: ReplicaSize. - CephFSMetadataReplicas int - - // CephFSDataReplicas is the data pool replication factor. - // Default: ReplicaSize. - CephFSDataReplicas int - - // CephFSActiveMDSCount is the number of active MDS daemons. Default: 1. - CephFSActiveMDSCount int - - // --- csi-ceph wiring --- - - // ClusterConnectionName and ClusterAuthenticationName point at the - // CephClusterConnection / CephClusterAuthentication CRs we create. - // Defaults: both "-conn". - ClusterConnectionName string - ClusterAuthenticationName string - - // RBDDefaultFSType picks the mkfs used on attach. Default: "ext4". - RBDDefaultFSType string - - // --- Modules --- - - // SkipModuleEnablement disables the module-enable step (useful when the - // caller has already configured ModuleConfig on the cluster). - SkipModuleEnablement bool - - // SkipClusterTeardown leaves the underlying Rook CephCluster and the - // rook-config-override ConfigMap in place during TeardownCephStorageClass. - // Use it when several StorageClasses share a single CephCluster — the - // "owning" call should leave the flag false and tear the cluster down - // last, while every other teardown sets it to true and only removes its - // SC-specific resources (CephStorageClass / connection / auth / pool / - // filesystem). - SkipClusterTeardown bool - - // SdsElasticSettings overrides `spec.settings` of the sds-elastic - // ModuleConfig. Defaults to the minimal set that makes sense on a - // single-node test cluster. - SdsElasticSettings map[string]interface{} - - // CsiCephSettings overrides `spec.settings` of the csi-ceph ModuleConfig. - CsiCephSettings map[string]interface{} - - // CsiCephModulePullOverride pins a specific csi-ceph image tag (dev - // registry only). Useful for testing PRs that haven't been released yet. - CsiCephModulePullOverride string - - // --- Timeouts --- - - ModulesReadyTimeout time.Duration // default 15m - CephClusterReadyTimeout time.Duration // default 20m - CephPoolReadyTimeout time.Duration // default 10m - CephFilesystemReadyTimeout time.Duration // default 10m - CredentialsTimeout time.Duration // default 10m - CSICephPhaseTimeout time.Duration // default 5m - StorageClassWaitTimeout time.Duration // default 2m -} - -func (c *CephStorageClassConfig) applyDefaults() { - if c.Namespace == "" { - c.Namespace = kubernetes.DefaultRookNamespace - } - if c.CephClusterName == "" { - c.CephClusterName = kubernetes.DefaultCephClusterName - } - if c.CephImage == "" { - c.CephImage = kubernetes.DefaultCephImage - } - if c.MonCount <= 0 { - c.MonCount = 1 - } - if c.MgrCount <= 0 { - c.MgrCount = 1 - } - if c.OSDCount <= 0 { - c.OSDCount = 1 - } - if c.OSDSize == "" { - c.OSDSize = kubernetes.DefaultOSDStorageClassSize - } - if c.OSDBackingStorageClassName == "" { - c.OSDBackingStorageClassName = "sds-local-volume-thick-ceph-osd" - } - if c.OSDBackingLVMType == "" { - c.OSDBackingLVMType = "Thick" - } - if c.ReplicaSize <= 0 { - c.ReplicaSize = 1 - } - if c.PoolName == "" { - c.PoolName = fmt.Sprintf("ceph-rbd-r%d", c.ReplicaSize) - } - if c.FailureDomain == "" { - if c.ReplicaSize == 1 { - c.FailureDomain = "osd" - } else { - c.FailureDomain = "host" - } - } - if c.ClusterConnectionName == "" { - c.ClusterConnectionName = c.StorageClassName + "-conn" - } - if c.ClusterAuthenticationName == "" { - c.ClusterAuthenticationName = c.StorageClassName + "-conn" - } - if c.RBDDefaultFSType == "" { - c.RBDDefaultFSType = "ext4" - } - if c.Type == "" { - c.Type = kubernetes.CephStorageClassTypeRBD - } - if c.CephFSName == "" { - c.CephFSName = "ceph-fs" - } - if c.CephFSDataPoolName == "" { - c.CephFSDataPoolName = "data0" - } - if c.CephFSMetadataReplicas <= 0 { - c.CephFSMetadataReplicas = c.ReplicaSize - } - if c.CephFSDataReplicas <= 0 { - c.CephFSDataReplicas = c.ReplicaSize - } - if c.CephFSActiveMDSCount <= 0 { - c.CephFSActiveMDSCount = 1 - } - if c.ModulesReadyTimeout == 0 { - c.ModulesReadyTimeout = 15 * time.Minute - } - if c.CephClusterReadyTimeout == 0 { - c.CephClusterReadyTimeout = 20 * time.Minute - } - if c.CephPoolReadyTimeout == 0 { - c.CephPoolReadyTimeout = 10 * time.Minute - } - if c.CephFilesystemReadyTimeout == 0 { - c.CephFilesystemReadyTimeout = 10 * time.Minute - } - if c.CredentialsTimeout == 0 { - c.CredentialsTimeout = 10 * time.Minute - } - if c.CSICephPhaseTimeout == 0 { - c.CSICephPhaseTimeout = 5 * time.Minute - } - if c.StorageClassWaitTimeout == 0 { - c.StorageClassWaitTimeout = 2 * time.Minute - } -} - -// EnsureCephStorageClass is the high-level entry point that turns an empty -// cluster into one with a working csi-ceph StorageClass. See -// CephStorageClassConfig for the step-by-step flow. -// -// The function is idempotent: re-running it picks up the existing Rook -// CephCluster / pool / csi-ceph CRs and only fills in whatever is still -// missing. Returns the name of the resulting k8s StorageClass. -func EnsureCephStorageClass(ctx context.Context, kubeconfig *rest.Config, cfg CephStorageClassConfig) (string, error) { - cfg.applyDefaults() - - if cfg.StorageClassName == "" { - return "", fmt.Errorf("StorageClassName is required") - } - - logger.Step(1, "Enabling Deckhouse modules for csi-ceph (sds-node-configurator, sds-elastic, csi-ceph)") - if !cfg.SkipModuleEnablement { - if err := ensureCephModules(ctx, kubeconfig, cfg); err != nil { - return "", fmt.Errorf("enable ceph modules: %w", err) - } - } - logger.StepComplete(1, "Modules enabled") - - logger.Step(2, "Resolving OSD backing StorageClass") - osdSC, err := ensureOSDBackingStorageClass(ctx, kubeconfig, &cfg) - if err != nil { - return "", fmt.Errorf("resolve OSD backing StorageClass: %w", err) - } - logger.StepComplete(2, "OSD backing StorageClass: %s", osdSC) - - logger.Step(3, "Seeding rook-config-override ConfigMap") - if err := kubernetes.SetRookConfigOverride(ctx, kubeconfig, cfg.Namespace, cfg.GlobalCephConfigOverrides); err != nil { - return "", fmt.Errorf("set rook-config-override: %w", err) - } - logger.StepComplete(3, "rook-config-override ready (%d global key(s))", len(cfg.GlobalCephConfigOverrides)) - - logger.Step(4, "Creating Rook CephCluster %s/%s", cfg.Namespace, cfg.CephClusterName) - if err := kubernetes.CreateCephCluster(ctx, kubeconfig, kubernetes.CephClusterConfig{ - Name: cfg.CephClusterName, - Namespace: cfg.Namespace, - CephImage: cfg.CephImage, - MonCount: cfg.MonCount, - MgrCount: cfg.MgrCount, - NetworkProvider: cfg.NetworkProvider, - PublicNetworkCIDRs: cfg.PublicNetworkCIDRs, - ClusterNetworkCIDRs: cfg.ClusterNetworkCIDRs, - OSDStorageClass: osdSC, - OSDCount: cfg.OSDCount, - OSDSize: cfg.OSDSize, - }); err != nil { - return "", fmt.Errorf("create CephCluster: %w", err) - } - if err := kubernetes.WaitForCephClusterReady(ctx, kubeconfig, cfg.Namespace, cfg.CephClusterName, cfg.CephClusterReadyTimeout); err != nil { - return "", fmt.Errorf("wait CephCluster: %w", err) - } - logger.StepComplete(4, "CephCluster %s/%s is Created", cfg.Namespace, cfg.CephClusterName) - - switch cfg.Type { - case kubernetes.CephStorageClassTypeRBD: - logger.Step(5, "Creating CephBlockPool %s/%s (replica=%d, failureDomain=%s)", - cfg.Namespace, cfg.PoolName, cfg.ReplicaSize, cfg.FailureDomain) - if err := kubernetes.CreateCephBlockPool(ctx, kubeconfig, kubernetes.CephBlockPoolConfig{ - Name: cfg.PoolName, - Namespace: cfg.Namespace, - FailureDomain: cfg.FailureDomain, - ReplicaSize: cfg.ReplicaSize, - }); err != nil { - return "", fmt.Errorf("create CephBlockPool: %w", err) - } - if err := kubernetes.WaitForCephBlockPoolReady(ctx, kubeconfig, cfg.Namespace, cfg.PoolName, cfg.CephPoolReadyTimeout); err != nil { - return "", fmt.Errorf("wait CephBlockPool: %w", err) - } - logger.StepComplete(5, "CephBlockPool %s/%s is Ready", cfg.Namespace, cfg.PoolName) - case kubernetes.CephStorageClassTypeCephFS: - logger.Step(5, "Creating CephFilesystem %s/%s (metadata replica=%d, data pool %q replica=%d, failureDomain=%s, activeMDS=%d)", - cfg.Namespace, cfg.CephFSName, - cfg.CephFSMetadataReplicas, cfg.CephFSDataPoolName, cfg.CephFSDataReplicas, - cfg.FailureDomain, cfg.CephFSActiveMDSCount) - if err := kubernetes.CreateCephFilesystem(ctx, kubeconfig, kubernetes.CephFilesystemConfig{ - Name: cfg.CephFSName, - Namespace: cfg.Namespace, - FailureDomain: cfg.FailureDomain, - MetadataPoolReplicas: cfg.CephFSMetadataReplicas, - DataPoolName: cfg.CephFSDataPoolName, - DataPoolReplicas: cfg.CephFSDataReplicas, - MetadataServerActiveCount: cfg.CephFSActiveMDSCount, - }); err != nil { - return "", fmt.Errorf("create CephFilesystem: %w", err) - } - if err := kubernetes.WaitForCephFilesystemReady(ctx, kubeconfig, cfg.Namespace, cfg.CephFSName, cfg.CephFilesystemReadyTimeout); err != nil { - return "", fmt.Errorf("wait CephFilesystem: %w", err) - } - logger.StepComplete(5, "CephFilesystem %s/%s is Ready", cfg.Namespace, cfg.CephFSName) - default: - return "", fmt.Errorf("unsupported CephStorageClass Type: %s", cfg.Type) - } - - logger.Step(6, "Extracting Rook-managed Ceph credentials (fsid, monitors, admin key)") - creds, err := kubernetes.WaitForCephCredentials(ctx, kubeconfig, cfg.Namespace, cfg.CredentialsTimeout) - if err != nil { - return "", fmt.Errorf("wait ceph credentials: %w", err) - } - logger.StepComplete(6, "Ceph credentials: fsid=%s, user=%s, %d monitor(s): %v", - creds.FSID, creds.AdminUser, len(creds.Monitors), creds.Monitors) - - logger.Step(7, "Wiring csi-ceph: CephClusterAuthentication %q + CephClusterConnection %q", - cfg.ClusterAuthenticationName, cfg.ClusterConnectionName) - if err := kubernetes.CreateCephClusterAuthentication(ctx, kubeconfig, kubernetes.CephClusterAuthenticationConfig{ - Name: cfg.ClusterAuthenticationName, - UserID: creds.AdminUser, - UserKey: creds.AdminKey, - }); err != nil { - return "", fmt.Errorf("create CephClusterAuthentication: %w", err) - } - if err := kubernetes.CreateCephClusterConnection(ctx, kubeconfig, kubernetes.CephClusterConnectionConfig{ - Name: cfg.ClusterConnectionName, - ClusterID: creds.FSID, - Monitors: creds.Monitors, - UserID: creds.AdminUser, - UserKey: creds.AdminKey, - }); err != nil { - return "", fmt.Errorf("create CephClusterConnection: %w", err) - } - if err := kubernetes.WaitForCephClusterConnectionCreated(ctx, kubeconfig, cfg.ClusterConnectionName, cfg.CSICephPhaseTimeout); err != nil { - return "", fmt.Errorf("wait CephClusterConnection: %w", err) - } - logger.StepComplete(7, "csi-ceph wired against Ceph cluster %s", creds.FSID) - - logger.Step(8, "Creating CephStorageClass %q (type=%s) → StorageClass", cfg.StorageClassName, cfg.Type) - cscCfg := kubernetes.CephStorageClassConfig{ - Name: cfg.StorageClassName, - ClusterConnectionName: cfg.ClusterConnectionName, - ClusterAuthenticationName: cfg.ClusterAuthenticationName, - Type: cfg.Type, - } - switch cfg.Type { - case kubernetes.CephStorageClassTypeRBD: - cscCfg.RBDPool = cfg.PoolName - cscCfg.RBDDefaultFSType = cfg.RBDDefaultFSType - case kubernetes.CephStorageClassTypeCephFS: - cscCfg.CephFSName = cfg.CephFSName - cscCfg.CephFSPool = kubernetes.CephFSDataPoolFullName(cfg.CephFSName, cfg.CephFSDataPoolName) - default: - return "", fmt.Errorf("unsupported CephStorageClass Type: %s", cfg.Type) - } - if err := kubernetes.CreateCephStorageClass(ctx, kubeconfig, cscCfg); err != nil { - return "", fmt.Errorf("create CephStorageClass: %w", err) - } - if err := kubernetes.WaitForCephStorageClassCreated(ctx, kubeconfig, cfg.StorageClassName, cfg.CSICephPhaseTimeout); err != nil { - return "", fmt.Errorf("wait CephStorageClass: %w", err) - } - if err := kubernetes.WaitForStorageClass(ctx, kubeconfig, cfg.StorageClassName, cfg.StorageClassWaitTimeout); err != nil { - return "", fmt.Errorf("wait core StorageClass: %w", err) - } - logger.StepComplete(8, "StorageClass %s is available", cfg.StorageClassName) - - switch cfg.Type { - case kubernetes.CephStorageClassTypeCephFS: - logger.Success("Ceph e2e stack ready: CephCluster %s/%s + filesystem %s → StorageClass %s", - cfg.Namespace, cfg.CephClusterName, cfg.CephFSName, cfg.StorageClassName) - default: - logger.Success("Ceph e2e stack ready: CephCluster %s/%s + pool %s → StorageClass %s", - cfg.Namespace, cfg.CephClusterName, cfg.PoolName, cfg.StorageClassName) - } - return cfg.StorageClassName, nil -} - -// TeardownCephStorageClass removes the csi-ceph wiring + Rook CephCluster + -// pool + rook-config-override produced by EnsureCephStorageClass. Safe to -// call on partial state (missing resources are skipped — the first error is -// returned but subsequent deletions are still attempted). -// -// Each Delete is followed by a Wait*Gone that waits for the apiserver to -// actually GC the CR. Without this synchronization the next test run (in -// alwaysUseExisting mode, or a fresh bootstrap that re-creates the same -// namespace) would race against Rook's finalizer and either: -// - find the CR still in Terminating and try to update its spec (no-op -// while the controller unwinds the finalizer); -// - delete the parent CephCluster while a child CephBlockPool / -// CephFilesystem is still alive — Rook then sets `DeletionIsBlocked / -// ObjectHasDependents` and the CephCluster sticks in `phase=Deleting` -// forever. -// -// On a Wait*Gone timeout we DO NOT auto-strip finalizers: the failure is -// surfaced as an aggregated error so the operator can investigate the -// cluster (typical reasons: HEALTH_ERR Ceph, stuck OSD prepare, dead mgr). -// -// It deliberately does NOT disable the Deckhouse modules: they may be owned -// by the cluster admin, and re-bootstrapping is cheaper than a full -// module-disable → module-enable cycle. -func TeardownCephStorageClass(ctx context.Context, kubeconfig *rest.Config, cfg CephStorageClassConfig) error { - cfg.applyDefaults() - - var firstErr error - note := func(err error, what string) { - if err == nil { - return - } - logger.Warn("teardown: %s: %v", what, err) - if firstErr == nil { - firstErr = fmt.Errorf("%s: %w", what, err) - } - } - - logger.Info("Tearing down csi-ceph StorageClass %q (type=%s)", cfg.StorageClassName, cfg.Type) - - // 1. CephStorageClass: leaf, no finalizer dependency on the rest. - note(kubernetes.DeleteCephStorageClass(ctx, kubeconfig, cfg.StorageClassName), "delete CephStorageClass") - note(kubernetes.WaitForCephStorageClassGone(ctx, kubeconfig, cfg.StorageClassName, 0), "wait CephStorageClass gone") - - // 2. CephClusterConnection / CephClusterAuthentication: csi-ceph CRs. - // Order between conn and auth doesn't matter — neither depends on the - // other. - note(kubernetes.DeleteCephClusterConnection(ctx, kubeconfig, cfg.ClusterConnectionName), "delete CephClusterConnection") - note(kubernetes.WaitForCephClusterConnectionGone(ctx, kubeconfig, cfg.ClusterConnectionName, 0), "wait CephClusterConnection gone") - - note(kubernetes.DeleteCephClusterAuthentication(ctx, kubeconfig, cfg.ClusterAuthenticationName), "delete CephClusterAuthentication") - note(kubernetes.WaitForCephClusterAuthenticationGone(ctx, kubeconfig, cfg.ClusterAuthenticationName, 0), "wait CephClusterAuthentication gone") - - // 3. Pool / Filesystem: must be fully gone before deleting CephCluster, - // otherwise Rook records DeletionIsBlocked / ObjectHasDependents. - switch cfg.Type { - case kubernetes.CephStorageClassTypeCephFS: - note(kubernetes.DeleteCephFilesystem(ctx, kubeconfig, cfg.Namespace, cfg.CephFSName), "delete CephFilesystem") - note(kubernetes.WaitForCephFilesystemGone(ctx, kubeconfig, cfg.Namespace, cfg.CephFSName, 0), "wait CephFilesystem gone") - default: - note(kubernetes.DeleteCephBlockPool(ctx, kubeconfig, cfg.Namespace, cfg.PoolName), "delete CephBlockPool") - note(kubernetes.WaitForCephBlockPoolGone(ctx, kubeconfig, cfg.Namespace, cfg.PoolName, 0), "wait CephBlockPool gone") - } - - // 4. CephCluster: only when this teardown call owns it (the other - // TeardownCephStorageClass call shares the same Rook cluster — see - // SkipClusterTeardown doc-comment). - if !cfg.SkipClusterTeardown { - note(kubernetes.DeleteCephCluster(ctx, kubeconfig, cfg.Namespace, cfg.CephClusterName), "delete CephCluster") - note(kubernetes.WaitForCephClusterGone(ctx, kubeconfig, cfg.Namespace, cfg.CephClusterName, 0), "wait CephCluster gone") - note(kubernetes.DeleteRookConfigOverride(ctx, kubeconfig, cfg.Namespace), "delete rook-config-override") - } else { - logger.Info("Skipping CephCluster + rook-config-override teardown (SkipClusterTeardown=true)") - } - return firstErr -} - -// EnsureDefaultCephStorageClass is EnsureCephStorageClass + SetGlobalDefaultStorageClass. -// After this call new PVCs without an explicit storageClassName will use the -// freshly-provisioned Ceph RBD class. -func EnsureDefaultCephStorageClass(ctx context.Context, kubeconfig *rest.Config, cfg CephStorageClassConfig) (string, error) { - scName, err := EnsureCephStorageClass(ctx, kubeconfig, cfg) - if err != nil { - return "", err - } - if err := kubernetes.SetGlobalDefaultStorageClass(ctx, kubeconfig, scName); err != nil { - return "", fmt.Errorf("set %s as default in global ModuleConfig: %w", scName, err) - } - logger.Success("StorageClass %s set as cluster default", scName) - return scName, nil -} - -// ensureCephModules enables sds-node-configurator + sds-elastic + csi-ceph -// and waits for their Ready phase. -func ensureCephModules(ctx context.Context, kubeconfig *rest.Config, cfg CephStorageClassConfig) error { - sdsElasticSettings := cfg.SdsElasticSettings - if sdsElasticSettings == nil { - sdsElasticSettings = map[string]interface{}{} - } - - csiCephSettings := cfg.CsiCephSettings - if csiCephSettings == nil { - csiCephSettings = map[string]interface{}{} - } - - modules := []kubernetes.ModuleSpec{ - { - Name: "sds-node-configurator", - Version: 1, - Enabled: true, - }, - { - Name: "sds-elastic", - Version: 1, - Enabled: true, - Settings: sdsElasticSettings, - Dependencies: []string{"sds-node-configurator"}, - }, - { - Name: "csi-ceph", - Version: 1, - Enabled: true, - Settings: csiCephSettings, - Dependencies: []string{"sds-elastic"}, - ModulePullOverride: cfg.CsiCephModulePullOverride, - }, - } - return kubernetes.EnableModulesAndWait(ctx, kubeconfig, nil, nil, modules, cfg.ModulesReadyTimeout) -} - -// ensureOSDBackingStorageClass returns an already-existing SC name (if the -// caller supplied OSDStorageClass) or delegates to EnsureDefaultStorageClass -// to provision a sds-local-volume SC on the fly. -func ensureOSDBackingStorageClass(ctx context.Context, kubeconfig *rest.Config, cfg *CephStorageClassConfig) (string, error) { - if cfg.OSDStorageClass != "" { - logger.Info("Using pre-existing OSD backing StorageClass %s", cfg.OSDStorageClass) - return cfg.OSDStorageClass, nil - } - - localCfg := DefaultStorageClassConfig{ - StorageClassName: cfg.OSDBackingStorageClassName, - LVMType: cfg.OSDBackingLVMType, - IncludeMasters: cfg.OSDBackingIncludeMasters, - BaseKubeconfig: cfg.OSDBackingBaseKubeconfig, - VMNamespace: cfg.OSDBackingVMNamespace, - BaseStorageClassName: cfg.OSDBackingBaseStorageClassName, - } - return EnsureDefaultStorageClass(ctx, kubeconfig, localCfg) -} diff --git a/pkg/testkit/ceph_cluster.go b/pkg/testkit/ceph_cluster.go deleted file mode 100644 index cf683f2..0000000 --- a/pkg/testkit/ceph_cluster.go +++ /dev/null @@ -1,295 +0,0 @@ -/* -Copyright 2026 Flant JSC - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package testkit - -import ( - "context" - "fmt" - "time" - - "k8s.io/client-go/rest" - - "github.com/deckhouse/storage-e2e/internal/logger" - "github.com/deckhouse/storage-e2e/pkg/kubernetes" -) - -// RookCephClusterConfig configures EnsureCephCluster — the "just bring up -// a Rook-managed Ceph cluster + pool" variant of EnsureCephStorageClass. -// -// Unlike EnsureCephStorageClass, EnsureCephCluster does NOT: -// - enable the `csi-ceph` Deckhouse module; -// - create CephClusterConnection / CephClusterAuthentication CRs; -// - create a CephStorageClass CR / materialize a core StorageClass. -// -// It stops once the Rook CephCluster is Created and the CephBlockPool is -// Ready. Use this when the test suite needs a live Ceph backend to exercise -// (e.g. to run rbd / ceph CLI against it, or to hook some other client) but -// deliberately does NOT want csi-ceph in the picture. -type RookCephClusterConfig struct { - // --- Namespacing / naming --- - - // Namespace is the Rook / sds-elastic namespace. Default: "d8-sds-elastic". - Namespace string - - // CephClusterName is the Rook CephCluster name. Default: "ceph-cluster". - CephClusterName string - - // CephImage is the Ceph container image. Default: - // "quay.io/ceph/ceph:v18.2.7". - CephImage string - - // MonCount / MgrCount are the Rook mon/mgr replica counts. - // Defaults: 1 / 1 (appropriate for 1..3-node test clusters). - MonCount int - MgrCount int - - // NetworkProvider: "" for CNI (default), "host" for host networking. - NetworkProvider string - PublicNetworkCIDRs []string - ClusterNetworkCIDRs []string - - // GlobalCephConfigOverrides populates `rook-config-override` under - // `[global]`, e.g. {"ms_crc_data": "false"} for the csi-ceph - // msCrcData matrix. nil leaves the ConfigMap otherwise empty. - GlobalCephConfigOverrides map[string]string - - // --- OSD backing --- - - // OSDStorageClass is a block-capable StorageClass used to back OSD PVCs. - // When empty, EnsureDefaultStorageClass is invoked with - // OSDBacking* to provision a sds-local-volume SC on the fly. - OSDStorageClass string - - // OSDCount is the number of OSDs. Default: 1. - OSDCount int - - // OSDSize is the size of each OSD PVC. Default: kubernetes.DefaultOSDStorageClassSize. - OSDSize string - - // --- Fallback SC provisioning via sds-local-volume --- - - // OSDBackingStorageClassName names the sds-local-volume SC we auto- - // provision for OSDs. Default: "sds-local-volume-thick-ceph-osd". - OSDBackingStorageClassName string - - // OSDBackingLVMType ("Thick"/"Thin"). Default: "Thick". - OSDBackingLVMType string - - OSDBackingIncludeMasters bool - OSDBackingBaseKubeconfig *rest.Config - OSDBackingVMNamespace string - OSDBackingBaseStorageClassName string - - // --- CephBlockPool --- - - // PoolName is the Rook CephBlockPool name. Default: - // "ceph-rbd-r". - PoolName string - - // ReplicaSize is the CephBlockPool replication factor. Default: 1. - ReplicaSize int - - // FailureDomain: "host" or "osd". Default: "osd" when ReplicaSize==1, - // "host" otherwise. - FailureDomain string - - // --- Modules --- - - // SkipModuleEnablement disables the module-enable step (useful when - // the caller has already enabled sds-node-configurator + sds-elastic - // through other means). - SkipModuleEnablement bool - - // SdsElasticSettings overrides `spec.settings` of the sds-elastic - // ModuleConfig. Defaults to an empty map. - SdsElasticSettings map[string]interface{} - - // --- Timeouts --- - - ModulesReadyTimeout time.Duration // default 15m - CephClusterReadyTimeout time.Duration // default 20m - CephPoolReadyTimeout time.Duration // default 10m -} - -func (c *RookCephClusterConfig) applyDefaults() { - if c.Namespace == "" { - c.Namespace = kubernetes.DefaultRookNamespace - } - if c.CephClusterName == "" { - c.CephClusterName = kubernetes.DefaultCephClusterName - } - if c.CephImage == "" { - c.CephImage = kubernetes.DefaultCephImage - } - if c.MonCount <= 0 { - c.MonCount = 1 - } - if c.MgrCount <= 0 { - c.MgrCount = 1 - } - if c.OSDCount <= 0 { - c.OSDCount = 1 - } - if c.OSDSize == "" { - c.OSDSize = kubernetes.DefaultOSDStorageClassSize - } - if c.OSDBackingStorageClassName == "" { - c.OSDBackingStorageClassName = "sds-local-volume-thick-ceph-osd" - } - if c.OSDBackingLVMType == "" { - c.OSDBackingLVMType = "Thick" - } - if c.ReplicaSize <= 0 { - c.ReplicaSize = 1 - } - if c.PoolName == "" { - c.PoolName = fmt.Sprintf("ceph-rbd-r%d", c.ReplicaSize) - } - if c.FailureDomain == "" { - if c.ReplicaSize == 1 { - c.FailureDomain = "osd" - } else { - c.FailureDomain = "host" - } - } - if c.ModulesReadyTimeout == 0 { - c.ModulesReadyTimeout = 15 * time.Minute - } - if c.CephClusterReadyTimeout == 0 { - c.CephClusterReadyTimeout = 20 * time.Minute - } - if c.CephPoolReadyTimeout == 0 { - c.CephPoolReadyTimeout = 10 * time.Minute - } -} - -// EnsureCephCluster brings up (or reuses) a Rook-managed Ceph cluster plus -// a CephBlockPool via sds-elastic — without touching csi-ceph. -// -// Flow: -// 1. Enable Deckhouse modules: sds-node-configurator + sds-elastic. -// 2. Resolve an OSD backing StorageClass (re-using EnsureDefaultStorageClass -// when none is pre-provided). -// 3. Seed `rook-config-override` with per-test global Ceph settings. -// 4. Create the Rook CephCluster and wait until it is Created. -// 5. Create the CephBlockPool and wait until it is Ready. -// -// Idempotent: re-running picks up existing resources. Returns the pool -// name (same one callers would reference as Ceph pool, e.g. for a -// subsequent `rbd create`/`CephStorageClass.rbd.pool`). -func EnsureCephCluster(ctx context.Context, kubeconfig *rest.Config, cfg RookCephClusterConfig) (string, error) { - cfg.applyDefaults() - - logger.Step(1, "Enabling Deckhouse modules for Rook (sds-node-configurator, sds-elastic)") - if !cfg.SkipModuleEnablement { - if err := ensureRookModules(ctx, kubeconfig, cfg.SdsElasticSettings, cfg.ModulesReadyTimeout); err != nil { - return "", fmt.Errorf("enable rook modules: %w", err) - } - } - logger.StepComplete(1, "Modules enabled") - - logger.Step(2, "Resolving OSD backing StorageClass") - osdSC := cfg.OSDStorageClass - if osdSC == "" { - local := DefaultStorageClassConfig{ - StorageClassName: cfg.OSDBackingStorageClassName, - LVMType: cfg.OSDBackingLVMType, - IncludeMasters: cfg.OSDBackingIncludeMasters, - BaseKubeconfig: cfg.OSDBackingBaseKubeconfig, - VMNamespace: cfg.OSDBackingVMNamespace, - BaseStorageClassName: cfg.OSDBackingBaseStorageClassName, - } - name, err := EnsureDefaultStorageClass(ctx, kubeconfig, local) - if err != nil { - return "", fmt.Errorf("resolve OSD backing StorageClass: %w", err) - } - osdSC = name - } else { - logger.Info("Using pre-existing OSD backing StorageClass %s", osdSC) - } - logger.StepComplete(2, "OSD backing StorageClass: %s", osdSC) - - logger.Step(3, "Seeding rook-config-override ConfigMap") - if err := kubernetes.SetRookConfigOverride(ctx, kubeconfig, cfg.Namespace, cfg.GlobalCephConfigOverrides); err != nil { - return "", fmt.Errorf("set rook-config-override: %w", err) - } - logger.StepComplete(3, "rook-config-override ready (%d global key(s))", len(cfg.GlobalCephConfigOverrides)) - - logger.Step(4, "Creating Rook CephCluster %s/%s", cfg.Namespace, cfg.CephClusterName) - if err := kubernetes.CreateCephCluster(ctx, kubeconfig, kubernetes.CephClusterConfig{ - Name: cfg.CephClusterName, - Namespace: cfg.Namespace, - CephImage: cfg.CephImage, - MonCount: cfg.MonCount, - MgrCount: cfg.MgrCount, - NetworkProvider: cfg.NetworkProvider, - PublicNetworkCIDRs: cfg.PublicNetworkCIDRs, - ClusterNetworkCIDRs: cfg.ClusterNetworkCIDRs, - OSDStorageClass: osdSC, - OSDCount: cfg.OSDCount, - OSDSize: cfg.OSDSize, - }); err != nil { - return "", fmt.Errorf("create CephCluster: %w", err) - } - if err := kubernetes.WaitForCephClusterReady(ctx, kubeconfig, cfg.Namespace, cfg.CephClusterName, cfg.CephClusterReadyTimeout); err != nil { - return "", fmt.Errorf("wait CephCluster: %w", err) - } - logger.StepComplete(4, "CephCluster %s/%s is Created", cfg.Namespace, cfg.CephClusterName) - - logger.Step(5, "Creating CephBlockPool %s/%s (replica=%d, failureDomain=%s)", - cfg.Namespace, cfg.PoolName, cfg.ReplicaSize, cfg.FailureDomain) - if err := kubernetes.CreateCephBlockPool(ctx, kubeconfig, kubernetes.CephBlockPoolConfig{ - Name: cfg.PoolName, - Namespace: cfg.Namespace, - FailureDomain: cfg.FailureDomain, - ReplicaSize: cfg.ReplicaSize, - }); err != nil { - return "", fmt.Errorf("create CephBlockPool: %w", err) - } - if err := kubernetes.WaitForCephBlockPoolReady(ctx, kubeconfig, cfg.Namespace, cfg.PoolName, cfg.CephPoolReadyTimeout); err != nil { - return "", fmt.Errorf("wait CephBlockPool: %w", err) - } - logger.StepComplete(5, "CephBlockPool %s/%s is Ready", cfg.Namespace, cfg.PoolName) - - logger.Success("Ceph cluster ready: CephCluster %s/%s + pool %s (no csi-ceph wiring)", - cfg.Namespace, cfg.CephClusterName, cfg.PoolName) - return cfg.PoolName, nil -} - -// ensureRookModules enables sds-node-configurator + sds-elastic (and nothing -// else). Used by EnsureCephCluster and as the Rook-only step of -// EnsureCephStorageClass's module list. -func ensureRookModules(ctx context.Context, kubeconfig *rest.Config, sdsElasticSettings map[string]interface{}, readyTimeout time.Duration) error { - if sdsElasticSettings == nil { - sdsElasticSettings = map[string]interface{}{} - } - modules := []kubernetes.ModuleSpec{ - { - Name: "sds-node-configurator", - Version: 1, - Enabled: true, - }, - { - Name: "sds-elastic", - Version: 1, - Enabled: true, - Settings: sdsElasticSettings, - Dependencies: []string{"sds-node-configurator"}, - }, - } - return kubernetes.EnableModulesAndWait(ctx, kubeconfig, nil, nil, modules, readyTimeout) -} diff --git a/pkg/testkit/elastic.go b/pkg/testkit/elastic.go new file mode 100644 index 0000000..3d7d231 --- /dev/null +++ b/pkg/testkit/elastic.go @@ -0,0 +1,383 @@ +/* +Copyright 2026 Flant JSC + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package testkit + +import ( + "context" + "fmt" + "time" + + "k8s.io/client-go/rest" + + "github.com/deckhouse/storage-e2e/internal/logger" + "github.com/deckhouse/storage-e2e/pkg/kubernetes" +) + +// Re-exports of the supported ElasticStorageClass enums so callers don't have +// to import the lower-level pkg/kubernetes package just to set cfg.Type / +// cfg.Replication. +const ( + ElasticStorageClassTypeRBD = kubernetes.ElasticStorageClassTypeRBD + ElasticStorageClassTypeCephFS = kubernetes.ElasticStorageClassTypeCephFS + + ElasticReplicationAvailabilityWithoutConsistency = kubernetes.ElasticReplicationAvailabilityWithoutConsistency + ElasticReplicationConsistencyAndAvailability = kubernetes.ElasticReplicationConsistencyAndAvailability + ElasticReplicationHighRedundancy = kubernetes.ElasticReplicationHighRedundancy + ElasticReplicationErasureCodedCompact = kubernetes.ElasticReplicationErasureCodedCompact +) + +// Default labels used to mark storage nodes / OSD BlockDevices for an +// ElasticCluster in e2e. The keys are namespaced under a dedicated e2e prefix +// so they never collide with anything the module itself sets. +const ( + DefaultElasticStorageNodeLabelKey = "sds-elastic-e2e.storage.deckhouse.io/storage-node" + DefaultElasticStorageNodeLabelValue = "true" + DefaultElasticOSDLabelKey = "sds-elastic-e2e.storage.deckhouse.io/osd" + DefaultElasticOSDLabelValue = "true" + + // DefaultElasticClusterReadyTimeout covers Rook bringing up the full + // CephCluster (mon/mgr/osd) on top of LVM-local storage, plus the + // credential backup and csi-ceph wiring stages of the EC reconcile. + DefaultElasticClusterReadyTimeout = 25 * time.Minute + + // DefaultElasticStorageClassReadyTimeout covers pool/filesystem + // provisioning + csi-ceph StorageClass materialisation. + DefaultElasticStorageClassReadyTimeout = 10 * time.Minute +) + +// ElasticOSDBlockDevicesConfig describes how to prepare raw disks for OSD +// adoption by an ElasticCluster: label a set of storage nodes, wait for +// sds-node-configurator to publish consumable BlockDevices on them, then +// label those BlockDevices so the EC's blockDeviceSelector can adopt them. +type ElasticOSDBlockDevicesConfig struct { + // StorageNodeNames is the explicit set of nodes to use. When empty, all + // worker nodes are used. + StorageNodeNames []string + + // NodeLabelKey / NodeLabelValue is applied to every storage node and is + // what ElasticCluster.spec.storage.nodeSelector.matchLabels should match. + // Defaults: DefaultElasticStorageNodeLabelKey / DefaultElasticStorageNodeLabelValue. + NodeLabelKey string + NodeLabelValue string + + // BlockDeviceLabelKey / BlockDeviceLabelValue is applied to every + // consumable BlockDevice found on the storage nodes and is what + // ElasticCluster.spec.storage.blockDeviceSelector.matchLabels should + // match. Defaults: DefaultElasticOSDLabelKey / DefaultElasticOSDLabelValue. + BlockDeviceLabelKey string + BlockDeviceLabelValue string + + // MinBlockDevices is the minimum number of consumable BlockDevices that + // must appear on the storage nodes before labelling proceeds. Default: 1. + MinBlockDevices int + + // BlockDeviceWaitTimeout bounds the wait for consumable BlockDevices to + // surface. Default: 10m. + BlockDeviceWaitTimeout time.Duration +} + +func (c *ElasticOSDBlockDevicesConfig) applyDefaults() { + if c.NodeLabelKey == "" { + c.NodeLabelKey = DefaultElasticStorageNodeLabelKey + } + if c.NodeLabelValue == "" { + c.NodeLabelValue = DefaultElasticStorageNodeLabelValue + } + if c.BlockDeviceLabelKey == "" { + c.BlockDeviceLabelKey = DefaultElasticOSDLabelKey + } + if c.BlockDeviceLabelValue == "" { + c.BlockDeviceLabelValue = DefaultElasticOSDLabelValue + } + if c.MinBlockDevices <= 0 { + c.MinBlockDevices = 1 + } + if c.BlockDeviceWaitTimeout == 0 { + c.BlockDeviceWaitTimeout = 10 * time.Minute + } +} + +const blockDevicePollInterval = 10 * time.Second + +// EnsureElasticOSDBlockDevices labels storage nodes and the consumable +// BlockDevices on them so an ElasticCluster can adopt the disks for OSDs. +// Returns the names of the labelled BlockDevices. +// +// Idempotent: re-running re-applies the (already present) labels. It does NOT +// create the ElasticCluster — call EnsureElasticCluster afterwards with +// matching selectors. +func EnsureElasticOSDBlockDevices(ctx context.Context, kubeconfig *rest.Config, cfg ElasticOSDBlockDevicesConfig) ([]string, error) { + cfg.applyDefaults() + + logger.Step(1, "Resolving storage nodes") + nodeNames := cfg.StorageNodeNames + if len(nodeNames) == 0 { + workers, err := kubernetes.GetWorkerNodes(ctx, kubeconfig) + if err != nil { + return nil, fmt.Errorf("list worker nodes: %w", err) + } + for i := range workers { + nodeNames = append(nodeNames, workers[i].Name) + } + } + if len(nodeNames) == 0 { + return nil, fmt.Errorf("no storage nodes resolved (StorageNodeNames empty and no worker nodes found)") + } + logger.StepComplete(1, "Storage nodes: %v", nodeNames) + + logger.Step(2, "Labelling storage nodes with %s=%s", cfg.NodeLabelKey, cfg.NodeLabelValue) + if err := kubernetes.LabelNodes(ctx, kubeconfig, nodeNames, cfg.NodeLabelKey, cfg.NodeLabelValue); err != nil { + return nil, fmt.Errorf("label storage nodes: %w", err) + } + logger.StepComplete(2, "Storage nodes labelled") + + logger.Step(3, "Waiting for >= %d consumable BlockDevice(s) on storage nodes (timeout %v)", + cfg.MinBlockDevices, cfg.BlockDeviceWaitTimeout) + bds, err := waitForConsumableBlockDevicesOnNodes(ctx, kubeconfig, nodeNames, cfg.MinBlockDevices, cfg.BlockDeviceWaitTimeout) + if err != nil { + return nil, err + } + logger.StepComplete(3, "Found %d consumable BlockDevice(s)", len(bds)) + + logger.Step(4, "Labelling %d BlockDevice(s) with %s=%s", len(bds), cfg.BlockDeviceLabelKey, cfg.BlockDeviceLabelValue) + labelled := make([]string, 0, len(bds)) + for _, bd := range bds { + if err := kubernetes.LabelBlockDevice(ctx, kubeconfig, bd.Name, cfg.BlockDeviceLabelKey, cfg.BlockDeviceLabelValue); err != nil { + return nil, fmt.Errorf("label BlockDevice %s: %w", bd.Name, err) + } + labelled = append(labelled, bd.Name) + } + logger.StepComplete(4, "Labelled BlockDevices: %v", labelled) + + logger.Success("Prepared %d OSD BlockDevice(s) across %d storage node(s)", len(labelled), len(nodeNames)) + return labelled, nil +} + +// waitForConsumableBlockDevicesOnNodes polls until at least minCount +// consumable BlockDevices live on the given nodes (or the timeout fires). +func waitForConsumableBlockDevicesOnNodes(ctx context.Context, kubeconfig *rest.Config, nodeNames []string, minCount int, timeout time.Duration) ([]kubernetes.BlockDevice, error) { + nodeSet := make(map[string]struct{}, len(nodeNames)) + for _, n := range nodeNames { + nodeSet[n] = struct{}{} + } + + deadlineCtx, cancel := context.WithTimeout(ctx, timeout) + defer cancel() + + ticker := time.NewTicker(blockDevicePollInterval) + defer ticker.Stop() + + var lastSeen int + for { + all, err := kubernetes.GetConsumableBlockDevices(deadlineCtx, kubeconfig) + if err != nil { + logger.Debug("listing consumable BlockDevices failed (will retry): %v", err) + } else { + var onNodes []kubernetes.BlockDevice + for _, bd := range all { + if _, ok := nodeSet[bd.NodeName]; ok { + onNodes = append(onNodes, bd) + } + } + lastSeen = len(onNodes) + if lastSeen >= minCount { + return onNodes, nil + } + logger.Debug("consumable BlockDevices on storage nodes: %d/%d", lastSeen, minCount) + } + + select { + case <-deadlineCtx.Done(): + return nil, fmt.Errorf("timeout waiting for >= %d consumable BlockDevices on storage nodes (last seen %d): %w", + minCount, lastSeen, deadlineCtx.Err()) + case <-ticker.C: + } + } +} + +// ElasticClusterConfig drives EnsureElasticCluster: create an ElasticCluster +// with the given selectors and wait until it is Ready. +type ElasticClusterConfig struct { + // Name of the ElasticCluster (cluster-scoped). Required. + Name string + + // NodeSelectorMatchLabels / BlockDeviceSelectorMatchLabels populate the + // EC storage selectors. They should match the labels applied by + // EnsureElasticOSDBlockDevices. Required. + NodeSelectorMatchLabels map[string]string + BlockDeviceSelectorMatchLabels map[string]string + + // NetworkPublic / NetworkCluster optionally pin spec.network. + NetworkPublic string + NetworkCluster string + + Labels map[string]string + Annotations map[string]string + + // ReadyTimeout bounds the wait for the EC Ready condition. + // Default: DefaultElasticClusterReadyTimeout. + ReadyTimeout time.Duration +} + +// EnsureElasticCluster creates (or reuses) an ElasticCluster and waits until +// its aggregate Ready condition is True. Returns the EC name. +func EnsureElasticCluster(ctx context.Context, kubeconfig *rest.Config, cfg ElasticClusterConfig) (string, error) { + if cfg.Name == "" { + return "", fmt.Errorf("ElasticClusterConfig.Name is required") + } + if cfg.ReadyTimeout == 0 { + cfg.ReadyTimeout = DefaultElasticClusterReadyTimeout + } + + logger.Step(1, "Creating ElasticCluster %s", cfg.Name) + if err := kubernetes.CreateElasticCluster(ctx, kubeconfig, kubernetes.ElasticClusterParams{ + Name: cfg.Name, + NodeSelectorMatchLabels: cfg.NodeSelectorMatchLabels, + BlockDeviceSelectorMatchLabels: cfg.BlockDeviceSelectorMatchLabels, + NetworkPublic: cfg.NetworkPublic, + NetworkCluster: cfg.NetworkCluster, + Labels: cfg.Labels, + Annotations: cfg.Annotations, + }); err != nil { + return "", fmt.Errorf("create ElasticCluster: %w", err) + } + logger.StepComplete(1, "ElasticCluster %s created", cfg.Name) + + logger.Step(2, "Waiting for ElasticCluster %s to become Ready (timeout %v)", cfg.Name, cfg.ReadyTimeout) + if err := kubernetes.WaitForElasticClusterReady(ctx, kubeconfig, cfg.Name, cfg.ReadyTimeout); err != nil { + return "", fmt.Errorf("wait ElasticCluster Ready: %w", err) + } + logger.StepComplete(2, "ElasticCluster %s is Ready", cfg.Name) + + logger.Success("ElasticCluster %s ready", cfg.Name) + return cfg.Name, nil +} + +// ElasticStorageClassConfig drives EnsureElasticStorageClass. +type ElasticStorageClassConfig struct { + // Name of the ESC; also the resulting csi-ceph CephStorageClass and core + // k8s StorageClass name. Required. + Name string + + // ClusterRef is the owning ElasticCluster. Required. + ClusterRef string + + // Type selects RBD or CephFS. Required. + Type string + + // Replication picks the strategy. Empty defaults to the CRD default + // (ConsistencyAndAvailability). + Replication string + + Labels map[string]string + Annotations map[string]string + + // ReadyTimeout bounds the wait for the ESC Ready condition. + // Default: DefaultElasticStorageClassReadyTimeout. + ReadyTimeout time.Duration + + // StorageClassWaitTimeout bounds the extra wait for the core k8s + // StorageClass to materialise after the ESC is Ready. Default: 2m. + StorageClassWaitTimeout time.Duration +} + +// EnsureElasticStorageClass creates (or reuses) an ElasticStorageClass, waits +// until it is Ready, and confirms the 1:1-named core StorageClass exists. +// Returns the StorageClass name (== ESC name). +func EnsureElasticStorageClass(ctx context.Context, kubeconfig *rest.Config, cfg ElasticStorageClassConfig) (string, error) { + if cfg.Name == "" { + return "", fmt.Errorf("ElasticStorageClassConfig.Name is required") + } + if cfg.ClusterRef == "" { + return "", fmt.Errorf("ElasticStorageClassConfig.ClusterRef is required") + } + if cfg.Type == "" { + return "", fmt.Errorf("ElasticStorageClassConfig.Type is required (RBD or CephFS)") + } + if cfg.ReadyTimeout == 0 { + cfg.ReadyTimeout = DefaultElasticStorageClassReadyTimeout + } + if cfg.StorageClassWaitTimeout == 0 { + cfg.StorageClassWaitTimeout = 2 * time.Minute + } + + logger.Step(1, "Creating ElasticStorageClass %s (clusterRef=%s, type=%s, replication=%s)", + cfg.Name, cfg.ClusterRef, cfg.Type, cfg.Replication) + if err := kubernetes.CreateElasticStorageClass(ctx, kubeconfig, kubernetes.ElasticStorageClassParams{ + Name: cfg.Name, + ClusterRef: cfg.ClusterRef, + Type: cfg.Type, + Replication: cfg.Replication, + Labels: cfg.Labels, + Annotations: cfg.Annotations, + }); err != nil { + return "", fmt.Errorf("create ElasticStorageClass: %w", err) + } + logger.StepComplete(1, "ElasticStorageClass %s created", cfg.Name) + + logger.Step(2, "Waiting for ElasticStorageClass %s to become Ready (timeout %v)", cfg.Name, cfg.ReadyTimeout) + if err := kubernetes.WaitForElasticStorageClassReady(ctx, kubeconfig, cfg.Name, cfg.ReadyTimeout); err != nil { + return "", fmt.Errorf("wait ElasticStorageClass Ready: %w", err) + } + logger.StepComplete(2, "ElasticStorageClass %s is Ready", cfg.Name) + + logger.Step(3, "Waiting for core StorageClass %s to materialise", cfg.Name) + if err := kubernetes.WaitForStorageClass(ctx, kubeconfig, cfg.Name, cfg.StorageClassWaitTimeout); err != nil { + return "", fmt.Errorf("wait core StorageClass: %w", err) + } + logger.StepComplete(3, "StorageClass %s is available", cfg.Name) + + logger.Success("ElasticStorageClass %s ready (type=%s)", cfg.Name, cfg.Type) + return cfg.Name, nil +} + +// TeardownElasticStorageClass deletes an ElasticStorageClass and waits until +// it is fully gone. When force is true, the destructive force-deletion +// annotation is set first (authorising the purge of a non-empty RBD pool); it +// never bypasses the bound-PV guard. Safe to call on missing resources. +func TeardownElasticStorageClass(ctx context.Context, kubeconfig *rest.Config, name string, force bool, timeout time.Duration) error { + if force { + logger.Info("Setting force-deletion annotation on ElasticStorageClass %s", name) + if err := kubernetes.AnnotateElasticStorageClassForceDeletion(ctx, kubeconfig, name); err != nil { + return fmt.Errorf("annotate ElasticStorageClass force-deletion: %w", err) + } + } + if err := kubernetes.DeleteElasticStorageClass(ctx, kubeconfig, name); err != nil { + return fmt.Errorf("delete ElasticStorageClass: %w", err) + } + if err := kubernetes.WaitForElasticStorageClassGone(ctx, kubeconfig, name, timeout); err != nil { + return fmt.Errorf("wait ElasticStorageClass gone: %w", err) + } + logger.Success("ElasticStorageClass %s torn down", name) + return nil +} + +// TeardownElasticCluster deletes an ElasticCluster and waits until it is fully +// gone (the controller finalizer first tears down the whole Rook CephCluster +// and the csi-ceph wiring). Safe to call on missing resources. Tear down all +// referencing ElasticStorageClasses first — otherwise the EC sticks on the +// non-bypassable StorageClassesExist guard. +func TeardownElasticCluster(ctx context.Context, kubeconfig *rest.Config, name string, timeout time.Duration) error { + if err := kubernetes.DeleteElasticCluster(ctx, kubeconfig, name); err != nil { + return fmt.Errorf("delete ElasticCluster: %w", err) + } + if err := kubernetes.WaitForElasticClusterGone(ctx, kubeconfig, name, timeout); err != nil { + return fmt.Errorf("wait ElasticCluster gone: %w", err) + } + logger.Success("ElasticCluster %s torn down", name) + return nil +}