Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions configs/kueue/docs/adrs/ADR-001-kueue-adoption.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# ADR-001: Kueue Adoption

- Status: Accepted
- Date: January 2025

## Context

CANFAR needs a Kubernetes-native way to control admission, queueing, quotas,
borrowing, reclaim, and visibility for a mix of interactive, persistent, and
batch science workloads. The platform must handle very large pending backlogs
without treating direct Kubernetes scheduling as the tenant policy layer.

## Decision

CANFAR adopts Kueue as the admission and quota orchestration layer for the
Science Platform. Kubernetes remains the runtime scheduler and execution plane.
`skaha` remains the main user submission entry point.

## Consequences

Kueue provides the needed queue, quota, priority, cohort, and visibility
primitives. It also creates a clean path to future topology-aware scheduling and
MultiKueue.

CANFAR must still solve identity, project mapping, and accounting outside
Kueue. Kueue is not the tenant system of record.

## Alternatives considered

- Continue with direct Kubernetes scheduling and custom ad hoc controls
- Build a custom scheduling layer or scheduler plugin stack
- Treat the backlog problem as only a `skaha` rate-limiting problem

These alternatives either move too much policy into custom code or fail to give
native cohort, quota, and admission control semantics.
36 changes: 36 additions & 0 deletions configs/kueue/docs/adrs/ADR-002-workload-apis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# ADR-002: Supported Workload APIs

- Status: Accepted
- Date: Spring 2025

## Context

The target architecture must support a broad workload taxonomy, but the current
repository baseline and the need for safe operational rollout make it unwise to
treat every Kueue integration production commitment.

## Decision

Production support centers on `batch/v1.Job`, including Indexed Job
usage patterns for large independent fan-out work. Protected interactive and
persistent workloads may be brought under Kueue using mature controller
patterns, but only where the team can verify the operational behavior.

`JobSet`, MPI, Ray, and other advanced or distributed controllers remain part of
the target taxonomy and future roadmap, not the initial production commitment.

## Consequences

The platform gets a safe first production lane for large-scale batch admission
without blocking future support for more advanced workload types.

The package still documents the full workload taxonomy so later phases do not
need to invent a new fairness or queue model.

## Alternatives considered

- Promise full support for every Kueue integration production commitment
- Delay all interactive or persistent integration until after batch-only rollout

The first option creates avoidable operational risk. The second option breaks
the desired unified scheduling model too early.
36 changes: 36 additions & 0 deletions configs/kueue/docs/adrs/ADR-003-shared-workloads-namespace.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# ADR-003: Shared `workloads` namespace now, namespace evolution later

- Status: Accepted
- Date: March 12, 2026

## Context

The current Kueue repository baseline uses multiple managed namespaces, but the
target architecture wants one shared Kueue-managed namespace at first so queue
governance, RBAC, and visibility can be kept simple while the new tenant model
is introduced.

## Decision

Use one shared `workloads` namespace for Kueue-managed user workloads in the
target single-cluster design. Create project-scoped `LocalQueue` objects in that
shared namespace on demand.

Future supported namespace layouts include one namespace per community, one namespace per workload class, or a
hybrid namespace layout.

## Consequences

This keeps the initial rollout simpler and reduces the number of moving parts
while project-based fairness and community ownership are introduced.

Future namespace splits remain possible without changing the core
community-project-cohort model.

## Alternatives considered

- Start immediately with one namespace per community
- Start immediately with one namespace per workload class

Both alternatives increase governance and visibility complexity too early for
production commitment.
40 changes: 40 additions & 0 deletions configs/kueue/docs/adrs/ADR-004-standalone-control-service.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# ADR-004: Standalone accounting and control service

- Status: Accepted
- Date: March 12, 2026

## Context

Kueue cannot serve as the system of record for communities, projects, POSIX
group mapping, delegated project administration, or accounting relationships.
Those concerns are fundamental to CANFAR's policy and visibility.

## Decision

Define a new standalone accounting and control service as a required future
dependency of the platform. The service remains out of scope for implementation
in this package, but it is in scope for architecture and requirements.

The service must support:

- community creation and management by cluster admins
- project creation inside a community by delegated project admins
- project-to-group mapping and later user or group resolution
- override request workflows for temporary fair-share changes
- exposure of tenant metadata to `skaha` and the future visibility UI

## Consequences

The scheduler design stays clean. Kueue owns admission and quota behavior while
the control service owns tenant and policy metadata.

The rollout now has an explicit dependency that must be addressed in later
phases rather than hidden behind manual configuration.

## Alternatives considered

- Extend an existing service implicitly without naming a new component
- Keep project metadata as static Kubernetes configuration only

Both alternatives hide ownership and make future admin workflows difficult to
design and operate.
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# ADR-005: Fairness, workload priority, and preemption model

- Status: Accepted
- Date: March 12, 2026

## Context

CANFAR needs fair competition between projects, community ownership of
resources, borrowing of idle capacity, and a workload-ordering model that keeps
interactive work ahead of batch work inside each project.

## Decision

Use the following split model:

- Community = `ClusterQueue`
- Project = `LocalQueue`
- Multiple communities sharing capacity = `Cohort`
- Project competition inside one community = Admission Fair Sharing with
adjustable `LocalQueue` weights
- Workload ordering inside one project = `WorkloadPriorityClass`

Use cohort borrowing and reclaim for community-level resource ownership. Use
project-local workload priority to select interactive work before lower-priority
batch work inside the chosen project queue.

## Consequences

This preserves community ownership while still maximizing idle cluster use. It
also avoids pretending that project fair-share and workload priority are the
same thing.

Cross-community competition remains community-scoped rather than global
project-scoped. That is intentional.

## Alternatives considered

- One global project fair-share plane across all communities
- Priority-only scheduling without project fair-share weights
- Community-only fairness with no project-level balancing

These alternatives either ignore community ownership or fail to give projects a
meaningful fairness model inside a community.
50 changes: 50 additions & 0 deletions configs/kueue/docs/adrs/ADR-006-posix-group-project-mapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# ADR-006: POSIX group to project mapping options

- Status: Proposed
- Date: March 12, 2026

## Context

Projects may contain multiple POSIX groups and communities may contain multiple
projects. The open question is whether a POSIX group may belong to more than one
project.

This decision changes the submission experience because ambiguous group mapping
may force the API layer to require an explicit project field.

## Options

### Option A: One group maps to exactly one project

Under this option, a POSIX group may not belong to multiple projects.

#### Benefits

- `Skaha` can often infer project and community from group context
- submission stays simpler for users
- visibility and accounting reasoning stay easier to explain

#### Costs

- the identity model is stricter
- some administrative use cases may need new group structures

### Option B: A group may map to multiple projects

Under this option, a POSIX group may belong to more than one project.

#### Benefits

- the identity model is more flexible
- administrators can reuse groups across projects

#### Costs

- the submission path must require explicit project selection in ambiguous cases
- user experience becomes more complex
- the control service and UI must explain ambiguity clearly

## Current direction

Leave the decision open. The architecture and UI must support both models until
the tenant administration workflow is finalized.
45 changes: 45 additions & 0 deletions configs/kueue/docs/adrs/ADR-007-resource-flavor-taxonomy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# ADR-007: ResourceFlavor taxonomy and topology model

- Status: Accepted
- Date: March 12, 2026

## Context

CANFAR needs a flavor model that captures resource identity across cluster,
zone, accelerator type, storage class, and later topology-aware scheduling
domains. The model must stay readable to operators and extensible to future
MultiKueue deployments.

## Decision

Use `ResourceFlavor` as the canonical scheduler-facing identity for placement and
hardware classes. Standardize flavor naming around stable placement and hardware
dimensions rather than workload class.

Adopt the following naming pattern:

`rf-<cluster>-<zone>-<resource-class>[-<accelerator-class>]`

Examples:

- `rf-ca-west-01-cpu-standard`
- `rf-ca-west-01-cpu-highmem`
- `rf-ca-west-01-gpu-a100`

Treat topology-aware scheduling as a future phase. When topology becomes active,
use `Topology` objects and flavor association rather than encoding full topology
hierarchy into the flavor name itself.

## Consequences

Operators get a stable taxonomy that works in both single-cluster and future
manager-worker designs. Users and admins can also read flavor identity in a
predictable way.

## Alternatives considered

- Opaque flavor names with documentation-only meaning
- One flavor per workload class
- Encoding every topology dimension directly in the flavor name

These alternatives either hide meaning or create unnecessary flavor sprawl.
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# ADR-008: Queue enforcement and managed namespace model

- Status: Accepted
- Date: March 12, 2026

## Context

Kueue policy only works predictably when managed workloads land in managed
namespaces and carry valid queue information. CANFAR requires users to
submit through `skaha`, not through raw Kubernetes APIs without platform policy.

## Decision

Use explicitly managed namespaces for Kueue-managed user work. In the target
state this is one shared `workloads` namespace. The submission path must resolve
and apply a `LocalQueue` explicitly.

Keep `manageJobsWithoutQueueName` disabled and reject malformed or unqueued
submissions in managed namespaces through admission policy and service-side
validation.

## Consequences

The scheduler does not need to guess tenant identity. Platform policy remains
explicit, and visibility stays consistent with actual queue assignment.

Future namespace evolution remains possible as long as the same enforcement
principles are preserved.

## Alternatives considered

- Allow silent default queue assignment everywhere
- Allow users to create unmanaged work in the same namespaces as managed work

These alternatives make fairness and explanation harder to trust.
39 changes: 39 additions & 0 deletions configs/kueue/docs/adrs/ADR-009-visibility-and-ui-scope.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# ADR-009: Visibility and UI scope

- Status: Accepted
- Date: March 12, 2026

## Context

Fair scheduling without understandable visibility will be perceived as arbitrary.
CANFAR's users, project admins, and cluster admins all need different levels of
insight into ownership, pending reasons, and current queue position.

## Decision

Treat visibility as a first-class architectural concern. Production commitment relies on
`kubectl`, Grafana, Kueue metrics, and the pending-workloads visibility API.
Later phases add a read-only queue UI and then guided admin workflows.

The UI must explain scheduling outcomes in terms of:

- fair-share position
- workload priority
- quota exhaustion
- insufficient resource availability
- policy rejection

## Consequences

The architecture gains a clear product surface instead of assuming that raw
conditions or controller logs are enough.

This also creates a requirement for the control service to expose tenant and
override metadata to the UI.

## Alternatives considered

- Delay visibility until after scheduling is complete
- Rely only on Kubernetes-native object inspection

These alternatives make correct policy look opaque to most users.
Loading
Loading