diff --git a/content/en/docs/plugins.md b/content/en/docs/plugins.md index cc8020ba..3f133ab2 100644 --- a/content/en/docs/plugins.md +++ b/content/en/docs/plugins.md @@ -15,7 +15,140 @@ linktitle = "Plugins" weight = 3 +++ +### Env +#### Introduction + +The Env plugin is a crucial component of Volcano Job, specifically designed for scenarios where Pods need to be aware of their index position within a task. When creating a Volcano Job, these indices are automatically registered as environment variables, enabling each Pod to understand its position within the task group. This is particularly important for distributed computing frameworks such as MPI, TensorFlow, and PyTorch, which require coordination among multiple nodes to complete computational tasks. + +#### Use Cases + +The Env plugin is particularly suitable for the following scenarios: + +1. **Distributed Machine Learning**: In distributed training with frameworks like TensorFlow and PyTorch, each worker node needs to know its role (such as parameter server or worker) and its index position within the work group. +2. **Data Parallel Processing**: When multiple Pods need to process different data shards, each Pod can obtain its index through environment variables to determine which data range it should process. +3. **MPI Parallel Computing**: In high-performance computing scenarios, MPI tasks require each process to know its rank for proper inter-process communication. + +#### Key Features + +- Automatically registers `VK_TASK_INDEX` and `VC_TASK_INDEX` environment variables for each Pod +- Index values range from 0 to the number of replicas minus 1, indicating the Pod's position in the task +- No additional configuration required; simply register the plugin in the Job definition +- Seamlessly integrates with other Volcano plugins (such as Gang, SVC, etc.) to enhance distributed task coordination capabilities + +#### Usage + +Adding the Env plugin to a Volcano Job definition is straightforward: + +```yaml +yamlspec: + plugins: + env: [] # Register the Env plugin, no values needed in the array +``` + +For more information about the Env plugin, please refer to the [Volcano Env Plugin Guide](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_env_plugin.md). + +### SSH + +#### Introduction + +The SSH plugin is designed to provide password-free login capabilities between Pods in a Volcano Job, which is essential for workloads like MPI. It is typically used in conjunction with the SVC plugin to enable efficient communication between nodes in a distributed computing environment. + +#### Use Cases + +The SSH plugin is particularly suitable for the following scenarios: + +1. **MPI Parallel Computing**: MPI frameworks require unobstructed communication between nodes, and password-free SSH login is a key part of their infrastructure. +2. **Distributed Machine Learning**: During distributed training, the master node may need to connect to worker nodes via SSH to execute commands or monitor status. +3. **Cluster Management**: When administrative operations need to be performed across multiple Pods in a job, password-free SSH simplifies the operational workflow. +4. **High-Performance Computing**: HPC workloads typically require efficient communication and coordination between nodes, which the SSH plugin provides. + +#### Key Features + +- Automatically configures password-free SSH login for all Pods in the Job +- Creates a Secret containing `authorized_keys`, `id_rsa`, `config`, and `id_rsa.pub` +- Mounts SSH configuration files to specified paths in all containers within the Job +- Provides a `/root/.ssh/config` file containing hostname and subdomain mappings for all Pods in the Job +- Supports customization of SSH keys and configuration paths + +#### Configuration Parameters + +| Parameter | Type | Default Value | Required | Description | +| ------------------- | ------ | ------------- | -------- | -------------------------------------------- | +| `ssh-key-file-path` | String | `/root/.ssh` | No | Path for storing SSH private and public keys | +| `ssh-private-key` | String | Default key | No | Input string for private key | +| `ssh-public-key` | String | Default key | No | Input string for public key | + +#### Usage + +Adding the SSH plugin to a Volcano Job definition is straightforward: + +```yaml +yamlspec: + plugins: + ssh: [] # Register the SSH plugin, no additional parameters needed in most cases + svc: [] # Typically used with the SVC plugin +``` + +#### Important Notes + +- If `ssh-key-file-path` is configured, ensure the target directory contains private and public keys. In most cases, it's recommended to keep the default value. +- If `ssh-private-key` or `ssh-public-key` is configured, ensure the values are correct. In most cases, it's recommended to use the default keys. +- Once the SSH plugin is configured, a Secret named "job-name-ssh" will be created containing the required SSH configuration files. +- Ensure the `sshd` service is available in all containers, otherwise the SSH login functionality will not work properly. + +For more information about the SSH plugin, please refer to the [Volcano SSH Plugin Guide](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_ssh_plugin.md). + +### SVC + +#### Introduction + +The SVC plugin is designed to provide communication capabilities between Pods in a Volcano Job, which is essential for workloads like TensorFlow and MPI. For example, TensorFlow jobs require communication between parameter servers (PS) and worker nodes. Volcano's SVC plugin enables Pods within a Job to access each other via domain names, greatly simplifying the deployment of distributed applications. + +#### Use Cases + +The SVC plugin is particularly suitable for the following scenarios: + +1. **Distributed Machine Learning**: Frameworks like TensorFlow and PyTorch require efficient communication between worker nodes and parameter servers. +2. **Big Data Processing**: Frameworks like Spark require communication between Drivers and Executors. +3. **High-Performance Computing**: Parallel computing frameworks like MPI require low-latency communication between nodes. +4. **Microservice Architecture**: When a job contains multiple interdependent service components. + +#### Key Features + +- Automatically sets `hostname` (Pod name) and `subdomain` (Job name) for all Pods +- Registers environment variables `VC_%s_NUM` (number of task replicas) and `VC_%s_HOSTS` (domain names of all Pods under the task) for all containers +- Creates a ConfigMap containing the number of all task replicas and Pod domain names, mounted to the `/etc/volcano/` directory +- Creates a headless service with the same name as the Job +- Optionally creates NetworkPolicy objects to control communication between Pods + +#### Configuration Parameters + +| Parameter | Type | Default | Description | +| ----------------------------- | ------- | ------- | -------------------------------------------------------- | +| `publish-not-ready-addresses` | Boolean | `false` | Whether to publish addresses when Pods are not ready | +| `disable-network-policy` | Boolean | `false` | Whether to disable creating network policies for the Job | + +#### Usage + +Adding the SVC plugin to a Volcano Job definition: + +```yaml +yamlspec: + plugins: + svc: [] # Use default configuration + # Or customize configuration + # svc: ["--publish-not-ready-addresses=true", "--disable-network-policy=true"] +``` + +#### Important Notes + +- Your Kubernetes cluster requires a DNS plugin (such as CoreDNS) +- Kubernetes version should be >= v1.14 +- Resources created by the SVC plugin (ConfigMap, Service, NetworkPolicy) are automatically managed with the Job +- Pod domain information can be accessed via environment variables or mounted configuration files + +For more information about the SVC plugin, please refer to the [Volcano SVC Plugin Guide](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_ssh_plugin.md). ### Gang @@ -174,3 +307,146 @@ The Numa-Aware Plugin aims to address these limitations. Common scenarios for NUMA-Aware are computation-intensive jobs that are sensitive to CPU parameters, scheduling delays. Such as scientific calculation, video decoding, animation rendering, big data offline processing and other specific scenes. + +### Capacity + +#### Introduction + +The Capacity plugin is responsible for managing queue resource quotas in the Volcano scheduler. It ensures that resources are allocated to various queues according to preset resource quotas and supports hierarchical queue structures. The main functions of the Capacity plugin include: tracking queue resource usage, ensuring queues do not exceed their resource limits, supporting resource preemption, and managing job enqueuing logic. + +The Capacity plugin achieves precise control over resource allocation by monitoring each queue's allocated resources, requested resources, guaranteed resources, and elastic resources. It also supports hierarchical queue structures, allowing administrators to create parent-child queue relationships for implementing more complex resource management strategies. + +#### Scenarios + +- Multi-tenant environments: In environments where multiple teams or departments share cluster resources, queue resource quotas limit resource usage by various tenants, ensuring fair resource distribution. +- Resource guarantee requirements: When critical business operations require resource guarantees, setting queue guarantee resources ensures these operations always receive the necessary resources. +- Hierarchical resource management: In large organizations, hierarchical queue structures implement multi-level resource management for departments, teams, and projects, where higher-level queues can control resource usage of lower-level queues. + +### CDP + +#### Introduction + +The CDP plugin is designed specifically for elastic scheduling scenarios in the Volcano scheduler. In elastic scheduling environments, preemptible pods may frequently switch between being preempted and resuming operation. Without a cooling protection mechanism, these pods might be preempted again shortly after starting, leading to decreased service stability. + +The CDP plugin provides a cooling time protection for pods, ensuring they won't be preempted for a certain period after entering the Running state, thereby improving service stability. This protection mechanism is particularly important for applications that require a certain startup time before providing stable services. + +##### Scenarios + +- Elastic training systems: In machine learning training tasks, model training pods need stable running time to learn effectively. CDP ensures these pods won't be immediately preempted after startup, improving training efficiency. +- Elastic service systems: For applications providing online services, pods typically need to initialize and warm up before providing normal service. CDP guarantees these service pods have sufficient time to complete initialization. +- Clusters with intense resource competition: In resource-constrained clusters, high-priority tasks may frequently preempt resources from low-priority tasks. CDP provides protection for low-priority tasks that still require stable running time. +- Stateful applications: For stateful applications, frequent preemption and recovery may lead to inconsistent states or data loss. CDP reduces the occurrence of such situations. +- Applications with long startup times: Some applications may have lengthy startup times; if frequently preempted during startup, they might never provide normal service. CDP ensures these applications have at least one complete startup cycle. + +### Conformance + +#### Introduction + +The Conformance plugin is a safety plugin in the Volcano scheduler designed to protect critical Kubernetes system pods from preemption or reclamation. This plugin ensures the stable operation of system-critical components, preventing scheduling decisions from affecting the core functionality of the cluster. + +The Conformance plugin identifies critical pods by recognizing specific priority class names and namespaces. It filters out pods with system-level priorities or running in system namespaces, preventing these pods from becoming targets for preemption or resource reclamation. + +#### Scenarios + +- System component protection: Ensures that core Kubernetes components running in the kube-system namespace (such as kube-apiserver, kube-scheduler, kube-controller-manager, etc.) are not preempted due to user workload scheduling requirements. +- Cluster stability assurance: By preventing critical pods from being preempted, maintains the basic functionality and stability of the cluster, ensuring cluster management functions operate normally even under resource constraints. + +### DeviceShare + +#### Introduction + +The DeviceShare plugin is a component in the Volcano scheduler specifically designed for managing and scheduling shared device resources, particularly high-value computing resources like GPUs. This plugin supports various device sharing modes, including GPU sharing (GPUShare) and virtual GPU (VGPU), enabling clusters to utilize limited device resources more efficiently. + +Through fine-grained device resource allocation mechanisms, the DeviceShare plugin allows multiple tasks to share the same physical device, thereby improving device utilization and cluster throughput. It provides device resource predicate and score functions to ensure tasks are scheduled to appropriate nodes, while also supporting node locking functionality to prevent issues caused by resource contention. + +#### Scenarios + +- GPU sharing environments: In machine learning and deep learning workloads, many tasks may only require partial GPU resources. Through GPU sharing, multiple tasks can share the same physical GPU, improving resource utilization. +- Mixed workloads: In clusters running both compute-intensive and non-compute-intensive tasks, DeviceShare helps allocate GPU resources more rationally, ensuring resources aren't wasted. +- Virtual GPU applications: For environments supporting virtual GPU technology, DeviceShare provides VGPU scheduling support, enabling effective management and allocation of virtualized GPU resources. + +### Extender + +#### Introduction + +The Extender plugin is an extension mechanism for the Volcano scheduler that allows users to integrate custom scheduling logic into the Volcano scheduling system through HTTP interfaces. This plugin delegates part or all of the scheduling decision process to external systems through HTTP calls, enabling the Volcano scheduler to support more complex, domain-specific scheduling requirements. + +The Extender plugin supports extensions for various scheduling phases, including session opening/closing, node predicate, node prioritization, task preemption, resource reclamation, queue overuse checking, and job enqueuing checking. Users can implement one or more of these interfaces as needed to customize scheduling behavior. + +#### Scenarios + +- Domain-specific scheduling requirements: When the standard Volcano scheduler cannot meet complex scheduling requirements in specific domains (such as HPC, AI training, etc.), the Extender plugin can integrate specialized scheduling logic. +- External system integration: For existing scheduling systems or resource management systems, the Extender plugin enables smooth integration with Volcano. + +### NodeGroup + +#### Introduction + +The NodeGroup plugin is a component in the Volcano scheduler used to manage node group affinity and anti-affinity. This plugin allows users to control workload distribution based on relationships between queues and node groups, providing a higher-level resource allocation and isolation mechanism. Through the NodeGroup plugin, users can define affinity and anti-affinity rules between queues and specific node groups, which can be either required (hard) or preferred (soft) requirements. + +The NodeGroup plugin identifies the node group to which nodes belong through a labeling mechanism and performs node predicate and scoring during scheduling based on queue affinity configurations. This allows administrators to more finely control how workloads from different queues are distributed across the cluster. + +#### Scenarios + +- Resource isolation: In multi-tenant environments, workloads from different tenants can be restricted to specific node groups, avoiding resource interference and improving security and performance stability. +- Hardware affinity: When clusters contain nodes with different hardware configurations (such as GPU nodes, high-memory nodes, etc.), NodeGroup can guide specific types of workloads to appropriate hardware nodes. +- Failure domain isolation: By distributing workloads across different node groups, the impact range of single-point failures can be reduced, improving system availability. +- Progressive upgrades: During cluster upgrades, NodeGroup can control workload distribution between new and old node groups, enabling smooth transitions. + +### Overcommit + +#### Introduction + +The Overcommit plugin is a component in the Volcano scheduler used to implement resource overcommitting. This plugin allows clusters to accept more job enqueuing requests even when physical resources are insufficient by setting an overcommit factor, thereby improving cluster resource utilization and job throughput. + +The Overcommit plugin determines whether new job requests can be enqueued by calculating the cluster's total resources, used resources, and resource requirements of already enqueued jobs, combined with the overcommit factor. The overcommit factor defines the proportion by which a cluster can exceed its physical resource capacity, with a default value of 1.2, indicating that the cluster can accept resource requests exceeding its actual capacity by 20%. + +#### Scenarios + +- Resource utilization optimization: In practice, many applications' resource requests often exceed their actual usage. Through resource overcommitting, more jobs can be accepted, improving overall cluster resource utilization. +- Elastic workload environments: For workloads with fluctuating resource demands, the overcommitting mechanism can temporarily accept more jobs during peak resource demand periods, enhancing system elasticity and responsiveness. +- Batch processing job clusters: In environments dominated by batch processing jobs, resource usage typically doesn't reach peak levels simultaneously. Overcommitting can increase cluster job throughput and reduce job waiting times. + +### PDB + +#### Introduction + +PDB is a plugin in the Volcano scheduler used to protect application availability. This plugin ensures that during resource reclamation and preemption processes, the application availability constraints defined by Kubernetes PodDisruptionBudget resource objects are respected, preventing service interruptions due to scheduling decisions. + +By integrating with Kubernetes PodDisruptionBudget resources, the PDB plugin checks whether each potential victim would violate PDB constraints when selecting victim tasks. If removing a pod would cause the number of application instances to fall below the minimum available instances defined by the PDB, that pod will not be selected as a victim, thereby protecting application availability. + +#### Scenarios + +- **High-availability service protection**: For online services requiring high availability (such as web services, database services, etc.), the PDB plugin ensures that during resource reclamation and preemption, the number of available service instances doesn't fall below the preset threshold, avoiding service interruptions. +- **Stateful application management**: For stateful applications (such as distributed databases, message queues, etc.), the PDB plugin prevents too many instances from being evicted simultaneously, reducing pressure on data replication and synchronization, and maintaining system stability. + +### Rescheduling + +#### Introduction + +The Rescheduling plugin is a component in the Volcano scheduler used to optimize cluster resource utilization. This plugin periodically evaluates cluster status, identifies resource allocation imbalances, and proactively triggers task rescheduling to achieve better resource distribution and utilization. + +The Rescheduling plugin supports multiple rescheduling strategies, with the default "lowNodeUtilization" strategy focusing on identifying low-utilization nodes and migrating tasks from low-utilization nodes to higher-utilization nodes, thereby improving overall cluster efficiency. The plugin performs rescheduling evaluations at configurable time intervals (default is 5 minutes) to ensure continuous optimization of cluster resource allocation. + +#### Scenarios + +- Resource utilization optimization: For long-running clusters, resource allocation may become imbalanced over time. The Rescheduling plugin can periodically rebalance resource allocation, improving overall utilization. +- Node resource fragment consolidation: When multiple low-utilization nodes exist in a cluster, Rescheduling can consolidate resource fragments through task migration, freeing up complete nodes for large tasks or node maintenance. +- Periodic maintenance: As part of cluster periodic maintenance procedures, Rescheduling can optimize resource allocation during off-peak periods in preparation for peak periods. +- Post-elastic scaling optimization: After cluster auto-scaling, resource allocation may not be optimal. Rescheduling can re-optimize task distribution after scaling operations. + +### ResourceQuota + +#### Introduction + +The ResourceQuota plugin is a component in the Volcano scheduler used to implement namespace resource quota control. This plugin ensures that jobs comply with namespace resource limitations defined by Kubernetes ResourceQuota resource objects when enqueuing, preventing individual namespaces from consuming excessive cluster resources. + +The ResourceQuota plugin determines whether a job can be enqueued by checking the job's minimum resource requirements (MinResources) against the namespace's resource quota status. When a job's resource requirements plus the namespace's already used resources exceed quota limits, the job will be rejected from enqueuing, and corresponding event information will be recorded. The plugin also maintains a tracking mechanism for pending resource usage, ensuring that multiple jobs' resource requirements within the same scheduling cycle don't exceed namespace quotas. + +#### Scenarios + +The ResourceQuota plugin is applicable to the following scenarios: + +- Multi-tenant environments: In environments where multiple teams or projects share the same cluster, the ResourceQuota plugin ensures each tenant can only use resources allocated to their namespace, preventing resource contention and "noisy neighbor" problems. +- Resource allocation management: Administrators can implement reasonable allocation and fine-grained management of cluster resources by setting different namespace resource quotas, ensuring important business operations receive sufficient resources. +- Prevention of resource abuse: The ResourceQuota plugin can prevent excessive resource requests due to program errors or malicious behavior, protecting cluster stability. diff --git a/content/zh/docs/plugins.md b/content/zh/docs/plugins.md index 0d016245..f54a39f0 100644 --- a/content/zh/docs/plugins.md +++ b/content/zh/docs/plugins.md @@ -15,6 +15,141 @@ linktitle = "Plugins" weight = 3 +++ +### Env + +#### 简介 + +Env 插件是 Volcano Job 的一个重要组件,专为需要 Pod 感知其在任务中索引位置的业务场景设计。当创建 Volcano Job 时,这些索引会自动注册为环境变量,使得每个 Pod 能够了解自己在任务组中的位置。这对于分布式计算框架(如 MPI、TensorFlow、PyTorch 等)尤为重要,因为它们需要协调多个节点共同完成计算任务。 + +#### 场景 + +Env 插件特别适用于以下场景: + +1. **分布式机器学习**:在 TensorFlow、PyTorch 等框架的分布式训练中,每个工作节点需要知道自己的角色(如参数服务器或工作节点)以及在工作组中的索引位置。 +2. **数据并行处理**:当多个 Pod 需要处理不同数据分片时,每个 Pod 可以通过环境变量获取自己的索引,从而确定应处理的数据范围。 +3. **MPI 并行计算**:在高性能计算场景中,MPI 任务需要每个进程了解自己的 rank,以便正确地进行进程间通信。 + +#### 关键特性 + +- 自动为每个 Pod 注册 `VK_TASK_INDEX` 和 `VC_TASK_INDEX` 环境变量 +- 索引值范围从 0 到副本数量减 1,表示 Pod 在任务中的位置 +- 无需额外配置,只需在 Job 定义中注册插件即可使用 +- 与其他 Volcano 插件(如 Gang、SVC 等)完美配合,增强分布式任务的协调能力 + +#### 使用方法 + +在 Volcano Job 定义中添加 Env 插件非常简单: + +```yaml +yamlspec: + plugins: + env: [] # 注册 Env 插件,数组中不需要任何值 +``` + +如需了解更多关于 Env 插件的信息,请参考[Volcano Env 插件指南](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_env_plugin.md) 获取更多信息。 + +### SSH + +#### 简介 + +SSH 插件是为 Volcano Job 中的 Pod 之间提供免密登录功能而设计的,这对于像 MPI 这样的工作负载来说是必不可少的。它通常与 SVC 插件一起使用,以实现分布式计算环境中节点间的高效通信。 + +#### 应用场景 + +SSH 插件特别适用于以下场景: + +1. **MPI 并行计算**:MPI 框架需要各节点间能够无障碍通信,免密 SSH 登录是其基础设施的关键部分。 +2. **分布式机器学习**:在分布式训练过程中,主节点可能需要通过 SSH 连接到工作节点执行命令或监控状态。 +3. **集群管理**:当需要在作业的多个 Pod 之间执行管理操作时,免密 SSH 可以简化操作流程。 +4. **高性能计算**:HPC 工作负载通常需要节点间的高效通信和协调,SSH 插件提供了这种能力。 + +#### 关键特性 + +- 自动为 Job 中的所有 Pod 配置 SSH 免密登录 +- 创建包含 `authorized_keys`、`id_rsa`、`config` 和 `id_rsa.pub` 的 Secret +- 将 SSH 配置文件挂载到 Job 中所有容器的指定路径 +- 提供 `/root/.ssh/config` 文件,包含 Job 中所有 Pod 的主机名和子域名对应关系 +- 支持自定义 SSH 密钥和配置路径 + +#### 配置参数 + +| 参数 | 类型 | 默认值 | 必填 | 描述 | +| ------------------- | ------ | ------------ | ---- | ----------------------------- | +| `ssh-key-file-path` | 字符串 | `/root/.ssh` | 否 | 用于存储 SSH 私钥和公钥的路径 | +| `ssh-private-key` | 字符串 | 默认私钥 | 否 | 私钥的输入字符串 | +| `ssh-public-key` | 字符串 | 默认公钥 | 否 | 公钥的输入字符串 | + +#### 使用方法 + +在 Volcano Job 定义中添加 SSH 插件非常简单: + +```yaml +yamlspec: + plugins: + ssh: [] # 注册 SSH 插件,大多数情况下不需要额外参数 + svc: [] # 通常与 SVC 插件一起使用 +``` + +#### 注意事项 + +- 如果配置了 `ssh-key-file-path`,请确保目标目录下存在私钥和公钥。大多数情况下建议保持默认值。 +- 如果配置了 `ssh-private-key` 或 `ssh-public-key`,请确保值正确。大多数情况下建议使用默认密钥。 +- 一旦配置了 SSH 插件,将创建一个名称为 "作业名-ssh" 的 Secret,其中包含所需的 SSH 配置文件。 +- 请确保所有容器中都可用 `sshd` 服务,否则 SSH 登录功能将无法正常工作。 + +如需了解更多关于 SSH 插件的信息,请参考[Volcano SVC 插件指南](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_ssh_plugin.md) 获取更多信息。 + +### SVC + +#### 简介 + +SVC 插件是为 Volcano Job 中的 Pod 之间提供通信能力而设计的,这对于像 TensorFlow 和 MPI 这样的工作负载来说是必不可少的。例如,TensorFlow 作业需要在参数服务器(PS)和工作节点(Worker)之间进行通信。Volcano 的 SVC 插件使 Job 中的 Pod 能够通过域名相互访问,大大简化了分布式应用的部署。 + +#### 应用场景 + +SVC 插件特别适用于以下场景: + +1. **分布式机器学习**:TensorFlow、PyTorch 等框架需要工作节点和参数服务器之间的高效通信。 +2. **大数据处理**:Spark 等框架中的 Driver 和 Executor 需要相互通信。 +3. **高性能计算**:MPI 等并行计算框架需要节点间的低延迟通信。 +4. **微服务架构**:当一个作业包含多个相互依赖的服务组件时。 + +#### 关键特性 + +- 自动为所有 Pod 设置 `hostname`(Pod 名称)和 `subdomain`(Job 名称) +- 为所有容器注册环境变量 `VC_%s_NUM`(任务副本数)和 `VC_%s_HOSTS`(任务下所有 Pod 的域名) +- 创建包含所有任务副本数和 Pod 域名的 ConfigMap,并挂载到 `/etc/volcano/` 目录 +- 创建与 Job 同名的无头服务(Headless Service) +- 可选择性地创建 NetworkPolicy 对象以控制 Pod 间通信 + +#### 配置参数 + +| 参数 | 类型 | 默认值 | 描述 | +| ----------------------------- | ------ | ------- | ----------------------------- | +| `publish-not-ready-addresses` | 布尔值 | `false` | 是否在 Pod 未就绪时发布其地址 | +| `disable-network-policy` | 布尔值 | `false` | 是否禁用为 Job 创建网络策略 | + +#### 使用方法 + +在 Volcano Job 定义中添加 SVC 插件: + +```yaml +yamlspec: + plugins: + svc: [] # 使用默认配置 + # 或者自定义配置 + # svc: ["--publish-not-ready-addresses=true", "--disable-network-policy=true"] +``` + +#### 注意事项 + +- 您的 Kubernetes 集群需要 DNS 插件(如 CoreDNS) +- Kubernetes 版本需要 >= v1.14 +- SVC 插件创建的资源(ConfigMap、Service、NetworkPolicy)会随 Job 一起被自动管理 +- 可以通过环境变量或挂载的配置文件访问 Pod 域名信息 + +如需了解更多关于 SVC 插件的信息,请参考[Volcano SVC 插件指南](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_ssh_plugin.md) 获取更多信息。 + ### Gang {{
}} @@ -152,7 +287,7 @@ Tdm的全称是Time Division Multiplexing。在一些场景中,一些节点既 -#### Numa-aware +### Numa-aware #### 简介 @@ -172,5 +307,145 @@ Numa-aware plugin致力于解决如上局限。 Numa-aware的常见场景是那些对cpu参数敏感\调度延迟敏感的计算密集型作业。如科学计算、视频解码、动漫动画渲染、大数据离线处理等具体场景。 +### Capacity + +#### 简介 + +Capacity插件是Volcano调度器中负责管理队列资源配额的插件。它确保各个队列根据预设的资源配额进行资源分配,并支持层级队列结构。Capacity插件主要功能包括:跟踪队列的资源使用情况、确保队列不超过其资源上限、支持资源抢占以及管理作业入队逻辑。 + +Capacity插件通过监控每个队列的已分配资源、请求资源、保障资源和弹性资源,实现对资源分配的精确控制。它还支持层级队列结构,允许管理员创建父子队列关系,实现更复杂的资源管理策略。 + +#### 场景 + +- 多租户环境:在多个团队或部门共享集群资源的环境中,通过队列资源配额限制各租户资源使用,确保资源公平分配。 +- 资源保障需求:当某些关键业务需要资源保障时,可以通过设置队列的guarantee资源确保这些业务始终能获得所需资源。 +- 层级资源管理:在大型组织中,通过层级队列结构实现部门、团队、项目的多级资源管理,上级队列可以控制下级队列的资源使用。 + +### CDP + +#### 简介 + +CDP插件是Volcano调度器中专为弹性调度场景设计的。在弹性调度环境中,可抢占任务的Pod可能会频繁地在被抢占和恢复运行之间切换,如果没有冷却保护机制,这些Pod可能在刚刚启动运行后很短时间内就再次被抢占,这会导致服务稳定性下降。 + +CDP插件通过为Pod提供冷却时间保护,确保Pod在进入Running状态后的一段时间内不会被抢占,从而提高服务的稳定性。这种保护机制对于需要一定启动时间才能提供稳定服务的应用尤为重要。 + +#### 场景 + +- 弹性训练系统:在机器学习训练任务中,模型训练Pod需要一定的稳定运行时间才能有效学习。CDP可以确保这些Pod在刚启动后不会立即被抢占,提高训练效率。 +- 弹性服务系统:对于提供在线服务的应用,Pod启动后通常需要进行初始化、预热等操作才能正常提供服务。CDP可以保证这些服务Pod有足够的时间完成初始化。 +- 资源争抢激烈的集群:在资源紧张的集群中,高优先级任务可能频繁抢占低优先级任务的资源。CDP可以为低优先级但仍需要一定稳定运行时间的任务提供保护。 +- 有状态应用:对于有状态应用,频繁的抢占和恢复可能导致状态不一致或数据丢失。CDP可以减少这种情况的发生。 +- 启动时间较长的应用:某些应用可能有较长的启动时间,如果在启动过程中被频繁抢占,可能永远无法正常提供服务。CDP可以确保这些应用至少有一个完整的启动周期。 + +### Conformance + +#### 简介 + +Conformance插件是Volcano调度器中的一个安全插件,旨在保护Kubernetes系统中的关键Pod不被抢占或回收。该插件确保系统关键组件的稳定运行,防止调度决策影响集群的核心功能。 + +Conformance插件通过识别特定的优先级类名(PriorityClassName)和命名空间来判断Pod是否为关键Pod。它会过滤掉具有系统级别优先级或运行在系统命名空间中的Pod,使这些Pod不会成为抢占或资源回收的目标。 + +#### 场景 + +- 系统组件保护:确保kube-system命名空间中运行的核心Kubernetes组件(如kube-apiserver、kube-scheduler、kube-controller-manager等)不会因为用户工作负载的调度需求而被抢占。 +- 保障集群稳定性保障:通过防止关键Pod被抢占,维护集群的基本功能和稳定性,即使在资源紧张的情况下也能保证集群管理功能正常运行。 + +### DeviceShare + +#### 简介 + +DeviceShare插件是Volcano调度器中专门用于管理和调度共享设备资源的组件,特别是针对GPU等高价值计算资源。该插件支持多种设备共享模式,包括GPU共享(GPUShare)和虚拟GPU(VGPU),使集群能够更高效地利用有限的设备资源。 + +DeviceShare插件通过细粒度的设备资源分配机制,允许多个任务共享同一个物理设备,从而提高设备利用率和集群吞吐量。它提供了设备资源的预选(Predicate)和优选(Score)功能,确保任务被调度到合适的节点上,同时支持节点锁定功能,防止资源争用导致的问题。 + +#### 场景 + +- GPU共享环境:在机器学习和深度学习工作负载中,许多任务可能只需要部分GPU资源。通过GPU共享,多个任务可以共享同一个物理GPU,提高资源利用率。 +- 混合工作负载:在同时运行计算密集型和非计算密集型任务的集群中,DeviceShare可以帮助更合理地分配GPU资源,确保资源不被浪费。 +- 虚拟GPU应用:对于支持虚拟GPU技术的环境,DeviceShare提供了VGPU调度支持,使虚拟化GPU资源能够被有效管理和分配。 + +### Extender + +#### 简介 + +Extender插件是Volcano调度器的扩展机制,允许用户通过HTTP接口将自定义的调度逻辑集成到Volcano调度系统中。该插件通过HTTP调用外部服务,将调度决策的部分或全部环节委托给外部系统处理,使Volcano调度器能够支持更复杂、更特定领域的调度需求。 + +Extender插件支持多种调度阶段的扩展,包括会话开启/关闭、节点预选(Predicate)、节点优选(Prioritize)、任务抢占(Preemptable)、资源回收(Reclaimable)、队列过载检查(QueueOverused)以及作业入队检查(JobEnqueueable)等。用户可以根据需要实现这些接口中的一个或多个,以定制化调度行为。 + +#### 场景 + +- 特定领域的调度需求:当标准Volcano调度器无法满足特定领域(如HPC、AI训练等)的复杂调度需求时,可以通过Extender插件集成专门的调度逻辑。 +- 外部系统集成:对于已有的调度系统或资源管理系统,可以通过Extender插件将其与Volcano集成,实现平滑过渡。 + +### NodeGroup + +#### 简介 + +NodeGroup插件是Volcano调度器中用于管理节点组亲和性和反亲和性的组件。该插件允许用户基于队列和节点组之间的关系来控制工作负载的分布,提供了一种更高级别的资源分配和隔离机制。通过NodeGroup插件,用户可以定义队列与特定节点组之间的亲和性(Affinity)和反亲和性(Anti-Affinity)规则,这些规则可以是硬性要求(Required)或软性偏好(Preferred)。 + +NodeGroup插件通过标签机制识别节点所属的节点组,并根据队列的亲和性配置,在调度过程中进行节点预选(Predicate)和优选(Score)。这使得管理员可以更精细地控制不同队列的工作负载在集群中的分布方式。 + +#### 场景 + +- 资源隔离:在多租户环境中,可以将不同租户的工作负载限制在特定的节点组上,避免资源干扰,提高安全性和性能稳定性。 +- 硬件亲和性:当集群中存在不同硬件配置的节点时(如GPU节点、高内存节点等),可以通过NodeGroup将特定类型的工作负载引导到合适的硬件节点上。 +- 故障域隔离:通过将工作负载分散到不同的节点组,可以减少单点故障的影响范围,提高系统的可用性。 +- 渐进式升级:在集群升级过程中,可以使用NodeGroup控制工作负载在新旧节点组之间的分布,实现平滑过渡。 + +### Overcommit + +#### 简介 + +Overcommit插件是Volcano调度器中用于实现资源超售(Resource Overcommitting)的插件。该插件允许集群在物理资源不足的情况下,通过设置超售因子(Overcommit Factor)来接受更多的作业入队请求,从而提高集群的资源利用率和作业吞吐量。 + +Overcommit插件通过计算集群的总资源、已使用资源和已入队作业的资源需求,结合超售因子,来决定新的作业请求是否可以入队。超售因子定义了集群可以超出其物理资源容量的比例,默认值为1.2,表示集群可以接受超出其实际容量20%的资源请求。 + +#### 场景 + +- 资源利用率优化:在实际运行中,许多应用程序的资源请求往往高于其实际使用量。通过资源超售,可以接受更多的作业,提高集群的整体资源利用率。 +- 弹性工作负载环境:对于具有波动性资源需求的工作负载,超售机制可以在资源需求高峰期临时接受更多作业,提高系统的弹性和响应能力。 +- 批处理作业集群:在批处理作业为主的环境中,作业的资源使用通常不会同时达到峰值。通过超售,可以增加集群的作业吞吐量,减少作业等待时间。 + +### PDB + +#### 简介 + +PDB是Volcano调度器中用于保护应用可用性的插件。该插件确保在资源回收和抢占过程中,遵守Kubernetes的PodDisruptionBudget资源对象定义的应用可用性约束,防止因调度决策导致的服务中断。 + +PDB插件通过与Kubernetes的PodDisruptionBudget资源集成,在选择牺牲者(victims)任务时,会检查每个潜在的牺牲者是否会违反PDB约束。如果移除某个Pod会导致应用实例数低于PDB定义的最小可用实例数,那么该Pod将不会被选为牺牲者,从而保护应用的可用性。 + +#### 场景 + +- **高可用服务保护**:对于需要保持高可用性的在线服务(如Web服务、数据库服务等),PDB插件可以确保在资源回收和抢占过程中,服务的可用实例数不会低于预设的阈值,避免服务中断。 +- **有状态应用管理**:对于有状态应用(如分布式数据库、消息队列等),PDB插件可以防止过多的实例同时被驱逐,减少数据复制和同步的压力,保持系统稳定性。 + +### Rescheduling + +#### 简介 + +Rescheduling插件是Volcano调度器中用于优化集群资源利用率的插件。该插件通过周期性地评估集群状态,识别资源分配不均衡的情况,并主动触发任务重调度,以实现更优的资源分布和利用率。 + +Rescheduling插件支持多种重调度策略,默认使用"lowNodeUtilization"策略,该策略专注于识别利用率低的节点,并将任务从利用率低的节点迁移到利用率更高的节点,从而提高整体集群效率。插件通过可配置的时间间隔(默认为5分钟)周期性地执行重调度评估,确保集群资源分配持续优化。 + +#### 场景 + +- 资源利用率优化:对于长时间运行的集群,资源分配可能随着时间变得不均衡。Rescheduling插件可以定期重新平衡资源分配,提高整体利用率。 +- 节点资源碎片整合:当集群中存在多个低利用率节点时,Rescheduling可以通过任务迁移,将资源碎片整合,释放完整节点用于大型任务或节点维护。 +- 定期维护:作为集群定期维护流程的一部分,Rescheduling可以在低峰期优化资源分配,为高峰期做准备。 +- 弹性伸缩后优化:在集群进行自动伸缩后,资源分配可能不是最优的。Rescheduling可以在伸缩操作后重新优化任务分布。 + +### ResourceQuota + +#### 简介 + +ResourceQuota插件是Volcano调度器中用于实现命名空间资源配额控制的插件。该插件确保作业在入队时遵守Kubernetes ResourceQuota资源对象定义的命名空间资源限制,防止单个命名空间过度消耗集群资源。 + +ResourceQuota插件通过检查作业的最小资源需求(MinResources)与命名空间的资源配额状态,判断作业是否可以入队。当作业的资源需求加上命名空间已使用的资源超过配额限制时,作业将被拒绝入队,并记录相应的事件信息。插件还维护了一个待处理资源使用量的跟踪机制,确保在同一调度周期内多个作业的资源需求不会超过命名空间配额。 + +#### 场景 +ResourceQuota插件适用于以下场景: +- 多租户环境:在多个团队或项目共享同一集群的环境中,ResourceQuota插件可以确保每个租户只能使用分配给其命名空间的资源量,防止资源争用和"邻居噪音"问题。 +- 资源分配管理:管理员可以通过设置不同命名空间的资源配额,实现集群资源的合理分配和精细化管理,确保重要业务获得足够资源。 +- 防止资源滥用:ResourceQuota插件可以防止因程序错误或恶意行为导致的资源过度申请,保护集群稳定性。