Skip to content

Latest commit

 

History

History
1350 lines (1155 loc) · 36.3 KB

File metadata and controls

1350 lines (1155 loc) · 36.3 KB

Kubernetes

Architecture

Master Node - Manage, Plan, Schedule, Monitor

ETCD

ETCD basics

Q: What is ETCD? A: ETCD is a distributed reliable key-value store that is simple, secure and fast Q: What is a Key-Value Store? A:

KeyValue
NameJohn Doe
Age45
LocationNew York
Salary5000
NameAgeLocation
John Doe45New York
Dave Smith33New York
Aryan Kumar10New York
Lauren Rob13Banaglore
Lily Oliver15Bangalore

Put Name “John Doe” Get Name “John Doe”

Q: How to get started quickly Install ETCD: Download, extract, and run ./etcd ETCD listen by default on 2379 Default client is etcdctl Q: How to operate ETCD?

./etcdctl set key1 value1
./etcdctl get key1

ETCD in kubernetes

etcd store information about: Nodes, Pods, Configs, Secrets, Accounts, Roles, Bindings, Others

Setup in k8s - manual
wget -q --https-only |
    "https://github.com/coreos/etcd/releases/download/v3.3.9/etcd-v3.3.9-linux-amd64.tar.gz"
ExecStart=/usr/local/bin/etcd \\
--name ${ETCD_NAME} \\
--cert-file=/etc/etcd/kubernetes.pem \\
--key-file=/etc/etcd/kubernetes-key.pem \\
--peer-cert-file=/etc/etcd/kubernetes.pem \\
--peer-key-file/etc/etcd/kubernetes-key.pem \\
--trusted-ca-file=/etc/etcd/ca.pem \\
--peer-trusted-ca-file=/etc/etcd/ca.pem \\
--per-client-cert-auth \\
--client-cert-auth \\
--initial-advertise-peer-urls https://${INTERNAL_IP}:2380 \\
--listen-peer-urls https://${INTERNAL_IP}:2380 \\
--listen-client-urls https://${INTERNAL_IP}:2379,https://127.0.0.1:2379 \\
--advertise-client-urls https://${INTERNAL_IP}:2379 \\
--initial-cluster-token etcd-cluster-0 \\
--initial-cluster controller-0=https://${CONTROLLER0_IP}:2380,controller-1=https://${CONTROLLER1_IP}:2380 \\
--initial-cluster-state new \\
--data-dir=/var/lib/etcd
Setup in k8s- kubeadm

Automatic instalation

kubectl get pods -n kube-system
kubectl exec etcd-kind-control-plane -n kube-system -- etcdctl get / --prefix --keys-only
kubectl exec etcd-kind-control-plane -n kube-system -- sh -c "ETCDCTL_API=3 etcdctl get / --prefix --keys-only --limit=10 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt  --key /etc/kubernetes/pki/etcd/server.key"
ETCD in HA Environment

3 instancje –initial-cluster controller-0=https://${CONTROLLER0_IP}:2380,controller-1=https://${CONTROLLER1_IP}:2380

ETCD versions(Optional) Additional information about ETCDCTL Utility

ETCDCTL is the CLI tool used to interact with ETCD.

ETCDCTL can interact with ETCD Server using 2 API versions - Version 2 and Version 3. By default its set to use Version 2. Each version has different sets of commands.

For example ETCDCTL version 2 supports the following commands:

etcdctl backup etcdctl cluster-health etcdctl mk etcdctl mkdir etcdctl set

Whereas the commands are different in version 3

etcdctl snapshot save etcdctl endpoint health etcdctl get etcdctl put

To set the right version of API set the environment variable ETCDCTL_API command

export ETCDCTL_API=3

When API version is not set, it is assumed to be set to version 2. And version 3 commands listed above don’t work. When API version is set to version 3, version 2 commands listed above don’t work.

Apart from that, you must also specify path to certificate files so that ETCDCTL can authenticate to the ETCD API Server. The certificate files are available in the etcd-master at the following path. We discuss more about certificates in the security section of this course. So don’t worry if this looks complex:

–cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/server.crt –key /etc/kubernetes/pki/etcd/server.key

So for the commands I showed in the previous video to work you must specify the ETCDCTL API version and path to certificate files. Below is the final form:

kubectl exec etcd-master -n kube-system – sh -c “ETCDCTL_API=3 etcdctl get / –prefix –keys-only –limit=10 –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/server.crt –key /etc/kubernetes/pki/etcd/server.key”

kube-scheduler

Decide which pod go on which node

  1. Filter Nodes based on cpu and memory requirements
  2. Rank Nodes (more resources, better node)

View kube-scheduler options- kubeadm

cat /etc/kubernetes/manifests/kube-scheduler.yaml

View kube-scheduler options - service

ps -aux | grep kube-scheduler

Kube-Controller-Manager

Watch status and remediate situation

Node-Controller

Node-Controller via kube-apiserver looks into worker nodes for statuses Node Monitor Period = 5s Node Monitor Grace Period = 40s (after this time, node will be signed as unreachable) POD Eviction Timeout = 5m ()

Replication-Controller

is about replica sets

View kube-controller-manager options - kubeadm

cat /etc/kubernetes/manifests/kube-controller-manager.yaml

View kube-controller-manager options - service

cat /etc/systemd/system/kube-controller-manager.service ps -aux | grep kube-controller-manager

kube-apiserver

Creating a pod

  1. Autheticate User
  2. Validate Request
  3. Retrieve data
  4. Update ETCD
  5. Scheduler
  6. Kublet

kubectl vs curl -X POST /api/v1/namespaces/default/pods…[other]

Building kube-api server in the hard way

kube-apiserver.service

View api-server options - kubeadm

kubectl get pods -n kube-system
kubectl exec kube-apiserver-kind-control-plane -n kube-system -- cat /etc/kubernetes/manifests/kube-apiserver.yaml

Worker Node - Host Application as Containers

Container-runtime-engine(rkt, containerd, Docker)

Kubelet

  1. Register Node
  2. Create Pods
  3. Monitor node and pods

View kublet optios

ps -aux | grep kubelet

Kube-proxy

Run on each node. When service is created, kube-proxy will be updated.

Install kube-proxy and run it as a service. kube-proxy.service ExecStart=/usr/local/bin/kube-proxy
–config=/var/lib/kube-proxy/kube-proxy-config.yaml Restart=on-failure RestartSec=5

Kube proxy can be deployed as deamonset

POD

Imperative way

kubectl run nginx --image nginx
kubectl get pods
kubectl delete pod nginx

Declarative way

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
    type: front-end
spec:
  containers:
  - name: nginx-container
    image: nginx
EOF

kubectl get replicationcontroller
kubectl get pods
kubectl delete replicationcontroller myapp-rc

kubectl get replicaset
kubectl get pods
kubectl delete replicaset myapp-replicaset

Labels and Selectors

Replicaset monitor pods and it need to know which node have to monitor.

selector:
  matchLabels:
    tier: front-end
metadata:
  name: myapp-pod
  labels:
    tier: front-end

Scale

replicas: 6
kubeclt replace -f replicaset-definition.yml
kubectl scale --replicas=6 -f replicaset-definiton.yml
kubectl scale --replicas=6 -f replicaset myapp-replicaset

Deployment

Abstration above replicaset. Add ability to releasing.

Deploy

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deployment
  labels:
    app: myapp
    type: front-end
spec:
  template:
    metadata:
      name: myapp-pod
      labels:
        app: myapp
        type: front-end
    spec:
      containers:
      - name: nginx-container
        image: nginx
  replicas: 3
  selector:
    matchLabels:
      type: front-end
EOF
kubectl get deployment
kubectl get pods
kubectl delete deployment myapp-deployment
kubectl get all

As you might have seen already, it is a bit difficult to create and edit YAML files. Especially in the CLI. During the exam, you might find it difficult to copy and paste YAML files from browser to terminal. Using the kubectl run command can help in generating a YAML template. And sometimes, you can even get away with just the kubectl run command without having to create a YAML file at all. For example, if you were asked to create a pod or deployment with specific name and image you can simply run the kubectl run command.

Use the below set of commands and try the previous practice tests again, but this time try to use the below commands instead of YAML files. Try to use these as much as you can going forward in all exercises

Reference (Bookmark this page for exam. It will be very handy):

https://kubernetes.io/docs/reference/kubectl/conventions/

Create an NGINX Pod

kubectl run nginx --image=nginx

Generate POD Manifest YAML file (-o yaml). Don’t create it(–dry-run)

kubectl run nginx --image=nginx --dry-run=client -o yaml

Create a deployment

kubectl create deployment --image=nginx nginx

Generate Deployment YAML file (-o yaml). Don’t create it(–dry-run)

kubectl create deployment --image=nginx nginx --dry-run=client -o yaml

Generate Deployment YAML file (-o yaml). Don’t create it(–dry-run) with 4 Replicas (–replicas=4)

kubectl create deployment --image=nginx nginx --dry-run=client -o yaml > nginx-deployment.yaml

Save it to a file, make necessary changes to the file (for example, adding more replicas) and then create the deployment.

kubectl create -f nginx-deployment.yaml

OR

In k8s version 1.19+, we can specify the –replicas option to create a deployment with 4 replicas.

kubectl create deployment --image=nginx nginx --replicas=4 --dry-run=client -o yaml > nginx-deployment.yaml

Services

NodePort

TargetPort - pod port (if not specified, the same as Port) Port - service port (mandatory) NodePort - node port (if not specified, one available from range 30000 - 32767 will be used)

Service have build in load balancer: Algorithm: Random SessionAffinity: Yes

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: myapp-service
spec:
  type: NodePort
  ports:
  - targetPort: 80
    port: 80
    nodePort: 30008
  selector:
    app: myapp
    type: front-end
EOF
kubectl get services

kubectl cluster-info
curl http://172.18.0.2:30008

In case that pods are in sepearete nodes with separete IP addresses, service will also work, and curl for each IP will work. curl http://172.18.0.2:30008 curl http://172.18.0.3:30008 curl http://172.18.0.4:30008

ClusterIP

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: backend
spec:
  type: ClusterIP
  ports:
  - targetPort: 80
    port: 80
  selector:
    app: myapp
    type: backend
EOF

LoadBalancer

Namespaces

Default namespaces:

  1. Default
  2. Kube-system
  3. Kube-public

Both services(service-one and db-service) inside default namspace:

mysql.connec("db-service")

Service-one in default, db-service in dev namespace:

mysql.connect("db-service.dev.svc.cluster.local")

During creation a service, to DNS will be added address as below: db-service.dev.svc.cluster.local

  1. cluster.local - domain
  2. svc - subdomain
  3. dev - namespace
  4. db-service - service name

Create namespace:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: dev
EOF

Set context:

kubectl config set-context $(kubectl config current-context) --namespace=dev
kubectl get pods

List all pods in all namespaces

kubectl get pods --all-namespaces

Resource quota:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: dev
spec:
  hard:
    pods: "10"
    requests.cpu: "4"
    requests.memort: 5Gi
    limits.cpu: "10"
    limits.memory: 10Gi
EOF

Imperative vs Declarative

Create a pod with name nginx and image nginx

kubectl run --image=nginx nginx

Create a deployment with name nginx and image nginx

kubectl create deployment --image=nginx nginx

Create a service with CluserIP type with port and targetPort 80

kubectl expose deployment nginx --port 80

Edit deployment

kubectl edit deployment nginx

Scale deployment

kubectl scale deployment nginx --replicas=5

Change image for deployment (nginx= means container name)

kubectl set image deployment nginx nginx=nginx:1.18

Create kubernetes resource

kubectl create -f nginx.yaml

Replace kubernetes resource

kubectl replace -f nginx.yaml

Delete kubernetes resource

kubectl delete -f nginx.yaml

For declarative approach:

kubectl apply -f nginx.yaml

Certification tips

While you would be working mostly the declarative way - using definition files, imperative commands can help in getting one time tasks done quickly, as well as generate a definition template easily. This would help save considerable amount of time during your exams.

Before we begin, familiarize with the two options that can come in handy while working with the below commands:

–dry-run: By default as soon as the command is run, the resource will be created. If you simply want to test your command , use the –dry-run=client option. This will not create the resource, instead, tell you whether the resource can be created and if your command is right.

-o yaml: This will output the resource definition in YAML format on screen.

Use the above two in combination to generate a resource definition file quickly, that you can then modify and create resources as required, instead of creating the files from scratch.

POD Create an NGINX Pod

kubectl run nginx --image=nginx

Generate POD Manifest YAML file (-o yaml). Don’t create it(–dry-run)

kubectl run nginx --image=nginx --dry-run=client -o yaml

Deployment Create a deployment

kubectl create deployment --image=nginx nginx

Generate Deployment YAML file (-o yaml). Don’t create it(–dry-run)

kubectl create deployment --image=nginx nginx --dry-run=client -o yaml

Generate Deployment with 4 Replicas

kubectl create deployment nginx --image=nginx --replicas=4

You can also scale a deployment using the kubectl scale command.

kubectl scale deployment nginx --replicas=4

Another way to do this is to save the YAML definition to a file and modify

kubectl create deployment nginx --image=nginx --dry-run=client -o yaml > nginx-deployment.yaml

You can then update the YAML file with the replicas or any other field before creating the deployment.

Service Create a Service named redis-service of type ClusterIP to expose pod redis on port 6379

kubectl expose pod redis --port=6379 --name redis-service --dry-run=client -o yaml

(This will automatically use the pod’s labels as selectors) Or

kubectl create service clusterip redis --tcp=6379:6379 --dry-run=client -o yaml

(This will not use the pods labels as selectors, instead it will assume selectors as app=redis. You cannot pass in selectors as an option. So it does not work very well if your pod has a different label set. So generate the file and modify the selectors before creating the service)

Create a Service named nginx of type NodePort to expose pod nginx’s port 80 on port 30080 on the nodes:

kubectl expose pod nginx --type=NodePort --port=80 --name=nginx-service --dry-run=client -o yaml

(This will automatically use the pod’s labels as selectors, but you cannot specify the node port. You have to generate a definition file and then add the node port in manually before creating the service with the pod.) Or

kubectl create service nodeport nginx --tcp=80:80 --node-port=30080 --dry-run=client -o yaml

(This will not use the pods labels as selectors)

Both the above commands have their own challenges. While one of it cannot accept a selector the other cannot accept a node port. I would recommend going with the kubectl expose command. If you need to specify a node port, generate a definition file using the same command and manually input the nodeport before creating the service. Reference:

https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands

https://kubernetes.io/docs/reference/kubectl/conventions/

Scheduling

Manual scheduling

If there is no scheduler, pod will remain with Ready 0/1 with status pending. We can add nodeName directive to pod definition. This works only for newly created pod.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 8080
  nodeName: node02
EOF

If pod is actualy running, we need to create binding object.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Binding
metadata:
  name: nginx
target:
  apiVersion: v1
  kind: Node
  name: node02
EOF
curl --header "Content-Type: application/json" --request POST --data '{"apiVersion":"v1", "kind": "Binding" ... }' http://$SERVER/api/v1/namespaces/default/pods/$PODNAME/binding

Labels and Selectors

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: simple-webapp
  labels:
    app: App1
    function: Front-end
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 8080
EOF
kubectl get pods --selector app=App1

kubectl taint nodes node-name key=value:taint-effect

kubectl taint nodes node1 app=blue:NoSchedule

Untaint node node01 (by adding - sign at the end)

kubectl taint nodes node1 app=blue:NoSchedule-

Taint effects:

  1. NoSchedule - pod not be schedule on the node
  2. PreferNoSchedule - try avoid placing node on the node
  3. NoExecute - new pod will not be sheduled on the node and existing pod will be evicted if not tolerate the taint

Tolerations: Each values in tolerations need to be in double quotes (“”):

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 8080
  tolerations:
  - key: "app"
    operator: "Equal"
    value: "blue"
    effect: "NoSchedule"
EOF

Master node have own taints. Kind cluster have not taints on kind-control-plane node

kubectl get nodes
kubectl describe node kind-control-plane | grep Taint

kubectl label nodes node-1 size=Large

Node affinity

Advanced capabilities for selecting nodes for pod

Below snippet is equivalent of the above one

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 8080
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: size
            operator: In
            values:
            - Large
EOF

Apply pod if node is medium or large

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: size
          operator: In
          values:
          - Medium
          - Large

Apply pod only if node is not small

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: size
          operator: NotIn
          values:
          - Small

Apply pod to node if such (size) label exists

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: size
          operator: Exists

Node affinity types: Available:

  1. requiredDuringSchedulingIgnoredDuringExecution
  2. prefferedDuringSchedulingIgnoredDuringExecution Planned:
  3. requiredDuringSchedulingRequredDuringExecution
DuringSchedulingDuringExectuionno labels in noderemove label during run
RequiredIgnorednot scheduledwill run
PrefferedIgnoredscheduled on random nodewill run
RequiredRequirednot scheduledpod will be removed

Resource Requirements and Limits

1m CPU - minimal amout of cpu 1vCPU - default k8s cpu limit 512Mi - default k8s memory limit

Resource Request: Minimal amout of resources which are needed for pod Scheduler will take this information and will check that there is a place to put this pod into node based on resource requests.

In the previous lecture, I said - “When a pod is created the containers are assigned a default CPU request of .5 and memory of 256Mi”. For the POD to pick up those defaults you must have first set those as default values for request and limit by creating a LimitRange in that namespace.

apiVersion: v1
kind: LimitRange
metadata:
    name: mem-limit-range
spec:
    limits:
    - default:
        memory: 512Mi
    defaultRequest:
        memory: 256Mi
    type: Container

https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/

apiVersion: v1
kind: LimitRange
metadata:
    name: cpu-limit-range
spec:
    limits:
    - default:
        cpu: 1
    defaultRequest:
        cpu: 0.5
    type: Container

https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/

References:

https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource

Daemonsets

Daemonset ensure that one replica of pod is available on each available node. Usefull for monitoring and network agents. Kube-proxy is deployed as daemonset.

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: monitoring-daemon
spec:
  selector:
    matchLabels:
      app: monitoring-agent
  template:
    metadata:
      labels:
        app: monitoring-agent
    spec:
      containers:
      - name: monitoring-agent
        image: monitoring-agent
EOF

Static PODs

If there is no master node and left only single node Use cases: For master nodes (control plane components)

kubelet.service

ExecStart=/usr/local/bin/kublet \\
--container-runtime=remote \\
--container-runtime-endpont=unix:///var/run/containerd/containerd.sock \\
--pod-manifest-path=/etc/kubernetes/manifests \\
--kubeconfig=/var/lib/kublet/kubeconfig \\
--network-plugin=cni \\
--register-node=true \\
--v=2

OR:

kubelet.service

ExecStart=/usr/local/bin/kublet \\
--container-runtime=remote \\
--container-runtime-endpont=unix:///var/run/containerd/containerd.sock \\
--config=kubeconfig.yaml \\
--kubeconfig=/var/lib/kublet/kubeconfig \\
--network-plugin=cni \\
--register-node=true \\
--v=2

kubeconfig.yaml

staticPodPath: /etc/kubernetes/manifests

To view static pod:

docker ps

Task:

We just created a new static pod named static-greenbox. Find it and delete it. This question is a bit tricky. But if you use the knowledge you gained in the previous questions in this lab, you should be able to find the answer to it.

Solutions

First, let’s identify the node in which the pod called static-greenbox is created. To do this, run:

root@controlplane:~# kubectl get pods –all-namespaces -o wide | grep static-greenbox default static-greenbox-node01 1/1 Running 0 19s 10.244.1.2 node01 <none> <none> root@controlplane:~#

From the result of this command, we can see that the pod is running on node01.

Next, SSH to node01 and identify the path configured for static pods in this node. Important: The path need not be /etc/kubernetes/manifests. Make sure to check the path configured in the kubelet configuration file.

root@controlplane:~# ssh node01 root@node01:~# ps -ef | grep /usr/bin/kubelet root 752 654 0 00:30 pts/0 00:00:00 grep –color=auto /usr/bin/kubelet root 28567 1 0 00:22 ? 00:00:11 /usr/bin/kubelet –bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf –kubeconfig=/etc/kubernetes/kubelet.conf –config=/var/lib/kubelet/config.yaml –network-plugin=cni –pod-infra-container-image=k8s.gcr.io/pause:3.2 root@node01:~# grep -i staticpod /var/lib/kubelet/config.yaml staticPodPath: /etc/just-to-mess-with-you root@node01:~#

Here the staticPodPath is /etc/just-to-mess-with-you

Navigate to this directory and delete the YAML file:

root@node01:/etc/just-to-mess-with-you# ls greenbox.yaml root@node01:/etc/just-to-mess-with-you# rm -rf greenbox.yaml root@node01:/etc/just-to-mess-with-you#

Exit out of node01 using CTRL + D or type exit. You should return to the controlplane node. Check if the static-greenbox pod has been deleted:

root@controlplane:~# kubectl get pods –all-namespaces -o wide | grep static-greenbox root@controlplane:~#

Multiple scheduler

Default scheduler

Default scheduler name can be changed in file as below:

scheduler-config.yaml

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: defaultScheduler

Custom scheduler

my-scheduler-config.yaml

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: mys-scheduler

Deploy additonal scheduler:

wget https://storage.googleapis.com/kubernetes-release/release/v1.12.0/bin/linux/amd64/kube-scheduler

kube-scheduler.service

ExecStart=/usr/local/bin/kube-scheduler \\
--config=/etc/kubernetes/config/kube-scheduler.yaml

my-scsheduler.service

ExecStart=/usr/local/bin/kube-scheduler \\
--config=/etc/kubernetes/config/my-scheduler-config.yaml

Deployment with kubeadm as Pod:

apiVersion: v1
kind: Pod
metadata:
  name: my-custom-scheduler
  namespace: kube-system
spec:
  containers:
    - command:
        - kube-scheduler
        - --address=127.0.0.1
        - --kubeconfig=/etc/kubernetes/scheduler.conf
        - --config=/etc/kubernetes/my-scheduler-config.yml
      image: k8s.gcr.io/kube-scheduler-amd64:v1.11.3
      name: kube-scheduler

Leader election

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: mys-scheduler
leaderElection:
  leaderElect: true
  resourceNamespace: kube-system
  resourceName: lock-object-my-scheduler

How to use custom scheduler

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
    - image: nginx
      name: nginx
  schedulerName: my-custom-scheduler

Priority Class

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "This priority class should be used for xyz services pods only."

Scheduling process

Pods before allocated to node remain in scheduling queue. Next pods are filtered. Not suitable nodes are rejected. Next pods are scoring. Remaining nodes will give a score. Example: Scheduling queue: few pods, one selected: Pod = 6 cores Fitering: Node1 = 4 cores - not suitable Node2 = 4 cores - not suitable Node3 = 8 cores - suitable Node4 = 12 cores - suitable Scoring: Node3 = 8 - 6 = 2 Node4 = 12 - 6 = 6 Binding: Pod will be binding to Node4.

Scheduling plugins: Scheduling queue - PrioritySort Filtering - NodeResourceFit, NodeName, NodeUnschedulable Scoring - NodeResourceFit, ImageLocality, Binding- DefaultBinder