Skip to content

Latest commit

 

History

History
558 lines (417 loc) · 10 KB

File metadata and controls

558 lines (417 loc) · 10 KB

Troubleshooting Guide

Common issues and solutions when using vind.

Cluster Creation Issues

Cluster Won't Start

Symptoms:

  • vcluster create fails
  • Container exits immediately
  • No cluster appears in vcluster list

Solutions:

  1. Check Docker is running:

    docker ps
  2. Check Docker resources:

    docker system df
    docker system prune  # If needed
  3. Check available disk space:

    df -h
  4. View detailed logs:

    vcluster create my-cluster --debug
  5. Check Docker logs:

    docker exec vcluster.cp.my-cluster journalctl -u vcluster --nopager

Port Already in Use

Symptoms:

  • Error: port is already allocated
  • Cannot bind to port

Solutions:

  1. Find what's using the port:

    # macOS/Linux
    lsof -i :8443
    
    # Or
    netstat -an | grep 8443
  2. Use a different port:

    vcluster create my-cluster \
      --set experimental.docker.ports[0]="8444:8443"
  3. Stop conflicting service:

    # Find and stop the service using the port

Insufficient Resources

Symptoms:

  • Container fails to start
  • Out of memory errors
  • CPU throttling

Solutions:

  1. Check Docker resources:

    docker stats
  2. Increase Docker resources:

    • Docker Desktop: Settings → Resources
    • Increase CPU and Memory allocation
  3. Pause other clusters:

    vcluster list
    vcluster pause other-cluster

Platform UI Issues

Clusters Not Showing in Platform UI

Symptoms:

  • Created clusters don't appear in the Platform UI
  • Dashboard shows no clusters

Cause:

The vCluster Platform only discovers clusters created after the platform is started. If you create clusters first and then start the platform, they will not be synced.

Solutions:

  1. Start the platform first, then create clusters:

    # Start platform first
    vcluster platform start
    
    # Then create clusters - they will appear in the UI
    vcluster create my-cluster
  2. If you already have clusters created before the platform:

    • Delete and recreate the clusters after starting the platform
    vcluster platform start
    vcluster delete my-cluster
    vcluster create my-cluster

Connection Issues

Cannot Connect to Cluster

Symptoms:

  • vcluster connect fails
  • kubectl commands fail
  • Connection refused errors

Solutions:

  1. Verify cluster is running:

    vcluster list
    docker ps | grep vcluster
  2. Check cluster status:

    vcluster describe my-cluster
  3. Reconnect:

    vcluster connect my-cluster --update-current
  4. Check kubeconfig:

    kubectl config get-contexts
    kubectl config use-context vcluster-docker_my-cluster
  5. Verify port is accessible:

    # Get the port
    docker port vcluster-my-cluster
    
    # Test connection
    curl -k https://localhost:<port>

Kubeconfig Not Found

Symptoms:

  • kubectl says context not found
  • Cannot switch contexts

Solutions:

  1. Reconnect to cluster:

    vcluster connect my-cluster
  2. Manually add context:

    vcluster connect my-cluster --update-current
  3. Disconnect from cluster:

    vcluster disconnect my-cluster
  4. Check kubeconfig location:

    echo $KUBECONFIG
    kubectl config view

Load Balancer Issues

LoadBalancer Service Has No External IP

Symptoms:

  • Service created but EXTERNAL-IP is <pending>
  • Cannot access service

Solutions:

  1. Load balancer is enabled by default

  2. Check Docker network:

    docker network inspect vcluster-my-cluster
  3. On macOS, check port forwarding:

    # May need sudo for privileged ports
    # Load balancer is enabled by default

Cannot Access LoadBalancer Service

Symptoms:

  • Service has EXTERNAL-IP but cannot access
  • Connection refused

Solutions:

  1. Check service status:

    kubectl get svc my-service
  2. Verify IP is in Docker network:

    docker network inspect vcluster-my-cluster | grep -A 5 "IPAM"
  3. Test connectivity:

    curl http://<EXTERNAL-IP>
    ping <EXTERNAL-IP>
  4. Check firewall rules:

    # macOS
    sudo pfctl -sr

Registry Proxy Issues

Registry Proxy Not Working

Symptoms:

  • Images still pull from registry
  • No caching benefit

Solutions:

  1. Verify containerd storage:

    docker info | grep "Storage Driver"
    # Should show containerd or overlay2 with containerd
  2. Registry proxy is enabled by default

  3. Enable containerd storage:

  4. Verify containerd socket:

    ls -la /var/run/docker/containerd/containerd.sock
  5. Check logs:

    docker exec vcluster.cp.my-cluster journalctl -u vcluster --nopager | grep -i registry

Sleep/Wake Issues

Cannot Pause Cluster

Symptoms:

  • vcluster pause fails
  • Cluster doesn't stop

Solutions:

  1. Check cluster status:

    vcluster list
    docker ps | grep vcluster
  2. Manually stop containers:

    docker stop vcluster-my-cluster
    docker stop vcluster-node-my-cluster-*
  3. Check for stuck processes:

    docker ps -a | grep vcluster

Cannot Resume Cluster

Symptoms:

  • vcluster resume fails
  • Cluster doesn't start

Solutions:

  1. Check container status:

    docker ps -a | grep vcluster
  2. Manually start containers:

    docker start vcluster-my-cluster
    docker start vcluster-node-my-cluster-*
  3. Check Docker resources:

    docker system df
  4. View container logs:

    docker exec vcluster.cp.my-cluster journalctl -u vcluster --nopager

Network Issues

Cannot Reach Services

Symptoms:

  • Pods cannot communicate
  • Services unreachable
  • DNS resolution fails

Solutions:

  1. Check network configuration:

    docker network inspect vcluster-my-cluster
  2. Verify CNI is working:

    kubectl get pods -n kube-system | grep cni
  3. Check pod networking:

    kubectl get pods -o wide
    kubectl describe pod <pod-name>
  4. Test DNS:

    kubectl run -it --rm debug --image=busybox -- nslookup kubernetes.default

DNS Not Working

Symptoms:

  • Cannot resolve service names
  • DNS queries fail

Solutions:

  1. Check CoreDNS:

    kubectl get pods -n kube-system | grep coredns
    kubectl logs -n kube-system <coredns-pod>
  2. Verify DNS service:

    kubectl get svc -n kube-system kube-dns
  3. Check DNS config:

    kubectl get configmap -n kube-system coredns -o yaml

External Node Issues

Cannot Join External Node

Symptoms:

  • Join command fails
  • Node doesn't appear in cluster

Solutions:

  1. EC2 connector script not executing directly: If the join script does not execute directly via curl | bash, download the script first and then run it with sudo:

    # Download the script first
    curl -L -o join-script.sh "<join-script-url>"
    chmod +x join-script.sh
    sudo ./join-script.sh
  2. Verify VPN is enabled:

    vcluster describe my-cluster | grep -i vpn
  3. Check join token:

    # Get valid join token
    kubectl get secret join-token -o jsonpath='{.data.token}' | base64 -d
  4. Verify network connectivity:

    # From external node
    ping <control-plane-ip>
    curl -k https://<control-plane-ip>:8443
  5. Check platform URL:

    # Ensure platform is accessible
    curl https://<platform-url>/health
  6. View join logs:

    # On external node
    journalctl -u vcluster-join

Performance Issues

Slow Cluster Operations

Symptoms:

  • Commands take long time
  • High CPU/memory usage

Solutions:

  1. Check resource usage:

    docker stats
  2. Pause unused clusters:

    vcluster list
    vcluster pause unused-cluster
  3. Clean up Docker:

    docker system prune -a
  4. Increase Docker resources:

    • Docker Desktop: Settings → Resources

High Memory Usage

Symptoms:

  • System running out of memory
  • Docker containers killed

Solutions:

  1. Check memory usage:

    docker stats
  2. Limit cluster resources:

    experimental:
      docker:
        args:
          - "--memory=2g"
          - "--cpus=2"
  3. Pause unused clusters:

    vcluster pause my-cluster

Getting Help

Debug Mode

Enable debug logging:

vcluster create my-cluster --debug

View Logs

# Control plane logs
docker exec vcluster.cp.my-cluster journalctl -u vcluster --nopager

# Node logs (for node named worker-1)
docker exec vcluster.node.my-cluster.worker-1 journalctl -u kubelet --nopager
# or for containerd
docker exec vcluster.node.my-cluster.worker-1 journalctl -u containerd --nopager

Collect Information

When reporting issues, collect:

  1. vCluster version:

    vcluster version
  2. Docker version:

    docker version
  3. System info:

    uname -a
    docker info
  4. Cluster status:

    vcluster list
    vcluster describe my-cluster
  5. Logs:

    docker exec vcluster.cp.my-cluster journalctl -u vcluster --nopager > vcluster.log

Community Support