Skip to content

guang/docker-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

docker-spark

This is my docker setup for deploying spark (tailored towards pyspark) on kubernetes, based on other peeps' existing work

Notice that things like port mappings are delegated to the orchestration tool. As a result, the template for master pod should look something like

{
  "kind": "Pod",
  "apiVersion": "v1",
  "metadata": {
    "name": "{{USER_NAME}}-spark",
    "labels": {
      "name": "{{USER_NAME}}-spark-master",
      "owner": "{{USER_NAME}}"
    }
  },
  "spec": {
    "containers": [
      {
        "name": "{{USER_NAME}}-spark",
        "image": "guangyang/docker-spark:latest",
        "command": ["/bin/bash", "-c", "/start-master.sh {{USER_NAME}}-spark"],
        "env": [
          {
            "name": "SPARK_MASTER_PORT",
            "value": "7077"
          },
          {
            "name": "SPARK_MASTER_WEBUI_PORT",
            "value": "8080"
          }
        ],
        "ports": [
          {
            "containerPort": 7077,
            "protocol": "TCP"
          },
          {
            "containerPort": 8080,
            "protocol": "TCP"
          }
        ]
      }
    ]
  }
}

while the worker replication controller should look something like

{
  "kind": "ReplicationController",
  "apiVersion": "v1",
  "metadata": {
    "name": "{{USER_NAME}}-spark-worker-controller",
    "labels": {
      "name": "{{USER_NAME}}-spark-worker",
          "owner": "{{USER_NAME}}"
    }
  },
  "spec": {
    "replicas": 2,
    "selector": {
      "name": "{{USER_NAME}}-spark-worker"
    },
    "template": {
      "metadata": {
        "labels": {
          "name": "{{USER_NAME}}-spark-worker",
          "uses": "{{USER_NAME}}-spark",
          "owner": "{{USER_NAME}}"
        }
      },
      "spec": {
        "containers": [
          {
            "name": "{{USER_NAME}}-spark-worker",
            "image": "guangyang/docker-spark:latest",
            "command": ["/bin/bash", "-c", "/start-worker.sh spark://{{USER_NAME}}-spark:7077"],
            "ports": [
              {
                "hostPort": 8888,
                "containerPort": 8888
              }
            ]
          }
        ]
      }
    }
  }
}

where {{USER_NAME}} should be replaced with a dns-friendly name of your choosing

Love to hear feedbacks! To get started with spark on kubernetes, check out kube's doc on spark

About

Need more spark clusters

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages