github 项目地址
https://github.com/helm/charts/tree/master/stable/spark

Chart Details

This chart will do the following:

  • 1 x Spark Master with port 8080 exposed on an external LoadBalancer.
    1个 Spark Master节点,在负载均衡器上暴露外部访问端口为8080(修改为8088)。
  • 3 x Spark Workers with HorizontalPodAutoscaler(Default: false) to scale to max 10 pods when CPU hits 50% of 100m.
    3个自动水平扩展的Spark Workers,当触及节点CPU资源限额的50%时,最多扩展到10个Pods。(查看节点资源:kubectl describe node ec-k8s-n1)
  • 1 x Zeppelin with port 8080 exposed on an external LoadBalancer.
    1个Zeppelin ,在负载均衡器上暴露外部访问端口8080。
  • All using Kubernetes Deployments.
    全部使用Kubernetes部署

Getting Started

建议预先拉取国内镜像(k8s.gcr.io被墙)

## 各 Node 节点
#
[root@ec-k8s-n1 ~]# docker pull mirrorgooglecontainers/spark-master:1.5.1_v3
1.5.1_v3: Pulling from mirrorgooglecontainers/spark-master
674ded4e0a75: Pull complete 
a3ed95caeb02: Pull complete 
3fe37ed373c9: Pull complete 
......
e601161a0962: Pull complete 
68d5e6e808db: Pull complete 
8634d83ff7e4: Pull complete 
Digest: sha256:014120548840b7281756eb15aef0991c27b75db52cca30caae44a98df4b302db
Status: Downloaded newer image for mirrorgooglecontainers/spark-master:1.5.1_v3

# 
[root@ec-k8s-n1 ~]# docker pull mirrorgooglecontainers/spark-worker:1.5.1_v3
1.5.1_v3: Pulling from mirrorgooglecontainers/spark-worker
674ded4e0a75: Already exists 
a3ed95caeb02: Pull complete 
3fe37ed373c9: Already exists 
ddf1745d8563: Already exists 
bd3317f86714: Already exists 
......
e601161a0962: Already exists 
6ce453fb9a45: Pull complete 
1e35521fd858: Pull complete 
Digest: sha256:525b3026139248fad7c3d9c93a3a1273fde5c9563039dda4ccacf5541fe133e1
Status: Downloaded newer image for mirrorgooglecontainers/spark-worker:1.5.1_v3

#
[root@ec-k8s-n1 ~]# docker pull apache/zeppelin:0.7.3
0.7.3: Pulling from apache/zeppelin
9fb6c798fa41: Pull complete 
3b61febd4aef: Pull complete 
.......
224ffdb7a825: Pull complete 
5adcc546cd9a: Pull complete 
128ebdda1a7a: Pull complete 
Digest: sha256:2dbb8d9afb2002873ec91144ac28146d112dfc7ba48ef21b7de4e76bf4cd95cd
Status: Downloaded newer image for apache/zeppelin:0.7.3

Clone

## 克隆Helm/charts项目
#
[root@ec-k8s-m1 helm]# git clone https://github.com/helm/charts.git ./charts

Edit

## /helm/charts/stable/spark
## 添加 spark-master externalIP
#
[root@ec-k8s-m1 spark]# vim templates/spark-master-deployment.yaml

spec:
  externalIP:
    # 外部访问地址    
    - {{ .Values.WebUi.ExternalIP }}
## 添加 zeppelin externalIP
#
[root@ec-k8s-m1 spark]# vim templates/spark-zeppelin-deployment.yaml

spec:
  externalIPs:
    # 外部访问地址    
    - {{ .Values.Zeppelin.ExternalIP }}

Set Parameters

## 自定义参数
#  Master:
#    Image: "mirrorgooglecontainers/spark-master"
#  WebUi:
#    ExternalIP: "172.16.0.81"
#    ServicePort: 8088
#  Worker:
#    Image: "mirrorgooglecontainers/spark-worker"
#  Zeppelin:
#    ExternalIP: "172.16.0.82"
#    ServicePort: 8080


[root@ec-k8s-m1 spark]# vim values.yaml 

# Default values for spark.
# This is a YAML-formatted file.
# Declare name/value pairs to be passed into your templates.
# name: value

Spark:
  Path: "/opt/spark"

Master:
  Name: master
  # Image: "k8s.gcr.io/spark"
  Image: "mirrorgooglecontainers/spark-master"
  ImageTag: "1.5.1_v3"
  Replicas: 1
  Component: "spark-master"
  Cpu: "100m"
  Memory: "512Mi"
  ServicePort: 7077
  ContainerPort: 7077
  # Set Master JVM memory. Default 1g
  # DaemonMemory: 1g
  ServiceType: LoadBalancer

WebUi:
  Name: webui
  # 增加外部访问IP
  ExternalIP: "172.16.0.81"
  ServicePort: 8088
  ContainerPort: 8080

Worker:
  Name: worker
  # Image: "k8s.gcr.io/spark"
  Image: "mirrorgooglecontainers/spark-worker"
  ImageTag: "1.5.1_v3"
  Replicas: 3
  Component: "spark-worker"
  Cpu: "100m"
  Memory: "512Mi"
  ContainerPort: 8081
  # Set Worker JVM memory. Default 1g
  # DaemonMemory: 1g
  # Set how much total memory workers have to give executors
  # ExecutorMemory: 1g
  Autoscaling:
    Enabled: false
  ReplicasMax: 10
  CpuTargetPercentage: 50

Zeppelin:
  Name: zeppelin
  Image: "apache/zeppelin"
  ImageTag: "0.7.3"
  Replicas: 1
  Component: "zeppelin"
  Cpu: "100m"
  # 增加外部访问IP
  ExternalIP: "172.16.0.82"
  ServicePort: 8080
  ContainerPort: 8080
  ServiceType: LoadBalancer
  Ingress:
    Enabled: false
    Path: "/"
    Tls: []
  #    - Hosts:
  #    SecretName: zeppelin
  # Used to create an Ingress record.
  # Hosts:
  # - example.local
  # Annotations:
  #   kubernetes.io/ingress.class: nginx
  #   kubernetes.io/tls-acme: "true"
  # Tls:
  #   Enabled: true
  # Secrets must be manually created in the namespace.
  #   SecretName: example-tls
  #   Hosts:
  #   - example.local
  Persistence:
    Config:
      Enabled: false
      ## etcd data Persistent Volume Storage Class
      ## If defined, storageClassName: <storageClass>
      ## If set to "-", storageClassName: "", which disables dynamic provisioning
      ## If undefined (the default) or set to null, no storageClassName spec is
      ## set, choosing the default provisioner. (gp2 on AWS, standard on
      ## GKE, AWS & OpenStack)
      StorageClass: "-"
      ## Set default PVC size
      Size: 10G
      ## Set default PVC access mode: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
      AccessMode: ReadWriteOnce
    Notebook:
      Enabled: false
      StorageClass: "-"
      Size: 10G
      AccessMode: ReadWriteOnce


Installing

##
#
[root@ec-k8s-m1 spark]# helm install . --name ec-spark -f values.yaml
NAME:   ec-spark
LAST DEPLOYED: Thu Oct 25 23:32:24 2018
NAMESPACE: default
STATUS: DEPLOYED

RESOURCES:
==> v1/Service
NAME               AGE
ec-spark-master    2s
ec-spark-webui     2s
ec-spark-zeppelin  2s

==> v1beta1/Deployment
ec-spark-master    2s
ec-spark-worker    2s
ec-spark-zeppelin  2s

==> v1/Pod(related)

NAME                                READY  STATUS             RESTARTS  AGE
ec-spark-master-754fc8c4c-dbrjw     0/1    Pending            0         2s
ec-spark-worker-98d56445c-8jkt9     0/1    Pending            0         2s
ec-spark-worker-98d56445c-p8htd     0/1    Pending            0         2s
ec-spark-worker-98d56445c-zpcbw     0/1    Pending            0         2s
ec-spark-zeppelin-5c7985bc76-7nptp  0/1    ContainerCreating  0         2s


NOTES:
1. Get the Spark URL to visit by running these commands in the same shell:
  
  NOTE: It may take a few minutes for the LoadBalancer IP to be available.
  You can watch the status of by running 'kubectl get svc --namespace default -w ec-spark-webui'
  
  export SPARK_SERVICE_IP=$(kubectl get svc --namespace default ec-spark-webui -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
  echo http://$SPARK_SERVICE_IP:8088

2. Get the Zeppelin URL to visit by running these commands in the same shell:
  
  NOTE: It may take a few minutes for the LoadBalancer IP to be available.
  You can watch the status of by running 'kubectl get svc --namespace default -w ec-spark-zeppelin'

  export ZEPPELIN_SERVICE_IP=$(kubectl get svc --namespace default ec-spark-zeppelin -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
  echo http://$ZEPPELIN_SERVICE_IP:8080


#
[root@ec-k8s-m1 spark]# kubectl get pods -o wide
NAME                                 READY     STATUS    RESTARTS   AGE       IP                NODE        NOMINATED NODE
ec-spark-master-754fc8c4c-dbrjw      1/1       Running   1          1m        192.168.136.154   ec-k8s-n3   <none>
ec-spark-worker-98d56445c-8jkt9      1/1       Running   1          1m        192.168.136.155   ec-k8s-n3   <none>
ec-spark-worker-98d56445c-p8htd      1/1       Running   0          1m        192.168.54.204    ec-nfs01    <none>
ec-spark-worker-98d56445c-zpcbw      1/1       Running   0          1m        192.168.231.69    ec-k8s-n1   <none>
ec-spark-zeppelin-5c7985bc76-7nptp   1/1       Running   0          1m        192.168.136.153   ec-k8s-n3   <none>
mysql-0                              1/1       Running   0          22h       192.168.136.148   ec-k8s-n3   <none>

#
[root@ec-k8s-m1 spark]# kubectl get svc
NAME                TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
ec-spark-master     ClusterIP      10.96.158.169    <none>        7077/TCP         2m
ec-spark-webui      LoadBalancer   10.103.204.239   172.16.0.81   8088:30885/TCP   2m
ec-spark-zeppelin   LoadBalancer   10.96.84.142     172.16.0.82   8080:31254/TCP   2m
kubernetes          ClusterIP      10.96.0.1        <none>        443/TCP          27d

Kubectl get
WebUI
Zeppelin