Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[ZEPPELIN-3840] Zeppelin on Kubernetes
### What type of PR is it? This PR adds ability to run Zeppelin on Kubernetes. It aims - Zero configuration to start Zeppelin on Kubernetes. (and Spark on Kubernetes) - Run everything on Kubernetes: Zeppelin, Interpreters, Spark. - Highly customizable to adopt various user configurations and extensions. Key features are - Provides zeppelin-server.yaml file for `kubectl` to run Zeppelin server - All interpreters are automatically running as a Pod. - Spark interpreter automatically configured to use [Spark on Kubernetes](https://spark.apache.org/docs/latest/running-on-kubernetes.html) - Reverse proxy is configured to access Spark UI To do - [x] Document how reverse proxy for Spark UI works and how to configure custom domain. - [x] Document how to customize zeppelin-server and interpreter yaml. - [x] Document new configurations - [x] Document how to mount volume for notebook and configurations ### How it works #### Run Zeppelin Server on Kubernetes `k8s/zeppelin-server.yaml` is provided to run Zeppelin Server with few sidecars and configurations. This file is easy to publish (user can easily consume it using `curl`), highly customizable while it includes all the necessary things. #### K8s Interpreter launcher This PR adds new module, `launcher-k8s-standard` under `zeppelin/zeppelin-plugins/launcher/k8s-standard/` directory. This launcher is [automatically being selected](https://github.com/apache/zeppelin/pull/3240/files#diff-82fddd2ffb77aaffc4b9cf7b5b1eaa79) when Zeppelin is running on Kubernetes. The launcher both handles Spark interpreter and All other interpreters. The launcher launches interpreter as a Pod using template [k8s/interpreter/100-interpreter-pod.yaml](https://github.com/apache/zeppelin/pull/3240/files#diff-d9ce62e2c992d32f0184d7edb862f3c4). Reason filename has `100-` in prefix is because all files in the directory is consumed in alphabetical order by launcher on interpreter start/stop. User can drop more files here to extend/customize interpreter, and filename can be used to control order. The template is rendered by [jinjava](https://github.com/HubSpot/jinjava). #### Spark interpreter When interpreter group is `spark`, K8sRemoteInterpreterProcess [sets necessary spark configuration](https://github.com/apache/zeppelin/pull/3240/files#diff-6d1d3084f55bdd519e39ede4a619e73dR297) automatically to use [Spark on Kubernetes](https://spark.apache.org/docs/latest/running-on-kubernetes.html). User doesn't have to configure anything. It uses client mode. #### Spark UI We may make user manually configure port-forward or do something to access Spark UI, but that's not optimal. It is the best when Spark UI is automatically accessible when user have access to Zeppelin UI, without any extra configuration. To enable this, Zeppelin server Pod has a reverse proxy as a sidecar, and it split traffic to Zeppelin server and Spark UI running in the other Pod. It assume both `service.domain.com` and `*.service.domain.com` point the nginx proxy address. `service.domain.com` is directed to ZeppelinServer, `*.service.domain.com` is directed to interpreter Pod. `<port>-<interpreter pod svc name>.service.domain.com` is convention to access any application running in interpreter Pod. If Spark interpreter Pod is running with a name `spark-axefeg` and Spark UI is running on port 4040, ``` 4040-spark-axefeg.service.domain.com ``` is the address to access Spark UI. Default service domain is [local.zeppelin-project.org:8080](https://github.com/apache/zeppelin/pull/3240/files#diff-56ccb2e2c2617b27dbaae866d9431e51R22), while `local.zeppelin-project.org` and `*.local.zeppelin-project.org` point `127.0.0.1`, and it works with `kubectl port-forward`. ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-3840 ### How should this be tested? Prepare a Kubernetes cluster with enough resources (cpus > 5, mem > 6g). If you're using [minikube](https://github.com/kubernetes/minikube), check your capacity using `kubectl describe node` command before start. You'll need to build Zeppelin docker image and Spark docker image to test. Please follow guide docs/quickstart/kubernetes.md. To quickly try without building docker images, I have uploaded pre-built image on docker hub `moon/zeppelin:0.9.0-SNAPSHOT`, `moon/spark:2.4.0`. Try following command ``` ZEPPELIN_SERVER_YAML="curl -s https://raw.githubusercontent.com/Leemoonsoo/zeppelin/kubernetes/k8s/zeppelin-server.yaml" $ZEPPELIN_SERVER_YAML | sed 's/apache\/zeppelin:0.9.0-SNAPSHOT/moon\/zeppelin:0.9.0-SNAPSHOT/' | sed 's/spark:2.4.0/moon\/spark:2.4.0/' | kubectl apply -f - ``` And port forward ``` kubectl port-forward zeppelin-server 8080:80 ``` And browse http://localhost:8080 To clean up ``` $ZEPPELIN_SERVER_YAML | sed 's/apache\/zeppelin:0.9.0-SNAPSHOT/moon\/zeppelin:0.9.0-SNAPSHOT/' | sed 's/spark:2.4.0/moon\/spark:2.4.0/' | kubectl delete -f - ``` ### Screenshots (if appropriate) See this video https://youtu.be/7E4ZGn4pnTo ### Future work - Per interpreter docker image - Blocking communication between interpreter Pod. - Spark Interpreter Pod has Role CRUD for any pod/service in the same namespace. Which should be restricted to only Spark executors Pod. - Per note interpreter mode by default when Zeppelin is running on Kubernetes ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? yes Author: Lee moon soo <[email protected]> Author: Lee moon soo <[email protected]> Closes apache#3240 from Leemoonsoo/kubernetes and squashes the following commits: 0100a36 [Lee moon soo] update how it works on docs, add some comments on yaml files 423412a [Lee moon soo] zeppelin.k8s.mode -> zeppelin.run.mode 4e7d817 [Lee moon soo] localtest.me -> local.zeppelin-project.org 993a0e4 [Lee moon soo] document configurations 9ab6fc4 [Lee moon soo] address code review 22e090f [Lee moon soo] logger -> LOGGER 11960dd [Lee moon soo] update corresponding test as well 3b652a4 [Lee moon soo] Make spark executor set ownerreference correctly 1a3a070 [Lee moon soo] Set ownerreference to Role and Rolebinding of interpreter e2dc88a [Lee moon soo] suppress error log when wait target is already removed fa36c18 [Lee moon soo] Make spark master configurable b4f58a9 [Lee moon soo] sig term for quick termination 64a56b5 [Lee moon soo] Add docs e9ce64f [Lee moon soo] update dockerfile ec09b8b [Lee moon soo] add test 3078bac [Lee moon soo] spark ui support 9341fcb [Lee moon soo] install kubectl and configure log4j in docker image 0f7c0d4 [Lee moon soo] add license f305611 [Lee moon soo] rename file 2b579ff [Lee moon soo] let user override namespace f4166ad [Lee moon soo] make spark container image configurable 0d472ea [Lee moon soo] load properties and environment variables b0e2c36 [Lee moon soo] Rbac role, rolebinding 2960dcb [Lee moon soo] configure namespace a4072e6 [Lee moon soo] add signal handler 7a87367 [Lee moon soo] configure spark on kubernetes 263d859 [Lee moon soo] use headless service for interpreter pod 7fe9823 [Lee moon soo] interpreter pod cascade delete on zeppelin-server delete 86e8764 [Lee moon soo] add services on RBAC 18b8f68 [Lee moon soo] print spec file contents on debug log 0dea383 [Lee moon soo] create and connect interpreter pod 9f1b7a1 [Lee moon soo] run kubernetes launcher 2fd2ac8 [Lee moon soo] kubernetes mode configuration 58f9f19 [Lee moon soo] add rbac 36cf391 [Lee moon soo] correct plugin name 52bb6c7 [Lee moon soo] add k8s dir in package 5f602a6 [Lee moon soo] K8sRemoteInterpreterProcess 07489f7 [Lee moon soo] kubectl with exec d2f3d5b [Lee moon soo] add k8s-standard launcher module
- Loading branch information