ZEPPELIN-2898. Support Yarn-Cluster for Spark Interpreter #2577

zjffdu · 2017-09-09T13:58:12Z

What is this PR for?

This is the first version for supporting yarn-cluster of SparkInterpreter. I just delegate all the function to spark-submit as yarn-cluster is natively supported by spark, we don't need to reinvent the wheel. But there's still improvement to be done in future, e.g. I put some spark specific logic in InterpreterSetting which is not a good practise. I plan to improve it when I refactor the Interpreter class (ZEPPELIN-2685).

Besides that, I also add MiniHadoopCluster & MiniZeppelin which help for the integration test of yarn-client & yarn-cluster mode, otherwise I have to manually verify yarn-client & yarn-cluster mode which would easily cause regression issue in future.

To be noticed:

SPARK_HOME must be specified for yarn-cluster mode
HADOOP_CONF_DIR must be specified for yarn-cluster mode

What type of PR is it?

[Feature]

Todos

- Task

What is the Jira issue?

https://github.com/zjffdu/zeppelin/tree/ZEPPELIN-2898

How should this be tested?

System test is added in SparkInterpreterIT.

Questions:

Does the licenses files need update? No
Is there breaking changes for older versions? No
Does this needs documentation? No

zjffdu · 2017-09-09T22:50:19Z

@Leemoonsoo @jongyoul Could you help review ?

felixcheung · 2017-09-09T23:59:23Z

zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/InterpreterSetting.java

+    //Only one of py4j-0.9-src.zip and py4j-0.8.2.1-src.zip should exist
+    //TODO(zjffdu), this is not maintainable when new version is added.
+    String[] pythonLibs = new String[]{"pyspark.zip", "py4j-0.9-src.zip", "py4j-0.8.2.1-src.zip",
+      "py4j-0.10.1-src.zip", "py4j-0.10.3-src.zip", "py4j-0.10.4-src.zip"};


we really need to fix this....

Yes, I wouldn't have thought it used for a long time...

Sure, this is on my plan for the next step.

zjffdu · 2017-09-13T03:11:27Z

@Leemoonsoo @jongyoul As I mentioned in the PR description, this is not a perfect PR. If you don't have any more comments, I will merge it and continue the next PR to improve it. Thanks

felixcheung

can you document more of MiniHadoopCluster, MiniZeppelin?

felixcheung · 2017-09-15T04:26:41Z

conf/log4j_yarn_cluster.properties

@@ -0,0 +1,23 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more


why do we need to release with a new .properties file?

Because the default log4j.properties use DailyRollingFileAppender, this is not suitable for yarn container.

felixcheung · 2017-09-15T04:28:06Z

python/src/main/java/org/apache/zeppelin/python/IPythonInterpreter.java

-      additionalPythonPath = additionalPythonPath + ":" + py4jLibPath;
-    } else {
-      additionalPythonPath = py4jLibPath;
+    if (addBuiltinPy4j) {


nit addBuiltinPy4j is a bit confusing? useBuiltinPy4j? addBuiltinPy4jPath?

felixcheung · 2017-09-15T04:30:08Z

python/src/main/java/org/apache/zeppelin/python/IPythonInterpreter.java

    Map<String, String> envs = EnvironmentUtils.getProcEnvironment();
-    if (envs.containsKey("PYTHONPATH")) {
+    if (envs.containsKey("PYTHONPATH") && additionalPythonPath != null) {


|| instead? if PYTHONPATH is not set but additionalPythonPath is, don't we want to set it?

felixcheung · 2017-09-15T04:33:12Z

spark/src/main/java/org/apache/zeppelin/spark/PythonUtils.java

+      if (py4j.length == 0) {
+        throw new RuntimeException("No py4j files found under " + sparkHome + "/python/lib");
+      } else if (py4j.length > 1) {
+        throw new RuntimeException("Multiple py4j files found under " + sparkHome + "/python/lib");


One wonder if this might be a bit too restrictive ... I for one have multiple py4j versions and Spark always picks the right one

Because spark hard code each py4j version, but zeppelin needs to work for multiple versions of spark.

sure, but it might break people with existing setup...

Why ? If there's multiple version of py4j under SPARK_HOME/python/lib or ZEPPELIN_HOME/interpreter/spark/pyspark, it must be something wrong with user's enviroment.

maybe, maybe it is an edge case.
do we need to force the logic here? is it possible to call say, sbin/spark-config.sh?

I think sbin/spark-config.sh only affect pyspark shell. Spark also hard code py4j version in scala code when it needs to detect the py4j and distribute to yarn containers. . See https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala#L35

felixcheung · 2017-09-15T04:35:04Z

zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/InterpreterSetting.java

    return env;
  }

+  private void setupPropertiesForPySpark(Properties sparkProperties, String sparkMaster) {
+    if (sparkMaster.startsWith("yarn")) {


use isYarnMode()?

felixcheung · 2017-09-15T04:40:07Z

zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/InterpreterSetting.java

      }
+      if (key.equals("master")) {
+        sparkMaster = property.getProperty("master");


should this do the same as the other code (or perhaps re-use as a method) - fall back to get spark.master?

felixcheung · 2017-09-15T04:40:45Z

zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/InterpreterSetting.java

+
+    setupPropertiesForPySpark(sparkProperties, sparkMaster);
+    setupPropertiesForSparkR(sparkProperties, sparkMaster, property.getProperty("SPARK_HOME"));
+    if (sparkMaster != null && sparkMaster.equals("yarn-cluster")) {


shouldn't this match isYarnMode()?

felixcheung · 2017-09-15T04:40:56Z

zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/InterpreterSetting.java

+    if (sparkMaster != null) {
+      sparkConfBuilder.append(" --master " + sparkMaster);
+    }
+    if (sparkMaster.equals("yarn-cluster")) {


No, this only works for yarn-cluster mode, not yarn-client mode. yarn-client mode still use the default log4j.properties, because driver runs in the zeppelin host

well, my point is we are not handling yarn cluster mode consistently.
yarn cluster mode can set by master=yarn, deployMode=cluster or
master=yarn-cluster (deprecated)
we have both in this PR.. either we should support both, or the supported not deprecated way, I think

Right, I was planning to fix it in the next PR. This is a messy. Let me fix it in this PR.

felixcheung · 2017-09-15T04:41:41Z

zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/InterpreterSetting.java

+    if (sparkRPath.exists() && sparkRPath.isFile()) {
+      mergeSparkProperty(sparkProperties, "spark.yarn.dist.archives", sparkRPath.getAbsolutePath());
+    } else {
+      LOGGER.warn("sparkr.zip is not found, sparkr may not work.");


sparkr may not work => SparkR may not work

zjffdu · 2017-09-18T13:26:53Z

@Leemoonsoo @jongyoul @felixcheung Any more comments ?

jongyoul · 2017-09-18T13:28:02Z

I don't

zjffdu · 2017-09-18T22:16:31Z

Thanks, I will merge it to continue the next follow up PR.

felixcheung · 2017-09-19T00:51:52Z

hey @zjffdu how are we tracking the followup tasks, if there is any?
for example, it'd good to have documentation on how to run this yarn cluster mode in spark.md

zjffdu · 2017-09-19T00:58:12Z

@felixcheung I will do the follow up in https://issues.apache.org/jira/browse/ZEPPELIN-2685

### What is this PR for? Follow up of #2577. Main changes on Interpreter * Add throw `InterpreterException` which is checked exception for the abstract methods of `Interpreter`, this would enforce the interpreter implementation to throw `InterpreterException`. * field name refactoring. * `property` -> `properties` * `getProperty()` --> `getProperties()` * Introduce launcher layer for interpreter launching. Currently we only use shell script to launch interpreter, but it could be any other service or component to launch interpreter, such as livy server , other 3rd party tools or even we may create a separate module for interpreter launcher * abstract cass `InterpreterLauncher` * For now, only 2 implementation: `ShellScriptLauncher` & `SparkInterpreterLauncher`. We could add method in class `Interpreter` to allow interpreter to specify its own launcher class, but it could be future work. ### What type of PR is it? [Improvement | Refactoring] ### Todos * [ ] - Task ### What is the Jira issue? * https://issues.apache.org/jira/browse/ZEPPELIN-2685 ### How should this be tested? Unit test is covered. `ShellScriptLauncherTest` & `SparkInterpreterLauncherTest` ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Jeff Zhang <[email protected]> Closes #2592 from zjffdu/ZEPPELIN-2685 and squashes the following commits: 17dc2f1 [Jeff Zhang] address comments e545cc3 [Jeff Zhang] ZEPPELIN-2685. Improvement on Interpreter class

felixcheung reviewed Sep 9, 2017

View reviewed changes

zjffdu force-pushed the ZEPPELIN-2898 branch 21 times, most recently from b116cd7 to 3f34f8c Compare September 13, 2017 03:06

zjffdu force-pushed the ZEPPELIN-2898 branch from 3f34f8c to 91a9b47 Compare September 14, 2017 00:29

felixcheung reviewed Sep 15, 2017

View reviewed changes

zjffdu force-pushed the ZEPPELIN-2898 branch 2 times, most recently from 42beef2 to ed04545 Compare September 15, 2017 05:40

ZEPPELIN-2898. Support Yarn-Cluster for Spark Interpreter

9da7c4b

zjffdu force-pushed the ZEPPELIN-2898 branch from ed04545 to 9da7c4b Compare September 15, 2017 06:06

asfgit closed this in 5d71510 Sep 18, 2017

zjffdu mentioned this pull request Sep 21, 2017

ZEPPELIN-2685. Improvement on Interpreter class #2592

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZEPPELIN-2898. Support Yarn-Cluster for Spark Interpreter #2577

ZEPPELIN-2898. Support Yarn-Cluster for Spark Interpreter #2577

zjffdu commented Sep 9, 2017 •

edited

Loading

zjffdu commented Sep 9, 2017

felixcheung Sep 9, 2017

jongyoul Sep 10, 2017

zjffdu Sep 10, 2017

zjffdu commented Sep 13, 2017

felixcheung left a comment •

edited

Loading

felixcheung Sep 15, 2017

zjffdu Sep 15, 2017

felixcheung Sep 15, 2017

felixcheung Sep 15, 2017

felixcheung Sep 15, 2017

zjffdu Sep 15, 2017

felixcheung Sep 15, 2017

zjffdu Sep 15, 2017

felixcheung Sep 15, 2017

zjffdu Sep 15, 2017 •

edited

Loading

felixcheung Sep 15, 2017

felixcheung Sep 15, 2017

felixcheung Sep 15, 2017

felixcheung Sep 15, 2017

zjffdu Sep 15, 2017 •

edited

Loading

felixcheung Sep 15, 2017

zjffdu Sep 15, 2017

felixcheung Sep 15, 2017

zjffdu commented Sep 18, 2017

jongyoul commented Sep 18, 2017

zjffdu commented Sep 18, 2017

felixcheung commented Sep 19, 2017

zjffdu commented Sep 19, 2017

		@@ -0,0 +1,23 @@
		#
		# Licensed to the Apache Software Foundation (ASF) under one or more

ZEPPELIN-2898. Support Yarn-Cluster for Spark Interpreter #2577

ZEPPELIN-2898. Support Yarn-Cluster for Spark Interpreter #2577

Conversation

zjffdu commented Sep 9, 2017 • edited Loading

What is this PR for?

What type of PR is it?

Todos

What is the Jira issue?

How should this be tested?

Questions:

zjffdu commented Sep 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zjffdu commented Sep 13, 2017

felixcheung left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zjffdu Sep 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zjffdu Sep 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zjffdu commented Sep 18, 2017

jongyoul commented Sep 18, 2017

zjffdu commented Sep 18, 2017

felixcheung commented Sep 19, 2017

zjffdu commented Sep 19, 2017

zjffdu commented Sep 9, 2017 •

edited

Loading

felixcheung left a comment •

edited

Loading

zjffdu Sep 15, 2017 •

edited

Loading

zjffdu Sep 15, 2017 •

edited

Loading