Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZEPPELIN-2898. Support Yarn-Cluster for Spark Interpreter #2577

Closed
wants to merge 1 commit into from

Conversation

zjffdu
Copy link
Contributor

@zjffdu zjffdu commented Sep 9, 2017

What is this PR for?

This is the first version for supporting yarn-cluster of SparkInterpreter. I just delegate all the function to spark-submit as yarn-cluster is natively supported by spark, we don't need to reinvent the wheel. But there's still improvement to be done in future, e.g. I put some spark specific logic in InterpreterSetting which is not a good practise. I plan to improve it when I refactor the Interpreter class (ZEPPELIN-2685).

Besides that, I also add MiniHadoopCluster & MiniZeppelin which help for the integration test of yarn-client & yarn-cluster mode, otherwise I have to manually verify yarn-client & yarn-cluster mode which would easily cause regression issue in future.

To be noticed:

  • SPARK_HOME must be specified for yarn-cluster mode
  • HADOOP_CONF_DIR must be specified for yarn-cluster mode

What type of PR is it?

[Feature]

Todos

  • - Task

What is the Jira issue?

https://github.com/zjffdu/zeppelin/tree/ZEPPELIN-2898

How should this be tested?

System test is added in SparkInterpreterIT.

Questions:

  • Does the licenses files need update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

@zjffdu
Copy link
Contributor Author

zjffdu commented Sep 9, 2017

@Leemoonsoo @jongyoul Could you help review ?

//Only one of py4j-0.9-src.zip and py4j-0.8.2.1-src.zip should exist
//TODO(zjffdu), this is not maintainable when new version is added.
String[] pythonLibs = new String[]{"pyspark.zip", "py4j-0.9-src.zip", "py4j-0.8.2.1-src.zip",
"py4j-0.10.1-src.zip", "py4j-0.10.3-src.zip", "py4j-0.10.4-src.zip"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we really need to fix this....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wouldn't have thought it used for a long time...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this is on my plan for the next step.

@zjffdu zjffdu force-pushed the ZEPPELIN-2898 branch 21 times, most recently from b116cd7 to 3f34f8c Compare September 13, 2017 03:06
@zjffdu
Copy link
Contributor Author

zjffdu commented Sep 13, 2017

@Leemoonsoo @jongyoul As I mentioned in the PR description, this is not a perfect PR. If you don't have any more comments, I will merge it and continue the next PR to improve it. Thanks

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you document more of MiniHadoopCluster, MiniZeppelin?

@@ -0,0 +1,23 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to release with a new .properties file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the default log4j.properties use DailyRollingFileAppender, this is not suitable for yarn container.

additionalPythonPath = additionalPythonPath + ":" + py4jLibPath;
} else {
additionalPythonPath = py4jLibPath;
if (addBuiltinPy4j) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit addBuiltinPy4j is a bit confusing? useBuiltinPy4j? addBuiltinPy4jPath?

Map<String, String> envs = EnvironmentUtils.getProcEnvironment();
if (envs.containsKey("PYTHONPATH")) {
if (envs.containsKey("PYTHONPATH") && additionalPythonPath != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|| instead? if PYTHONPATH is not set but additionalPythonPath is, don't we want to set it?

if (py4j.length == 0) {
throw new RuntimeException("No py4j files found under " + sparkHome + "/python/lib");
} else if (py4j.length > 1) {
throw new RuntimeException("Multiple py4j files found under " + sparkHome + "/python/lib");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One wonder if this might be a bit too restrictive ... I for one have multiple py4j versions and Spark always picks the right one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because spark hard code each py4j version, but zeppelin needs to work for multiple versions of spark.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but it might break people with existing setup...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ? If there's multiple version of py4j under SPARK_HOME/python/lib or ZEPPELIN_HOME/interpreter/spark/pyspark, it must be something wrong with user's enviroment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, maybe it is an edge case.
do we need to force the logic here? is it possible to call say, sbin/spark-config.sh?

Copy link
Contributor Author

@zjffdu zjffdu Sep 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sbin/spark-config.sh only affect pyspark shell. Spark also hard code py4j version in scala code when it needs to detect the py4j and distribute to yarn containers. . See https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala#L35

return env;
}

private void setupPropertiesForPySpark(Properties sparkProperties, String sparkMaster) {
if (sparkMaster.startsWith("yarn")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use isYarnMode()?

}
if (key.equals("master")) {
sparkMaster = property.getProperty("master");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this do the same as the other code (or perhaps re-use as a method) - fall back to get spark.master?


setupPropertiesForPySpark(sparkProperties, sparkMaster);
setupPropertiesForSparkR(sparkProperties, sparkMaster, property.getProperty("SPARK_HOME"));
if (sparkMaster != null && sparkMaster.equals("yarn-cluster")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this match isYarnMode()?

if (sparkMaster != null) {
sparkConfBuilder.append(" --master " + sparkMaster);
}
if (sparkMaster.equals("yarn-cluster")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

@zjffdu zjffdu Sep 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this only works for yarn-cluster mode, not yarn-client mode. yarn-client mode still use the default log4j.properties, because driver runs in the zeppelin host

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, my point is we are not handling yarn cluster mode consistently.
yarn cluster mode can set by master=yarn, deployMode=cluster or
master=yarn-cluster (deprecated)
we have both in this PR.. either we should support both, or the supported not deprecated way, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I was planning to fix it in the next PR. This is a messy. Let me fix it in this PR.

if (sparkRPath.exists() && sparkRPath.isFile()) {
mergeSparkProperty(sparkProperties, "spark.yarn.dist.archives", sparkRPath.getAbsolutePath());
} else {
LOGGER.warn("sparkr.zip is not found, sparkr may not work.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sparkr may not work => SparkR may not work

@zjffdu zjffdu force-pushed the ZEPPELIN-2898 branch 2 times, most recently from 42beef2 to ed04545 Compare September 15, 2017 05:40
@zjffdu
Copy link
Contributor Author

zjffdu commented Sep 18, 2017

@Leemoonsoo @jongyoul @felixcheung Any more comments ?

@jongyoul
Copy link
Member

I don't

@zjffdu
Copy link
Contributor Author

zjffdu commented Sep 18, 2017

Thanks, I will merge it to continue the next follow up PR.

@asfgit asfgit closed this in 5d71510 Sep 18, 2017
@felixcheung
Copy link
Member

hey @zjffdu how are we tracking the followup tasks, if there is any?
for example, it'd good to have documentation on how to run this yarn cluster mode in spark.md

@zjffdu
Copy link
Contributor Author

zjffdu commented Sep 19, 2017

@felixcheung I will do the follow up in https://issues.apache.org/jira/browse/ZEPPELIN-2685

asfgit pushed a commit that referenced this pull request Oct 14, 2017
### What is this PR for?
Follow up of #2577. Main changes on Interpreter
* Add throw `InterpreterException` which is checked exception for the abstract methods of `Interpreter`, this would enforce the interpreter implementation to throw `InterpreterException`.
* field name refactoring.

     * `property` -> `properties`
     * `getProperty()` --> `getProperties()`
* Introduce launcher layer for interpreter launching. Currently we only use shell script to launch interpreter, but it could be any other service or component to launch interpreter, such as livy server , other 3rd party tools or even we may create a separate module for interpreter launcher

     * abstract cass `InterpreterLauncher`
     * For now, only 2 implementation: `ShellScriptLauncher` & `SparkInterpreterLauncher`. We could add method in class `Interpreter` to allow interpreter to specify its own launcher class, but it could be future work.

### What type of PR is it?
[Improvement | Refactoring]

### Todos
* [ ] - Task

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-2685

### How should this be tested?
Unit test is covered. `ShellScriptLauncherTest` & `SparkInterpreterLauncherTest`

### Screenshots (if appropriate)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jeff Zhang <[email protected]>

Closes #2592 from zjffdu/ZEPPELIN-2685 and squashes the following commits:

17dc2f1 [Jeff Zhang] address comments
e545cc3 [Jeff Zhang] ZEPPELIN-2685. Improvement on Interpreter class
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants