Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Minor) Fix in SparkParallelTracker to work with Spark2.3 #3062

Merged
merged 1 commit into from
Jan 25, 2018

Conversation

tomasatdatabricks
Copy link
Contributor

This PR changes how xgboost registers and removes Listeners to SparkContext so that it works on Spark2.3 while it continues to work on versions of Spark >= 2.0.

Spark 2.3 includes changes in LiveListenerBus which are incompatible with xgboost's SparkParallelTracker - the SparkContext::listenerBus member variable changed type and it is no longer possible to register or deregister listeners by directly accessing list of listeners in it. I've changed it so that listener is registered via public method on SparkContext. The public method for removing listener has been added only in Spark2.3 and in order to run on older versions of Spark I still call the method on the listenerBus instance.

This change means xgboost will not compile against/ run on spark version < 2.0.

@superbobry
Copy link
Contributor

I'd say it's OK not to support 1.6.X. What do you think @CodingCat?

@CodingCat
Copy link
Member

we do not support spark 1.x in anyway since we have integrated with Spark 2.x ML pipeline

@codecov-io
Copy link

codecov-io commented Jan 25, 2018

Codecov Report

Merging #3062 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3062      +/-   ##
============================================
+ Coverage     43.72%   43.72%   +<.01%     
  Complexity      228      228              
============================================
  Files           159      159              
  Lines         12351    12352       +1     
  Branches        466      466              
============================================
+ Hits           5400     5401       +1     
  Misses         6759     6759              
  Partials        192      192
Impacted Files Coverage Δ Complexity Δ
...ala/org/apache/spark/SparkParallelismTracker.scala 89.18% <100%> (+0.3%) 9 <1> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f87802f...cacec07. Read the comment docs.

@superbobry
Copy link
Contributor

We'll need to detail this in the docs, I guess.

Thanks @tomasatdatabricks!

@superbobry superbobry merged commit 5ef6846 into dmlc:master Jan 25, 2018
hcho3 pushed a commit that referenced this pull request Aug 13, 2018
This pull request amends the broken #3062 allow Spark 2.2 to work.

Please note this won't work in Spark <=2.1 as sc.removeSparkListener was implemented in Spark 2.2. (So perhaps a more general method is better, although that is what was attempted in #3062)

This PR fixes: #3208, #3151 and the discussion in #1927.

I do find it strange that #3062 dose not work in Spark 2.2, it's probably due to some sort of public/private issue in the org.apache.spark.scheduler.LiveListenerBus class inheritance (In Spark itself). The error is: `java.lang.NoSuchMethodError: org.apache.spark.scheduler.LiveListenerBus.removeListener(Ljava/lang/Object;)V`
@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants