-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all performance PRs #17919
Conversation
Test build #76665 has finished for PR 17919 at commit
|
cc @mpjlu @jkbradley |
Thanks, I am ok for this change. |
taking a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just 1 question/comment
Thanks for doing this!
@@ -451,6 +439,8 @@ class ALSModel private[ml] ( | |||
@Since("1.6.0") | |||
object ALSModel extends MLReadable[ALSModel] { | |||
|
|||
@transient private[recommendation] val _f2jBLAS = new F2jBLAS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this require significant initialization? You could use org.apache.spark.ml.linalg.BLAS.f2jBLAS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more or less than using ml.linalg.BLAS
- I did think of that but the var
needs to be exposed as private[ml]
. If we're ok with that then it'll be slightly cleaner to use that, yes.
Just decided to use |
Test build #76792 has finished for PR 17919 at commit
|
Jenkins retest this please |
Test build #76801 has finished for PR 17919 at commit
|
Merged to master/branch-2.2 |
Small clean ups from #17742 and #17845. ## How was this patch tested? Existing unit tests. Author: Nick Pentreath <[email protected]> Closes #17919 from MLnick/SPARK-20677-als-perf-followup. (cherry picked from commit 25b4f41) Signed-off-by: Nick Pentreath <[email protected]>
Hi, @MLnick, We find that just do repartition for userFeatures and productFeatures can improve the efficiency significantly on the ALS recommendForAll(). Here is our procedure:
Firstly, when you submit spark mission with "spark.default.parallelism=x" the stage for recommendForAll will be splited the number of x^2 tasks, due to the partition number of userFeatures is equal to x and productFeatures number is equal to x. This is not reasonable. Too much network I/O operation to finish the stage. Secondly, submitting spark mission with "spark.dynamicAllocation.enabled=true" may cause data uneven distribution on executors. We found that some executors may take n GB data(who start early), but others may just take m MB data(who start later). This may cause a few executors execute tasks slowly with high GC or crash by OOM. We did some test to repartition on the userFeatures and productFeatures. Here is it. case 1: case 2: Note that the partition number of userFeatures and productFeatures may be different. Above test based on the fix #17742 and #17845. We strongly suggest that provide interface to user to have a chance to do re-partition for 2 kinds of features. Thanks Here is the patch for mllib, with 2 new public function of MatrixFactorizationModel diff --git a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
require(rank > 0)
|
Hi @auskalia , you are right. repartition can improve the performance of recommendForAll. |
Hi @mpjlu , your are right. But I consider that sometimes we have to use several spark mission to finish our work, especially the resource is insufficient in hadoop cluster. Due to save and reload file in different mission is a common method for engineering application. So I recommend to export an interface to try do repartition features for client. |
Small clean ups from apache#17742 and apache#17845. ## How was this patch tested? Existing unit tests. Author: Nick Pentreath <[email protected]> Closes apache#17919 from MLnick/SPARK-20677-als-perf-followup.
Small clean ups from apache#17742 and apache#17845. ## How was this patch tested? Existing unit tests. Author: Nick Pentreath <[email protected]> Closes apache#17919 from MLnick/SPARK-20677-als-perf-followup.
Small clean ups from #17742 and #17845.
How was this patch tested?
Existing unit tests.