[SPARK-48773] Document config "spark.default.parallelism" by config builder framework #47171

amaliujia · 2024-07-02T02:00:23Z

What changes were proposed in this pull request?

Document config "spark.default.parallelism". This is Spark used config but not documented by config builder framework. This config is already in spark website: https://spark.apache.org/docs/latest/configuration.html.

Why are the changes needed?

Document Spark's config.

Does this PR introduce any user-facing change?

NO.

How was this patch tested?

N/A

Was this patch authored or co-authored using generative AI tooling?

N/A

yaooqinn · 2024-07-02T03:00:12Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

+        "For distributed shuffle operations like reduceByKey and join, the largest number of " +
+        "partitions in a parent RDD. For operations like parallelize with no parent RDDs, " +
+        "it depends on the cluster manager: Local mode: number of cores on the local machine " +
+        "Mesos fine grained mode: 8 Others: total number of cores on all executor nodes or 2, " +


mesos has been removed

yaooqinn · 2024-07-02T05:46:26Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

@@ -42,6 +42,17 @@ package object config {
  private[spark] val SPARK_TASK_PREFIX = "spark.task"
  private[spark] val LISTENER_BUS_EVENT_QUEUE_PREFIX = "spark.scheduler.listenerbus.eventqueue"

+  private[spark] val DEFAULT_PARALLELISM =


Can we replace all hardcoded ones with this？

Done.

I also found PySpark side is using the same string. However I am not sure how PySpark generally deal with configs so leave PySpark side unchanged.

core/src/main/scala/org/apache/spark/internal/config/package.scala

…cala Co-authored-by: allisonwang-db <[email protected]>

yaooqinn · 2024-07-03T03:44:41Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

+        "it depends on the cluster manager. For example in Local mode, it defaults to the " +
+        "number of cores on the local machine")


Since we now do not have fine-grained schedulers, both the local and others are using total number of cores?

This is something I am not sure. @cloud-fan do you remember?

I think so, but we don't need to expose such low-level details to end users.

cloud-fan · 2024-07-11T11:04:53Z

thanks, merging to master!

dongjoon-hyun

Thank you, @amaliujia and all!

…uilder framework ### What changes were proposed in this pull request? Document config "spark.default.parallelism". This is Spark used config but not documented by config builder framework. This config is already in spark website: https://spark.apache.org/docs/latest/configuration.html. ### Why are the changes needed? Document Spark's config. ### Does this PR introduce _any_ user-facing change? NO. ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? N/A Closes apache#47171 from amaliujia/document_spark_default_paramllel. Authored-by: Rui Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

amaliujia added 2 commits July 1, 2024 18:57

[SPARK-48773] Document config "spark.default.parallelism".

93cfcfc

update

b7affb9

github-actions bot added the CORE label Jul 2, 2024

amaliujia changed the title ~~[SPARK-48773] Document config "spark.default.parallelism"~~ [SPARK-48773] Document config "spark.default.parallelism" by config builder framework Jul 2, 2024

yaooqinn reviewed Jul 2, 2024

View reviewed changes

update

792fa30

yaooqinn reviewed Jul 2, 2024

View reviewed changes

allisonwang-db reviewed Jul 2, 2024

View reviewed changes

core/src/main/scala/org/apache/spark/internal/config/package.scala Outdated Show resolved Hide resolved

amaliujia and others added 2 commits July 2, 2024 11:26

Update core/src/main/scala/org/apache/spark/internal/config/package.s…

a77ef4a

…cala Co-authored-by: allisonwang-db <[email protected]>

update

d775a26

github-actions bot added SQL GRAPHX labels Jul 2, 2024

amaliujia added 2 commits July 2, 2024 13:20

update

d425dfa

update

2529688

yaooqinn reviewed Jul 3, 2024

View reviewed changes

cloud-fan approved these changes Jul 11, 2024

View reviewed changes

cloud-fan closed this in 896c15e Jul 11, 2024

dongjoon-hyun reviewed Jul 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48773] Document config "spark.default.parallelism" by config builder framework #47171

[SPARK-48773] Document config "spark.default.parallelism" by config builder framework #47171

amaliujia commented Jul 2, 2024

yaooqinn Jul 2, 2024

amaliujia Jul 2, 2024

yaooqinn Jul 2, 2024

cloud-fan Jul 2, 2024

amaliujia Jul 2, 2024 •

edited

Loading

yaooqinn Jul 3, 2024

amaliujia Jul 3, 2024

cloud-fan Jul 8, 2024

cloud-fan commented Jul 11, 2024

dongjoon-hyun left a comment

		"it depends on the cluster manager. For example in Local mode, it defaults to the " +
		"number of cores on the local machine")

[SPARK-48773] Document config "spark.default.parallelism" by config builder framework #47171

[SPARK-48773] Document config "spark.default.parallelism" by config builder framework #47171

Conversation

amaliujia commented Jul 2, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

yaooqinn Jul 2, 2024

Choose a reason for hiding this comment

amaliujia Jul 2, 2024

Choose a reason for hiding this comment

yaooqinn Jul 2, 2024

Choose a reason for hiding this comment

cloud-fan Jul 2, 2024

Choose a reason for hiding this comment

amaliujia Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

yaooqinn Jul 3, 2024

Choose a reason for hiding this comment

amaliujia Jul 3, 2024

Choose a reason for hiding this comment

cloud-fan Jul 8, 2024

Choose a reason for hiding this comment

cloud-fan commented Jul 11, 2024

dongjoon-hyun left a comment

Choose a reason for hiding this comment

amaliujia Jul 2, 2024 •

edited

Loading