Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48773] Document config "spark.default.parallelism" by config builder framework #47171

Closed

Conversation

amaliujia
Copy link
Contributor

What changes were proposed in this pull request?

Document config "spark.default.parallelism". This is Spark used config but not documented by config builder framework. This config is already in spark website: https://spark.apache.org/docs/latest/configuration.html.

Why are the changes needed?

Document Spark's config.

Does this PR introduce any user-facing change?

NO.

How was this patch tested?

N/A

Was this patch authored or co-authored using generative AI tooling?

N/A

@github-actions github-actions bot added the CORE label Jul 2, 2024
@amaliujia amaliujia changed the title [SPARK-48773] Document config "spark.default.parallelism" [SPARK-48773] Document config "spark.default.parallelism" by config builder framework Jul 2, 2024
"For distributed shuffle operations like reduceByKey and join, the largest number of " +
"partitions in a parent RDD. For operations like parallelize with no parent RDDs, " +
"it depends on the cluster manager: Local mode: number of cores on the local machine " +
"Mesos fine grained mode: 8 Others: total number of cores on all executor nodes or 2, " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mesos has been removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -42,6 +42,17 @@ package object config {
private[spark] val SPARK_TASK_PREFIX = "spark.task"
private[spark] val LISTENER_BUS_EVENT_QUEUE_PREFIX = "spark.scheduler.listenerbus.eventqueue"

private[spark] val DEFAULT_PARALLELISM =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace all hardcoded ones with this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

@amaliujia amaliujia Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

I also found PySpark side is using the same string. However I am not sure how PySpark generally deal with configs so leave PySpark side unchanged.

Comment on lines +51 to +52
"it depends on the cluster manager. For example in Local mode, it defaults to the " +
"number of cores on the local machine")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now do not have fine-grained schedulers, both the local and others are using total number of cores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I am not sure. @cloud-fan do you remember?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but we don't need to expose such low-level details to end users.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 896c15e Jul 11, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @amaliujia and all!

jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
…uilder framework

### What changes were proposed in this pull request?

Document config "spark.default.parallelism". This is Spark used config but not documented by config builder framework. This config is already in spark website: https://spark.apache.org/docs/latest/configuration.html.

### Why are the changes needed?

Document Spark's config.

### Does this PR introduce _any_ user-facing change?

NO.

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

N/A

Closes apache#47171 from amaliujia/document_spark_default_paramllel.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…uilder framework

### What changes were proposed in this pull request?

Document config "spark.default.parallelism". This is Spark used config but not documented by config builder framework. This config is already in spark website: https://spark.apache.org/docs/latest/configuration.html.

### Why are the changes needed?

Document Spark's config.

### Does this PR introduce _any_ user-facing change?

NO.

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

N/A

Closes apache#47171 from amaliujia/document_spark_default_paramllel.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants