Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48817][SQL] Eagerly execute union multi commands together #47224

Closed
wants to merge 2 commits into from

Conversation

wForget
Copy link
Member

@wForget wForget commented Jul 5, 2024

What changes were proposed in this pull request?

Eagerly execute union multi commands together.

Why are the changes needed?

MultiInsert is split to multiple sql executions, resulting in no exchange reuse.

Reproduce sql:

create table wangzhen_t1(c1 int);
create table wangzhen_t2(c1 int);
create table wangzhen_t3(c1 int);
insert into wangzhen_t1 values (1), (2), (3);

from (select /*+ REPARTITION(3) */ c1 from wangzhen_t1)
insert overwrite table wangzhen_t2 select c1
insert overwrite table wangzhen_t3 select c1; 

In Spark 3.1, there is only one SQL execution and there is a reuse exchange.

image

However, in Spark 3.5, it was split to multiple executions and there was no ReuseExchange.

image
image

Does this PR introduce any user-facing change?

yes, multi inserts will executed in one execution.

How was this patch tested?

added unit test

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Jul 5, 2024
@wForget
Copy link
Member Author

wForget commented Jul 5, 2024

It seems to be caused by #32513

@wForget
Copy link
Member Author

wForget commented Jul 5, 2024

@cloud-fan @beliefer Could you please take a look?

Copy link
Contributor

@ulysses-you ulysses-you left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm except some minor comments

@ulysses-you
Copy link
Contributor

thanks, merged to master

@cloud-fan
Copy link
Contributor

late LGTM

biruktesf-db pushed a commit to biruktesf-db/spark that referenced this pull request Jul 11, 2024
### What changes were proposed in this pull request?

Eagerly execute union multi commands together.

### Why are the changes needed?
MultiInsert is split to multiple sql executions, resulting in no exchange reuse.

Reproduce sql:

```
create table wangzhen_t1(c1 int);
create table wangzhen_t2(c1 int);
create table wangzhen_t3(c1 int);
insert into wangzhen_t1 values (1), (2), (3);

from (select /*+ REPARTITION(3) */ c1 from wangzhen_t1)
insert overwrite table wangzhen_t2 select c1
insert overwrite table wangzhen_t3 select c1;
```

In Spark 3.1, there is only one SQL execution and there is a reuse exchange.

![image](https://github.com/apache/spark/assets/17894939/5ff68392-aaa8-4e6b-8cac-1687880796b9)

However, in Spark 3.5, it was split to multiple executions and there was no ReuseExchange.

![image](https://github.com/apache/spark/assets/17894939/afdb14b6-5007-4923-802d-535149974ecf)
![image](https://github.com/apache/spark/assets/17894939/0d60e8db-9da7-4906-8d07-2b622b55e6ab)

### Does this PR introduce _any_ user-facing change?

yes,  multi  inserts will executed in one execution.

### How was this patch tested?

added unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47224 from wForget/SPARK-48817.

Authored-by: wforget <[email protected]>
Signed-off-by: youxiduo <[email protected]>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
### What changes were proposed in this pull request?

Eagerly execute union multi commands together.

### Why are the changes needed?
MultiInsert is split to multiple sql executions, resulting in no exchange reuse.

Reproduce sql:

```
create table wangzhen_t1(c1 int);
create table wangzhen_t2(c1 int);
create table wangzhen_t3(c1 int);
insert into wangzhen_t1 values (1), (2), (3);

from (select /*+ REPARTITION(3) */ c1 from wangzhen_t1)
insert overwrite table wangzhen_t2 select c1
insert overwrite table wangzhen_t3 select c1;
```

In Spark 3.1, there is only one SQL execution and there is a reuse exchange.

![image](https://github.com/apache/spark/assets/17894939/5ff68392-aaa8-4e6b-8cac-1687880796b9)

However, in Spark 3.5, it was split to multiple executions and there was no ReuseExchange.

![image](https://github.com/apache/spark/assets/17894939/afdb14b6-5007-4923-802d-535149974ecf)
![image](https://github.com/apache/spark/assets/17894939/0d60e8db-9da7-4906-8d07-2b622b55e6ab)

### Does this PR introduce _any_ user-facing change?

yes,  multi  inserts will executed in one execution.

### How was this patch tested?

added unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47224 from wForget/SPARK-48817.

Authored-by: wforget <[email protected]>
Signed-off-by: youxiduo <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?

Eagerly execute union multi commands together.

### Why are the changes needed?
MultiInsert is split to multiple sql executions, resulting in no exchange reuse.

Reproduce sql:

```
create table wangzhen_t1(c1 int);
create table wangzhen_t2(c1 int);
create table wangzhen_t3(c1 int);
insert into wangzhen_t1 values (1), (2), (3);

from (select /*+ REPARTITION(3) */ c1 from wangzhen_t1)
insert overwrite table wangzhen_t2 select c1
insert overwrite table wangzhen_t3 select c1;
```

In Spark 3.1, there is only one SQL execution and there is a reuse exchange.

![image](https://github.com/apache/spark/assets/17894939/5ff68392-aaa8-4e6b-8cac-1687880796b9)

However, in Spark 3.5, it was split to multiple executions and there was no ReuseExchange.

![image](https://github.com/apache/spark/assets/17894939/afdb14b6-5007-4923-802d-535149974ecf)
![image](https://github.com/apache/spark/assets/17894939/0d60e8db-9da7-4906-8d07-2b622b55e6ab)

### Does this PR introduce _any_ user-facing change?

yes,  multi  inserts will executed in one execution.

### How was this patch tested?

added unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47224 from wForget/SPARK-48817.

Authored-by: wforget <[email protected]>
Signed-off-by: youxiduo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants