Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48803][SQL] Throw internal error in Orc(De)serializer to align with ParquetWriteSupport #47208

Closed
wants to merge 1 commit into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jul 4, 2024

What changes were proposed in this pull request?

Kind of follow-up of #44275, this PR aligned 2 similar code paths with different error messages into one.

24/07/03 16:29:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: [INTERNAL_ERROR] Unsupported data type VarcharType(64). SQLSTATE: XX000
	at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
	at org.apache.spark.SparkException$.internalError(SparkException.scala:96)
org.apache.spark.SparkUnsupportedOperationException: VarcharType(64) is not supported yet.
	at org.apache.spark.sql.errors.QueryExecutionErrors$.dataTypeUnsupportedYetError(QueryExecutionErrors.scala:993)
	at org.apache.spark.sql.execution.datasources.orc.OrcSerializer.newConverter(OrcSerializer.scala:209)
	at org.apache.spark.sql.execution.datasources.orc.OrcSerializer.$anonfun$converters$2(OrcSerializer.scala:35)
	at scala.collection.immutable.List.map(List.scala:247)

Why are the changes needed?

improvement

Does this PR introduce any user-facing change?

No, users shouldn't face such errors in regular cases.

How was this patch tested?

passing existing tests

Was this patch authored or co-authored using generative AI tooling?

no

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you for making these consistent across data sources, @yaooqinn .
Merged to master for Apache Spark 4.0.0-preview2.

@yaooqinn
Copy link
Member Author

yaooqinn commented Jul 9, 2024

Thank you @dongjoon-hyun

@yaooqinn yaooqinn deleted the SPARK-48803 branch July 9, 2024 03:17
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Jul 10, 2024
… with ParquetWriteSupport

### What changes were proposed in this pull request?

Kind of follow-up of apache#44275, this PR aligned 2 similar code paths with different error messages into one.

```java
24/07/03 16:29:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: [INTERNAL_ERROR] Unsupported data type VarcharType(64). SQLSTATE: XX000
	at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
	at org.apache.spark.SparkException$.internalError(SparkException.scala:96)
```

```java
org.apache.spark.SparkUnsupportedOperationException: VarcharType(64) is not supported yet.
	at org.apache.spark.sql.errors.QueryExecutionErrors$.dataTypeUnsupportedYetError(QueryExecutionErrors.scala:993)
	at org.apache.spark.sql.execution.datasources.orc.OrcSerializer.newConverter(OrcSerializer.scala:209)
	at org.apache.spark.sql.execution.datasources.orc.OrcSerializer.$anonfun$converters$2(OrcSerializer.scala:35)
	at scala.collection.immutable.List.map(List.scala:247)
```

### Why are the changes needed?

improvement

### Does this PR introduce _any_ user-facing change?

No, users shouldn't face such errors in regular cases.
### How was this patch tested?

passing existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#47208 from yaooqinn/SPARK-48803.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
… with ParquetWriteSupport

### What changes were proposed in this pull request?

Kind of follow-up of apache#44275, this PR aligned 2 similar code paths with different error messages into one.

```java
24/07/03 16:29:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: [INTERNAL_ERROR] Unsupported data type VarcharType(64). SQLSTATE: XX000
	at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
	at org.apache.spark.SparkException$.internalError(SparkException.scala:96)
```

```java
org.apache.spark.SparkUnsupportedOperationException: VarcharType(64) is not supported yet.
	at org.apache.spark.sql.errors.QueryExecutionErrors$.dataTypeUnsupportedYetError(QueryExecutionErrors.scala:993)
	at org.apache.spark.sql.execution.datasources.orc.OrcSerializer.newConverter(OrcSerializer.scala:209)
	at org.apache.spark.sql.execution.datasources.orc.OrcSerializer.$anonfun$converters$2(OrcSerializer.scala:35)
	at scala.collection.immutable.List.map(List.scala:247)
```

### Why are the changes needed?

improvement

### Does this PR introduce _any_ user-facing change?

No, users shouldn't face such errors in regular cases.
### How was this patch tested?

passing existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#47208 from yaooqinn/SPARK-48803.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
MaxGekk pushed a commit that referenced this pull request Aug 16, 2024
…TEMP_2088`

### What changes were proposed in this pull request?
The pr is following up  #47208.

### Why are the changes needed?
In this PR #47208, the method `dataTypeUnsupportedYetError` of using `_LEGACY_ERROR_TEMP_2088` has been removed, but the corresponding error condition `_LEGACY_ERROR_TEMP_2088` has not been synchronously deleted, and currently the Spark code repo no longer uses this error condition.
<img width="1022" alt="image" src="https://github.com/user-attachments/assets/19723637-3535-4162-9259-5ddcea0ee579">

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47783 from panbingkun/remove_unused_error_condition.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
…TEMP_2088`

### What changes were proposed in this pull request?
The pr is following up  apache#47208.

### Why are the changes needed?
In this PR apache#47208, the method `dataTypeUnsupportedYetError` of using `_LEGACY_ERROR_TEMP_2088` has been removed, but the corresponding error condition `_LEGACY_ERROR_TEMP_2088` has not been synchronously deleted, and currently the Spark code repo no longer uses this error condition.
<img width="1022" alt="image" src="https://github.com/user-attachments/assets/19723637-3535-4162-9259-5ddcea0ee579">

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47783 from panbingkun/remove_unused_error_condition.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants