Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48414][PYTHON] Fix breaking change in python's fromJson #46737

Closed

Conversation

stefankandic
Copy link
Contributor

What changes were proposed in this pull request?

Fix breaking change in fromJson method by having default param values.

Why are the changes needed?

In order to not break clients that don't care about collations.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UTs.

Was this patch authored or co-authored using generative AI tooling?

No.

fieldPath: str,
collationsMap: Optional[Dict[str, str]],
fieldPath: str = "",
collationsMap: Optional[Dict[str, str]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensive testing was added in #46280, this change is just so that the API for the fromJson method doesn't change which we missed in the previous PR

fieldPath: str,
collationsMap: Optional[Dict[str, str]],
fieldPath: str = "",
collationsMap: Optional[Dict[str, str]] = None,
) -> "ArrayType":
elementType = _parse_datatype_json_value(
json["elementType"], fieldPath + ".element", collationsMap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it's .element when empty strings are given. Are they expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you are deserializing a schema where an array is on the column a the collationsMap would be {"a.element": collation}. So if you just want to deserialize a map that is not a part of the larger schema the collationsMap should be {".element": collation}

Added tests to verify this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it has to be {"element": collation} and {"key": collation}. .element sounds a bit awkward. While I am fine because this JSON format is supposed to be internal-only, is this backward compatible? E.g., can old JSON data types be read? I have seen many tickets related to this, e.g., #43474

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second thought I agree with you, it definitely makes more sense to special case this so we don't have this weird dangling dot.

This should be completely backwards compatible; if we don't care about collations and don't pass fieldPath and collationsMap everything will be the same as before.

@HyukjinKwon
Copy link
Member

Merged to master.

HyukjinKwon pushed a commit that referenced this pull request Jul 27, 2024
…scala

### What changes were proposed in this pull request?

When deserializing map/array that is not part of the struct field, the key in collation map should just be `{"element": collation}` instead of `{".element": collation}`.

### Why are the changes needed?

To be consistent with the behavior on the pyspark side (#46737).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47497 from stefankandic/complexTypeDeSer.

Authored-by: Stefan Kandic <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
ilicmarkodb pushed a commit to ilicmarkodb/spark that referenced this pull request Jul 29, 2024
### What changes were proposed in this pull request?

Fix breaking change in `fromJson` method by having default param values.

### Why are the changes needed?

In order to not break clients that don't care about collations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46737 from stefankandic/fromJsonBreakingChange.

Authored-by: Stefan Kandic <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
ilicmarkodb pushed a commit to ilicmarkodb/spark that referenced this pull request Jul 29, 2024
…scala

### What changes were proposed in this pull request?

When deserializing map/array that is not part of the struct field, the key in collation map should just be `{"element": collation}` instead of `{".element": collation}`.

### Why are the changes needed?

To be consistent with the behavior on the pyspark side (apache#46737).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47497 from stefankandic/complexTypeDeSer.

Authored-by: Stefan Kandic <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
fusheng-rd pushed a commit to fusheng-rd/spark that referenced this pull request Aug 6, 2024
### What changes were proposed in this pull request?

Fix breaking change in `fromJson` method by having default param values.

### Why are the changes needed?

In order to not break clients that don't care about collations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46737 from stefankandic/fromJsonBreakingChange.

Authored-by: Stefan Kandic <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
fusheng-rd pushed a commit to fusheng-rd/spark that referenced this pull request Aug 6, 2024
…scala

### What changes were proposed in this pull request?

When deserializing map/array that is not part of the struct field, the key in collation map should just be `{"element": collation}` instead of `{".element": collation}`.

### Why are the changes needed?

To be consistent with the behavior on the pyspark side (apache#46737).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47497 from stefankandic/complexTypeDeSer.

Authored-by: Stefan Kandic <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?

Fix breaking change in `fromJson` method by having default param values.

### Why are the changes needed?

In order to not break clients that don't care about collations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46737 from stefankandic/fromJsonBreakingChange.

Authored-by: Stefan Kandic <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…scala

### What changes were proposed in this pull request?

When deserializing map/array that is not part of the struct field, the key in collation map should just be `{"element": collation}` instead of `{".element": collation}`.

### Why are the changes needed?

To be consistent with the behavior on the pyspark side (apache#46737).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47497 from stefankandic/complexTypeDeSer.

Authored-by: Stefan Kandic <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants