Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert nth_value to UDAF #11287

Merged
merged 38 commits into from
Jul 8, 2024
Merged

Convert nth_value to UDAF #11287

merged 38 commits into from
Jul 8, 2024

Conversation

jcsherin
Copy link
Contributor

@jcsherin jcsherin commented Jul 5, 2024

Which issue does this PR close?

Closes #11284.

Rationale for this change

What changes are included in this PR?

  • Converts nth_value to UDAF
  • Expression can be reversed so that N will be from the end instead of the start
  • When nth_value is used in window aggregations the existing builtin function is used instead of this UDAF. An explicit bypass is done where the logical plan nodes are created. This is similar to how it works for first_value and last_value.

Are these changes tested?

  • Test in roundtrip_physical_plan is updated
  • Use of lowercase naming in sqllogictest plans

Are there any user-facing changes?

Pending methods are:
- `accumulator`
- `state_fields`
- `reverse_expr`
This error message is manually formatted to remain consistent with
existing error statements. It is not formatted by running:
```
cargo test -p datafusion-sqllogictest --test sqllogictests errors -- --complete
```
This reverts commit 1913efda49e585816286b54b371d4166ac894d1f.
This reverts commit 4bb204f6ec8028c4e3313db5af3fabfcdaf7fea8.
This reverts commit fa61ce62dcae6eae6f8e9c9900ebf8cff5023bc0.
@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Jul 5, 2024
self.nullable, // This should be the same as field()
format_state_name(self.name(), "nth_value"),
Field::new("item", args.input_type.clone(), true),
false,
Copy link
Contributor Author

@jcsherin jcsherin Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should nullable be configurable? But it is unavailable in StateFieldArgs. I think it is related to #11274 and #11094.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think given the existing nth function, we should let nullable configurable. And, the nullability is actually for the list element. We should add nullable in StateFieldArgs.

let mut fields = vec![Field::new_list(
            format_state_name(self.name(), "nth_value"),
            Field::new("item", args.input_type.clone(), self.nullable),
            false)]

@eejbyfeldt is working on it in #11063

Copy link
Contributor Author

@jcsherin jcsherin Jul 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked it as a TODO in comments so that it can be completed once it is added to StateFieldArgs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on #11299, I think we will have non-null for nth_value too (similar to first value).

@jcsherin jcsherin changed the title Convert udaf nth value Convert nth_value to UDAF Jul 5, 2024
self.nullable, // This should be the same as field()
format_state_name(self.name(), "nth_value"),
Field::new("item", args.input_type.clone(), true),
false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think given the existing nth function, we should let nullable configurable. And, the nullability is actually for the list element. We should add nullable in StateFieldArgs.

let mut fields = vec![Field::new_list(
            format_state_name(self.name(), "nth_value"),
            Field::new("item", args.input_type.clone(), self.nullable),
            false)]

@eejbyfeldt is working on it in #11063

datafusion/proto/tests/cases/roundtrip_physical_plan.rs Outdated Show resolved Hide resolved
datafusion/proto/tests/cases/roundtrip_physical_plan.rs Outdated Show resolved Hide resolved
datafusion/sql/src/expr/function.rs Show resolved Hide resolved
datafusion/functions-aggregate/src/nth_value.rs Outdated Show resolved Hide resolved
@jcsherin
Copy link
Contributor Author

jcsherin commented Jul 6, 2024

Summary:

  • Marked as TODO: make the nullability of list field element configurable. This can be completed in a follow up PR after Add input_nullable for UDAF args StateField and Accumulator #11063.
  • Extract merge_ordered_arrays to physical-expr-common. It is required by both nth_value and array_agg_ordered. This can be moved back into functions-aggregate after the remaining functions are converted.
  • Simplify argument by using nth_value_udaf in roundtrip_physical_plan.

@jayzhan211 Thank you for the code review and providing context. It was very helpful.

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too -- thank you @jcsherin and @jayzhan211

cc @mustafasrepo

@@ -39,8 +39,6 @@ pub enum AggregateFunction {
Max,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this list is quite close to empty 🤞

ordering_req: LexOrdering,
signature: Signature,
/// Determines whether `N` is relative to the beginning or the end
/// of the aggregation. When set to `true`, then `N` is from the end.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb
Copy link
Contributor

alamb commented Jul 8, 2024

I merged this PR up from main to ensure all the CI still passes and once they do I plan to merge it

@alamb alamb merged commit 4ac1428 into apache:main Jul 8, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 8, 2024

🚀

@alamb
Copy link
Contributor

alamb commented Jul 8, 2024

Thanks again @jayzhan211 and @jcsherin

@alamb alamb added the api change Changes the API exposed to users of the crate label Jul 9, 2024
@jcsherin jcsherin deleted the convert-udaf-nth-value branch July 11, 2024 18:32
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* Copies `NthValueAccumulator` to `functions-aggregate`

* Partial implementation of `AggregateUDFImpl`

Pending methods are:
- `accumulator`
- `state_fields`
- `reverse_expr`

* Implements `accumulator` method

* Retains existing comments verbatim

* Removes unnecessary path prefix

* Implements `reverse_expr` method

* Adds `nullable` field to `NthValue`

* Revert to existing name

* Implements `state_fields` method

* Removes `nth_value` from `physical-expr`

* Adds default

* Exports `nth_value`

* Fixes build error in physical plan roundtrip test

* Minor: formatting

* Parses `N` from input expression

* Fixes build error by using `nth_value_udaf`

* Fixes `reverse_expr` by passing correct `N`

* Update plan with lowercase UDF name

* Updates error message for incorrect no. of arguments

This error message is manually formatted to remain consistent with
existing error statements. It is not formatted by running:
```
cargo test -p datafusion-sqllogictest --test sqllogictests errors -- --complete
```

* Fixes nullable "item" in `state_fields`

* Minor: fix formatting after resolving conflicts

* Updates multiple existing plans with lowercase name

* Implements `retract_batch` for window aggregations

* Fixes: regex mismatch for error message in CI

* Revert "Updates multiple existing plans with lowercase name"

This reverts commit 1913efda49e585816286b54b371d4166ac894d1f.

* Revert "Implements `retract_batch` for window aggregations"

This reverts commit 4bb204f6ec8028c4e3313db5af3fabfcdaf7fea8.

* Fixes: use builtin window function instead of udaf

* Revert "Updates error message for incorrect no. of arguments"

This reverts commit fa61ce62dcae6eae6f8e9c9900ebf8cff5023bc0.

* Refactor: renames field and method

* Removes hack for nullability

* Minor: refactors `reverse_expr`

* Minor: removes unncessary path prefix

* Minor: cleanup arguments for creating aggregate expr

* Refactor: extracts `merge_ordered_arrays` to `physical-expr-common`

* Minor: adds todo for configuring nullability

* Retrigger CI

---------

Co-authored-by: Andrew Lamb <[email protected]>
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 18, 2024
* Copies `NthValueAccumulator` to `functions-aggregate`

* Partial implementation of `AggregateUDFImpl`

Pending methods are:
- `accumulator`
- `state_fields`
- `reverse_expr`

* Implements `accumulator` method

* Retains existing comments verbatim

* Removes unnecessary path prefix

* Implements `reverse_expr` method

* Adds `nullable` field to `NthValue`

* Revert to existing name

* Implements `state_fields` method

* Removes `nth_value` from `physical-expr`

* Adds default

* Exports `nth_value`

* Fixes build error in physical plan roundtrip test

* Minor: formatting

* Parses `N` from input expression

* Fixes build error by using `nth_value_udaf`

* Fixes `reverse_expr` by passing correct `N`

* Update plan with lowercase UDF name

* Updates error message for incorrect no. of arguments

This error message is manually formatted to remain consistent with
existing error statements. It is not formatted by running:
```
cargo test -p datafusion-sqllogictest --test sqllogictests errors -- --complete
```

* Fixes nullable "item" in `state_fields`

* Minor: fix formatting after resolving conflicts

* Updates multiple existing plans with lowercase name

* Implements `retract_batch` for window aggregations

* Fixes: regex mismatch for error message in CI

* Revert "Updates multiple existing plans with lowercase name"

This reverts commit 1913efda49e585816286b54b371d4166ac894d1f.

* Revert "Implements `retract_batch` for window aggregations"

This reverts commit 4bb204f6ec8028c4e3313db5af3fabfcdaf7fea8.

* Fixes: use builtin window function instead of udaf

* Revert "Updates error message for incorrect no. of arguments"

This reverts commit fa61ce62dcae6eae6f8e9c9900ebf8cff5023bc0.

* Refactor: renames field and method

* Removes hack for nullability

* Minor: refactors `reverse_expr`

* Minor: removes unncessary path prefix

* Minor: cleanup arguments for creating aggregate expr

* Refactor: extracts `merge_ordered_arrays` to `physical-expr-common`

* Minor: adds todo for configuring nullability

* Retrigger CI

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate logical-expr Logical plan and expressions physical-expr Physical Expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert NthValue to UDAF
3 participants