-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor group_split()
to use dplyr_row_slice()
#5167
Conversation
R/group_split.R
Outdated
data <- .tbl | ||
grouped_data <- group_by(.tbl, ...) | ||
|
||
data <- group_split_col_slice(data, grouped_data, keep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function names to more clearly indicate that this is about dropping the grouping variables out of the splits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we're grouping here, can't we just call group_split.grouped_df()
? And then group_split_col_slice
can be inlined into group_split.grouped_df
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just call group_split.grouped_df()
I don't think so. By design, group_split.grouped_df()
coerces the grouped-df to tibble and splits that, so you'd end up with a list of tibbles, rather than a list of the tibble subclasses.
To me it's about gathering two pieces of information: an ungrouped data frame (possibly subclassed) to split, and a grouped data frame holding information about how to split.
With group_split.data.frame()
you have the ungrouped thing you split on, so you have to generate the grouped data.
With group_split.grouped_df()
you have the grouped data, but need the ungrouped thing to split on. The best we can do here is make the grouped data into a bare tibble. This is why Earo will need a group_split.grouped_ts()
method to be able to return a list of tbl_ts.
Once the two pieces of information have been gathered, they share a common implementation in group_split_impl()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense. Thanks!
Closes #5165
group_split()
has been refactored to accomplish a few goals:It now uses
dplyr_row_slice()
to split with. Previously this wasvec_slice()
.There was a case where we used
[, i]
rather than 1D[i]
whenkeep = FALSE
. This has been fixed.group-splitting a rowwise data frame used to return a list-of rowwise data frames. I believe this is inconsistent, and it should just return a list of tibbles, like with grouped-dfs. The change has been made to make this more consistent.
Methods for
data.frame
,grouped_df
, androwwise_df
all now share the same implementation@earowang, this makes
group_split(pedestrian, Sensor)
work since you have adplyr_row_slice()
method. You will still need agroup_split.grouped_ts()
method. It should just have to callNextMethod()
to get thegroup_split.grouped_df()
behavior, and then map over the resulting list of tibbles, coercing them to tbl_ts if applicable. I don't think we can do much better here. Adding that method should fixnest_by()
automatically.