Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor group_split() to use dplyr_row_slice() #5167

Merged
merged 7 commits into from
Apr 30, 2020

Conversation

DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Apr 30, 2020

Closes #5165

group_split() has been refactored to accomplish a few goals:

  • It now uses dplyr_row_slice() to split with. Previously this was vec_slice().

  • There was a case where we used [, i] rather than 1D [i] when keep = FALSE. This has been fixed.

  • group-splitting a rowwise data frame used to return a list-of rowwise data frames. I believe this is inconsistent, and it should just return a list of tibbles, like with grouped-dfs. The change has been made to make this more consistent.

  • Methods for data.frame, grouped_df, and rowwise_df all now share the same implementation

@earowang, this makes group_split(pedestrian, Sensor) work since you have a dplyr_row_slice() method. You will still need a group_split.grouped_ts() method. It should just have to call NextMethod() to get the group_split.grouped_df() behavior, and then map over the resulting list of tibbles, coercing them to tbl_ts if applicable. I don't think we can do much better here. Adding that method should fix nest_by() automatically.

R/group_split.R Outdated
data <- .tbl
grouped_data <- group_by(.tbl, ...)

data <- group_split_col_slice(data, grouped_data, keep)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function names to more clearly indicate that this is about dropping the grouping variables out of the splits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we're grouping here, can't we just call group_split.grouped_df()? And then group_split_col_slice can be inlined into group_split.grouped_df?

Copy link
Member Author

@DavisVaughan DavisVaughan Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just call group_split.grouped_df()

I don't think so. By design, group_split.grouped_df() coerces the grouped-df to tibble and splits that, so you'd end up with a list of tibbles, rather than a list of the tibble subclasses.


To me it's about gathering two pieces of information: an ungrouped data frame (possibly subclassed) to split, and a grouped data frame holding information about how to split.

With group_split.data.frame() you have the ungrouped thing you split on, so you have to generate the grouped data.

With group_split.grouped_df() you have the grouped data, but need the ungrouped thing to split on. The best we can do here is make the grouped data into a bare tibble. This is why Earo will need a group_split.grouped_ts() method to be able to return a list of tbl_ts.

Once the two pieces of information have been gathered, they share a common implementation in group_split_impl()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should nest_by() and group_split() (and friends) preserve subclass, if dplyr_row_slice() implemented?
2 participants