Refactor `group_split()` to use `dplyr_row_slice()` #5167

DavisVaughan · 2020-04-30T16:13:23Z

group_split() has been refactored to accomplish a few goals:

It now uses dplyr_row_slice() to split with. Previously this was vec_slice().
There was a case where we used [, i] rather than 1D [i] when keep = FALSE. This has been fixed.
group-splitting a rowwise data frame used to return a list-of rowwise data frames. I believe this is inconsistent, and it should just return a list of tibbles, like with grouped-dfs. The change has been made to make this more consistent.
Methods for data.frame, grouped_df, and rowwise_df all now share the same implementation

@earowang, this makes group_split(pedestrian, Sensor) work since you have a dplyr_row_slice() method. You will still need a group_split.grouped_ts() method. It should just have to call NextMethod() to get the group_split.grouped_df() behavior, and then map over the resulting list of tibbles, coercing them to tbl_ts if applicable. I don't think we can do much better here. Adding that method should fix nest_by() automatically.

hadley · 2020-04-30T18:28:19Z

R/group_split.R

+  data <- .tbl
+  grouped_data <- group_by(.tbl, ...)
+
+  data <- group_split_col_slice(data, grouped_data, keep)


I think the function names to more clearly indicate that this is about dropping the grouping variables out of the splits.

Also, if we're grouping here, can't we just call group_split.grouped_df()? And then group_split_col_slice can be inlined into group_split.grouped_df?

can't we just call group_split.grouped_df()

I don't think so. By design, group_split.grouped_df() coerces the grouped-df to tibble and splits that, so you'd end up with a list of tibbles, rather than a list of the tibble subclasses.

To me it's about gathering two pieces of information: an ungrouped data frame (possibly subclassed) to split, and a grouped data frame holding information about how to split.

With group_split.data.frame() you have the ungrouped thing you split on, so you have to generate the grouped data.

With group_split.grouped_df() you have the grouped data, but need the ungrouped thing to split on. The best we can do here is make the grouped data into a bare tibble. This is why Earo will need a group_split.grouped_ts() method to be able to return a list of tbl_ts.

Once the two pieces of information have been gathered, they share a common implementation in group_split_impl()

Ok, makes sense. Thanks!

DavisVaughan added 6 commits April 30, 2020 11:38

Refactor group_split() to use dplyr_row_slice() and 1D [

edf9efb

Be stricter in existing group_split() tests

ab23e5d

Add tests regarding the return value of group_split()

04d764f

Add tests to ensure that dplyr_row_slice() and [ are called

fad18d9

Don't even call group_split_col_slice() for rowwise dfs

f6a7bd0

Simplify loop with a map()

aebfd19

DavisVaughan requested a review from hadley April 30, 2020 17:12

hadley reviewed Apr 30, 2020

View reviewed changes

Use clearer drop_cols() helper

f2025b2

hadley merged commit d353ff1 into tidyverse:master Apr 30, 2020

DavisVaughan deleted the group-split-refactor branch April 30, 2020 19:59

This was referenced Apr 30, 2020

Should nest_by() and group_split() (and friends) preserve subclass, if dplyr_row_slice() implemented? #5165

Closed

Simplify group_split() using ungroup(), not as_tibble() #5175

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `group_split()` to use `dplyr_row_slice()` #5167

Refactor `group_split()` to use `dplyr_row_slice()` #5167

DavisVaughan commented Apr 30, 2020 •

edited

Loading

hadley Apr 30, 2020

hadley Apr 30, 2020

DavisVaughan Apr 30, 2020 •

edited

Loading

hadley Apr 30, 2020

Refactor group_split() to use dplyr_row_slice() #5167

Refactor group_split() to use dplyr_row_slice() #5167

Conversation

DavisVaughan commented Apr 30, 2020 • edited Loading

hadley Apr 30, 2020

Choose a reason for hiding this comment

hadley Apr 30, 2020

Choose a reason for hiding this comment

DavisVaughan Apr 30, 2020 • edited Loading

Choose a reason for hiding this comment

hadley Apr 30, 2020

Choose a reason for hiding this comment

Refactor `group_split()` to use `dplyr_row_slice()` #5167

Refactor `group_split()` to use `dplyr_row_slice()` #5167

DavisVaughan commented Apr 30, 2020 •

edited

Loading

DavisVaughan Apr 30, 2020 •

edited

Loading