Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vctrs:: based summarise() filter() slice() mutate() #4523

Merged
merged 32 commits into from
Aug 30, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
54321a5
Simplify internals of summarise()
romainfrancois Aug 7, 2019
98a4968
Forget about hybrid n_distinct() for now so that we can eliminate Mul…
romainfrancois Aug 7, 2019
afeb41a
Forget MultipleVectorVisitors c++ class
romainfrancois Aug 7, 2019
7ff4738
Forget VectorVisitor and derived classes
romainfrancois Aug 7, 2019
28035d2
Forget CharacterVectorOrderer c++ class
romainfrancois Aug 7, 2019
020dc12
Experimental summarise2()
romainfrancois Aug 7, 2019
3aaddd3
some TODOs [ci skip]
romainfrancois Aug 7, 2019
12b2d94
astyle
romainfrancois Aug 12, 2019
9d9eaec
assert results of summarise_one() validates vec_size() == 1
romainfrancois Aug 12, 2019
cb79658
pure R implementation of summarise2()
romainfrancois Aug 12, 2019
c189349
Experiment with summarise2(.size =)
romainfrancois Aug 12, 2019
a27abeb
support for n() in summarise()
romainfrancois Aug 13, 2019
970a862
install .data pronoun on summarise_data_mask()
romainfrancois Aug 13, 2019
2aef73d
adapt to changes in summarise()
romainfrancois Aug 13, 2019
36962cf
Better support consecutive reuse of summaries
romainfrancois Aug 13, 2019
43a7444
workaround for case whene there are 0 groups
romainfrancois Aug 15, 2019
f13c686
Fix tests
romainfrancois Aug 15, 2019
ebbd15c
temporary workaround for summarise_data_mask() vs rowwise() until vec…
romainfrancois Aug 15, 2019
832630d
Abandon old C++ implementation of summarise()
romainfrancois Aug 15, 2019
3754061
Update compatibilitry vignette
romainfrancois Aug 15, 2019
7e40a77
test update, min gives a warning in summarise()
romainfrancois Aug 19, 2019
da2d938
onlu auto splice when it's a data frame
romainfrancois Aug 19, 2019
dad4cb4
vctrs based filter()
romainfrancois Aug 19, 2019
45cb714
vctrs based slice()
romainfrancois Aug 20, 2019
278b52e
Abaandon old internal slice() support code
romainfrancois Aug 20, 2019
9688558
vctrs:: implementation of mutate()
romainfrancois Aug 23, 2019
1d455ae
Abandon pre vctrs:: mutate() internal impl
romainfrancois Aug 23, 2019
6bf809d
Remove hybrid evaluation implementation, so that we can simplify the …
romainfrancois Aug 23, 2019
ed32f0e
Remove previous C++ version of data mask
romainfrancois Aug 23, 2019
bc11d34
Promoting data mask to an R6 class
romainfrancois Aug 30, 2019
1b19afb
Remove test that is now irrelevant
romainfrancois Aug 30, 2019
2f1d017
Update tibble requirement, until https://github.com/tidyverse/tibble/…
romainfrancois Aug 30, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
pure R implementation of summarise2()
  • Loading branch information
romainfrancois committed Aug 12, 2019
commit cb79658f59a4edb5d791c44c57bc3eb00cd9316d
4 changes: 0 additions & 4 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
Expand Up @@ -144,10 +144,6 @@ summarise_impl <- function(df, dots, caller_env) {
.Call(`_dplyr_summarise_impl`, df, dots, caller_env)
}

summarise_one <- function(df, summaries, quosure, caller_env) {
.Call(`_dplyr_summarise_one`, df, summaries, quosure, caller_env)
}

hybrid_impl <- function(df, quosure, caller_env) {
.Call(`_dplyr_hybrid_impl`, df, quosure, caller_env)
}
Expand Down
56 changes: 51 additions & 5 deletions R/tbl-df.r
Original file line number Diff line number Diff line change
Expand Up @@ -198,32 +198,78 @@ assert_all_size_one <- function(x) {
}
}

summarise_data_mask <- function(data, rows) {
chunks_env <- env()
map2(data, names(data), function(col, nm) {
env_bind_lazy(chunks_env, !!nm := map(rows, vec_slice, x = col))
})

bottom <- env()
column_names <- set_names(names(data))

.current_group_index <- NA_integer_
env_bind_active(bottom, !!!map(column_names, function(column) {
function() {
chunks_env[[column]][[.current_group_index]]
}
}))

mask <- new_data_mask(bottom)
mask$.set_current_group <- function(group_index) {
.current_group_index <<- group_index
}
mask$.add_summarised <- function(name, chunks) {
env_bind_active(bottom, !!name := function() {
chunks[[.current_group_index]]
})
}

mask
}

#' @export
summarise2 <- function(.data, ...) {
dots <- enquos(...)
dots_names <- names(dots)

rows <- group_rows(.data)
mask <- summarise_data_mask(.data, rows)
caller <- caller_env()

summaries <- list()
for (i in seq_along(dots)) {
# summarise_one() gives a list in which each element is the result of
# a list in which each element is the result of
# evaluating the quosure in the "sliced data mask"
#
# vec_c() simplifies it to a vctr (might be a data frame)
#
# TODO: implement an R version of summarise_one()
# TODO: reinject hybrid evaluation at the R level
chunks <- summarise_one(.data, summaries, dots[[i]], caller_env())
chunks <- map(seq_along(rows), function(group) {
mask$.set_current_group(group)
eval_tidy(dots[[i]], mask, env = caller)
})

assert_all_size_one(chunks)

# vec_c() simplifies it to a vctr (might be a data frame)
result <- vec_c(!!!chunks)

if (is.null(dots_names) || dots_names[i] == "") {
# auto splice when the quosure is not named
if (is.data.frame(result)) {
summaries <- append(summaries, list2(!!!result))

# remember each result separately
map2(seq_along(result), names(result), function(i, nm) {
mask$.add_summarised(nm, map(chunks, i))
})
} else {
abort("cannot auto splice non data frame results")
}
} else {
# treat as a single output otherwise
summaries <- append(summaries, list2(!!dots_names[i] := result))

# remember
mask$.add_summarised(dots_names[i], chunks)
}

}
Expand Down
14 changes: 0 additions & 14 deletions src/RcppExports.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -362,19 +362,6 @@ BEGIN_RCPP
return rcpp_result_gen;
END_RCPP
}
// summarise_one
SEXP summarise_one(Rcpp::DataFrame df, Rcpp::List summaries, dplyr::Quosure quosure, SEXP caller_env);
RcppExport SEXP _dplyr_summarise_one(SEXP dfSEXP, SEXP summariesSEXP, SEXP quosureSEXP, SEXP caller_envSEXP) {
BEGIN_RCPP
Rcpp::RObject rcpp_result_gen;
Rcpp::traits::input_parameter< Rcpp::DataFrame >::type df(dfSEXP);
Rcpp::traits::input_parameter< Rcpp::List >::type summaries(summariesSEXP);
Rcpp::traits::input_parameter< dplyr::Quosure >::type quosure(quosureSEXP);
Rcpp::traits::input_parameter< SEXP >::type caller_env(caller_envSEXP);
rcpp_result_gen = Rcpp::wrap(summarise_one(df, summaries, quosure, caller_env));
return rcpp_result_gen;
END_RCPP
}
// hybrid_impl
SEXP hybrid_impl(Rcpp::DataFrame df, dplyr::Quosure quosure, SEXP caller_env);
RcppExport SEXP _dplyr_hybrid_impl(SEXP dfSEXP, SEXP quosureSEXP, SEXP caller_envSEXP) {
Expand Down Expand Up @@ -546,7 +533,6 @@ static const R_CallMethodDef CallEntries[] = {
{"_dplyr_mutate_impl", (DL_FUNC) &_dplyr_mutate_impl, 3},
{"_dplyr_select_impl", (DL_FUNC) &_dplyr_select_impl, 2},
{"_dplyr_summarise_impl", (DL_FUNC) &_dplyr_summarise_impl, 3},
{"_dplyr_summarise_one", (DL_FUNC) &_dplyr_summarise_one, 4},
{"_dplyr_hybrid_impl", (DL_FUNC) &_dplyr_hybrid_impl, 3},
{"_dplyr_test_comparisons", (DL_FUNC) &_dplyr_test_comparisons, 0},
{"_dplyr_test_matches", (DL_FUNC) &_dplyr_test_matches, 0},
Expand Down
47 changes: 0 additions & 47 deletions src/summarise.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -89,53 +89,6 @@ SEXP summarise_impl(Rcpp::DataFrame df, dplyr::QuosureList dots, SEXP caller_env
}
}

namespace dplyr {

template <typename SlicedTibble>
SEXP summarise_one_impl(const Rcpp::DataFrame& df, const Rcpp::List& summaries, const dplyr::Quosure& quosure, SEXP caller_env) {
SlicedTibble gdf(df);

DataMask<SlicedTibble> mask(gdf);

// register the summaries
SEXP summaries_names = Rf_getAttrib(summaries, R_NamesSymbol);
for (int i = 0; i < summaries.size(); i++) {
mask.input_summarised(SymbolString(STRING_ELT(summaries_names, i)), summaries[i]);
}

mask.setup();

int ngroups = gdf.ngroups();
Rcpp::List result(ngroups);
typename SlicedTibble::group_iterator it = gdf.group_begin();
for (int i = 0; i < ngroups; i++, ++it) {
result[i] = mask.eval(quosure, *it);
}

return result;

}

}




// [[Rcpp::export(rng = false)]]
SEXP summarise_one(Rcpp::DataFrame df, Rcpp::List summaries, dplyr::Quosure quosure, SEXP caller_env) {

// check_valid_colnames(df);
if (Rcpp::is<dplyr::RowwiseDataFrame>(df)) {
return dplyr::summarise_one_impl<dplyr::RowwiseDataFrame>(df, summaries, quosure, caller_env);
} else if (Rcpp::is<dplyr::GroupedDataFrame>(df)) {
return dplyr::summarise_one_impl<dplyr::GroupedDataFrame>(df, summaries, quosure, caller_env);
} else {
return dplyr::summarise_one_impl<dplyr::NaturalDataFrame>(df, summaries, quosure, caller_env);
}
}



namespace dplyr {

template <typename SlicedTibble>
Expand Down