-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for raw vectors #2952
Support for raw vectors #2952
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good. Could you please split it (=remove hybrid-related changes), and add a few more tests?
inst/include/dplyr_RcppExports.h
Outdated
@@ -73,7 +73,7 @@ namespace dplyr { | |||
RObject rcpp_result_gen; | |||
{ | |||
RNGScope RCPP_rngScope_gen; | |||
rcpp_result_gen = p_build_index_cpp(Shield<SEXP>(Rcpp::wrap(data))); | |||
rcpp_result_gen = p_build_index_cpp(Rcpp::wrap(data)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to update Rcpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
} | ||
|
||
bool can_promote(SEXP x) const { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this promotion actually tested anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️ in 8bd7a1f
// because RAWSXP does not have the NA concept | ||
template <> | ||
template <typename Container> | ||
inline SEXP MatrixColumnSubsetVisitor<RAWSXP>::subset_int(const Container& index) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just added a test for it here: ec3bd34
i.e.
test_that( "slice handles raw matrices", {
df <- data.frame( a = 1:4 )
df$b <- matrix(as.raw(1:8), ncol = 2)
expect_identical(
slice( df, 1:2 )$b,
matrix(as.raw(c(1,2,5,6)), ncol = 2)
)
})
but matrices are not widely supported, as e.g.
> filter(df, a < 4)
Error: Column `b` must be a 1d atomic vector or a list
Matrix<RAWSXP> res(n, nc); | ||
for (int h = 0; h < nc; h++) { | ||
Column column = res.column(h); | ||
Column source_column = const_cast<Matrix<RAWSXP>&>(data).column(h); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use a ConstColumn
here instead of the const_cast
? Also in the default implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. this is mostly a cry for auto
though.
tests/testthat/helper-astyle.R
Outdated
@@ -19,7 +19,7 @@ astyle <- function(extra_args = character()) { | |||
"--align-reference=type" | |||
) | |||
|
|||
src_path <- normalizePath(map_chr(c("../../src", "../../inst/include"), testthat::test_path)) | |||
src_path <- normalizePath(purrr::map_chr(c("../../src", "../../inst/include"), testthat::test_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we have defined a "poor man's" map_chr()
in this package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed now, sorry about that.
src/join.cpp
Outdated
{ | ||
switch (TYPEOF(right.get_data())) { | ||
case RAWSXP: { | ||
return new JoinVisitorImpl<RAWSXP, RAWSXP, ACCEPT_NA_MATCH> (left, right, warn_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️ ec4c9a7
inst/include/dplyr/Result/Rank.h
Outdated
@@ -230,6 +230,88 @@ class Rank_Impl : public Result, public Increment { | |||
Map map; | |||
}; | |||
|
|||
template <typename Increment, bool ascending> | |||
class Rank_Impl<RAWSXP, Increment, ascending> : public Result, public Increment { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to specialize this class? Actually, I'd rather review all hybrid-related changes in a separate PR, and keep forwarding raw columns to R evaluation for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped it now. The specialization was because there is no notion of NA for raw vectors.
very low hanging 🍉 🍌 🍎 almost no chance we want to rank based one raw vector, so R evaluation is fine.
Thanks for the review. I'll revise it when j get a chance. Class specializations for raw usually are because raw don't have the concept of NA. I'm not sure this can be split into two PRs. |
Thanks, please take your time. I'd argue that we don't need to include the hybrid handlers in this PR, because they'll just offload to standard evaluation if they're not implemented. |
test for `MatrixColumnSubsetVisitor<RAWSXP>::subset_int`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krlmlr I think it looks good now
} | ||
|
||
bool can_promote(SEXP x) const { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️ in 8bd7a1f
// because RAWSXP does not have the NA concept | ||
template <> | ||
template <typename Container> | ||
inline SEXP MatrixColumnSubsetVisitor<RAWSXP>::subset_int(const Container& index) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just added a test for it here: ec3bd34
i.e.
test_that( "slice handles raw matrices", {
df <- data.frame( a = 1:4 )
df$b <- matrix(as.raw(1:8), ncol = 2)
expect_identical(
slice( df, 1:2 )$b,
matrix(as.raw(c(1,2,5,6)), ncol = 2)
)
})
but matrices are not widely supported, as e.g.
> filter(df, a < 4)
Error: Column `b` must be a 1d atomic vector or a list
Matrix<RAWSXP> res(n, nc); | ||
for (int h = 0; h < nc; h++) { | ||
Column column = res.column(h); | ||
Column source_column = const_cast<Matrix<RAWSXP>&>(data).column(h); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. this is mostly a cry for auto
though.
inst/include/dplyr/Result/Rank.h
Outdated
@@ -230,6 +230,88 @@ class Rank_Impl : public Result, public Increment { | |||
Map map; | |||
}; | |||
|
|||
template <typename Increment, bool ascending> | |||
class Rank_Impl<RAWSXP, Increment, ascending> : public Result, public Increment { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped it now. The specialization was because there is no notion of NA for raw vectors.
very low hanging 🍉 🍌 🍎 almost no chance we want to rank based one raw vector, so R evaluation is fine.
inst/include/dplyr_RcppExports.h
Outdated
@@ -73,7 +73,7 @@ namespace dplyr { | |||
RObject rcpp_result_gen; | |||
{ | |||
RNGScope RCPP_rngScope_gen; | |||
rcpp_result_gen = p_build_index_cpp(Shield<SEXP>(Rcpp::wrap(data))); | |||
rcpp_result_gen = p_build_index_cpp(Rcpp::wrap(data)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/join.cpp
Outdated
{ | ||
switch (TYPEOF(right.get_data())) { | ||
case RAWSXP: { | ||
return new JoinVisitorImpl<RAWSXP, RAWSXP, ACCEPT_NA_MATCH> (left, right, warn_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️ ec4c9a7
tests/testthat/helper-astyle.R
Outdated
@@ -19,7 +19,7 @@ astyle <- function(extra_args = character()) { | |||
"--align-reference=type" | |||
) | |||
|
|||
src_path <- normalizePath(map_chr(c("../../src", "../../inst/include"), testthat::test_path)) | |||
src_path <- normalizePath(purrr::map_chr(c("../../src", "../../inst/include"), testthat::test_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed now, sorry about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I like it. We might be able to make this code slightly more compact in a few places. Could you please rerun astyle, too?
inst/include/dplyr/Result/Lag.h
Outdated
inline Lag<RAWSXP>::Lag(SEXP data_, int n_, const RObject& def_, bool is_summary_) : | ||
data(data_), | ||
n(n_), | ||
def((Rbyte)0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the new default_value()
here (and also for Lead
), instead of overriding the constructor?
} | ||
|
||
bool is_na(int) const { | ||
return false ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could define our own is_na()
akin to default_value()
and use that in the default implementation of VectorVisitorImpl
? Or perhaps there's a way to just override VectorVisitorImpl<RAWSXP>::is_na()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we don't need to because RawVector::is_na already does the right thing.
https://github.com/RcppCore/Rcpp/blob/6f81b4684481dbd9bb554dd95e66725fc3b63a8c/inst/include/Rcpp/traits/is_na.h#L31
However, NumericVector::is_na()
does not. https://twitter.com/romain_francois/status/938792189932556288
I'll open an issue to discuss this, it might be one of those cases where we depend on an upstream bug.
I removed the VectorVisitorImpl
specialization, and just specialized comparisons
for RAWSXP
, mostly to get rid of the two if
branches in is_less
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing:
is.na(NaN)
#> [1] TRUE
So the Rcpp version seems consistent with what R is doing. Why doesn't this work for us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you echo this on #3250 ?
I guess I have a problem with R then. I thought NA
and NaN
were two very different concepts, and assumed that is.na
would 🍒 pick only NA
…`Lag` constructors instead of override them for `RAWSXP`.
… lower level `comparisons` instead.
Done.
Perhaps devtools or one of its implementation 📦 could setup some git hook to run |
Thanks, I need to revisit the I'm not sure astyle is the best tool, it's just what I knew best at that time. Other projects use the clang formatter. I think a Git hook would have its own problems, but we might want to simplify calling astyle or the clang formatter, maybe in the context of the styler package? |
Thanks! |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
This adds support for raw vectors across the verbs. I had to modify some tests accordingly.
My use case is for the
collage
pkg: https://github.com/ThinkRstat/collage where we have raw vectors for red, green and blue channels of colors.This is related to (but independent of) this purrr pr tidyverse/purrr#345