Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for raw vectors #2952

Merged
merged 18 commits into from
Dec 21, 2017
Merged

Support for raw vectors #2952

merged 18 commits into from
Dec 21, 2017

Conversation

romainfrancois
Copy link
Member

This adds support for raw vectors across the verbs. I had to modify some tests accordingly.

My use case is for the collage pkg: https://github.com/ThinkRstat/collage where we have raw vectors for red, green and blue channels of colors.

This is related to (but independent of) this purrr pr tidyverse/purrr#345

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good. Could you please split it (=remove hybrid-related changes), and add a few more tests?

@@ -73,7 +73,7 @@ namespace dplyr {
RObject rcpp_result_gen;
{
RNGScope RCPP_rngScope_gen;
rcpp_result_gen = p_build_index_cpp(Shield<SEXP>(Rcpp::wrap(data)));
rcpp_result_gen = p_build_index_cpp(Rcpp::wrap(data));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to update Rcpp?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}

bool can_promote(SEXP x) const {
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this promotion actually tested anywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ in 8bd7a1f

// because RAWSXP does not have the NA concept
template <>
template <typename Container>
inline SEXP MatrixColumnSubsetVisitor<RAWSXP>::subset_int(const Container& index) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just added a test for it here: ec3bd34

i.e.

test_that( "slice handles raw matrices", {
  df <- data.frame( a = 1:4 )
  df$b <- matrix(as.raw(1:8), ncol = 2)

  expect_identical(
    slice( df, 1:2 )$b,
    matrix(as.raw(c(1,2,5,6)), ncol = 2)
  )
})

but matrices are not widely supported, as e.g.

> filter(df, a < 4)
 Error: Column `b` must be a 1d atomic vector or a list 

Matrix<RAWSXP> res(n, nc);
for (int h = 0; h < nc; h++) {
Column column = res.column(h);
Column source_column = const_cast<Matrix<RAWSXP>&>(data).column(h);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a ConstColumn here instead of the const_cast? Also in the default implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. this is mostly a cry for auto though.

@@ -19,7 +19,7 @@ astyle <- function(extra_args = character()) {
"--align-reference=type"
)

src_path <- normalizePath(map_chr(c("../../src", "../../inst/include"), testthat::test_path))
src_path <- normalizePath(purrr::map_chr(c("../../src", "../../inst/include"), testthat::test_path))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we have defined a "poor man's" map_chr() in this package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed now, sorry about that.

src/join.cpp Outdated
{
switch (TYPEOF(right.get_data())) {
case RAWSXP: {
return new JoinVisitorImpl<RAWSXP, RAWSXP, ACCEPT_NA_MATCH> (left, right, warn_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ ec4c9a7

@@ -230,6 +230,88 @@ class Rank_Impl : public Result, public Increment {
Map map;
};

template <typename Increment, bool ascending>
class Rank_Impl<RAWSXP, Increment, ascending> : public Result, public Increment {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to specialize this class? Actually, I'd rather review all hybrid-related changes in a separate PR, and keep forwarding raw columns to R evaluation for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped it now. The specialization was because there is no notion of NA for raw vectors.
very low hanging 🍉 🍌 🍎 almost no chance we want to rank based one raw vector, so R evaluation is fine.

@romainfrancois
Copy link
Member Author

Thanks for the review. I'll revise it when j get a chance.

Class specializations for raw usually are because raw don't have the concept of NA.

I'm not sure this can be split into two PRs.

@krlmlr
Copy link
Member

krlmlr commented Jul 15, 2017

Thanks, please take your time.

I'd argue that we don't need to include the hybrid handlers in this PR, because they'll just offload to standard evaluation if they're not implemented.

Copy link
Member Author

@romainfrancois romainfrancois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krlmlr I think it looks good now

}

bool can_promote(SEXP x) const {
return
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ in 8bd7a1f

// because RAWSXP does not have the NA concept
template <>
template <typename Container>
inline SEXP MatrixColumnSubsetVisitor<RAWSXP>::subset_int(const Container& index) const {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just added a test for it here: ec3bd34

i.e.

test_that( "slice handles raw matrices", {
  df <- data.frame( a = 1:4 )
  df$b <- matrix(as.raw(1:8), ncol = 2)

  expect_identical(
    slice( df, 1:2 )$b,
    matrix(as.raw(c(1,2,5,6)), ncol = 2)
  )
})

but matrices are not widely supported, as e.g.

> filter(df, a < 4)
 Error: Column `b` must be a 1d atomic vector or a list 

Matrix<RAWSXP> res(n, nc);
for (int h = 0; h < nc; h++) {
Column column = res.column(h);
Column source_column = const_cast<Matrix<RAWSXP>&>(data).column(h);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. this is mostly a cry for auto though.

@@ -230,6 +230,88 @@ class Rank_Impl : public Result, public Increment {
Map map;
};

template <typename Increment, bool ascending>
class Rank_Impl<RAWSXP, Increment, ascending> : public Result, public Increment {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped it now. The specialization was because there is no notion of NA for raw vectors.
very low hanging 🍉 🍌 🍎 almost no chance we want to rank based one raw vector, so R evaluation is fine.

@@ -73,7 +73,7 @@ namespace dplyr {
RObject rcpp_result_gen;
{
RNGScope RCPP_rngScope_gen;
rcpp_result_gen = p_build_index_cpp(Shield<SEXP>(Rcpp::wrap(data)));
rcpp_result_gen = p_build_index_cpp(Rcpp::wrap(data));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

src/join.cpp Outdated
{
switch (TYPEOF(right.get_data())) {
case RAWSXP: {
return new JoinVisitorImpl<RAWSXP, RAWSXP, ACCEPT_NA_MATCH> (left, right, warn_);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ ec4c9a7

@@ -19,7 +19,7 @@ astyle <- function(extra_args = character()) {
"--align-reference=type"
)

src_path <- normalizePath(map_chr(c("../../src", "../../inst/include"), testthat::test_path))
src_path <- normalizePath(purrr::map_chr(c("../../src", "../../inst/include"), testthat::test_path))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed now, sorry about that.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like it. We might be able to make this code slightly more compact in a few places. Could you please rerun astyle, too?

inline Lag<RAWSXP>::Lag(SEXP data_, int n_, const RObject& def_, bool is_summary_) :
data(data_),
n(n_),
def((Rbyte)0),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the new default_value() here (and also for Lead), instead of overriding the constructor?

}

bool is_na(int) const {
return false ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could define our own is_na() akin to default_value() and use that in the default implementation of VectorVisitorImpl? Or perhaps there's a way to just override VectorVisitorImpl<RAWSXP>::is_na()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we don't need to because RawVector::is_na already does the right thing.
https://github.com/RcppCore/Rcpp/blob/6f81b4684481dbd9bb554dd95e66725fc3b63a8c/inst/include/Rcpp/traits/is_na.h#L31

However, NumericVector::is_na() does not. https://twitter.com/romain_francois/status/938792189932556288

I'll open an issue to discuss this, it might be one of those cases where we depend on an upstream bug.

I removed the VectorVisitorImpl specialization, and just specialized comparisons for RAWSXP, mostly to get rid of the two if branches in is_less.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing:

is.na(NaN)
#> [1] TRUE

So the Rcpp version seems consistent with what R is doing. Why doesn't this work for us?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you echo this on #3250 ?

I guess I have a problem with R then. I thought NA and NaN were two very different concepts, and assumed that is.na would 🍒 pick only NA

Romain Francois added 2 commits December 14, 2017 08:21
…`Lag` constructors instead of override them for `RAWSXP`.
@romainfrancois
Copy link
Member Author

Done.
Running astyle did not change anything this time, perhaps I'm not using the same version. I have:

$ astyle --version
Artistic Style Version 3.0

Perhaps devtools or one of its implementation 📦 could setup some git hook to run astyle, maybe as part of use_Rcpp ? @hadley ?

@krlmlr
Copy link
Member

krlmlr commented Dec 14, 2017

Thanks, I need to revisit the astyle parameters.

I'm not sure astyle is the best tool, it's just what I knew best at that time. Other projects use the clang formatter. I think a Git hook would have its own problems, but we might want to simplify calling astyle or the clang formatter, maybe in the context of the styler package?

@krlmlr krlmlr requested a review from hadley December 14, 2017 09:17
@krlmlr krlmlr merged commit c04deb3 into master Dec 21, 2017
@krlmlr
Copy link
Member

krlmlr commented Dec 21, 2017

Thanks!

@romainfrancois romainfrancois deleted the raw branch May 8, 2018 07:30
@lock
Copy link

lock bot commented Nov 4, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Nov 4, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants