Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement list_sizes() #539

Merged
merged 4 commits into from
Mar 18, 2020
Merged

Implement list_sizes() #539

merged 4 commits into from
Mar 18, 2020

Conversation

romainfrancois
Copy link
Contributor

vec_sizes() is to vec_size() what lengths() is to length():

library(vctrs)

data <- list(iris, mtcars, 1:10)
vec_sizes(data)
#> [1] 150  32  10

The 😈 plan is to use this in a vctrs based implementation of things like dplyr::summarise() and dplyr::mutate() e.g. evaluate the expression in each group and then assert that the sizes are good (all equal to 1 if summaries, equal to the group sizes on a mutate).

The PR is wip, if we want that I'll add unit tests, etc ...

@romainfrancois
Copy link
Contributor Author

Alternatively or additionally, it would be nice to have a C api for vec_size() that we can use from internals of e.g. dplyr.

romainfrancois added a commit to tidyverse/dplyr that referenced this pull request Aug 12, 2019
Could replace map_int(x, vec_size) by vec_sizes() from r-lib/vctrs#539
@DavisVaughan
Copy link
Member

Not sure if it matters, but it looks like this would also work on a data frame and return the size of each column (all the same size)?

@lionel-
Copy link
Member

lionel- commented Aug 12, 2019

I think it'd be better to export vec_size() as a C callable, if needed.

Not sure if it matters, but it looks like this would also work on a data frame and return the size of each column (all the same size)?

That doesn't seem appropriate. In general this function seems ill-suited for vctrs at the moment because it's about [[.

@romainfrancois
Copy link
Contributor Author

Sure. I can live with map_int(vec_size) for now, having vec_size() C callable means I can implement something that tests is all the sizes are 1 instead of getting all the sizes and then do a any(sizes != 1)

@lionel- lionel- closed this Mar 6, 2020
@lionel-
Copy link
Member

lionel- commented Mar 6, 2020

Let's revisit once we have figured out vec_slice2() and the tbl_ API.

@lionel- lionel- reopened this Mar 18, 2020
@lionel-
Copy link
Member

lionel- commented Mar 18, 2020

From a discussion with @DavisVaughan, we could have list_sizes() that fails with atomic vectors (including dfs).

@DavisVaughan
Copy link
Member

@romainfrancois I'll take over this PR if that's alright with you

@romainfrancois
Copy link
Contributor Author

@DavisVaughan sure. dplyr still has a dplyr_vec_sizes but is unused at this point.

@DavisVaughan DavisVaughan changed the title Additional function vec_sizes() Implement list_sizes() Mar 18, 2020
@DavisVaughan
Copy link
Member

Pretty nice improvement

library(purrr)
library(vctrs)

x <- rep_len(list(1:5 + 0L), 100)

bench::mark(
  map_int(x, vec_size),
  list_sizes(x)
)
#> # A tibble: 2 x 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 map_int(x, vec_size)  63.34µs  77.62µs    12802.    68.6KB     91.8
#> 2 list_sizes(x)          2.81µs   3.08µs   287798.     4.3KB      0

list_sizes(mtcars)
#> Error: `x` must be a list.

Created on 2020-03-18 by the reprex package (v0.3.0)

NEWS.md Outdated Show resolved Hide resolved
}

// [[ register() ]]
SEXP vctrs_list_sizes(SEXP x) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Do we need a registered alias since they have the same signature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to try and always keep a separate function prefixed with vctrs_ that gets exported, whether it is strictly required or not

@DavisVaughan DavisVaughan merged commit 51df406 into r-lib:master Mar 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants