-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prefer vctrs::new_data_frame() over tibble(!!!) #5371
Conversation
R/bind.r
Outdated
# weird edge case about named factors | ||
if (is.factor(.x)) { | ||
.x <- set_names(map(.x, unname), names(.x)) | ||
} | ||
new_data_frame(as.list(.x), class = classes_tbl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vec_chop()
can be used to transform to list generically, this should fix the factor names issue. But probably best let vec_rbind()
deal with atomic vectors here. The special handling is only needed for lists.
Should the constructor be tibble::new_tibble()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, letting vctrs::vec_rbind()
handle vectors deals with the weird named factor case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This however gives a data.frame()
where it gives a tbl_df
in 1.0.0:
library(dplyr, warn.conflicts = FALSE)
library(rlang)
fct <- set_names(factor(c("a", "b")), c("x", "y"))
bind_rows(fct, fct)
#> x y
#> 1 a b
#> 2 a b
Created on 2020-06-30 by the reprex package (v0.3.0.9001)
but I guess I can work around it
This is a performance improvement. To match performance from 0.8.5 with list, we would need some sort of option in However in 1.0.0 we introduced some special handling that would "force" tibbles when we would give vectors or bare lists as part of I don't think it is a big deal but that is potentially a breaking change, or at least might create some friction with |
The special casing of bare lists is because this: str(vctrs::vec_rbind(
list(a = 1, b = 2),
list(a = 3, b = 4)
))
#> 'data.frame': 2 obs. of 2 variables:
#> $ a:List of 2
#> ..$ : num 1
#> ..$ : num 3
#> $ b:List of 2
#> ..$ : num 2
#> ..$ : num 4
str(dplyr::bind_rows(
list(a = 1, b = 2),
list(a = 3, b = 4)
))
#> 'data.frame': 2 obs. of 2 variables:
#> $ a: num 1 3
#> $ b: num 2 4 Created on 2020-07-01 by the reprex package (v0.3.0.9001) but this is something for |
With this code: library(tibble)
library(bench)
f <- function(){
library(dplyr, warn.conflicts = FALSE)
library(purrr)
new_list <- map(1:10000, function(x) {
list(A = 2 * x, B = 4 * x)
})
bench::mark(bind_rows(new_list))
}
callr::r(f) we get these results with this branch:
these with master:
it isn't quite the performance of dplyr 0.8.5:
but to get there |
f75c5eb
to
cf65cd1
Compare
cf65cd1
to
e77d834
Compare
The overall file looks suspicious, but the new problems section gives:
|
371ba9d
to
e77d834
Compare
revdep results on this branch highlighted the same list of packages as in master in terms of |
* Follow up to #5371 * Better fix * cloud_check()
part of #5370 but would probably need more help from
vctrs
(probably r-lib/vctrs#1169).