Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement relationship and refine join match warning #6753

Merged
merged 5 commits into from
Feb 24, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update to - rather than _
  • Loading branch information
DavisVaughan committed Feb 24, 2023
commit 8c2b5ddb56788f5cb8108bdefd688ea4f44dc002
10 changes: 5 additions & 5 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,21 @@

* The mutating joins gain a new `relationship` argument, allowing you to
enforce one of the following relationship constraints between the keys of
`x` and `y`: `"none"`, `"one_to_one"`, `"one_to_many"`, `"many_to_one"`,
`"many_to_many"`, or `"warn_many_to_many"`.
`x` and `y`: `"none"`, `"one-to-one"`, `"one-to-many"`, `"many-to-one"`,
`"many-to-many"`, or `"warn-many-to-many"`.

For example, `"many_to_one"` enforces that each row in `x` can match at
For example, `"many-to-one"` enforces that each row in `x` can match at
most 1 row in `y`. If a row in `x` matches >1 rows in `y`, an error is
thrown. This option serves as the replacement for `multiple = "error"`.

For equality joins, `relationship` defaults to `"warn_many_to_many"` to
For equality joins, `relationship` defaults to `"warn-many-to-many"` to
warn if an unexpected many-to-many relationship is detected, and otherwise
defaults to `"none"` for inequality, rolling, and overlap joins.

This change unfortunately does mean that if you have set `multiple = "all"` to
avoid a warning and you happened to be doing a many-to-many style join, then
you will need to replace `multiple = "all"` with
`relationship = "many_to_many"` to silence the new warning, but we believe
`relationship = "many-to-many"` to silence the new warning, but we believe
this should be rare since many-to-many relationships are fairly uncommon.

* `pick()` now returns a 1 row, 0 column tibble when `...` evaluates to an
Expand Down
8 changes: 4 additions & 4 deletions R/join-rows.R
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ rethrow_warning_join_relationship_many_to_many <- function(cnd, call) {
i = glue("Row {j} of `y` matches multiple rows in `x`."),
i = paste0(
"If a many-to-many relationship is expected, ",
"set `relationship = \"many_to_many\"` to silence this warning."
"set `relationship = \"many-to-many\"` to silence this warning."
)
),
class = "dplyr_warning_join_relationship_many_to_many",
Expand Down Expand Up @@ -377,7 +377,7 @@ standardise_join_relationship <- function(relationship, condition, user_env) {
# - Inequality and overlap joins often generate many-to-many relationships
# by nature
# - Rolling joins are a little trickier, but we've decided that not warning is
# probably easier to explain. `relationship = "many_to_one"` can always be
# probably easier to explain. `relationship = "many-to-one"` can always be
# used explicitly as needed.
any_inequality <- any(condition != "==")

Expand All @@ -388,7 +388,7 @@ standardise_join_relationship <- function(relationship, condition, user_env) {
# to `relationship` to silence it
"none"
} else {
"warn_many_to_many"
"warn-many-to-many"
}
}

Expand All @@ -400,7 +400,7 @@ warn_join_multiple <- function(what, user_env = caller_env(2)) {
lifecycle::deprecate_warn(
when = "1.1.1",
what = I(what),
with = I('`relationship = "many_to_one"`'),
with = I('`relationship = "many-to-one"`'),
user_env = user_env,
always = TRUE
)
Expand Down
22 changes: 11 additions & 11 deletions R/join.R
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
#' rows returned from the join.
#'
#' If a many-to-many relationship is expected, silence this warning by
#' explicitly setting `relationship = "many_to_many"`.
#' explicitly setting `relationship = "many-to-many"`.
#'
#' In production code, it is best to preemptively set `relationship` to whatever
#' relationship you expect to exist between the keys of `x` and `y`, as this
Expand All @@ -55,7 +55,7 @@
#'
#' Rolling joins don't warn on many-to-many relationships either, but many
#' rolling joins follow a many-to-one relationship, so it is often useful to
#' set `relationship = "many_to_one"` to enforce this.
#' set `relationship = "many-to-one"` to enforce this.
#'
#' Note that in SQL, most database providers won't let you specify a
#' many-to-many relationship between two tables, instead requiring that you
Expand Down Expand Up @@ -171,32 +171,32 @@
#' invalidated, an error is thrown.
#'
#' - `NULL`, the default, chooses:
#' - `"warn_many_to_many"` for equality joins
#' - `"warn-many-to-many"` for equality joins
#' - `"none"` for inequality joins, rolling joins, and overlap joins
#'
#' See the _Many-to-many relationships_ section for more details.
#'
#' - `"none"` doesn't perform any relationship checks.
#'
#' - `"one_to_one"` expects:
#' - `"one-to-one"` expects:
#' - Each row in `x` matches at most 1 row in `y`.
#' - Each row in `y` matches at most 1 row in `x`.
#'
#' - `"one_to_many"` expects:
#' - `"one-to-many"` expects:
#' - Each row in `y` matches at most 1 row in `x`.
#'
#' - `"many_to_one"` expects:
#' - `"many-to-one"` expects:
#' - Each row in `x` matches at most 1 row in `y`.
#'
#' - `"many_to_many"` doesn't perform any relationship checks, and is
#' - `"many-to-many"` doesn't perform any relationship checks, and is
#' identical to `"none"`, but is provided to allow you to be explicit about
#' this relationship if you know it exists.
#'
#' - `"warn_many_to_many"` doesn't assume there is any known relationship, but
#' - `"warn-many-to-many"` doesn't assume there is any known relationship, but
#' will warn if `x` and `y` have a many-to-many relationship
#' (which is typically unexpected), encouraging you to either take a closer
#' look at your inputs or make this relationship explicit by specifying
#' `"many_to_many"`.
#' `"many-to-many"`.
#'
#' `relationship` doesn't handle cases where there are zero matches. For that,
#' see `unmatched`.
Expand Down Expand Up @@ -232,8 +232,8 @@
#' df3 %>% left_join(df2)
#'
#' # In the rare case where a many-to-many relationship is expected, set
#' # `relationship = "many_to_many"` to silence this warning
#' df3 %>% left_join(df2, relationship = "many_to_many")
#' # `relationship = "many-to-many"` to silence this warning
#' df3 %>% left_join(df2, relationship = "many-to-many")
#'
#' # Use `join_by()` with a condition other than `==` to perform an inequality
#' # join. Here we match on every instance where `df1$x > df2$x`.
Expand Down
22 changes: 11 additions & 11 deletions man/mutate-joins.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

24 changes: 12 additions & 12 deletions tests/testthat/_snaps/join-rows.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
Detected an unexpected many-to-many relationship between `x` and `y`.
i Row 1 of `x` matches multiple rows in `y`.
i Row 1 of `y` matches multiple rows in `x`.
i If a many-to-many relationship is expected, set `relationship = "many_to_many"` to silence this warning.
i If a many-to-many relationship is expected, set `relationship = "many-to-many"` to silence this warning.

# join_rows() allows `unmatched` to be specified independently for inner joins

Expand All @@ -32,7 +32,7 @@
# join_rows() gives meaningful one-to-one errors

Code
join_rows(1, c(1, 1), relationship = "one_to_one")
join_rows(1, c(1, 1), relationship = "one-to-one")
Condition
Error:
! Each row in `x` must match at most 1 row in `y`.
Expand All @@ -41,7 +41,7 @@
---

Code
join_rows(c(1, 1), 1, relationship = "one_to_one")
join_rows(c(1, 1), 1, relationship = "one-to-one")
Condition
Error:
! Each row in `y` must match at most 1 row in `x`.
Expand All @@ -50,7 +50,7 @@
# join_rows() gives meaningful one-to-many errors

Code
join_rows(c(1, 1), 1, relationship = "one_to_many")
join_rows(c(1, 1), 1, relationship = "one-to-many")
Condition
Error:
! Each row in `y` must match at most 1 row in `x`.
Expand All @@ -59,7 +59,7 @@
# join_rows() gives meaningful many-to-one errors

Code
join_rows(1, c(1, 1), relationship = "many_to_one")
join_rows(1, c(1, 1), relationship = "many-to-one")
Condition
Error:
! Each row in `x` must match at most 1 row in `y`.
Expand All @@ -68,13 +68,13 @@
# join_rows() gives meaningful many-to-many warnings

Code
join_rows(c(1, 1), c(1, 1), relationship = "warn_many_to_many")
join_rows(c(1, 1), c(1, 1), relationship = "warn-many-to-many")
Condition
Warning:
Detected an unexpected many-to-many relationship between `x` and `y`.
i Row 1 of `x` matches multiple rows in `y`.
i Row 1 of `y` matches multiple rows in `x`.
i If a many-to-many relationship is expected, set `relationship = "many_to_many"` to silence this warning.
i If a many-to-many relationship is expected, set `relationship = "many-to-many"` to silence this warning.
Output
$x
[1] 1 1 2 2
Expand All @@ -92,7 +92,7 @@
Detected an unexpected many-to-many relationship between `x` and `y`.
i Row 1 of `x` matches multiple rows in `y`.
i Row 1 of `y` matches multiple rows in `x`.
i If a many-to-many relationship is expected, set `relationship = "many_to_many"` to silence this warning.
i If a many-to-many relationship is expected, set `relationship = "many-to-many"` to silence this warning.
Output
x
1 1
Expand Down Expand Up @@ -351,7 +351,7 @@
Condition
Warning:
Specifying `multiple = "error"` was deprecated in dplyr 1.1.1.
i Please use `relationship = "many_to_one"` instead.
i Please use `relationship = "many-to-one"` instead.
Error:
! Each row in `x` must match at most 1 row in `y`.
i Row 2 of `x` matches multiple rows in `y`.
Expand All @@ -363,7 +363,7 @@
Condition
Warning:
Specifying `multiple = "error"` was deprecated in dplyr 1.1.1.
i Please use `relationship = "many_to_one"` instead.
i Please use `relationship = "many-to-one"` instead.
Error in `left_join()`:
! Each row in `x` must match at most 1 row in `y`.
i Row 2 of `x` matches multiple rows in `y`.
Expand All @@ -375,7 +375,7 @@
Condition
Warning:
Specifying `multiple = "warning"` was deprecated in dplyr 1.1.1.
i Please use `relationship = "many_to_one"` instead.
i Please use `relationship = "many-to-one"` instead.
Warning:
Each row in `x` is expected to match at most 1 row in `y`.
i Row 2 of `x` matches multiple rows.
Expand All @@ -387,7 +387,7 @@
Condition
Warning:
Specifying `multiple = "warning"` was deprecated in dplyr 1.1.1.
i Please use `relationship = "many_to_one"` instead.
i Please use `relationship = "many-to-one"` instead.
Warning in `left_join()`:
Each row in `x` is expected to match at most 1 row in `y`.
i Row 2 of `x` matches multiple rows.
Expand Down
4 changes: 2 additions & 2 deletions tests/testthat/_snaps/join.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
Detected an unexpected many-to-many relationship between `x` and `y`.
i Row 1 of `x` matches multiple rows in `y`.
i Row 1 of `y` matches multiple rows in `x`.
i If a many-to-many relationship is expected, set `relationship = "many_to_many"` to silence this warning.
i If a many-to-many relationship is expected, set `relationship = "many-to-many"` to silence this warning.

# mutating joins compute common columns

Expand Down Expand Up @@ -202,7 +202,7 @@
# `by = character()` technically respects `relationship`

Code
left_join(df, df, by = character(), relationship = "many_to_one")
left_join(df, df, by = character(), relationship = "many-to-one")
Condition
Error in `left_join()`:
! Each row in `x` must match at most 1 row in `y`.
Expand Down
4 changes: 2 additions & 2 deletions tests/testthat/test-group-by.R
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,8 @@ test_that("joins preserve grouping", {
df <- data.frame(x = rep(1:2, each = 4), y = rep(1:4, each = 2))
g <- group_by(df, x)

expect_equal(group_vars(inner_join(g, g, by = c("x", "y"), relationship = "many_to_many")), "x")
expect_equal(group_vars(left_join(g, g, by = c("x", "y"), relationship = "many_to_many")), "x")
expect_equal(group_vars(inner_join(g, g, by = c("x", "y"), relationship = "many-to-many")), "x")
expect_equal(group_vars(left_join(g, g, by = c("x", "y"), relationship = "many-to-many")), "x")
expect_equal(group_vars(semi_join(g, g, by = c("x", "y"))), "x")
expect_equal(group_vars(anti_join(g, g, by = c("x", "y"))), "x")
})
Expand Down
12 changes: 6 additions & 6 deletions tests/testthat/test-join-rows.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
test_that("`relationship` default behavior is correct", {
# "warn_many_to_many" for equality joins
# "warn-many-to-many" for equality joins
expect_snapshot(out <- join_rows(c(1, 1), c(1, 1), condition = "=="))
expect_equal(out$x, c(1L, 1L, 2L, 2L))
expect_equal(out$y, c(1L, 2L, 1L, 2L))
Expand Down Expand Up @@ -170,28 +170,28 @@ test_that("join_rows() expects incompatible type errors to have been handled by

test_that("join_rows() gives meaningful one-to-one errors", {
expect_snapshot(error = TRUE, {
join_rows(1, c(1, 1), relationship = "one_to_one")
join_rows(1, c(1, 1), relationship = "one-to-one")
})
expect_snapshot(error = TRUE, {
join_rows(c(1, 1), 1, relationship = "one_to_one")
join_rows(c(1, 1), 1, relationship = "one-to-one")
})
})

test_that("join_rows() gives meaningful one-to-many errors", {
expect_snapshot(error = TRUE, {
join_rows(c(1, 1), 1, relationship = "one_to_many")
join_rows(c(1, 1), 1, relationship = "one-to-many")
})
})

test_that("join_rows() gives meaningful many-to-one errors", {
expect_snapshot(error = TRUE, {
join_rows(1, c(1, 1), relationship = "many_to_one")
join_rows(1, c(1, 1), relationship = "many-to-one")
})
})

test_that("join_rows() gives meaningful many-to-many warnings", {
expect_snapshot({
join_rows(c(1, 1), c(1, 1), relationship = "warn_many_to_many")
join_rows(c(1, 1), c(1, 1), relationship = "warn-many-to-many")
})

# With proof that the defaults flow through user facing functions
Expand Down
Loading