Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify differences between count() and tally() #5349

Merged
merged 8 commits into from
Jun 29, 2020
Merged

Clarify differences between count() and tally() #5349

merged 8 commits into from
Jun 29, 2020

Conversation

hadley
Copy link
Member

@hadley hadley commented Jun 22, 2020

And revert count() to 0.8.5 behaviour. Fixes #5298

And revert count() to 0.8.5 behaviour. Fixes #5163
@hadley hadley changed the title Clearly differences between count() and tally() Clarify differences between count() and tally() Jun 22, 2020
@yutannihilation

This comment has been minimized.

@hadley

This comment has been minimized.

@hadley
Copy link
Member Author

hadley commented Jun 23, 2020

@lionel- could you check my reasoning here please?

Copy link
Member

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count() and tally() have always behaved differently because they use different default values (NULL vs missing): dc92c24

The historical behaviour is:

  • tally() guesses the weight variable
  • count() doesn't guess.

This behaviour was preserved when we switched to tidy eval. However it was inadvertently changed in 0.8.2 with #4408. Since that version (almost one year old) neither count() nor tally() try to guess the weighting column.

I think not guessing is the expected behaviour for most people. I also like that count and tally are consistent. Maybe we should sanction the 0.8.2 behaviour that the weight column is never guessed? In that case, we don't need the guess_wt() sentinel.

NEWS.md Outdated Show resolved Hide resolved
R/count-tally.R Outdated Show resolved Hide resolved
R/count-tally.R Outdated Show resolved Hide resolved
R/count-tally.R Outdated Show resolved Hide resolved
R/count-tally.R Outdated Show resolved Hide resolved
@hadley
Copy link
Member Author

hadley commented Jun 23, 2020

Ok, if we accidentally broke the autoguessing in 0.8.2, then lets remove it all together.

@hadley
Copy link
Member Author

hadley commented Jun 23, 2020

@yutannihilation since you have an eye for detail, would you mind taking a look at this PR. Most importantly, is the reasoning in the NEWS clear, and does it make sense to you? Thanks!

@yutannihilation
Copy link
Member

Looks good! Just one thing to confirm, is it safe to keep using wt = n()? If it's a thing that the user is recommended to change, some words on this might be needed in the NEWS item.

@hadley
Copy link
Member Author

hadley commented Jun 24, 2020

Good point, probably worth keeping that check around just as a precaution.

Copy link
Member

@romainfrancois romainfrancois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will be simpler to explain (including to self).

(#5324).
* `count()` and `tally()` no longer automatically weights by column `n` if
present (#5298). dplyr 1.0.0 introduced this behaviour because of Hadley's
faulty memory. Historically `tally()` automatically weighted and `count()`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂

@romainfrancois romainfrancois added this to the 1.0.1 milestone Jun 29, 2020
@hadley hadley merged commit bc49875 into master Jun 29, 2020
@hadley hadley deleted the count-wt branch June 29, 2020 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default behavior of count() seems to have changed in 1.0.0
4 participants