Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid cleaning the data mask #3443

Merged
merged 1 commit into from
Mar 19, 2018
Merged

Conversation

krlmlr
Copy link
Member

@krlmlr krlmlr commented Mar 18, 2018

by only sharing proxy objects with a weak reference to the data source. Closes #3318.

@romainfrancois: This patch covers pretty much all of the current implementation to access variables in a grouped (and also ungrouped) operation. The patch only changes ownership: Previously we held ownership of objects used to handle the callbacks from R, now we create proxy objects that are owned by an XPtr and are aware if the underlying real object is still alive.

@romainfrancois @lionel-: I haven't found a way around using two proxy objects: one, because I need a shared_ptr<>, and the second which I actually share that contains the weak reference to the first proxy object. Is there a better way? Also, happy to take feedback on the naming and the comments, and of course the implementation.

by only sharing proxy objects with a weak connection to the data source
@romainfrancois
Copy link
Member

Not sure I can review much without spending lots of time on it. I guess I'll wait until I land on that part of the code again.

@lionel-
Copy link
Member

lionel- commented Mar 19, 2018

I don't have time to review deeply but the approach seems reasonable. The evaluation semantics it creates on the R side are better because we now get warnings when the dplyr objects have been destroyed. I think it should be errors instead of warnings though.

Ideally we'd provide slow substitutes for the requested actions, perhaps just evaluate some R code. To accomplish this the active binding should provide the mask environment alongside the column name to the callback handler. A reference to the mask can be kept in the closure env of the active binding. From there we should be able to get the data necessary to complete the operation. Or am I missing something?

Hmm.... Yeah for grouped operations that would be harder. If a closure or formula is created during a grouped operation, how do we keep track of the group id? Perhaps by creating children of the mask for each group and keep a group reference there? Then the caller env would have to be passed to the callback handler along with the column name.

It'd likely be a lot of work to fix corner case semantics and no guarantee we'll be able to make it 100% robust. Let's just mark it as a limitation of eval semantics in dplyr.

Edit: Or provide a slow substitute for the ungrouped case and fail with an error for the grouped case. That seems to make sense.

@krlmlr
Copy link
Member Author

krlmlr commented Mar 19, 2018

Thanks. By "slow substitutes for the requested actions", do you mean that a$b should continue to work with a leaked overscope/data mask a? Let's think about this when somebody comes up with a use case ;-)

I'm hesitant to throw an error, because this operation returned a silent NULL before. Let's not risk breaking downstream dependencies, but I've added an item to our list if breaking changes.

@lionel-
Copy link
Member

lionel- commented Mar 19, 2018

If a downstream dep relies on that NULL return it's likely a bug. The warning will cause CMD check failures anyway so we might as well throw an error I think.

@romainfrancois
Copy link
Member

yes, I generally don't mind breaking dependencies that rely on bugs.

@krlmlr
Copy link
Member Author

krlmlr commented Mar 19, 2018

Why would a warning cause a failure in R CMD check?

@lionel-
Copy link
Member

lionel- commented Mar 19, 2018

Doesn't CRAN examine warnings in revdep checks just like revdepcheck does? I'm not sure but I think they do.

@krlmlr
Copy link
Member Author

krlmlr commented Mar 19, 2018

Check WARNINGs, yes. But not testthat warnings or warnings in vignettes.

@krlmlr krlmlr merged commit 20ef7bf into tidyverse:master Mar 19, 2018
@krlmlr krlmlr deleted the f-#3318-hybrid-proxy branch March 19, 2018 15:18
@krlmlr
Copy link
Member Author

krlmlr commented Mar 19, 2018

Also, throwing an error would fail for the original use case, krlmlr/bindrcpp#2.

@lionel-
Copy link
Member

lionel- commented Mar 19, 2018

Also, throwing an error would fail for the original use case, krlmlr/bindrcpp#2.

Hmmm. Looks like the new behaviour will make warnings pop up out of nowhere in many cases :/

@lionel-
Copy link
Member

lionel- commented Mar 19, 2018

@krlmlr I think we might want to reverse this if that causes warnings in knitr.

@krlmlr
Copy link
Member Author

krlmlr commented Mar 20, 2018

I think leaking the data mask and accessing it later will be rare. We can revisit when users start asking us about "Hybrid callback proxy out of scope" warnings ;-)

@lionel-
Copy link
Member

lionel- commented Mar 20, 2018

People do all sorts of things in do() and mutate() + map(), including creation of closures and formulas. A typical way would be stats models where the formula is recorded in the object. That means leaking the data mask is not that rare. I'm just concerned about this knitr thing that apparently examines all accessible variables?

@krlmlr
Copy link
Member Author

krlmlr commented Mar 20, 2018

knitr caching serializes objects from the global environment. It's very easy to turn off the warning if you think it's the right thing to do.

@lionel-
Copy link
Member

lionel- commented Mar 20, 2018

It's definitely not the right thing to do if we keep the objects there.

@krlmlr
Copy link
Member Author

krlmlr commented Mar 20, 2018

Can you please rephrase? What do you suggest we should do?

@lionel-
Copy link
Member

lionel- commented Mar 20, 2018

I mean we shouldn't turn off warnings when the variables are accessed.

@krlmlr
Copy link
Member Author

krlmlr commented Mar 20, 2018

This means that knitr users may see these warnings occasionally in conjunction with cached chunks. I don't mind for now.

@lock
Copy link

lock bot commented Sep 17, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Sep 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants