-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid cleaning the data mask #3443
Conversation
by only sharing proxy objects with a weak connection to the data source
Not sure I can review much without spending lots of time on it. I guess I'll wait until I land on that part of the code again. |
I don't have time to review deeply but the approach seems reasonable. The evaluation semantics it creates on the R side are better because we now get warnings when the dplyr objects have been destroyed. I think it should be errors instead of warnings though. Ideally we'd provide slow substitutes for the requested actions, perhaps just evaluate some R code. To accomplish this the active binding should provide the mask environment alongside the column name to the callback handler. A reference to the mask can be kept in the closure env of the active binding. From there we should be able to get the data necessary to complete the operation. Or am I missing something? Hmm.... Yeah for grouped operations that would be harder. If a closure or formula is created during a grouped operation, how do we keep track of the group id? Perhaps by creating children of the mask for each group and keep a group reference there? Then the caller env would have to be passed to the callback handler along with the column name. It'd likely be a lot of work to fix corner case semantics and no guarantee we'll be able to make it 100% robust. Let's just mark it as a limitation of eval semantics in dplyr. Edit: Or provide a slow substitute for the ungrouped case and fail with an error for the grouped case. That seems to make sense. |
Thanks. By "slow substitutes for the requested actions", do you mean that I'm hesitant to throw an error, because this operation returned a silent |
If a downstream dep relies on that NULL return it's likely a bug. The warning will cause CMD check failures anyway so we might as well throw an error I think. |
yes, I generally don't mind breaking dependencies that rely on bugs. |
Why would a warning cause a failure in |
Doesn't CRAN examine warnings in revdep checks just like revdepcheck does? I'm not sure but I think they do. |
Check |
Also, throwing an error would fail for the original use case, krlmlr/bindrcpp#2. |
Hmmm. Looks like the new behaviour will make warnings pop up out of nowhere in many cases :/ |
@krlmlr I think we might want to reverse this if that causes warnings in knitr. |
I think leaking the data mask and accessing it later will be rare. We can revisit when users start asking us about "Hybrid callback proxy out of scope" warnings ;-) |
People do all sorts of things in |
knitr caching serializes objects from the global environment. It's very easy to turn off the warning if you think it's the right thing to do. |
It's definitely not the right thing to do if we keep the objects there. |
Can you please rephrase? What do you suggest we should do? |
I mean we shouldn't turn off warnings when the variables are accessed. |
This means that knitr users may see these warnings occasionally in conjunction with cached chunks. I don't mind for now. |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
by only sharing proxy objects with a weak reference to the data source. Closes #3318.
@romainfrancois: This patch covers pretty much all of the current implementation to access variables in a grouped (and also ungrouped) operation. The patch only changes ownership: Previously we held ownership of objects used to handle the callbacks from R, now we create proxy objects that are owned by an
XPtr
and are aware if the underlying real object is still alive.@romainfrancois @lionel-: I haven't found a way around using two proxy objects: one, because I need a
shared_ptr<>
, and the second which I actually share that contains the weak reference to the first proxy object. Is there a better way? Also, happy to take feedback on the naming and the comments, and of course the implementation.