Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copying a filtered collection with ImmutableSet.copyOf() should require visiting each element only once #7143

Closed
1 task done
asheldon opened this issue Apr 9, 2024 · 1 comment · Fixed by #7144
Closed
1 task done
Assignees
Labels
P2 package=collect type=defect Bug, not working as expected type=enhancement Make an existing feature better

Comments

@asheldon
Copy link

asheldon commented Apr 9, 2024

Description

Since Google Guava 33.1.0 in 5c4f5b2, using ImmutableSet to copy a filtered collection requires iterating over and applying the predicate function, along with any side effects, to each element twice. The expected number of times is once per element.

This change message seems to indicate this behavior was understood by the author.

I could imagine bad performance for users who call ImmutableSet.copyOf(Collections.filter(...)). If that's an issue in practice, then maybe we should insert a special case for it inside copyOf. I at least made sure to call size() only once.

I believe the code should handle this case if only to avoid unnecessary executions of the filter function which may negate any potential allocation savings. If this change cannot be made, please consider replacing size() with isEmpty(). For collections where size() is not constant time, isEmpty() is almost always cheaper, including for filtered collections.

Example

import com.google.common.collect.Collections2;
import com.google.common.collect.ImmutableSet;

import java.util.ArrayList;
import java.util.List;

public class FilteredImmutableRepro {
    public static void main(String[] unused) {
        List<String> example = new ArrayList<>();
        example.add("one");
        example.add("two");
        example.add("three");

        ImmutableSet.copyOf(Collections2.filter(example, element -> {
            System.out.println(element);
            return true;
        }));
    }
}

Expected Behavior

one
two
three

Actual Behavior

one
two
three
one
two
three

Packages

com.google.common.collect

Platforms

No response

Checklist

@asheldon asheldon added the type=defect Bug, not working as expected label Apr 9, 2024
@cpovirk
Copy link
Member

cpovirk commented Apr 9, 2024

Huh, why didn't I use isEmpty() there? Oh, I see: The "more significant optimization [that] may be coming in cl/590212390" would use size(), and I didn't reconsider that when I carved off a less ambitious optimization to try first.

I can change to isEmpty(), but it is possible that a future change will switch back to size(). As you observed, the filtered-collection problem is a cost that I think we'd be willing to pay in general: We regret providing filtered collections, and we believe that the JDK folks took a better approach when they designed Stream. ImmutableSet is probably one of many APIs in Guava that both calls size() and then iterates, so filtered collections are going to remain susceptible to this general kind of issue.

Still, there's no reason to use size() today, even setting aside filtered-collection concerns. So I can change to isEmpty() to at least buy you some time to consider taking further action. (I suspect that you could switch to ImmutableSet.copyOf(ImmutableList.copyOf(filteredCollection)), but I do encourage moving as far away from filtered collections as you can get away with :))

@cpovirk cpovirk self-assigned this Apr 9, 2024
@cpovirk cpovirk added type=enhancement Make an existing feature better package=collect P2 labels Apr 9, 2024
copybara-service bot pushed a commit that referenced this issue Apr 9, 2024
That way is simpler.

It's also somewhat better for users who pass a filtered collection, **but:**
- We may undo this change in the future (as we are considering approaches that would actually use the result of `size` for more than an emptiness check).
- A nonempty filtered collection will still apply the filter at least once during the `isEmpty` call, so that can still cause trouble if the filter has side effects or "call the filter until it returns `true`" is slow enough to matter.
- Other methods in Guava likely still trigger similar behavior when given a filtered collection, as do methods outside of Guava.

Fixes #7143 (for now)

RELNOTES=n/a
PiperOrigin-RevId: 623223284
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 package=collect type=defect Bug, not working as expected type=enhancement Make an existing feature better
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants