-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ExpressionFunctionValues should cache per-hit value [LUCENE-8574] #9620
Comments
Robert Muir (@rmuir) (migrated from JIRA) +1, I remember jack added this explicitly to prevent the situation where the same subexpr gets called multiple times for the same document and it wastefully recomputes the whole thing more than once. |
Alan Woodward (@romseygeek) (migrated from JIRA) There's a top-level cache in ExpressionValueSource but I think I missed the per-hit cache. It should be easy enough to add a caching DoubleValues implementation and wrap each variable. |
Robert Muir (@rmuir) (migrated from JIRA) I don't think we need any caching DoubleValues implementation. We can just simply put the code back. |
Robert Muir (@rmuir) (migrated from JIRA) Hopefully this is enough? If we have to handle the case where advanceExact() is called over and over again we have to add 'currentDoc' back too so it works more like jack's original cache. But we should avoid adding abstractions here that will make things harder on the compiler. |
Michael McCandless (@mikemccand) (migrated from JIRA) +1, patch looks good. |
ASF subversion and git services (migrated from JIRA) Commit 2991acf in lucene-solr's branch refs/heads/master from Patrick Zhai #10431: Upgrade HPPC to 0.8.2 (#1560)
|
Patrick Zhai (@zhaih) (migrated from JIRA) Sry, wrong commit message pointed to here, the correct issue should be #10431. BTW, is this patch ever merged? |
Patrick Zhai (@zhaih) (migrated from JIRA) I've checked the current release and couldn't see this patch merged. And I think there's no other changes introducing similar functionality (not so sure). Probably we should merge this? |
Patrick Zhai (@zhaih) (migrated from JIRA) I've attached a unit test showing a case that current code could not handle. And it seems the patch attached to this issue could not handle it as well (since DoubleValues generated for the same LeafReaderContext is not the same, we still get tons of DoubleValues created). |
Robert Muir (@rmuir) (migrated from JIRA) please, lets keep the boolean and not bring NaN into this. |
Patrick Zhai (@zhaih) (migrated from JIRA) Ah, yes will use boolean instead of NaN. I'm just verifying whether the patch works so quickly inserted few lines of code without much thinking. But how should we fix this issue correctly? Since the easy fix patch seems not solving the problem. |
Michael McCandless (@mikemccand) (migrated from JIRA)
+1
Hmm, good catch! So we somehow need to ensure that we use the same |
Patrick Zhai (@zhaih) (migrated from JIRA) Made a PR(apache/lucene-solr#1613) about this issue Basically making a new DoubleValuesSource class which pass in |
Michael McCandless (@mikemccand) (migrated from JIRA) @zhaih does this fix your combinatoric adversarial test case ( I.e. before the fix, that test runs for way too long (days), but after, it runs quickly? I left some small comments on the PR. Thanks for tackling this! |
Patrick Zhai (@zhaih) (migrated from JIRA) @mikemccand Yes it fixes the test case. Before the fix the test will cause Thanks for reviewing that, I've addressed those comments! |
Michael McCandless (@mikemccand) (migrated from JIRA) Thanks @zhaih I like that test case! Fibonacci should be simple computation :) I think the PR looks great. Since the pre-existing |
Michael McCandless (@mikemccand) (migrated from JIRA) Thanks @zhaih, the PR looks great, and now passes |
ASF subversion and git services (migrated from JIRA) Commit 60e0d8a in lucene-solr's branch refs/heads/master from Michael McCandless LUCENE-8574: the DoubleValues for dependent bindings for an expression are now cached and reused and no longer inefficiently recomputed per hit |
ASF subversion and git services (migrated from JIRA) Commit 1791d7e in lucene-solr's branch refs/heads/branch_8x from Michael McCandless LUCENE-8574: the DoubleValues for dependent bindings for an expression are now cached and reused and no longer inefficiently recomputed per hit |
Michael McCandless (@mikemccand) (migrated from JIRA) Thanks @zhaih! |
Adrien Grand (@jpountz) (migrated from JIRA) Closing after the 9.0.0 release |
The original version of
ExpressionFunctionValues
had a simple per-hit cache, so that nested expressions that reference the same common variable would compute the value for that variable the first time it was referenced and then use that cached value for all subsequent invocations, within one hit. I think it was accidentally removed in #8660?This is quite important if you have non-trivial expressions that reference the same variable multiple times.
E.g. if I have these expressions:
Then evaluating x should only cause b's value to be computed once (for a given hit), but today it's computed twice. The problem is combinatoric if b then references another variable multiple times, etc.
I think to fix this we just need to restore the per-hit cache?
Migrated from LUCENE-8574 by Michael McCandless (@mikemccand), resolved Jul 09 2020
Attachments: LUCENE-8574.patch, unit_test.patch
Pull requests: apache/lucene-solr#1560, apache/lucene-solr#1613
The text was updated successfully, but these errors were encountered: