-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Querier: Deterministic buffered queries #4780
Comments
Looks good, I didn't think about this 👍 what do you mean by |
Good point - so we can buffer all, as we do right now, but to avoid having all series in memory on Quirer, we could put all in disk (tmp) and mmap. For large queries this is not too slow - as it reduces the peak mem, but allows consistent throughput. This would would be needed if we would allow fanning out to infinite cardinality queries. |
Looks good. Really nice diagrams and easy-to-understand tracing graphs as well. I am sure we have the same |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Yeah, this is still needed. If no one will pick this up soon then I might try implementing something. |
Taking a fresh look (I might use this case in my First - we need solid e2e benchmark for this |
👍 That would be great, we would be able to compare then. I'm working on implementing a tournament tree in Go. We could use a min-heap but it's not as efficient. Couldn't find any implementation of a loser tree in Go :s |
So essentially k-tree merge solution? |
Yeah. Planning to open a PR next week, almost done. Decided to go with a heap based solution for now since that is easier to maintain and I doubt there would be much of an improvement with a selection/tournament tree because it needs auxiliary nodes. We actually already have Series() benchmarks - the improvement looks sweet. |
PTAL fix: #5296 |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Done with k-way I believe. |
I plan to give a formal proposal + PoC, but starting an issue to cover the idea to solve the problem we have on the Querier fanout side which creates confusion, annoyance and overall extra latency.
I explained this problem to the team and on the latest Community Hours, but for initial info:
Problem
Current Querier algorithm looks following:
While it's not easy to spot, but what is happening in practice is that
MergeSort
is choosing which storeAPIs to fetch message from one-by-one. On average it's round robin, but that depends on order of series. In practice this looks as follows:If we do query (
start_timestamp_ms:1626134100000 end_timestamp_ms:1627344000000 matchers:<type:RE name:"__name__" value:"continuous_app_metric9.*" >
) for 2w from 5 different StoreAPIs (I literally usedinteractive_test
), it takes 6+ seconds on my machine:This test has Jaeger so I added lower granularity spans that tells us exactly wait-times for (bwplotka@e75ce70):
querier.streamSeriesSet.waitForStoreResult
querier.streamSeriesSet.waitForMergeSort
What we see for this query is that those go routines are starved for most of the times. E.g this sidecar fetch was waiting 500ms just to be able to pass it's data to merge sort:
This means sidecar itself gRPC service was waiting to send its messages too.
This happens for all stores. Essentially computation and slowest Stores causes additive latency which adds up.
And ONLY when we merge all, PromQL can start doing it's job due to prometheus/prometheus#7326
Consequences:
Note we see many of those "wait" times being significant in many messages writes:
Solution
The solution is kind of... simple. We have to buffer all and then merge all on top of it, then start eval PromQL as soon as possible. We have to fetch as fast as possible so StoreAPIs are free to perform further fetches and free memory. (sooner we do the job, the more we can do it!). Following diagram shows the general idea:
The text was updated successfully, but these errors were encountered: