Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

queryfrontend 20% latency overhead #4571

Closed
bwplotka opened this issue Aug 17, 2021 · 10 comments
Closed

queryfrontend 20% latency overhead #4571

bwplotka opened this issue Aug 17, 2021 · 10 comments

Comments

@bwplotka
Copy link
Member

We see that on first query range queries (cold cache) can take extra 20% of time if they go through query-frontend query range.

AC:

  • We understand the bottlenecks
  • We have micro benchmark for this path
  • Ideally: We fix latency. It should not be 20% on ~1m query IMO

cc @matej-g

@GiedriusS
Copy link
Member

GiedriusS commented Aug 17, 2021

I was thinking that maybe our HTTP client isn't honoring keep-alive properly. I believe this is the relevant place where the request is handled: https://github.com/cortexproject/cortex/blob/master/pkg/frontend/transport/handler.go#L119. Isn't this missing a io.Copy(ioutil.Discard, r.Body) just like we do in other places (https://awmanoj.github.io/tech/2016/12/16/keep-alive-http-requests-in-golang/)?

@GiedriusS
Copy link
Member

Yes, so that is the problem albeit in a different place + the default http.DefaultTransport is used which is quite limited in the number of idle connections: https://github.com/cortexproject/cortex/blob/master/pkg/frontend/downstream_roundtripper.go#L39. Let me try to fix this.

@matej-g
Copy link
Collaborator

matej-g commented Aug 18, 2021

@GiedriusS wow, impressively fast action 😁. Once this lands in Cortex I'll be happy to poke around some more and see how much of improvement we get over the latency mentioned in the description that we are currently observing on our side.

GiedriusS added a commit to GiedriusS/thanos that referenced this issue Sep 1, 2021
Make the HTTP downstream tripper configurable so that it would be
possible to set the number of maximum idle connections per host to more
than 2 so that HTTP keep-alive connections could be used on higher
loads.

Solves thanos-io#4571.

Signed-off-by: Giedrius Statkevičius <[email protected]>
bwplotka pushed a commit that referenced this issue Sep 1, 2021
* query-frontend: make HTTP downstream tripper configurable

Make the HTTP downstream tripper configurable so that it would be
possible to set the number of maximum idle connections per host to more
than 2 so that HTTP keep-alive connections could be used on higher
loads.

Solves #4571.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* CHANGELOG: add entry

Signed-off-by: Giedrius Statkevičius <[email protected]>
@GiedriusS
Copy link
Member

This should be fixed with #4623 and setting max_idle_conns_per_host to a high number such as 100. @matej-g could you please try this out?

@matej-g
Copy link
Collaborator

matej-g commented Sep 13, 2021

Hey @GiedriusS,
So I did a number of 'macro' tests last week (comparable to our original latency report). I compared latest main with v0.22.0. I did tests both with the interactive example locally (while spinning up 2 different versions of query frontend) and on a test cluster on AWS. In all cases I used the same data, generated by the interactive example via thanosbench tool (this created ~2 weeks worth of data).
I also tested a number of different settings (with cache enabled / disabled; with split interval 24h / 1h / disabled). The conclusion is however that I could not see any discernible difference between the previous version and the version with configurable roundtripper (while using the recommended setting with 100 max idle connections per host). Queries took roughly the same time on both instance of query frontend in all cases.

However, one interesting finding is about the overhead of doing query splitting / 'stitching' it back together. Below I'm pasting some numbers and queries I ran, but it seems that on a 1 week range with the default 24h splitting, the response time can be as much as 2-3x higher in comparison to going directly through query (and the overhead seems to increase with the range).

EDIT: OK, I think this was caused by the fact that the interactive tests overrides the default for max concurrent queries to 1 which I wasn't aware of, which was causing the increase wait time for split queries.

@GiedriusS
Copy link
Member

Hey @GiedriusS,
So I did a number of 'macro' tests last week (comparable to our original latency report). I compared latest main with v0.22.0. I did tests both with the interactive example locally (while spinning up 2 different versions of query frontend) and on a test cluster on AWS. In all cases I used the same data, generated by the interactive example via thanosbench tool (this created ~2 weeks worth of data).
I also tested a number of different settings (with cache enabled / disabled; with split interval 24h / 1h / disabled). The conclusion is however that I could not see any discernible difference between the previous version and the version with configurable roundtripper (while using the recommended setting with 100 max idle connections per host). Queries took roughly the same time on both instance of query frontend in all cases.

However, one interesting finding is about the overhead of doing query splitting / 'stitching' it back together. Below I'm pasting some numbers and queries I ran, but it seems that on a 1 week range with the default 24h splitting, the response time can be as much as 2-3x higher in comparison to going directly through query (and the overhead seems to increase with the range).

EDIT: OK, I think this was caused by the fact that the interactive tests overrides the default for max concurrent queries to 1 which I wasn't aware of, which was causing the increase wait time for split queries.

Mhm, is it reproducible when more maximum concurrent queries are permitted?

@matej-g
Copy link
Collaborator

matej-g commented Sep 16, 2021

It's not reproducible with higher number of concurrent queries (e.g. the default 4 or higher) - with that configuration, the times are pretty similar for both query and query frontend.

I am actually now unsure about how precise the original report we received about those 20% was and whether there might have been something else going on that would explain the difference. But from what I can conclude from my tests, there is no perceivable difference for the end user regardless of whether doing query through frontend or directly on query (with the obvious exception of the case when the operator sets query to handle only 1 concurrent query, which introduces the wait time for split queries). I will clarify with the reporter of the issue once again to wrap this up and ensure I did not miss anything.

@matej-g
Copy link
Collaborator

matej-g commented Oct 1, 2021

So as I said in my findings above I could not confirm the issue with 20% latency increase on query frontend, this is for me the conclusion of the issue.

As for the rest of the task, I opened a PR to add some benchmarks to query frontend, so feel free to have a look - #4723

someshkoli pushed a commit to someshkoli/thanos that referenced this issue Nov 7, 2021
…4623)

* query-frontend: make HTTP downstream tripper configurable

Make the HTTP downstream tripper configurable so that it would be
possible to set the number of maximum idle connections per host to more
than 2 so that HTTP keep-alive connections could be used on higher
loads.

Solves thanos-io#4571.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* CHANGELOG: add entry

Signed-off-by: Giedrius Statkevičius <[email protected]>
@GiedriusS
Copy link
Member

@matej-g let's close this, then? (:

@matej-g
Copy link
Collaborator

matej-g commented Nov 12, 2021

Ah, right 🙂 Please close it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants