Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store: If request ctx has an error we do not increment opsFailures counter #3179

Merged
merged 3 commits into from
Sep 21, 2020

Conversation

ipstatic
Copy link
Contributor

@ipstatic ipstatic commented Sep 16, 2020

Signed-off-by: Jarod Watkins [email protected]

Resolves #3149

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

If a request is cancelled or times out we will no longer increment the ops failure counter.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to just skip context.Canceled.

I think timeouts might be a legit failures, no?

@@ -412,7 +412,7 @@ func (b *metricBucket) Delete(ctx context.Context, name string) error {

start := time.Now()
if err := b.bkt.Delete(ctx, name); err != nil {
if !b.isOpFailureExpected(err) {
if !b.isOpFailureExpected(err) && ctx.Err() == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !b.isOpFailureExpected(err) && ctx.Err() == nil {
if !b.isOpFailureExpected(err) && ctx.Err() != context.Canceled {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree timeouts are legit failures, but this comment apply to any place, not just here, right? Also, is there a good reason why we don't we the context.Canceled check isOpFailureExpected()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I have observed, cancelled requests can cause timeouts:
Screen Shot 2020-09-10 at 10 43 58 AM

If you notice, query cancels the request but store reports it as aborted. In my opinion that should be reported as cancelled, and not increase the error count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or does Aborted != timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree timeouts are legit failures, but this comment apply to any place, not just here, right? Also, is there a good reason why we don't we the context.Canceled check isOpFailureExpected()?

If that is a better place for this then I have no problems making the change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isOpFailureExpected is really only for user specified function what to expect. I believe Cancel is valid to be ignored no matter what user specifies, no? (:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right @bwplotka 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to use that context.

@ipstatic
Copy link
Contributor Author

Is there a way for me to view the circleci results without giving them access to my Github account?

@bwplotka
Copy link
Member

@ipstatic
Copy link
Contributor Author

When I attempt to visit that URL, I get redirected to login via Github/Bitbucket and it requires read/write access to both public and private repos. I just didn't know if there was a way to view the logs "ready only" or not having to login at all.

@bwplotka
Copy link
Member

Try incognito mode - works just fine (:

Signed-off-by: Jarod Watkins <[email protected]>
Signed-off-by: Jarod Watkins <[email protected]>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thanks!

Some other solution at some point is to have error status as label (:

Good enough for now +1

@bwplotka bwplotka merged commit 49f332d into thanos-io:master Sep 21, 2020
@ipstatic
Copy link
Contributor Author

Thank you both!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store: Cancelled/Aborted GRPC Requests Increment thanos_objstore_bucket_operation_failures_total
3 participants