Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact: accept malformed index #953

Merged
merged 8 commits into from
Mar 25, 2019

Conversation

jjneely
Copy link
Contributor

@jjneely jjneely commented Mar 20, 2019

This PR attempts to address #888

Changes

Verification

I've run this code with and without the added CLI option against my Prometheus TSDB blocks known to exhibit this condition. The condition is flagged appropriately. New compacted blocks are built.

WIP Status

Handle cases where we detect that postings have labels listed
incorrectly due to Prometheus Isuue thanos-io#5372.  With a command line option
set these specific errors can be ignored as they happen with Prometheus
2.8.0 and lesser versions.
The VerifyIndex() function explicitly states testing of the invariant
ordering, so rather than adding additional parameters to change its
behavior when --debug.accept-malformed-index is set, we skip the
verification step on the newly compacted TSDB block.  This allows
compaction to happen as normal when out of order labels are present in
the index.
@jjneely
Copy link
Contributor Author

jjneely commented Mar 21, 2019

Circleci failes with: reloader_test.go:352: unexpected error: build hash: open /tmp/reloader-rules-test545805072/rules.yaml: no such file or directory

Not sure what this is, but I don't believe this relates to my changes...

pkg/compact/compact.go Outdated Show resolved Hide resolved
pkg/compact/compact.go Outdated Show resolved Hide resolved
pkg/compact/compact.go Outdated Show resolved Hide resolved
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I like this, but let's address @povilasv comments before merge.

Sorry for this CI fail it's a flakiness. I restarted build.

pkg/compact/compact.go Outdated Show resolved Hide resolved
* Use fields instead of function parameters
* use the same variable name everywhere
@bwplotka
Copy link
Member

Can we remove WIP from title and commit?

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one nit ((:

@@ -908,7 +918,7 @@ func (c *BucketCompactor) Compact(ctx context.Context) error {

level.Info(c.logger).Log("msg", "start of compaction")

groups, err := c.sy.Groups()
groups, err := c.sy.Groups(c.acceptMalformedIndex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put to group field as well? Essentially if it is option like this then it suggests that we change this dynamically, which we don't. Also bool argument in function is discouraged ):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at that this morning, which would mean I need to modify the Syncer to support this flag rather than BucketCompactor. We'd pass in the command line option here:

https://github.com/jjneely/thanos/blob/jjneely/accept-malformed-index/cmd/thanos/compact.go#L170

rather than here:

https://github.com/jjneely/thanos/blob/jjneely/accept-malformed-index/cmd/thanos/compact.go#L200

That seemed like not a straight forward path. Then I found that compaction Groups were created by the Syncer rather than the BlockCompactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed.

@jjneely jjneely changed the title WIP: compact: accept malformed index compact: accept malformed index Mar 22, 2019
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks (:

@jjneely
Copy link
Contributor Author

jjneely commented Mar 22, 2019

Up for a rerun of CircleCI? The failure seems new and unrelated most likely....

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job! Just two minor nits about a comment and a log message, and we can merge this.

OutOfOrderLabels int
}

// PrometheusIssue5372Err returns an error if statsd indicates postings with out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording seems a bit off here. Perhaps: statsd indicates -> stats indicate ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the most recent patch

return stats, errors.Errorf("out-of-order label set %s for series %d", lset, id)
stats.OutOfOrderLabels++
level.Warn(logger).Log("msg",
"out-of-order label set: known bug in Prometheus 2.8.0 and below",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should suggest users here to enable --debug.accept-malformed-index here? I mean we would only tell users about a known bug now but wouldn't tell them how to fix it so that's weird 😄 we could be more user-friendly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this in pkg/compact/compact.go where the error is generated that would otherwise halt execution of the compact component. Rather than here were it just becomes noise when the CLI option is enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that's even better 👍

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you a lot for your work on this and for everyone's reviews :)

@GiedriusS GiedriusS merged commit c75388a into thanos-io:master Mar 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants