Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9396: Improve truncation detection for points. #1557

Merged
merged 5 commits into from
Jun 16, 2020

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Jun 9, 2020

TestAllFilesDetectTruncation still passes with this change.

@jpountz jpountz changed the title LUCENE-9396: Improve corruption detection for points. LUCENE-9396: Improve truncation detection for points. Jun 10, 2020
@rmuir
Copy link
Member

rmuir commented Jun 10, 2020

Should we keep the existing check also? It does more than just check length, it validates this "footer" too (e.g. detecting invalid bits).

@jpountz
Copy link
Contributor Author

jpountz commented Jun 11, 2020

@rmuir I combined them in an overloaded retrieveChecksum(IndexInput, long) variant, what do you think?

@rmuir
Copy link
Member

rmuir commented Jun 12, 2020

I like the new organization by having an optional parameter to retrieve. But I think the exception handling can be simpler here? I think it would be better to check the file's length and if its wrong throw exception, then call retrieveChecksum? Since we know the length beforehand, there isn't sense IMO in doing suppressed dances here.

@rmuir
Copy link
Member

rmuir commented Jun 12, 2020

i'm proposing something like this for the new method:

if (expectedLength < footerLength()) {
  throw new IllegalArgumentException("expectedLength cannot be less than the footer length");
}
if (in.length() < expectedLength) {
  throw new CorruptIndexException("truncated file: length=" + in.length() + " but expectedLength==" + expectedLength, in);
} else if (in.length() > expectedLength) {
  throw new CorruptIndexException("file too long: length=" + in.length() + " but expectedLength==" + expectedLength, in);
}
return retrieveChecksum(in);

@jpountz
Copy link
Contributor Author

jpountz commented Jun 12, 2020

I liked doing the suppressed dance to still get information about the shape of the footer, e.g. for the case when the footer would be correct but the length retrieved from the meta file would be incorrect, similarly to how checkFooter tells us that the checksum passed when reading a file failed even though the checksum is correct. No strong feelings, I don't mind removing it if you don't like it.

@rmuir
Copy link
Member

rmuir commented Jun 12, 2020

How would the length in the meta file be incorrect? It comes from a checksum-verified file, right?

I think with the current PR, you'll get a somewhat confusing exception for a truncated file, probably something like this:

CorruptIndexException("misplaced codec footer (file truncated?): remaining=" + remaining + ", expected=" + expected + ", fp=" + in.getFilePointer(), in)
   |--- suppressed: new CorruptIndexException("truncated file: length=" + in.length() + " but expectedLength==" + expectedLength, in)

Since we actually know the length up-front, we can take advantage of that to just deliver the better exception: truncated file directly to the user?

@jpountz
Copy link
Contributor Author

jpountz commented Jun 12, 2020

I think it's doing the opposite actually:

CorruptIndexException("truncated file: length=" + in.length() + " but expectedLength==" + expectedLength, in)
   |--- suppressed: CorruptIndexException("misplaced codec footer (file truncated?): remaining=" + remaining + ", expected=" + expected + ", fp=" + in.getFilePointer(), in)

@rmuir
Copy link
Member

rmuir commented Jun 12, 2020

Yeah, you are right, its not that bad. But still i would prefer to just deliver the clear exception without adding a less-clear suppressed one.

@jpountz
Copy link
Contributor Author

jpountz commented Jun 16, 2020

I implemented your suggestion.

@jpountz jpountz merged commit 2b61b20 into apache:master Jun 16, 2020
@jpountz jpountz deleted the points_corruption_detection branch June 16, 2020 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants