-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-6697; JBOD configured broker should not die if log directory is invalid #4752
Conversation
Thanks for the PR. Can we add a test? |
@ijuma Yeah I am going to add a test. I am opening the pull request and Becket told me this looks good to him. I am applying the patch internally at LinkedIn first before working on the test. |
@ijuma Sorry for the delay. Unit test is added now. |
db0fc20
to
8043ec4
Compare
@ijuma @hachikuji Do you have time to review and merge this bug fix for Kafka 2.0 release? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, left a few comments and questions.
@@ -162,6 +160,11 @@ class LogManager(logDirs: Seq[File], | |||
} | |||
if (!dir.isDirectory || !dir.canRead) | |||
throw new IOException(dir.getAbsolutePath + " is not a readable log directory.") | |||
|
|||
if (canonicalPaths.contains(dir.getCanonicalPath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding a comment that getCanonicalPath
can fail if the disk is bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Added comment getCanonicalPath can throw IOException if the disk is bad
.
@@ -162,6 +160,11 @@ class LogManager(logDirs: Seq[File], | |||
} | |||
if (!dir.isDirectory || !dir.canRead) | |||
throw new IOException(dir.getAbsolutePath + " is not a readable log directory.") | |||
|
|||
if (canonicalPaths.contains(dir.getCanonicalPath)) | |||
throw new KafkaException("Duplicate log directory found: " + dirs.mkString(", ")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use String interpolation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I updated this method to use interpolation. Will take care in the future :)
|
||
if (canonicalPaths.contains(dir.getCanonicalPath)) | ||
throw new KafkaException("Duplicate log directory found: " + dirs.mkString(", ")) | ||
canonicalPaths.add(dir.getCanonicalPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can simplify by just checking the result of calling add
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Updated the patch as suggested.
*/ | ||
@Test | ||
def testCreateLogWithInvalidLogDir() { | ||
val dirs = Seq(logDir, new File("\u0000")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is this the behaviour we want? It seems to me that we should just fail start-up if log dirs is invalid (like this case). But we should handle the case where getCanonicalPath
fails due to a bad disk better (which I think is the goal of this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is just a hack in the test to let getCanonicalPath
throw exception. I agree that we should just fail startup if the dirs is invalid. Is there a way to determine whether the log dir is invalid other than using getCanonicalPath
? If it is not easy, then maybe we can just stick to the current behavior where we will mark the log directory as offline if the log dir is invalid. User will notice the issue using either OfflineLogDirectoryCount sensor or by checking the exception in the log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lindong28 Seems like the method is package private:
final boolean isInvalid() {
if (status == null) {
status = (this.path.indexOf('\u0000') < 0) ? PathStatus.CHECKED
: PathStatus.INVALID;
}
return status == PathStatus.INVALID;
}
It's pretty trivial though, so we could just copy the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ijuma It seems like a naive test and the Java doc says returning false does not guarantee that the path is valid
. Not sure if it is worthwhile to add 10 lines of code just to detect null character in the value of log directory config.
If we add this check in source code, I currently don't have a good idea to let getCanonicalPath() throw IOException. Do you like me to add this check in source code and then find another way to test the fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's expand the comment about getCanonicalPath
throwing an exception if the disk is bad to also mention that an invalid path throws the same error and we treat both the same at the moment. Also mention near where the File is created in the test that this triggers the same code path, but it's not the case we are trying to catch. It's just easier to simulate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Thanks for the suggestion. I have updated the comment as suggested.
Added to 2.0.0 milestone since this is a bug fix and it is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tweaked the comments a little, LGTM.
A broker with multiple log dirs will die on startup if dir.getCanonicalPath() throws IOException for one of the log dirs. We should mark such log directory as offline instead and the broker should start if there is a healthy log dir. Reviewers: Ismael Juma <[email protected]>
) A broker with multiple log dirs will die on startup if dir.getCanonicalPath() throws IOException for one of the log dirs. We should mark such log directory as offline instead and the broker should start if there is a healthy log dir. Reviewers: Ismael Juma <[email protected]>
Currently JBOD configured broker will still die on startup if dir.getCanonicalPath() throws IOException. We should mark such log directory as offline and broker should still run if there is good disk.
Committer Checklist (excluded from commit message)