Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-6697; JBOD configured broker should not die if log directory is invalid #4752

Merged
merged 5 commits into from
Jun 20, 2018

Conversation

lindong28
Copy link
Member

@lindong28 lindong28 commented Mar 21, 2018

Currently JBOD configured broker will still die on startup if dir.getCanonicalPath() throws IOException. We should mark such log directory as offline and broker should still run if there is good disk.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@ijuma
Copy link
Contributor

ijuma commented Mar 21, 2018

Thanks for the PR. Can we add a test?

@lindong28
Copy link
Member Author

@ijuma Yeah I am going to add a test. I am opening the pull request and Becket told me this looks good to him. I am applying the patch internally at LinkedIn first before working on the test.

@lindong28
Copy link
Member Author

@ijuma Sorry for the delay. Unit test is added now.

@lindong28 lindong28 force-pushed the KAFKA-6697 branch 3 times, most recently from db0fc20 to 8043ec4 Compare March 24, 2018 19:12
@lindong28
Copy link
Member Author

lindong28 commented May 29, 2018

@ijuma @hachikuji Do you have time to review and merge this bug fix for Kafka 2.0 release? Thanks!

Copy link
Contributor

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left a few comments and questions.

@@ -162,6 +160,11 @@ class LogManager(logDirs: Seq[File],
}
if (!dir.isDirectory || !dir.canRead)
throw new IOException(dir.getAbsolutePath + " is not a readable log directory.")

if (canonicalPaths.contains(dir.getCanonicalPath))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a comment that getCanonicalPath can fail if the disk is bad.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Added comment getCanonicalPath can throw IOException if the disk is bad.

@@ -162,6 +160,11 @@ class LogManager(logDirs: Seq[File],
}
if (!dir.isDirectory || !dir.canRead)
throw new IOException(dir.getAbsolutePath + " is not a readable log directory.")

if (canonicalPaths.contains(dir.getCanonicalPath))
throw new KafkaException("Duplicate log directory found: " + dirs.mkString(", "))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use String interpolation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I updated this method to use interpolation. Will take care in the future :)


if (canonicalPaths.contains(dir.getCanonicalPath))
throw new KafkaException("Duplicate log directory found: " + dirs.mkString(", "))
canonicalPaths.add(dir.getCanonicalPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can simplify by just checking the result of calling add?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Updated the patch as suggested.

*/
@Test
def testCreateLogWithInvalidLogDir() {
val dirs = Seq(logDir, new File("\u0000"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is this the behaviour we want? It seems to me that we should just fail start-up if log dirs is invalid (like this case). But we should handle the case where getCanonicalPath fails due to a bad disk better (which I think is the goal of this PR).

Copy link
Member Author

@lindong28 lindong28 May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is just a hack in the test to let getCanonicalPath throw exception. I agree that we should just fail startup if the dirs is invalid. Is there a way to determine whether the log dir is invalid other than using getCanonicalPath? If it is not easy, then maybe we can just stick to the current behavior where we will mark the log directory as offline if the log dir is invalid. User will notice the issue using either OfflineLogDirectoryCount sensor or by checking the exception in the log.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lindong28 Seems like the method is package private:

final boolean isInvalid() {
        if (status == null) {
            status = (this.path.indexOf('\u0000') < 0) ? PathStatus.CHECKED
                                                       : PathStatus.INVALID;
        }
        return status == PathStatus.INVALID;
    }

It's pretty trivial though, so we could just copy the implementation.

Copy link
Member Author

@lindong28 lindong28 Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma It seems like a naive test and the Java doc says returning false does not guarantee that the path is valid. Not sure if it is worthwhile to add 10 lines of code just to detect null character in the value of log directory config.

If we add this check in source code, I currently don't have a good idea to let getCanonicalPath() throw IOException. Do you like me to add this check in source code and then find another way to test the fix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's expand the comment about getCanonicalPath throwing an exception if the disk is bad to also mention that an invalid path throws the same error and we treat both the same at the moment. Also mention near where the File is created in the test that this triggers the same code path, but it's not the case we are trying to catch. It's just easier to simulate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Thanks for the suggestion. I have updated the comment as suggested.

@lindong28 lindong28 added this to the 2.0.0 milestone Jun 14, 2018
@lindong28
Copy link
Member Author

Added to 2.0.0 milestone since this is a bug fix and it is ready for review.

@ijuma ijuma self-assigned this Jun 18, 2018
Copy link
Contributor

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tweaked the comments a little, LGTM.

@ijuma ijuma merged commit fd969be into apache:trunk Jun 20, 2018
ijuma pushed a commit that referenced this pull request Jun 20, 2018
A broker with multiple log dirs will die on startup if dir.getCanonicalPath() throws
IOException for one of the log dirs. We should mark such log directory as offline
instead and the broker should start if there is a healthy log dir.

Reviewers: Ismael Juma <[email protected]>
@lindong28 lindong28 deleted the KAFKA-6697 branch June 21, 2018 19:14
ying-zheng pushed a commit to ying-zheng/kafka that referenced this pull request Jul 6, 2018
)

A broker with multiple log dirs will die on startup if dir.getCanonicalPath() throws
IOException for one of the log dirs. We should mark such log directory as offline
instead and the broker should start if there is a healthy log dir.

Reviewers: Ismael Juma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants