-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-13972; Ensure replica state deleted after reassignment cancellation #13107
KAFKA-13972; Ensure replica state deleted after reassignment cancellation #13107
Conversation
core/src/main/scala/kafka/controller/ControllerChannelManager.scala
Outdated
Show resolved
Hide resolved
@jolshan @dajac This patch has been updated to loosen the epoch check on the broker side. The original approach seemed a little risky in the case a reassignment is cancelled and resubmitted. It might be possible for a |
@@ -390,7 +390,7 @@ class ReplicaManager(val config: KafkaConfig, | |||
// epoch, a sentinel value (NoEpoch) is used and bypass the epoch validation. | |||
if (requestLeaderEpoch == LeaderAndIsr.EpochDuringDelete || | |||
requestLeaderEpoch == LeaderAndIsr.NoEpoch || | |||
requestLeaderEpoch > currentLeaderEpoch) { | |||
requestLeaderEpoch >= currentLeaderEpoch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious -- was the epoch check put in place because we were concerned about stale stop replicas? Just trying to figure out why we need it and the implications for adding the current epoch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is part of KIP-570. I will take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the concern there was reassignment. I think equal to the leader epoch is ok though because if we are a replica for the current leader epoch then we can't send a stop replica unless the reassignment was cancelled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main thing is that a new reassignment will always have a leader epoch bump, so we are still ensured that the StopReplica
cannot be reordered. It is still possible to have a stray replica though if the StopReplica
from a cancellation is received before the initial LeaderAndIsr
when the reassignment began. A better fix would be to bump the epoch after cancellation, but I didn't see a really simple way to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks for clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Jason. I'm excited to see this fix. :)
…tion (apache#13107) When a reassignment is cancelled, we need to delete the partition state of adding replicas. Failing to do so causes "stray" replicas which take up disk space and can cause topicId conflicts if the topic is later recreated. Currently, this logic does not work because the leader epoch does not always get bumped after cancellation. Without a leader epoch bump, the replica will ignore StopReplica requests sent by the controller and the replica may remain online. In this patch, we fix the problem by loosening the epoch check on the broker side when a StopReplica request is received. Instead of ignoring the request when the request epoch matches the current epoch, the request will be accepted. Note, this problem only affects the ZK controller. The integration tests added here nevertheless cover both metadata modes. Reviewers: David Jacot <[email protected]>, Justine Olshan <[email protected]>
…tion (#13107) When a reassignment is cancelled, we need to delete the partition state of adding replicas. Failing to do so causes "stray" replicas which take up disk space and can cause topicId conflicts if the topic is later recreated. Currently, this logic does not work because the leader epoch does not always get bumped after cancellation. Without a leader epoch bump, the replica will ignore StopReplica requests sent by the controller and the replica may remain online. In this patch, we fix the problem by loosening the epoch check on the broker side when a StopReplica request is received. Instead of ignoring the request when the request epoch matches the current epoch, the request will be accepted. Note, this problem only affects the ZK controller. The integration tests added here nevertheless cover both metadata modes. Reviewers: David Jacot <[email protected]>, Justine Olshan <[email protected]>
After this #13107 PR, an if-else block became unreachable. We need remove it and make the code clean. Reviewers: Luke Chen <[email protected]>, Divij Vaidya <[email protected]>
…he#15220) After this apache#13107 PR, an if-else block became unreachable. We need remove it and make the code clean. Reviewers: Luke Chen <[email protected]>, Divij Vaidya <[email protected]>
…he#15220) After this apache#13107 PR, an if-else block became unreachable. We need remove it and make the code clean. Reviewers: Luke Chen <[email protected]>, Divij Vaidya <[email protected]>
…he#15220) After this apache#13107 PR, an if-else block became unreachable. We need remove it and make the code clean. Reviewers: Luke Chen <[email protected]>, Divij Vaidya <[email protected]>
When a reassignment is cancelled, we need to delete the partition state of adding replicas. Failing to do so causes "stray" replicas which take up disk space and can cause topicId conflicts if the topic is later recreated. Currently, this logic does not work because the leader epoch does not always get bumped after cancellation. Without a leader epoch bump, the replica will ignore
StopReplica
requests sent by the controller and the replica may remain online.In this patch, we fix the problem by loosening the epoch check on the broker side when a
StopReplica
request is received. Instead of ignoring the request when the request epoch matches the current epoch, the request will be accepted.Note, this problem only affects the ZK controller. The integration tests added here nevertheless cover both metadata modes.
Committer Checklist (excluded from commit message)