-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-17453. IncrementalBlockReport can have race condition with Edit Log Tailer #6708
Conversation
I am planning to add another unit test that simulates this race condition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle this looks like the proper fix.
Let's add the other unit test you mentioned.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@goiri I have added the unit test which reproduces the exact race condition. I confirmed this by testing the unit test against a branch without the fix and caught the false corrupt replicas. |
@sodonnel Could you take a look at my PR? I think it addresses the issue you mentioned here https://issues.apache.org/jira/browse/HDFS-15422?focusedCommentId=17287194&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17287194 |
@kihwal Could you take a look at my PR? I think it addresses the issue you mentioned here https://issues.apache.org/jira/browse/HDFS-14941?focusedCommentId=17140156&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17140156 |
spies.add(nnSpy); | ||
} | ||
|
||
Thread.sleep(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do better than sleep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waitFor() something is better.
// the old reported block will be processed and marked as corrupt by the ANN. | ||
// See HDFS-17453 | ||
int size = queue.size(); | ||
if (queue.removeIf(rbi -> rbi.storageInfo.equals(storageInfo) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this more robus to nulls with:
void enqueueReportedBlock(DatanodeStorageInfo storageInfo, Block block,
ReplicaState reportedState) {
if (storageInfo == null || block == null || reportedState == null) {
return;
}
...
if (queue.removeIf(rbi -> storageInfo.equals(rbi.storageInfo) &&
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these null checks, but I needed to remove the reportedState null check because the null value is used by removeStoredBlock
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@dannytbecker thanks for your work, I'm just wondering what's the case of three GS with one block |
Description of PR
Summary
There is a race condition between IncrementalBlockReports (IBR) and EditLogTailer in Standby NameNode (SNN) which can lead to leaked IBRs and false corrupt blocks after HA Failover. The race condition occurs when the SNN loads the edit logs before it receives the block reports from DataNode (DN).
Example
In the following example there is a block (b1) with 3 generation stamps (gs1, gs2, gs3).
If the example above happens for every DN which stores b1, then when the HA failover happens, b1 will be incorrectly marked as corrupt. This will be fixed when the first DN sends a FullBlockReport or an IBR for b1.
Logs from Active Cluster
I added the following logs to confirm this issue in an active cluster:
Logs from nn1 (Active):
Logs from nn2 (Standby):
Logs from nn2 when it transitions to Active:
How was this patch tested?
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?