Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-25903 ReadOnlyZKClient APIs - CompletableFuture.get() calls can cause threads to hang forver when ZK client create throws Non IOException #3293

Merged
merged 2 commits into from
May 31, 2021

Conversation

anoopsjohn
Copy link
Contributor

No description provided.

… cause threads to hang forver when ZK client create throws Non IOException
@@ -344,6 +344,10 @@ private void run() {
} catch (IOException e) {
task.connectFailed(e);
continue;
} catch (Exception e) {
// won't be doing any retries for non IOE.
task.closed(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call closed here? This is a connect failed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually in connect fail case, zk is supposed to throw IOE and we handle that above. The connectFailed will cause the retries. As mentioned in Jira, some old versions of zk has a bug and cause a RuntimeException while zk hostname resolve. So as such, on our dev branches this issue wont happen as we moved to zk 3.5.7
Still I think its better to handle here. Else we will end up hanging the Future.get() call ever. In my cluster version where zk client is old, I see this causes ReplicationSource init to hang forver and WALs getting accumulated and replication lag. Later when we see HDFS disk alerts, then only analyzed. the culprit was DNS which gives temp hostname resolution issues.
I thought like whether we should catch Exception instead of IOE in 1st place and do connectFailed only so that we will get retry. But just changed my mind later because its generic Exception we catch. Theoretically its IOE what we have to retry right? Still am open for change. There is nothing wrong if do few more retries before give up.
BTW in my internal version, i will handle the latter way as I can not upgrade zk client version that easily.
Thoughts Duo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Apache9 So u prefer handle any Exception as connect failed and so retry? Am ok for that too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, if you think this is an unrecoverable error, we should just quit the whole ReadOnlyZKClient loop, instead of calling closed here but do not break out the loop? If not, why give up retry at the first time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got ur view Duo. Thanks. Ya I took a middle approach. Thought about this also.
I am very much fine for handling Exception as how we do IOE and do connectFailed(). After retry anyways will come out. Ideally the exception will be IOE. I read like u also okf with that. I will make that change.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 4s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 30s master passed
+1 💚 compile 0m 30s master passed
+1 💚 shadedjars 8m 13s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 28s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 19s the patch passed
+1 💚 compile 0m 30s the patch passed
+1 💚 javac 0m 30s the patch passed
+1 💚 shadedjars 8m 14s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 26s the patch passed
_ Other Tests _
+1 💚 unit 1m 19s hbase-client in the patch passed.
30m 50s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3293
Optional Tests javac javadoc unit shadedjars compile
uname Linux e360d056b945 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / fe70fce
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/1/testReport/
Max. process+thread count 311 (vs. ulimit of 30000)
modules C: hbase-client U: hbase-client
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 15s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 21s master passed
+1 💚 compile 0m 28s master passed
+1 💚 shadedjars 9m 27s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 27s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 29s the patch passed
+1 💚 compile 0m 32s the patch passed
+1 💚 javac 0m 32s the patch passed
+1 💚 shadedjars 9m 5s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 21s the patch passed
_ Other Tests _
+1 💚 unit 1m 32s hbase-client in the patch passed.
33m 14s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3293
Optional Tests javac javadoc unit shadedjars compile
uname Linux 8c6b53ca65d0 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / fe70fce
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/1/testReport/
Max. process+thread count 220 (vs. ulimit of 30000)
modules C: hbase-client U: hbase-client
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 8s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 4m 18s master passed
+1 💚 compile 0m 56s master passed
+1 💚 checkstyle 0m 29s master passed
+1 💚 spotbugs 1m 5s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 3s the patch passed
+1 💚 compile 0m 54s the patch passed
+1 💚 javac 0m 54s the patch passed
+1 💚 checkstyle 0m 27s the patch passed
+1 💚 whitespace 0m 1s The patch has no whitespace issues.
+1 💚 hadoopcheck 20m 1s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 1m 11s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
43m 9s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3293
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 82fb7c184d8b 4.15.0-128-generic #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / fe70fce
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 86 (vs. ulimit of 30000)
modules C: hbase-client U: hbase-client
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/1/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

… cause threads to hang forver when ZK client create throws Non IOException
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 6s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 10s master passed
+1 💚 compile 0m 30s master passed
+1 💚 shadedjars 8m 9s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 26s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 14s the patch passed
+1 💚 compile 0m 29s the patch passed
+1 💚 javac 0m 29s the patch passed
+1 💚 shadedjars 8m 11s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 27s the patch passed
_ Other Tests _
+1 💚 unit 1m 20s hbase-client in the patch passed.
30m 15s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3293
Optional Tests javac javadoc unit shadedjars compile
uname Linux 30325b650855 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 3f7d289
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/2/testReport/
Max. process+thread count 292 (vs. ulimit of 30000)
modules C: hbase-client U: hbase-client
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 6s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 24s master passed
+1 💚 compile 0m 27s master passed
+1 💚 shadedjars 9m 8s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 24s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 7s the patch passed
+1 💚 compile 0m 25s the patch passed
+1 💚 javac 0m 25s the patch passed
+1 💚 shadedjars 9m 5s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 23s the patch passed
_ Other Tests _
+1 💚 unit 1m 30s hbase-client in the patch passed.
32m 12s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3293
Optional Tests javac javadoc unit shadedjars compile
uname Linux 380a81c7b79e 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 3f7d289
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/2/testReport/
Max. process+thread count 220 (vs. ulimit of 30000)
modules C: hbase-client U: hbase-client
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 4m 22s master passed
+1 💚 compile 1m 4s master passed
+1 💚 checkstyle 0m 30s master passed
+1 💚 spotbugs 1m 7s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 13s the patch passed
+1 💚 compile 1m 1s the patch passed
+1 💚 javac 1m 1s the patch passed
+1 💚 checkstyle 0m 32s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 19m 54s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 1m 19s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 15s The patch does not generate ASF License warnings.
43m 12s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3293
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 31efc7cdb85c 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 3f7d289
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 96 (vs. ulimit of 30000)
modules C: hbase-client U: hbase-client
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3293/2/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@anoopsjohn anoopsjohn merged commit 1ccba10 into apache:master May 31, 2021
anoopsjohn added a commit that referenced this pull request May 31, 2021
… cause threads to hang forver when ZK client create throws Non IOException (#3293)

Signed-off-by: zhangduo <[email protected]>
anoopsjohn added a commit that referenced this pull request May 31, 2021
… cause threads to hang forver when ZK client create throws Non IOException (#3293)

Signed-off-by: zhangduo <[email protected]>
anoopsjohn added a commit that referenced this pull request May 31, 2021
… cause threads to hang forver when ZK client create throws Non IOException (#3293)

Signed-off-by: zhangduo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants