Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-23394: Fix Flaky TestJdbcGenericUDTFGetSplits2#testGenericUDTFOr… #4249

Merged
merged 2 commits into from
May 8, 2023

Conversation

simhadri-g
Copy link
Member

@simhadri-g simhadri-g commented Apr 19, 2023

…derBySplitCount1 and TestJdbcGenericUDTFGetSplits#testGenericUDTFOrderBySplitCount1

What changes were proposed in this pull request?

Fix 2 flaky tests :

  1. TestJdbcGenericUDTFGetSplits2#testGenericUDTFOderBySplitCount1
  2. TestJdbcGenericUDTFGetSplits#testGenericUDTFOrderBySplitCount1

Why are the changes needed?

To get tests to pass we had to :

  1. Remove the conf hive.llap.external.splits.order.by.force.single.split as it no longer exists in hive. https://github.com/apache/hive/pull/2318/files
  2. Break the test into smaller methods to prevent timeout.

Does this PR introduce any user-facing change?

How was this patch tested?

Flaky check test: http://ci.hive.apache.org/job/hive-flaky-check/664/

@ayushtkn
Copy link
Member

The original ticket mentioned:

statements in itests/hive-unit/src/test/java/org/apache/hive/jdbc/AbstractTestJdbcGenericUDTFGetSplits.java are not closed properly - however fixing that doesn't fix the issue...

And the exception isn't test timeout but connection issue, I am not sure if by spliting the test into two parts is a good idea, Should try to solve keeping it together. May be by spliting we are avoiding the problem not solving...

public void testGenericUDTFOrderBySplitCount1() throws Exception {
super.testGenericUDTFOrderBySplitCount1("get_splits", new int[]{10, 1, 0, 2, 2, 2, 1, 10});
super.testGenericUDTFOrderBySplitCount1("get_splits", new int[] { 10, 5, 0 });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why some splits removed here? If the dataset is not the same, won't hide the actual issue?

public void testGenericUDTFOrderBySplitCount1() throws Exception {
super.testGenericUDTFOrderBySplitCount1("get_llap_splits", new int[]{12, 3, 1, 4, 4, 4, 3, 12});
super.testGenericUDTFOrderBySplitCount1("get_llap_splits", new int[] { 12, 7, 1 });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@simhadri-g
Copy link
Member Author

Thanks, @ayushtkn and @TuroczyX for the review.

Looking at the stack trace mentioned in HIVE-23394
The "out of sequence response" error occurs when two threads use the same HiveMetaStoreClient instance.
It seems that the metastore client socket is reading the RPC response from a different call, hence the out-of-sequence exception.

Looking at the tests , there are 2 tests that use AbstractTestJdbcGenericUDTFGetSplits:

  1. TestJdbcGenericUDTFGetSplits2#testGenericUDTFOderBySplitCount1
  2. TestJdbcGenericUDTFGetSplits#testGenericUDTFOrderBySplitCount1

These 2 tests call the same AbstractTestJdbcGenericUDTFGetSplits.java, which spins up a new miniHS2 instance.
I guess if these 2 tests run concurrently, it may cause the test to be flaky.

I think we should merge these 2 tests to a single file and ensure it runs sequentially to fix the flakiness.

Error Message
Failed to close statement
Stacktrace
java.sql.SQLException: Failed to close statement
	at org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:200)
	at org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:205)
	at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:222)
	at org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.runQuery(AbstractTestJdbcGenericUDTFGetSplits.java:135)
	at org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1(AbstractTestJdbcGenericUDTFGetSplits.java:164)
	at org.apache.hive.jdbc.TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1(TestJdbcGenericUDTFGetSplits2.java:28)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: org.apache.thrift.TApplicationException: CloseOperation failed: out of sequence response
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:84)
	at org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:521)
	at org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:508)
	at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1732)
	at com.sun.proxy.$Proxy146.CloseOperation(Unknown Source)
	at org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:193)
	... 14 more

@ayushtkn
Copy link
Member

The "out of sequence response" error occurs when two threads use the same HiveMetaStoreClient instance.
It seems that the metastore client socket is reading the RPC response from a different call, hence the out-of-sequence exception.

That sounds like a bug in the RPC layer, two threads sharing HMSClient instance is same as two threads sharing FileSystem instances and invoking calls to server.

The RPC layer should isolate those calls and shouldn't goof up. Every RPC AFAIK should be associated with RPC callId and ClientId, Atleast the Hadoop RPC layer does pass that.

This should be easily reproducible then in a normal multithreading execution?

@simhadri-g
Copy link
Member Author

Yes, the same issue was faced in multiple jiras.

It's well described here as well.
https://issues.apache.org/jira/browse/HIVE-10404

I think it should be fixed now by https://issues.apache.org/jira/browse/HIVE-10956 .

I will check and see if i can still repro.

…derBySplitCount1 and TestJdbcGenericUDTFGetSplits#testGenericUDTFOrderBySplitCount1
@simhadri-g
Copy link
Member Author

simhadri-g commented Apr 27, 2023

Hi Guys,

The leaking thread has been fixed by this PR. #2967

The test we are enabling checks the following 2 udtfs:
"get_splits()"
"get_llap_splits()".

After enabling the test, since both the tests have similar code and I merged it into a single file to reduce code duplication.

Also, because of the following 2 changes, the value of splits has changed. :
252ce32#diff-47b7f6f11c3aabbf6811b6d4d2e48162b68e4d9f30241b956e17b0d3960ec223

76f8182#diff-47b7f6f11c3aabbf6811b6d4d2e48162b68e4d9f30241b956e17b0d3960ec223

At the time these changes were made the test (testGenericUDTFOrderBySplitCount1) was ignored. So they didn’t update they only updated the splits here. But if we look at the testGenericUDTFOrderBySplitCount1OnPartitionedTable partition test in the same file, they have been updated.

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1, pending tests

@sonarcloud
Copy link

sonarcloud bot commented May 5, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@deniskuzZ deniskuzZ merged commit c0c2031 into apache:master May 8, 2023
@simhadri-g
Copy link
Member Author

Thanks @aturoczy , @deniskuzZ , @ayushtkn for the review!

yeahyung pushed a commit to yeahyung/hive that referenced this pull request Jul 20, 2023
…OrderBySplitCount1 (Simhadri Govindappa, reviewed by Attila Turoczy, Ayush Saxena, Denys Kuzmenko)

Closes apache#4249
tarak271 pushed a commit to tarak271/hive-1 that referenced this pull request Dec 19, 2023
…OrderBySplitCount1 (Simhadri Govindappa, reviewed by Attila Turoczy, Ayush Saxena, Denys Kuzmenko)

Closes apache#4249
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants