-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-16729 out of band deletes #952
HADOOP-16729 out of band deletes #952
Conversation
💔 -1 overall
This message was automatically generated. |
…testFileMetadataExpiresTtl
* Tests for being in sync /not being in sync now compare etag and version IDs. Highlights we aren't always getting back version IDs. * Parameterized OOB test now includes auth flag in mehod name, for ease of debugging * minor: formatting Change-Id: I7ea062d6996c9ca00d036347c310e5d2e0fa60fe
26b6f09
to
b5e394d
Compare
🎊 +1 overall
This message was automatically generated. |
Incorporates HADOOP-16368 "S3A list operation doesn't pick up etags from results" so that the tests can use that to validate consistency rather than just timestamps; Similarly, some changes in ITestS3GuardOutOfBandOperations added while trying to debug the problem. Change-Id: Id809886841442a8cc42bff8f7046ade69b94e013
Change-Id: Ic0a0710d2092c2e60eabbb7bc140fac3a1545297
Tested: S3 Ireland (versioned) with s3guard One failure; the usual intermittent ITestS3AContractGetFileStatusV1List.
|
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
I have just pushed up a PR with changes. If I didn't need this in so that I could base my own PR atop it, I'd be seriously considering say "Use java 8 time over millis, as it guarantees that there won't be any bits of the code which assumes it is seconds" latter is awful about assessing the value of all enumerated moves & countermoves. In this instance, S3Guard.addAncestors()that walk up the tree uses an isDeleted() checl. Should that include TTL probes Note: I'm not going to do that now, because I've pushed that work further into DDB itself (needed to deal with scale issues); if changes are needed then they should be based off that patch. Please review that code and suggest the next action Importskeep that ordering of imports what we expect
` TestsI got a failure in a test run in teardown of
It's not caused by this PR and addressed in my rename PR, which doesn't do that cleanup unless fs != null. Also (I believe) unrelated, the Magic Committer ITest is playing up, and as the logs of the AM don't seem to be saved, can't quite debug.
Could this been related? Well, iff the path was still in S3 and the tombstone hadn't expired, yes. But I doubt that. Something I will worry about myself. I'll add some more assertions.... +that v1 listing again. Next StepsGabor - pull this test down and do the scale test runs with all the options (auth, nonauth, local) and see how well it goes. If we dont' see problems or we believe they are transient unrelated issues, then I'll vote on this later on today |
💔 -1 overall
This message was automatically generated. |
My test results: local: error seems unrelated
dynamo: testMRJob failure is known, testDynamoTableTagging failed for me the first time. unrelated.
|
|
teardown timeout, "it happens". The converting this from illegal arg to a new PathIOException subclass is part of HADOOP-15183. Happens sporadically as dDB deletion is eventually consistent |
OK, did a final test run, one failure HADOOP-16375 |
My Test failure is clearly unrelated. Gabor's may be, but it'd have to be if the delete tombstones expired between the listing and the delete. I'll leave him to improve the test debugging on a failure. (Catch the IOE, do a ContractTestUtils.lsR, print it, etc) +1 for this as is. |
Further test results (running with
sequential-integration-tests
Maybe bump up the timout to more? So this actually means that there's only a timeout and the flaky ITestS3AContractGetFileStatusV1List. |
that's a sign that somehow the delete didn't go through. I wouldn't say "keep extending it" -more likely that there's something there which wasn't deleted. Which could be: some other test putting it in, or a failure of that first list call to find and delete everything. I'd suggest thinking of 'what diagnostics could you collect' rather than just hoping the problem will go away with timeouts. e.g: deep listing of directory trees before the rm is started, and again on every listing != empty outcome. |
I should add: both these test failures are related to listing directories in one form or another, during the test and after previous tests created (and should have cleaned up) files from immediately previous test cases. Which means that they are potentially signs of a problem. Interesting that it is only ever the v1 test which fails. Note: We can't just say "oh, this is eventual consistency at work', not if the tests are running with s3guard on. With S3Guard enabled, it is an implicit requirement that no tests fail from eventual consistency errors on listings. HEAD/GET calls may still have some surprises |
* fix up javadocs Change-Id: I8652eceeb8010c82a9892378c434ee5db2ad51f9
6adb4ac
to
eb534b0
Compare
Change-Id: Iac23037e039b9a3eea2060d65fbf097198064ec8
merged with trunk |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
…job redeploys Author: Ray Matharu <[email protected]> Reviewers: Jagadish<[email protected]> Closes apache#952 from rmatharu/test-standbyimprovements
This is Gabor's #802 PR rebased to trunk and with some extra changes on top