-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
librbd: compare and write against a clone can result in failure #18887
Conversation
4ce1871
to
bfac405
Compare
Test run (pre-shrink object map fix): http://pulpito.ceph.com/jdillaman-2017-11-12_12:43:25-rbd-wip-jd-testing-distro-basic-smithi/ |
... fsx tests now passing after fixing shrink issue: http://pulpito.ceph.com/jdillaman-2017-11-13_11:22:59-rbd-wip-jd-testing-distro-basic-smithi/ |
src/librbd/io/CopyupRequest.cc
Outdated
for (size_t i=0; i<m_pending_requests.size(); ++i) { | ||
ObjectRequest<> *req = m_pending_requests[i]; | ||
auto req = m_pending_requests[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: probably for (auto req : m_pending_requests)
instead?
src/librbd/io/ObjectRequest.cc
Outdated
librados::ObjectWriteOperation *wr) { | ||
I *image_ctx = this->m_ictx; | ||
RWLock::RLocker snap_locker(image_ctx->snap_lock); | ||
if (image_ctx->enable_alloc_hint && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: may be no need to check for image_ctx->enable_alloc_hint
here as it will be checked by ObjectRequest<I>::add_write_hint()
anyway?
@dillaman fsx failure: Reproduced locally with:
|
Another failure case: http://pulpito.ceph.com/trociny-2017-11-14_21:42:50-rbd-wip-mgolub-testing-distro-basic-smithi/ Reproduced locally with:
(It looks like rbd-nbd is important as I can't reproduce without |
@trociny The |
... of course rbd-nbd does not function at all on F26. |
Hm, is this something F26 specific? Just in case, didn't you forget to set PATH properly in root environment? |
@trociny No, it starts rbd-nbd daemon and maps the device just fine. The first time I ran it it immediately failed on the first write call with an "out of space" error. The second time I ran it (after cleaning up my environment), it failed with "no extend on truncate! not posix" -- so it doesn't seem to be picking up the resize operations correctly. |
... I'm playing around w/ it to see if I can get it working while an Ubuntu build is in-progress. |
src/test/librbd/fsx.cc
Outdated
@@ -2069,6 +2069,13 @@ docompareandwrite(unsigned offset, unsigned size) | |||
if (o_direct) | |||
size -= size % writebdy; | |||
|
|||
if (skip_partial_discard) { | |||
if (!quiet && testcalls > simulatedopcount && !o_direct) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose !o_direct
check is not needed here. I assume it is if (size == 0)
specific because we may truncate size in o_direct
case.
Probably it is a good idea to move this if (skip_partial_discard)
block above if (o_direct)
block or below if (size == 0)
block as they look related.
@trociny I had to add the following to get F26 to rescan the partition table and pick up the size change on the
|
@trociny Fixed the issue -- the |
@dillaman TestMockIoObjectRequest need tweaking. See jenkins logs. |
@trociny Sorry about that -- should be all better now. |
src/librbd/io/CopyupRequest.cc
Outdated
@@ -264,7 +256,11 @@ bool CopyupRequest<I>::should_complete(int r) | |||
|
|||
case STATE_COPYUP: | |||
// invoked via a finisher in librados, so thread safe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillaman I assume this comment is not relevant now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yanked it
…ent file Signed-off-by: Jason Dillaman <[email protected]>
Signed-off-by: Jason Dillaman <[email protected]>
Signed-off-by: Jason Dillaman <[email protected]>
The ReadResult wrapper requires the buffer extents to know how to properly reconstruct the out buffer. Signed-off-by: Jason Dillaman <[email protected]>
Signed-off-by: Jason Dillaman <[email protected]>
Signed-off-by: Jason Dillaman <[email protected]>
Signed-off-by: Jason Dillaman <[email protected]>
This adds support for sparse-reads and ensures all object reads utilize a single, tested code path. Signed-off-by: Jason Dillaman <[email protected]>
The compare-and-write object operation cannot be executed concurrently within a copyup operation since the object might not exist yet (if not performing a deep-copy). Signed-off-by: Jason Dillaman <[email protected]>
The initial copyup was not receiving a write hint and the code for hints was duplicated multiple times. Additionally, the object map state should match the last executed write op. Signed-off-by: Jason Dillaman <[email protected]>
…e machine Signed-off-by: Jason Dillaman <[email protected]>
Fixes: http://tracker.ceph.com/issues/20789 Signed-off-by: Jason Dillaman <[email protected]>
Signed-off-by: Jason Dillaman <[email protected]>
@ceph-jenkins retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.