Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librbd: compare and write against a clone can result in failure #18887

Merged
merged 13 commits into from
Nov 16, 2017

Conversation

dillaman
Copy link

No description provided.

@dillaman
Copy link
Author

@dillaman dillaman changed the title [DNM] librbd: compare and write against a clone can result in failure librbd: compare and write against a clone can result in failure Nov 13, 2017
@dillaman
Copy link
Author

... fsx tests now passing after fixing shrink issue: http://pulpito.ceph.com/jdillaman-2017-11-13_11:22:59-rbd-wip-jd-testing-distro-basic-smithi/

for (size_t i=0; i<m_pending_requests.size(); ++i) {
ObjectRequest<> *req = m_pending_requests[i];
auto req = m_pending_requests[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: probably for (auto req : m_pending_requests) instead?

librados::ObjectWriteOperation *wr) {
I *image_ctx = this->m_ictx;
RWLock::RLocker snap_locker(image_ctx->snap_lock);
if (image_ctx->enable_alloc_hint &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: may be no need to check for image_ctx->enable_alloc_hint here as it will be checked by ObjectRequest<I>::add_write_hint() anyway?

@trociny
Copy link
Contributor

trociny commented Nov 15, 2017

@dillaman fsx failure:

http://qa-proxy.ceph.com/teuthology/trociny-2017-11-14_21:38:44-rbd-wip-mgolub-testing-distro-basic-smithi/1849099/teuthology.log

Reproduced locally with:

CEPH_ARGS='--rbd_skip_partial_discard=true' ceph_test_librbd_fsx -d -W -R -p 100 -P /tmp/fsx -r 1 -w 1 -t 1 -h 1 -l 250000000 -S 6423 -N 20000 rbd image_client.0

@trociny
Copy link
Contributor

trociny commented Nov 15, 2017

Another failure case:

http://pulpito.ceph.com/trociny-2017-11-14_21:42:50-rbd-wip-mgolub-testing-distro-basic-smithi/

Reproduced locally with:

ceph_test_librbd_fsx -d -W -R -p 100 -P /tmp/fsx -r 512 -w 512 -t 512 -h 512 -l 250000000 -S 9072 -N 6000 -M rbd image_client.5

(It looks like rbd-nbd is important as I can't reproduce without -M)

@dillaman
Copy link
Author

@trociny The skip_partial_discard failure looks unrelated since compare and write would never have worked given how it's handled by fsx. It might just be better to just disable the compare and write operator when skip_partial_discard is enabled. I'll take a look at the rbd-nbd failure and see if I can reproduce).

@dillaman
Copy link
Author

... of course rbd-nbd does not function at all on F26.

@trociny
Copy link
Contributor

trociny commented Nov 15, 2017

... of course rbd-nbd does not function at all on F26.

Hm, is this something F26 specific? Just in case, didn't you forget to set PATH properly in root environment?

@dillaman
Copy link
Author

@trociny No, it starts rbd-nbd daemon and maps the device just fine. The first time I ran it it immediately failed on the first write call with an "out of space" error. The second time I ran it (after cleaning up my environment), it failed with "no extend on truncate! not posix" -- so it doesn't seem to be picking up the resize operations correctly.

@dillaman
Copy link
Author

... I'm playing around w/ it to see if I can get it working while an Ubuntu build is in-progress.

@@ -2069,6 +2069,13 @@ docompareandwrite(unsigned offset, unsigned size)
if (o_direct)
size -= size % writebdy;

if (skip_partial_discard) {
if (!quiet && testcalls > simulatedopcount && !o_direct)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose !o_direct check is not needed here. I assume it is if (size == 0) specific because we may truncate size in o_direct case.

Probably it is a good idea to move this if (skip_partial_discard) block above if (o_direct) block or below if (size == 0) block as they look related.

@dillaman
Copy link
Author

@trociny I had to add the following to get F26 to rescan the partition table and pick up the size change on the /dev/nbdX device:

diff --git a/src/tools/rbd_nbd/rbd-nbd.cc b/src/tools/rbd_nbd/rbd-nbd.cc
index 371f5301ba..669b986641 100644
--- a/src/tools/rbd_nbd/rbd-nbd.cc
+++ b/src/tools/rbd_nbd/rbd-nbd.cc
@@ -476,6 +476,10 @@ public:
         } else {
           size = new_size;
         }
+        if (ioctl(fd, BLKRRPART, NULL) < 0) {
+            derr << "rescan of partition table failed: " << cpp_strerror(errno)
+                 << dendl;
+        }
         if (image.invalidate_cache() < 0)
             derr << "invalidate rbd cache failed" << dendl;
       }

@dillaman
Copy link
Author

@trociny Fixed the issue -- the ObjectReadRequest was skipping the read immediately if the object map said the object didn't exist. I moved that check to ObjectReadRequest::read_object so that it properly checks the writeback cache first.

@trociny
Copy link
Contributor

trociny commented Nov 15, 2017

@dillaman TestMockIoObjectRequest need tweaking. See jenkins logs.

@dillaman
Copy link
Author

@trociny Sorry about that -- should be all better now.

@@ -264,7 +256,11 @@ bool CopyupRequest<I>::should_complete(int r)

case STATE_COPYUP:
// invoked via a finisher in librados, so thread safe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman I assume this comment is not relevant now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yanked it

Jason Dillaman added 12 commits November 16, 2017 07:31
The ReadResult wrapper requires the buffer extents to know
how to properly reconstruct the out buffer.

Signed-off-by: Jason Dillaman <[email protected]>
This adds support for sparse-reads and ensures all object reads
utilize a single, tested code path.

Signed-off-by: Jason Dillaman <[email protected]>
The compare-and-write object operation cannot be executed
concurrently within a copyup operation since the object might not
exist yet (if not performing a deep-copy).

Signed-off-by: Jason Dillaman <[email protected]>
The initial copyup was not receiving a write hint and the
code for hints was duplicated multiple times. Additionally,
the object map state should match the last executed write op.

Signed-off-by: Jason Dillaman <[email protected]>
@trociny
Copy link
Contributor

trociny commented Nov 16, 2017

@ceph-jenkins retest this please

Copy link
Contributor

@trociny trociny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@trociny trociny merged commit 8defef4 into ceph:master Nov 16, 2017
@dillaman dillaman deleted the wip-20789 branch November 16, 2017 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants