OSDs fail to come online with multiple worker nodes in the same AZ. #131

davidvossel · 2019-09-17T18:25:33Z

On AWS, some osds fail to come online when multiple worker nodes exist in the same availability zone (AZ).

The only consistent factor I've been able to identify is that the osds fail when two worker nodes exist in the same AZ. I don't understand the root cause yet. Below are the data points I have

Environments that work

Success: 3/3 osds come online

Replica 3
3 Worker nodes
even distribution across 3 AZs

Success: 2/2 osds come online

Replica 2
2 Worker nodes
even distribution across 2 AZs

Environments that fail

Failure: 2/3 or 1/3 osds come online

Replica 3
3 Worker nodes
even distribution across 2 AZs

Failure: 1/2 osds come online

Replica 2
3 Worker nodes
even distribution across 2 AZs

Failure debug

Events:
  Type     Reason              Age                 From                                   Message
  ----     ------              ----                ----                                   -------
  Normal   Scheduled           46m                 default-scheduler                      Successfully assigned openshift-storage/rook-ceph-osd-0-574bb9858b-kfltj to ip-10-0-129-254.ec2.internal
  Warning  FailedAttachVolume  46m                 attachdetach-controller                Multi-Attach error for volume &quot;pvc-230b857f-d8af-11e9-b766-0e15af88a5cc&quot; Volume is already used by pod(s) rook-ceph-osd-prepare-example-deviceset-2-khhxc-22t6n
  Warning  FailedMount         88s (x20 over 44m)  kubelet, ip-10-0-129-254.ec2.internal  Unable to mount volumes for pod &quot;rook-ceph-osd-0-574bb9858b-kfltj_openshift-storage(43ee9141-d8af-11e9-bed1-0a89cbfd94e0)&quot;: timeout expired waiting for volumes to attach or mount for pod &quot;openshift-storage&quot;/&quot;rook-ceph-osd-0-574bb9858b-kfltj&quot;. list of unmounted volumes=[example-deviceset-2-khhxc]. list of unattached volumes=[rook-data rook-config-override rook-ceph-log devices example-deviceset-2-khhxc example-deviceset-2-khhxc-bridge rook-binaries run-udev rook-ceph-osd-token-fcbk9]

The text was updated successfully, but these errors were encountered:

davidvossel · 2019-09-17T19:05:50Z

We have a root cause for why this is occurring.

The issue has to do with the inability to detach the PVC from osd's prepare job node. If the resulting osd runs on a different node from the prepare job, then the PVC will never be able to be attached.

This is why we only hit the issue when there are more than one node in an AZ. When there's only one node per an AZ, due to limited options the scheduler always places the osd on the corresponding node the prepare job ran on.

Rook is tracking a fix for this in this PR, rook/rook#3755

openshift-bot · 2020-09-17T16:55:33Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2020-10-19T18:53:29Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2020-11-18T20:41:04Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2020-11-18T20:41:22Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

davidvossel mentioned this issue Sep 17, 2019

OCS Functional Test Suite #114

Merged

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 17, 2020

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 19, 2020

openshift-ci-robot closed this as completed Nov 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSDs fail to come online with multiple worker nodes in the same AZ. #131

OSDs fail to come online with multiple worker nodes in the same AZ. #131

davidvossel commented Sep 17, 2019

davidvossel commented Sep 17, 2019

openshift-bot commented Sep 17, 2020

openshift-bot commented Oct 19, 2020

openshift-bot commented Nov 18, 2020

openshift-ci-robot commented Nov 18, 2020

OSDs fail to come online with multiple worker nodes in the same AZ. #131

OSDs fail to come online with multiple worker nodes in the same AZ. #131

Comments

davidvossel commented Sep 17, 2019

Environments that work

Environments that fail

Failure debug

davidvossel commented Sep 17, 2019

openshift-bot commented Sep 17, 2020

openshift-bot commented Oct 19, 2020

openshift-bot commented Nov 18, 2020

openshift-ci-robot commented Nov 18, 2020