-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSDs fail to come online with multiple worker nodes in the same AZ. #131
Comments
We have a root cause for why this is occurring. The issue has to do with the inability to detach the PVC from osd's prepare job node. If the resulting osd runs on a different node from the prepare job, then the PVC will never be able to be attached. This is why we only hit the issue when there are more than one node in an AZ. When there's only one node per an AZ, due to limited options the scheduler always places the osd on the corresponding node the prepare job ran on. Rook is tracking a fix for this in this PR, rook/rook#3755 |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
On AWS, some osds fail to come online when multiple worker nodes exist in the same availability zone (AZ).
The only consistent factor I've been able to identify is that the osds fail when two worker nodes exist in the same AZ. I don't understand the root cause yet. Below are the data points I have
Environments that work
Success: 3/3 osds come online
Success: 2/2 osds come online
Environments that fail
Failure: 2/3 or 1/3 osds come online
Failure: 1/2 osds come online
Failure debug
The text was updated successfully, but these errors were encountered: