Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential deadlock in SchedulingQueue #50

Merged

Conversation

TaoYang526
Copy link
Contributor

There may be a potential deadlock when:
(1) scheduling thread holds the lock of parent queue and waits for read lock of child queue (calling stack: Scheduler#singleStepSchedule -> Scheduler#findAllocationAsks -> Scheduler#findNextAllocationAskCandidate -> Scheduler#sortSubqueuesFromQueue -> SchedulingQueue#GetPendingResource)
and meanwhile
(2) scheduler event handler holds the lock of child queue and waits for read lock of parent queue (calling stack: Scheduler#handleSchedulerEvent -> Scheduler#processAllocationUpdateEvent -> Scheduler#updateSchedulingRequest -> SchedulingQueue#IncPendingResource)

I think the proper sequence of nested lock should be from parent queue to child queue, the contrary way should be corrected.

Deadlock stacks:
image

@yangwwei
Copy link
Contributor

Nice catch @TaoYang526
I will get this merged once we received your ICLA. Thanks!

Copy link
Contributor

@yangwwei yangwwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yangwwei yangwwei added the bug Something isn't working label Sep 19, 2019
@yangwwei yangwwei merged commit bf86aa9 into apache:master Sep 19, 2019
@yangwwei
Copy link
Contributor

Thanks @TaoYang526 for getting this fixed!

@TaoYang526
Copy link
Contributor Author

Thanks @yangwwei for the review and commit.

pbacsko pushed a commit to pbacsko/incubator-yunikorn-core that referenced this pull request Feb 5, 2024
The makefile in scheduler-interface module should not use bash syntax
for tests "[[". They do not work in a normal shell.
Bring checks inline with the other repos.

Fixes: apache#50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants