Skip to content

Commit

Permalink
workqueue: fix possible deadlock in idle worker rebinding
Browse files Browse the repository at this point in the history
Currently, rebind_workers() and idle_worker_rebind() are two-way
interlocked.  rebind_workers() waits for idle workers to finish
rebinding and rebound idle workers wait for rebind_workers() to finish
rebinding busy workers before proceeding.

Unfortunately, this isn't enough.  The second wait from idle workers
is implemented as follows.

	wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND));

rebind_workers() clears WORKER_REBIND, wakes up the idle workers and
then returns.  If CPU hotplug cycle happens again before one of the
idle workers finishes the above wait_event(), rebind_workers() will
repeat the first part of the handshake - set WORKER_REBIND again and
wait for the idle worker to finish rebinding - and this leads to
deadlock because the idle worker would be waiting for WORKER_REBIND to
clear.

This is fixed by adding another interlocking step at the end -
rebind_workers() now waits for all the idle workers to finish the
above WORKER_REBIND wait before returning.  This ensures that all
rebinding steps are complete on all idle workers before the next
hotplug cycle can happen.

This problem was diagnosed by Lai Jiangshan who also posted a patch to
fix the issue, upon which this patch is based.

This is the minimal fix and further patches are scheduled for the next
merge window to simplify the CPU hotplug path.

Signed-off-by: Tejun Heo <[email protected]>
Original-patch-by: Lai Jiangshan <[email protected]>
LKML-Reference: <[email protected]>
  • Loading branch information
htejun committed Sep 5, 2012
1 parent 90beca5 commit ec58815
Showing 1 changed file with 27 additions and 2 deletions.
29 changes: 27 additions & 2 deletions kernel/workqueue.c
Original file line number Diff line number Diff line change
Expand Up @@ -1326,6 +1326,15 @@ static void idle_worker_rebind(struct worker *worker)

/* we did our part, wait for rebind_workers() to finish up */
wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND));

/*
* rebind_workers() shouldn't finish until all workers passed the
* above WORKER_REBIND wait. Tell it when done.
*/
spin_lock_irq(&worker->pool->gcwq->lock);
if (!--worker->idle_rebind->cnt)
complete(&worker->idle_rebind->done);
spin_unlock_irq(&worker->pool->gcwq->lock);
}

/*
Expand Down Expand Up @@ -1448,12 +1457,28 @@ static void rebind_workers(struct global_cwq *gcwq)
* be cleared inside idle_worker_rebind(). Clear and release.
* Clearing %WORKER_REBIND from this foreign context is safe
* because these workers are still guaranteed to be idle.
*
* We need to make sure all idle workers passed WORKER_REBIND wait
* in idle_worker_rebind() before returning; otherwise, workers can
* get stuck at the wait if hotplug cycle repeats.
*/
for_each_worker_pool(pool, gcwq)
list_for_each_entry(worker, &pool->idle_list, entry)
idle_rebind.cnt = 1;
INIT_COMPLETION(idle_rebind.done);

for_each_worker_pool(pool, gcwq) {
list_for_each_entry(worker, &pool->idle_list, entry) {
worker->flags &= ~WORKER_REBIND;
idle_rebind.cnt++;
}
}

wake_up_all(&gcwq->rebind_hold);

if (--idle_rebind.cnt) {
spin_unlock_irq(&gcwq->lock);
wait_for_completion(&idle_rebind.done);
spin_lock_irq(&gcwq->lock);
}
}

static struct worker *alloc_worker(void)
Expand Down

0 comments on commit ec58815

Please sign in to comment.