Skip to content

Commit

Permalink
ocfs2: fix deadlock between o2hb thread and o2net_wq
Browse files Browse the repository at this point in the history
The following case may lead to o2net_wq and o2hb thread deadlock on
o2hb_callback_sem.
Currently there are 2 nodes say N1, N2 in the cluster. And N2 down, at
the same time, N3 tries to join the cluster. So N1 will handle node
down (N2) and join (N3) simultaneously.
    o2hb                               o2net_wq
    ->o2hb_do_disk_heartbeat
    ->o2hb_check_slot
    ->o2hb_run_event_list
    ->o2hb_fire_callbacks
    ->down_write(&o2hb_callback_sem)
    ->o2net_hb_node_down_cb
    ->flush_workqueue(o2net_wq)
                                       ->o2net_process_message
                                       ->dlm_query_join_handler
                                       ->o2hb_check_node_heartbeating
                                       ->o2hb_fill_node_map
                                       ->down_read(&o2hb_callback_sem)

No need to take o2hb_callback_sem in dlm_query_join_handler,
o2hb_live_lock is enough to protect live node map.

Signed-off-by: Joseph Qi <[email protected]>
Cc: xMark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: jiangyiwen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
josephhz authored and torvalds committed Oct 10, 2014
1 parent 5046f18 commit 70e82a1
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 1 deletion.
19 changes: 19 additions & 0 deletions fs/ocfs2/cluster/heartbeat.c
Original file line number Diff line number Diff line change
Expand Up @@ -2572,6 +2572,25 @@ int o2hb_check_node_heartbeating(u8 node_num)
}
EXPORT_SYMBOL_GPL(o2hb_check_node_heartbeating);

int o2hb_check_node_heartbeating_no_sem(u8 node_num)
{
unsigned long testing_map[BITS_TO_LONGS(O2NM_MAX_NODES)];
unsigned long flags;

spin_lock_irqsave(&o2hb_live_lock, flags);
o2hb_fill_node_map_from_callback(testing_map, sizeof(testing_map));
spin_unlock_irqrestore(&o2hb_live_lock, flags);
if (!test_bit(node_num, testing_map)) {
mlog(ML_HEARTBEAT,
"node (%u) does not have heartbeating enabled.\n",
node_num);
return 0;
}

return 1;
}
EXPORT_SYMBOL_GPL(o2hb_check_node_heartbeating_no_sem);

int o2hb_check_node_heartbeating_from_callback(u8 node_num)
{
unsigned long testing_map[BITS_TO_LONGS(O2NM_MAX_NODES)];
Expand Down
1 change: 1 addition & 0 deletions fs/ocfs2/cluster/heartbeat.h
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ void o2hb_fill_node_map(unsigned long *map,
void o2hb_exit(void);
int o2hb_init(void);
int o2hb_check_node_heartbeating(u8 node_num);
int o2hb_check_node_heartbeating_no_sem(u8 node_num);
int o2hb_check_node_heartbeating_from_callback(u8 node_num);
int o2hb_check_local_node_heartbeating(void);
void o2hb_stop_all_regions(void);
Expand Down
2 changes: 1 addition & 1 deletion fs/ocfs2/dlm/dlmdomain.c
Original file line number Diff line number Diff line change
Expand Up @@ -839,7 +839,7 @@ static int dlm_query_join_handler(struct o2net_msg *msg, u32 len, void *data,
* to back off and try again. This gives heartbeat a chance
* to catch up.
*/
if (!o2hb_check_node_heartbeating(query->node_idx)) {
if (!o2hb_check_node_heartbeating_no_sem(query->node_idx)) {
mlog(0, "node %u is not in our live map yet\n",
query->node_idx);

Expand Down

0 comments on commit 70e82a1

Please sign in to comment.