Skip to content

Commit

Permalink
mm, oom_reaper: fix memory corruption
Browse files Browse the repository at this point in the history
David Rientjes has reported the following memory corruption while the
oom reaper tries to unmap the victims address space

  BUG: Bad page map in process oom_reaper  pte:6353826300000000 pmd:00000000
  addr:00007f50cab1d000 vm_flags:08100073 anon_vma:ffff9eea335603f0 mapping:          (null) index:7f50cab1d
  file:          (null) fault:          (null) mmap:          (null) readpage:          (null)
  CPU: 2 PID: 1001 Comm: oom_reaper
  Call Trace:
     unmap_page_range+0x1068/0x1130
     __oom_reap_task_mm+0xd5/0x16b
     oom_reaper+0xff/0x14c
     kthread+0xc1/0xe0

Tetsuo Handa has noticed that the synchronization inside exit_mmap is
insufficient.  We only synchronize with the oom reaper if
tsk_is_oom_victim which is not true if the final __mmput is called from
a different context than the oom victim exit path.  This can trivially
happen from context of any task which has grabbed mm reference (e.g.  to
read /proc/<pid>/ file which requires mm etc.).

The race would look like this

  oom_reaper		oom_victim		task
						mmget_not_zero
			do_exit
			  mmput
  __oom_reap_task_mm				mmput
  						  __mmput
						    exit_mmap
						      remove_vma
    unmap_page_range

Fix this issue by providing a new mm_is_oom_victim() helper which
operates on the mm struct rather than a task.  Any context which
operates on a remote mm struct should use this helper in place of
tsk_is_oom_victim.  The flag is set in mark_oom_victim and never cleared
so it is stable in the exit_mmap path.

Debugged by Tetsuo Handa.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 2129258 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: Michal Hocko <[email protected]>
Reported-by: David Rientjes <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Andrea Argangeli <[email protected]>
Cc: <[email protected]>	[4.14]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
Michal Hocko authored and torvalds committed Dec 15, 2017
1 parent bdcf0a4 commit 4837fe3
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 6 deletions.
9 changes: 9 additions & 0 deletions include/linux/oom.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,15 @@ static inline bool tsk_is_oom_victim(struct task_struct * tsk)
return tsk->signal->oom_mm;
}

/*
* Use this helper if tsk->mm != mm and the victim mm needs a special
* handling. This is guaranteed to stay true after once set.
*/
static inline bool mm_is_oom_victim(struct mm_struct *mm)
{
return test_bit(MMF_OOM_VICTIM, &mm->flags);
}

/*
* Checks whether a page fault on the given mm is still reliable.
* This is no longer true if the oom reaper started to reap the
Expand Down
1 change: 1 addition & 0 deletions include/linux/sched/coredump.h
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ static inline int get_dumpable(struct mm_struct *mm)
#define MMF_UNSTABLE 22 /* mm is unstable for copy_from_user */
#define MMF_HUGE_ZERO_PAGE 23 /* mm has ever used the global huge zero page */
#define MMF_DISABLE_THP 24 /* disable THP for all VMAs */
#define MMF_OOM_VICTIM 25 /* mm is the oom victim */
#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)

#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
Expand Down
10 changes: 5 additions & 5 deletions mm/mmap.c
Original file line number Diff line number Diff line change
Expand Up @@ -3019,20 +3019,20 @@ void exit_mmap(struct mm_struct *mm)
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);

set_bit(MMF_OOM_SKIP, &mm->flags);
if (unlikely(tsk_is_oom_victim(current))) {
if (unlikely(mm_is_oom_victim(mm))) {
/*
* Wait for oom_reap_task() to stop working on this
* mm. Because MMF_OOM_SKIP is already set before
* calling down_read(), oom_reap_task() will not run
* on this "mm" post up_write().
*
* tsk_is_oom_victim() cannot be set from under us
* either because current->mm is already set to NULL
* mm_is_oom_victim() cannot be set from under us
* either because victim->mm is already set to NULL
* under task_lock before calling mmput and oom_mm is
* set not NULL by the OOM killer only if current->mm
* set not NULL by the OOM killer only if victim->mm
* is found not NULL while holding the task_lock.
*/
set_bit(MMF_OOM_SKIP, &mm->flags);
down_write(&mm->mmap_sem);
up_write(&mm->mmap_sem);
}
Expand Down
4 changes: 3 additions & 1 deletion mm/oom_kill.c
Original file line number Diff line number Diff line change
Expand Up @@ -683,8 +683,10 @@ static void mark_oom_victim(struct task_struct *tsk)
return;

/* oom_mm is bound to the signal struct life time. */
if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm))
if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) {
mmgrab(tsk->signal->oom_mm);
set_bit(MMF_OOM_VICTIM, &mm->flags);
}

/*
* Make sure that the task is woken up from uninterruptible sleep
Expand Down

0 comments on commit 4837fe3

Please sign in to comment.