Skip to content

Commit

Permalink
userfaultfd: use RCU to free the task struct when fork fails
Browse files Browse the repository at this point in the history
The task structure is freed while get_mem_cgroup_from_mm() holds
rcu_read_lock() and dereferences mm->owner.

  get_mem_cgroup_from_mm()                failing fork()
  ----                                    ---
  task = mm->owner
                                          mm->owner = NULL;
                                          free(task)
  if (task) *task; /* use after free */

The fix consists in freeing the task with RCU also in the fork failure
case, exactly like it always happens for the regular exit(2) path.  That
is enough to make the rcu_read_lock hold in get_mem_cgroup_from_mm()
(left side above) effective to avoid a use after free when dereferencing
the task structure.

An alternate possible fix would be to defer the delivery of the
userfaultfd contexts to the monitor until after fork() is guaranteed to
succeed.  Such a change would require more changes because it would
create a strict ordering dependency where the uffd methods would need to
be called beyond the last potentially failing branch in order to be
safe.  This solution as opposed only adds the dependency to common code
to set mm->owner to NULL and to free the task struct that was pointed by
mm->owner with RCU, if fork ends up failing.  The userfaultfd methods
can still be called anywhere during the fork runtime and the monitor
will keep discarding orphaned "mm" coming from failed forks in userland.

This race condition couldn't trigger if CONFIG_MEMCG was set =n at build
time.

[[email protected]: improve changelog, reduce #ifdefs per Michal]
  Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 893e26e ("userfaultfd: non-cooperative: Add fork() event")
Signed-off-by: Andrea Arcangeli <[email protected]>
Tested-by: zhong jiang <[email protected]>
Reported-by: [email protected]
Cc: Oleg Nesterov <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: zhong jiang <[email protected]>
Cc: [email protected]
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
aagit authored and torvalds committed May 15, 2019
1 parent acb2ec3 commit c3f3ce0
Showing 1 changed file with 29 additions and 2 deletions.
31 changes: 29 additions & 2 deletions kernel/fork.c
Original file line number Diff line number Diff line change
Expand Up @@ -955,6 +955,15 @@ static void mm_init_aio(struct mm_struct *mm)
#endif
}

static __always_inline void mm_clear_owner(struct mm_struct *mm,
struct task_struct *p)
{
#ifdef CONFIG_MEMCG
if (mm->owner == p)
WRITE_ONCE(mm->owner, NULL);
#endif
}

static void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
{
#ifdef CONFIG_MEMCG
Expand Down Expand Up @@ -1343,6 +1352,7 @@ static struct mm_struct *dup_mm(struct task_struct *tsk,
free_pt:
/* don't put binfmt in mmput, we haven't got module yet */
mm->binfmt = NULL;
mm_init_owner(mm, NULL);
mmput(mm);

fail_nomem:
Expand Down Expand Up @@ -1726,6 +1736,21 @@ static int pidfd_create(struct pid *pid)
return fd;
}

static void __delayed_free_task(struct rcu_head *rhp)
{
struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);

free_task(tsk);
}

static __always_inline void delayed_free_task(struct task_struct *tsk)
{
if (IS_ENABLED(CONFIG_MEMCG))
call_rcu(&tsk->rcu, __delayed_free_task);
else
free_task(tsk);
}

/*
* This creates a new process as a copy of the old one,
* but does not actually start it yet.
Expand Down Expand Up @@ -2233,8 +2258,10 @@ static __latent_entropy struct task_struct *copy_process(
bad_fork_cleanup_namespaces:
exit_task_namespaces(p);
bad_fork_cleanup_mm:
if (p->mm)
if (p->mm) {
mm_clear_owner(p->mm, p);
mmput(p->mm);
}
bad_fork_cleanup_signal:
if (!(clone_flags & CLONE_THREAD))
free_signal_struct(p->signal);
Expand Down Expand Up @@ -2265,7 +2292,7 @@ static __latent_entropy struct task_struct *copy_process(
bad_fork_free:
p->state = TASK_DEAD;
put_task_stack(p);
free_task(p);
delayed_free_task(p);
fork_out:
spin_lock_irq(&current->sighand->siglock);
hlist_del_init(&delayed.node);
Expand Down

0 comments on commit c3f3ce0

Please sign in to comment.