Skip to content

Commit

Permalink
mlock: mlocked pages are unevictable
Browse files Browse the repository at this point in the history
Make sure that mlocked pages also live on the unevictable LRU, so kswapd
will not scan them over and over again.

This is achieved through various strategies:

1) add yet another page flag--PG_mlocked--to indicate that
   the page is locked for efficient testing in vmscan and,
   optionally, fault path.  This allows early culling of
   unevictable pages, preventing them from getting to
   page_referenced()/try_to_unmap().  Also allows separate
   accounting of mlock'd pages, as Nick's original patch
   did.

   Note:  Nick's original mlock patch used a PG_mlocked
   flag.  I had removed this in favor of the PG_unevictable
   flag + an mlock_count [new page struct member].  I
   restored the PG_mlocked flag to eliminate the new
   count field.

2) add the mlock/unevictable infrastructure to mm/mlock.c,
   with internal APIs in mm/internal.h.  This is a rework
   of Nick's original patch to these files, taking into
   account that mlocked pages are now kept on unevictable
   LRU list.

3) update vmscan.c:page_evictable() to check PageMlocked()
   and, if vma passed in, the vm_flags.  Note that the vma
   will only be passed in for new pages in the fault path;
   and then only if the "cull unevictable pages in fault
   path" patch is included.

4) add try_to_unlock() to rmap.c to walk a page's rmap and
   ClearPageMlocked() if no other vmas have it mlocked.
   Reuses as much of try_to_unmap() as possible.  This
   effectively replaces the use of one of the lru list links
   as an mlock count.  If this mechanism let's pages in mlocked
   vmas leak through w/o PG_mlocked set [I don't know that it
   does], we should catch them later in try_to_unmap().  One
   hopes this will be rare, as it will be relatively expensive.

Original mm/internal.h, mm/rmap.c and mm/mlock.c changes:
Signed-off-by: Nick Piggin <[email protected]>

splitlru: introduce __get_user_pages():

  New munlock processing need to GUP_FLAGS_IGNORE_VMA_PERMISSIONS.
  because current get_user_pages() can't grab PROT_NONE pages theresore it
  cause PROT_NONE pages can't munlock.

[[email protected]: fix this for pagemap-pass-mm-into-pagewalkers.patch]
[[email protected]: untangle patch interdependencies]
[[email protected]: fix things after out-of-order merging]
[[email protected]: fix page-flags mess]
[[email protected]: fix munlock page table walk - now requires 'mm']
[[email protected]: build fix]
[[email protected]: fix truncate race and sevaral comments]
[[email protected]: splitlru: introduce __get_user_pages()]
Signed-off-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Rik van Riel <[email protected]>
Signed-off-by: Lee Schermerhorn <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Matt Mackall <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
Nick Piggin authored and torvalds committed Oct 20, 2008
1 parent 89e004e commit b291f00
Show file tree
Hide file tree
Showing 13 changed files with 817 additions and 91 deletions.
5 changes: 5 additions & 0 deletions include/linux/mm.h
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,11 @@ extern unsigned int kobjsize(const void *objp);
#define VM_SequentialReadHint(v) ((v)->vm_flags & VM_SEQ_READ)
#define VM_RandomReadHint(v) ((v)->vm_flags & VM_RAND_READ)

/*
* special vmas that are non-mergable, non-mlock()able
*/
#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP)

/*
* mapping from the currently active vm_flags protection bits (the
* low four bits) to a page protection mask..
Expand Down
19 changes: 16 additions & 3 deletions include/linux/page-flags.h
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ enum pageflags {
PG_swapbacked, /* Page is backed by RAM/swap */
#ifdef CONFIG_UNEVICTABLE_LRU
PG_unevictable, /* Page is "unevictable" */
PG_mlocked, /* Page is vma mlocked */
#endif
#ifdef CONFIG_IA64_UNCACHED_ALLOCATOR
PG_uncached, /* Page has been mapped as uncached */
Expand Down Expand Up @@ -232,7 +233,17 @@ PAGEFLAG_FALSE(SwapCache)
#ifdef CONFIG_UNEVICTABLE_LRU
PAGEFLAG(Unevictable, unevictable) __CLEARPAGEFLAG(Unevictable, unevictable)
TESTCLEARFLAG(Unevictable, unevictable)

#define MLOCK_PAGES 1
PAGEFLAG(Mlocked, mlocked) __CLEARPAGEFLAG(Mlocked, mlocked)
TESTSCFLAG(Mlocked, mlocked)

#else

#define MLOCK_PAGES 0
PAGEFLAG_FALSE(Mlocked)
SETPAGEFLAG_NOOP(Mlocked) TESTCLEARFLAG_FALSE(Mlocked)

PAGEFLAG_FALSE(Unevictable) TESTCLEARFLAG_FALSE(Unevictable)
SETPAGEFLAG_NOOP(Unevictable) CLEARPAGEFLAG_NOOP(Unevictable)
__CLEARPAGEFLAG_NOOP(Unevictable)
Expand Down Expand Up @@ -354,15 +365,17 @@ static inline void __ClearPageTail(struct page *page)
#endif /* !PAGEFLAGS_EXTENDED */

#ifdef CONFIG_UNEVICTABLE_LRU
#define __PG_UNEVICTABLE (1 << PG_unevictable)
#define __PG_UNEVICTABLE (1 << PG_unevictable)
#define __PG_MLOCKED (1 << PG_mlocked)
#else
#define __PG_UNEVICTABLE 0
#define __PG_UNEVICTABLE 0
#define __PG_MLOCKED 0
#endif

#define PAGE_FLAGS (1 << PG_lru | 1 << PG_private | 1 << PG_locked | \
1 << PG_buddy | 1 << PG_writeback | \
1 << PG_slab | 1 << PG_swapcache | 1 << PG_active | \
__PG_UNEVICTABLE)
__PG_UNEVICTABLE | __PG_MLOCKED)

/*
* Flags checked in bad_page(). Pages on the free list should not have
Expand Down
14 changes: 14 additions & 0 deletions include/linux/rmap.h
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,19 @@ unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
*/
int page_mkclean(struct page *);

#ifdef CONFIG_UNEVICTABLE_LRU
/*
* called in munlock()/munmap() path to check for other vmas holding
* the page mlocked.
*/
int try_to_munlock(struct page *);
#else
static inline int try_to_munlock(struct page *page)
{
return 0; /* a.k.a. SWAP_SUCCESS */
}
#endif

#else /* !CONFIG_MMU */

#define anon_vma_init() do {} while (0)
Expand All @@ -140,5 +153,6 @@ static inline int page_mkclean(struct page *page)
#define SWAP_SUCCESS 0
#define SWAP_AGAIN 1
#define SWAP_FAIL 2
#define SWAP_MLOCK 3

#endif /* _LINUX_RMAP_H */
71 changes: 71 additions & 0 deletions mm/internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ static inline unsigned long page_order(struct page *page)
return page_private(page);
}

extern int mlock_vma_pages_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
extern void munlock_vma_pages_all(struct vm_area_struct *vma);

#ifdef CONFIG_UNEVICTABLE_LRU
/*
* unevictable_migrate_page() called only from migrate_page_copy() to
Expand All @@ -79,6 +83,65 @@ static inline void unevictable_migrate_page(struct page *new, struct page *old)
}
#endif

#ifdef CONFIG_UNEVICTABLE_LRU
/*
* Called only in fault path via page_evictable() for a new page
* to determine if it's being mapped into a LOCKED vma.
* If so, mark page as mlocked.
*/
static inline int is_mlocked_vma(struct vm_area_struct *vma, struct page *page)
{
VM_BUG_ON(PageLRU(page));

if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED))
return 0;

SetPageMlocked(page);
return 1;
}

/*
* must be called with vma's mmap_sem held for read, and page locked.
*/
extern void mlock_vma_page(struct page *page);

/*
* Clear the page's PageMlocked(). This can be useful in a situation where
* we want to unconditionally remove a page from the pagecache -- e.g.,
* on truncation or freeing.
*
* It is legal to call this function for any page, mlocked or not.
* If called for a page that is still mapped by mlocked vmas, all we do
* is revert to lazy LRU behaviour -- semantics are not broken.
*/
extern void __clear_page_mlock(struct page *page);
static inline void clear_page_mlock(struct page *page)
{
if (unlikely(TestClearPageMlocked(page)))
__clear_page_mlock(page);
}

/*
* mlock_migrate_page - called only from migrate_page_copy() to
* migrate the Mlocked page flag
*/
static inline void mlock_migrate_page(struct page *newpage, struct page *page)
{
if (TestClearPageMlocked(page))
SetPageMlocked(newpage);
}


#else /* CONFIG_UNEVICTABLE_LRU */
static inline int is_mlocked_vma(struct vm_area_struct *v, struct page *p)
{
return 0;
}
static inline void clear_page_mlock(struct page *page) { }
static inline void mlock_vma_page(struct page *page) { }
static inline void mlock_migrate_page(struct page *new, struct page *old) { }

#endif /* CONFIG_UNEVICTABLE_LRU */

/*
* FLATMEM and DISCONTIGMEM configurations use alloc_bootmem_node,
Expand Down Expand Up @@ -148,4 +211,12 @@ static inline void mminit_validate_memmodel_limits(unsigned long *start_pfn,
}
#endif /* CONFIG_SPARSEMEM */

#define GUP_FLAGS_WRITE 0x1
#define GUP_FLAGS_FORCE 0x2
#define GUP_FLAGS_IGNORE_VMA_PERMISSIONS 0x4

int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int len, int flags,
struct page **pages, struct vm_area_struct **vmas);

#endif
56 changes: 49 additions & 7 deletions mm/memory.c
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@

#include "internal.h"

#include "internal.h"

#ifndef CONFIG_NEED_MULTIPLE_NODES
/* use the per-pgdat data instead for discontigmem - mbligh */
unsigned long max_mapnr;
Expand Down Expand Up @@ -1129,12 +1131,17 @@ static inline int use_zero_page(struct vm_area_struct *vma)
return !vma->vm_ops || !vma->vm_ops->fault;
}

int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int len, int write, int force,


int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int len, int flags,
struct page **pages, struct vm_area_struct **vmas)
{
int i;
unsigned int vm_flags;
unsigned int vm_flags = 0;
int write = !!(flags & GUP_FLAGS_WRITE);
int force = !!(flags & GUP_FLAGS_FORCE);
int ignore = !!(flags & GUP_FLAGS_IGNORE_VMA_PERMISSIONS);

if (len <= 0)
return 0;
Expand All @@ -1158,7 +1165,9 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
if (write) /* user gate pages are read-only */

/* user gate pages are read-only */
if (!ignore && write)
return i ? : -EFAULT;
if (pg > TASK_SIZE)
pgd = pgd_offset_k(pg);
Expand Down Expand Up @@ -1190,8 +1199,9 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
continue;
}

if (!vma || (vma->vm_flags & (VM_IO | VM_PFNMAP))
|| !(vm_flags & vma->vm_flags))
if (!vma ||
(vma->vm_flags & (VM_IO | VM_PFNMAP)) ||
(!ignore && !(vm_flags & vma->vm_flags)))
return i ? : -EFAULT;

if (is_vm_hugetlb_page(vma)) {
Expand Down Expand Up @@ -1266,6 +1276,23 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
} while (len);
return i;
}

int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int len, int write, int force,
struct page **pages, struct vm_area_struct **vmas)
{
int flags = 0;

if (write)
flags |= GUP_FLAGS_WRITE;
if (force)
flags |= GUP_FLAGS_FORCE;

return __get_user_pages(tsk, mm,
start, len, flags,
pages, vmas);
}

EXPORT_SYMBOL(get_user_pages);

pte_t *get_locked_pte(struct mm_struct *mm, unsigned long addr,
Expand Down Expand Up @@ -1858,6 +1885,15 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
if (!new_page)
goto oom;
/*
* Don't let another task, with possibly unlocked vma,
* keep the mlocked page.
*/
if (vma->vm_flags & VM_LOCKED) {
lock_page(old_page); /* for LRU manipulation */
clear_page_mlock(old_page);
unlock_page(old_page);
}
cow_user_page(new_page, old_page, address, vma);
__SetPageUptodate(new_page);

Expand Down Expand Up @@ -2325,7 +2361,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
page_add_anon_rmap(page, vma, address);

swap_free(entry);
if (vm_swap_full())
if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
remove_exclusive_swap_page(page);
unlock_page(page);

Expand Down Expand Up @@ -2465,6 +2501,12 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
ret = VM_FAULT_OOM;
goto out;
}
/*
* Don't let another task, with possibly unlocked vma,
* keep the mlocked page.
*/
if (vma->vm_flags & VM_LOCKED)
clear_page_mlock(vmf.page);
copy_user_highpage(page, vmf.page, address, vma);
__SetPageUptodate(page);
} else {
Expand Down
2 changes: 2 additions & 0 deletions mm/migrate.c
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,8 @@ static void migrate_page_copy(struct page *newpage, struct page *page)
__set_page_dirty_nobuffers(newpage);
}

mlock_migrate_page(newpage, page);

#ifdef CONFIG_SWAP
ClearPageSwapCache(page);
#endif
Expand Down
Loading

0 comments on commit b291f00

Please sign in to comment.