Skip to content

Commit

Permalink
mm,thp,rmap: simplify compound page mapcount handling
Browse files Browse the repository at this point in the history
Compound page (folio) mapcount calculations have been different for anon
and file (or shmem) THPs, and involved the obscure PageDoubleMap flag. 
And each huge mapping and unmapping of a file (or shmem) THP involved
atomically incrementing and decrementing the mapcount of every subpage of
that huge page, dirtying many struct page cachelines.

Add subpages_mapcount field to the struct folio and first tail page, so
that the total of subpage mapcounts is available in one place near the
head: then page_mapcount() and total_mapcount() and page_mapped(), and
their folio equivalents, are so quick that anon and file and hugetlb don't
need to be optimized differently.  Delete the unloved PageDoubleMap.

page_add and page_remove rmap functions must now maintain the
subpages_mapcount as well as the subpage _mapcount, when dealing with pte
mappings of huge pages; and correct maintenance of NR_ANON_MAPPED and
NR_FILE_MAPPED statistics still needs reading through the subpages, using
nr_subpages_unmapped() - but only when first or last pmd mapping finds
subpages_mapcount raised (double-map case, not the common case).

But are those counts (used to decide when to split an anon THP, and in
vmscan's pagecache_reclaimable heuristic) correctly maintained?  Not
quite: since page_remove_rmap() (and also split_huge_pmd()) is often
called without page lock, there can be races when a subpage pte mapcount
0<->1 while compound pmd mapcount 0<->1 is scanning - races which the
previous implementation had prevented.  The statistics might become
inaccurate, and even drift down until they underflow through 0.  That is
not good enough, but is better dealt with in a followup patch.

Update a few comments on first and second tail page overlaid fields. 
hugepage_add_new_anon_rmap() has to "increment" compound_mapcount, but
subpages_mapcount and compound_pincount are already correctly at 0, so
delete its reinitialization of compound_pincount.

A simple 100 X munmap(mmap(2GB, MAP_SHARED|MAP_POPULATE, tmpfs), 2GB) took
18 seconds on small pages, and used to take 1 second on huge pages, but
now takes 119 milliseconds on huge pages.  Mapping by pmds a second time
used to take 860ms and now takes 92ms; mapping by pmds after mapping by
ptes (when the scan is needed) used to take 870ms and now takes 495ms. 
But there might be some benchmarks which would show a slowdown, because
tail struct pages now fall out of cache until final freeing checks them.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: James Houghton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mina Almasry <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Sidhartha Kumar <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zach O'Keefe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
  • Loading branch information
Hugh Dickins authored and akpm00 committed Nov 30, 2022
1 parent dad6a5e commit cb67f42
Show file tree
Hide file tree
Showing 13 changed files with 194 additions and 261 deletions.
18 changes: 0 additions & 18 deletions Documentation/mm/transhuge.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,24 +125,6 @@ pages:
->_mapcount of all sub-pages in order to have race-free detection of
last unmap of subpages.

PageDoubleMap() indicates that the page is *possibly* mapped with PTEs.

For anonymous pages, PageDoubleMap() also indicates ->_mapcount in all
subpages is offset up by one. This additional reference is required to
get race-free detection of unmap of subpages when we have them mapped with
both PMDs and PTEs.

This optimization is required to lower the overhead of per-subpage mapcount
tracking. The alternative is to alter ->_mapcount in all subpages on each
map/unmap of the whole compound page.

For anonymous pages, we set PG_double_map when a PMD of the page is split
for the first time, but still have a PMD mapping. The additional references
go away with the last compound_mapcount.

File pages get PG_double_map set on the first map of the page with PTE and
goes away when the page gets evicted from the page cache.

split_huge_page internally has to distribute the refcounts in the head
page to the tail pages before clearing all PG_head/tail bits from the page
structures. It can be done easily for refcounts taken by page table
Expand Down
85 changes: 62 additions & 23 deletions include/linux/mm.h
Original file line number Diff line number Diff line change
Expand Up @@ -818,8 +818,8 @@ static inline int is_vmalloc_or_module_addr(const void *x)
/*
* How many times the entire folio is mapped as a single unit (eg by a
* PMD or PUD entry). This is probably not what you want, except for
* debugging purposes; look at folio_mapcount() or page_mapcount()
* instead.
* debugging purposes - it does not include PTE-mapped sub-pages; look
* at folio_mapcount() or page_mapcount() or total_mapcount() instead.
*/
static inline int folio_entire_mapcount(struct folio *folio)
{
Expand All @@ -829,12 +829,20 @@ static inline int folio_entire_mapcount(struct folio *folio)

/*
* Mapcount of compound page as a whole, does not include mapped sub-pages.
*
* Must be called only for compound pages.
* Must be called only on head of compound page.
*/
static inline int compound_mapcount(struct page *page)
static inline int head_compound_mapcount(struct page *head)
{
return folio_entire_mapcount(page_folio(page));
return atomic_read(compound_mapcount_ptr(head)) + 1;
}

/*
* Sum of mapcounts of sub-pages, does not include compound mapcount.
* Must be called only on head of compound page.
*/
static inline int head_subpages_mapcount(struct page *head)
{
return atomic_read(subpages_mapcount_ptr(head));
}

/*
Expand All @@ -847,37 +855,71 @@ static inline void page_mapcount_reset(struct page *page)
atomic_set(&(page)->_mapcount, -1);
}

int __page_mapcount(struct page *page);

/*
* Mapcount of 0-order page; when compound sub-page, includes
* compound_mapcount().
* compound_mapcount of compound_head of page.
*
* Result is undefined for pages which cannot be mapped into userspace.
* For example SLAB or special types of pages. See function page_has_type().
* They use this place in struct page differently.
*/
static inline int page_mapcount(struct page *page)
{
if (unlikely(PageCompound(page)))
return __page_mapcount(page);
return atomic_read(&page->_mapcount) + 1;
}
int mapcount = atomic_read(&page->_mapcount) + 1;

int folio_mapcount(struct folio *folio);
if (likely(!PageCompound(page)))
return mapcount;
page = compound_head(page);
return head_compound_mapcount(page) + mapcount;
}

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline int total_mapcount(struct page *page)
{
return folio_mapcount(page_folio(page));
if (likely(!PageCompound(page)))
return atomic_read(&page->_mapcount) + 1;
page = compound_head(page);
return head_compound_mapcount(page) + head_subpages_mapcount(page);
}

#else
static inline int total_mapcount(struct page *page)
/*
* Return true if this page is mapped into pagetables.
* For compound page it returns true if any subpage of compound page is mapped,
* even if this particular subpage is not itself mapped by any PTE or PMD.
*/
static inline bool page_mapped(struct page *page)
{
return page_mapcount(page);
return total_mapcount(page) > 0;
}

/**
* folio_mapcount() - Calculate the number of mappings of this folio.
* @folio: The folio.
*
* A large folio tracks both how many times the entire folio is mapped,
* and how many times each individual page in the folio is mapped.
* This function calculates the total number of times the folio is
* mapped.
*
* Return: The number of times this folio is mapped.
*/
static inline int folio_mapcount(struct folio *folio)
{
if (likely(!folio_test_large(folio)))
return atomic_read(&folio->_mapcount) + 1;
return atomic_read(folio_mapcount_ptr(folio)) + 1 +
atomic_read(folio_subpages_mapcount_ptr(folio));
}

/**
* folio_mapped - Is this folio mapped into userspace?
* @folio: The folio.
*
* Return: True if any page in this folio is referenced by user page tables.
*/
static inline bool folio_mapped(struct folio *folio)
{
return folio_mapcount(folio) > 0;
}
#endif

static inline struct page *virt_to_head_page(const void *x)
{
Expand Down Expand Up @@ -1800,9 +1842,6 @@ static inline pgoff_t page_index(struct page *page)
return page->index;
}

bool page_mapped(struct page *page);
bool folio_mapped(struct folio *folio);

/*
* Return true only if the page has been allocated with
* ALLOC_NO_WATERMARKS and the low watermark was not
Expand Down
21 changes: 18 additions & 3 deletions include/linux/mm_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ struct page {
unsigned char compound_dtor;
unsigned char compound_order;
atomic_t compound_mapcount;
atomic_t subpages_mapcount;
atomic_t compound_pincount;
#ifdef CONFIG_64BIT
unsigned int compound_nr; /* 1 << compound_order */
Expand Down Expand Up @@ -270,7 +271,8 @@ struct page {
* @_head_1: Points to the folio. Do not use.
* @_folio_dtor: Which destructor to use for this folio.
* @_folio_order: Do not use directly, call folio_order().
* @_total_mapcount: Do not use directly, call folio_entire_mapcount().
* @_compound_mapcount: Do not use directly, call folio_entire_mapcount().
* @_subpages_mapcount: Do not use directly, call folio_mapcount().
* @_pincount: Do not use directly, call folio_maybe_dma_pinned().
* @_folio_nr_pages: Do not use directly, call folio_nr_pages().
* @_flags_2: For alignment. Do not use.
Expand Down Expand Up @@ -323,7 +325,8 @@ struct folio {
unsigned long _head_1;
unsigned char _folio_dtor;
unsigned char _folio_order;
atomic_t _total_mapcount;
atomic_t _compound_mapcount;
atomic_t _subpages_mapcount;
atomic_t _pincount;
#ifdef CONFIG_64BIT
unsigned int _folio_nr_pages;
Expand Down Expand Up @@ -365,7 +368,8 @@ FOLIO_MATCH(flags, _flags_1);
FOLIO_MATCH(compound_head, _head_1);
FOLIO_MATCH(compound_dtor, _folio_dtor);
FOLIO_MATCH(compound_order, _folio_order);
FOLIO_MATCH(compound_mapcount, _total_mapcount);
FOLIO_MATCH(compound_mapcount, _compound_mapcount);
FOLIO_MATCH(subpages_mapcount, _subpages_mapcount);
FOLIO_MATCH(compound_pincount, _pincount);
#ifdef CONFIG_64BIT
FOLIO_MATCH(compound_nr, _folio_nr_pages);
Expand All @@ -388,11 +392,22 @@ static inline atomic_t *folio_mapcount_ptr(struct folio *folio)
return &tail->compound_mapcount;
}

static inline atomic_t *folio_subpages_mapcount_ptr(struct folio *folio)
{
struct page *tail = &folio->page + 1;
return &tail->subpages_mapcount;
}

static inline atomic_t *compound_mapcount_ptr(struct page *page)
{
return &page[1].compound_mapcount;
}

static inline atomic_t *subpages_mapcount_ptr(struct page *page)
{
return &page[1].subpages_mapcount;
}

static inline atomic_t *compound_pincount_ptr(struct page *page)
{
return &page[1].compound_pincount;
Expand Down
21 changes: 0 additions & 21 deletions include/linux/page-flags.h
Original file line number Diff line number Diff line change
Expand Up @@ -176,9 +176,6 @@ enum pageflags {
/* SLOB */
PG_slob_free = PG_private,

/* Compound pages. Stored in first tail page's flags */
PG_double_map = PG_workingset,

#ifdef CONFIG_MEMORY_FAILURE
/*
* Compound pages. Stored in first tail page's flags.
Expand Down Expand Up @@ -874,29 +871,11 @@ static inline int PageTransTail(struct page *page)
{
return PageTail(page);
}

/*
* PageDoubleMap indicates that the compound page is mapped with PTEs as well
* as PMDs.
*
* This is required for optimization of rmap operations for THP: we can postpone
* per small page mapcount accounting (and its overhead from atomic operations)
* until the first PMD split.
*
* For the page PageDoubleMap means ->_mapcount in all sub-pages is offset up
* by one. This reference will go away with last compound_mapcount.
*
* See also __split_huge_pmd_locked() and page_remove_anon_compound_rmap().
*/
PAGEFLAG(DoubleMap, double_map, PF_SECOND)
TESTSCFLAG(DoubleMap, double_map, PF_SECOND)
#else
TESTPAGEFLAG_FALSE(TransHuge, transhuge)
TESTPAGEFLAG_FALSE(TransCompound, transcompound)
TESTPAGEFLAG_FALSE(TransCompoundMap, transcompoundmap)
TESTPAGEFLAG_FALSE(TransTail, transtail)
PAGEFLAG_FALSE(DoubleMap, double_map)
TESTSCFLAG_FALSE(DoubleMap, double_map)
#endif

#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
Expand Down
2 changes: 2 additions & 0 deletions include/linux/rmap.h
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,8 @@ void hugepage_add_new_anon_rmap(struct page *, struct vm_area_struct *,

static inline void __page_dup_rmap(struct page *page, bool compound)
{
if (!compound && PageCompound(page))
atomic_inc(subpages_mapcount_ptr(compound_head(page)));
atomic_inc(compound ? compound_mapcount_ptr(page) : &page->_mapcount);
}

Expand Down
5 changes: 3 additions & 2 deletions mm/debug.c
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,10 @@ static void __dump_page(struct page *page)
page, page_ref_count(head), mapcount, mapping,
page_to_pgoff(page), page_to_pfn(page));
if (compound) {
pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n",
pr_warn("head:%p order:%u compound_mapcount:%d subpages_mapcount:%d compound_pincount:%d\n",
head, compound_order(head),
folio_entire_mapcount(folio),
head_compound_mapcount(head),
head_subpages_mapcount(head),
head_compound_pincount(head));
}

Expand Down
6 changes: 0 additions & 6 deletions mm/folio-compat.c
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,6 @@ void wait_for_stable_page(struct page *page)
}
EXPORT_SYMBOL_GPL(wait_for_stable_page);

bool page_mapped(struct page *page)
{
return folio_mapped(page_folio(page));
}
EXPORT_SYMBOL(page_mapped);

void mark_page_accessed(struct page *page)
{
folio_mark_accessed(page_folio(page));
Expand Down
36 changes: 8 additions & 28 deletions mm/huge_memory.c
Original file line number Diff line number Diff line change
Expand Up @@ -2142,6 +2142,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,

VM_BUG_ON_PAGE(!page_count(page), page);
page_ref_add(page, HPAGE_PMD_NR - 1);
atomic_add(HPAGE_PMD_NR, subpages_mapcount_ptr(page));

/*
* Without "freeze", we'll simply split the PMD, propagating the
Expand Down Expand Up @@ -2225,33 +2226,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
pte_unmap(pte);
}

if (!pmd_migration) {
/*
* Set PG_double_map before dropping compound_mapcount to avoid
* false-negative page_mapped().
*/
if (compound_mapcount(page) > 1 &&
!TestSetPageDoubleMap(page)) {
for (i = 0; i < HPAGE_PMD_NR; i++)
atomic_inc(&page[i]._mapcount);
}

lock_page_memcg(page);
if (atomic_add_negative(-1, compound_mapcount_ptr(page))) {
/* Last compound_mapcount is gone. */
__mod_lruvec_page_state(page, NR_ANON_THPS,
-HPAGE_PMD_NR);
if (TestClearPageDoubleMap(page)) {
/* No need in mapcount reference anymore */
for (i = 0; i < HPAGE_PMD_NR; i++)
atomic_dec(&page[i]._mapcount);
}
}
unlock_page_memcg(page);

/* Above is effectively page_remove_rmap(page, vma, true) */
munlock_vma_page(page, vma, true);
}
if (!pmd_migration)
page_remove_rmap(page, vma, true);

smp_wmb(); /* make pte visible before pmd */
pmd_populate(mm, pmd, pgtable);
Expand Down Expand Up @@ -2453,7 +2429,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
(1L << PG_dirty) |
LRU_GEN_MASK | LRU_REFS_MASK));

/* ->mapping in first tail page is compound_mapcount */
/* ->mapping in first and second tail page is replaced by other uses */
VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING,
page_tail);
page_tail->mapping = head->mapping;
Expand All @@ -2463,6 +2439,10 @@ static void __split_huge_page_tail(struct page *head, int tail,
* page->private should not be set in tail pages with the exception
* of swap cache pages that store the swp_entry_t in tail pages.
* Fix up and warn once if private is unexpectedly set.
*
* What of 32-bit systems, on which head[1].compound_pincount overlays
* head[1].private? No problem: THP_SWAP is not enabled on 32-bit, and
* compound_pincount must be 0 for folio_ref_freeze() to have succeeded.
*/
if (!folio_test_swapcache(page_folio(head))) {
VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, page_tail);
Expand Down
2 changes: 2 additions & 0 deletions mm/hugetlb.c
Original file line number Diff line number Diff line change
Expand Up @@ -1333,6 +1333,7 @@ static void __destroy_compound_gigantic_page(struct page *page,
struct page *p;

atomic_set(compound_mapcount_ptr(page), 0);
atomic_set(subpages_mapcount_ptr(page), 0);
atomic_set(compound_pincount_ptr(page), 0);

for (i = 1; i < nr_pages; i++) {
Expand Down Expand Up @@ -1852,6 +1853,7 @@ static bool __prep_compound_gigantic_page(struct page *page, unsigned int order,
set_compound_head(p, page);
}
atomic_set(compound_mapcount_ptr(page), -1);
atomic_set(subpages_mapcount_ptr(page), 0);
atomic_set(compound_pincount_ptr(page), 0);
return true;

Expand Down
11 changes: 2 additions & 9 deletions mm/khugepaged.c
Original file line number Diff line number Diff line change
Expand Up @@ -1238,15 +1238,8 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
/*
* Check if the page has any GUP (or other external) pins.
*
* Here the check is racy it may see total_mapcount > refcount
* in some cases.
* For example, one process with one forked child process.
* The parent has the PMD split due to MADV_DONTNEED, then
* the child is trying unmap the whole PMD, but khugepaged
* may be scanning the parent between the child has
* PageDoubleMap flag cleared and dec the mapcount. So
* khugepaged may see total_mapcount > refcount.
*
* Here the check may be racy:
* it may see total_mapcount > refcount in some cases?
* But such case is ephemeral we could always retry collapse
* later. However it may report false positive if the page
* has excessive GUP pins (i.e. 512). Anyway the same check
Expand Down
Loading

0 comments on commit cb67f42

Please sign in to comment.