diff options
author | Andrew Morton <akpm@linux-foundation.org> | 2024-04-10 14:00:41 -0700 |
---|---|---|
committer | Andrew Morton <akpm@linux-foundation.org> | 2024-04-10 14:00:41 -0700 |
commit | f92a3cca4a66258bb53db53ec73b01f6e3a1abef (patch) | |
tree | 570107cf75ecb1fbab730b4c745a930c9f69a0be | |
parent | 701e94e0cdc925a85b13d80b899d243d07746037 (diff) | |
download | 25-new-f92a3cca4a66258bb53db53ec73b01f6e3a1abef.tar.gz |
foo
84 files changed, 3119 insertions, 10 deletions
diff --git a/patches/documentation-admin-guide-cgroup-v1-memoryrst-dont-reference-page_mapcount.patch b/patches/documentation-admin-guide-cgroup-v1-memoryrst-dont-reference-page_mapcount.patch new file mode 100644 index 000000000..840b4dba4 --- /dev/null +++ b/patches/documentation-admin-guide-cgroup-v1-memoryrst-dont-reference-page_mapcount.patch @@ -0,0 +1,45 @@ +From: David Hildenbrand <david@redhat.com> +Subject: Documentation/admin-guide/cgroup-v1/memory.rst: don't reference page_mapcount() +Date: Tue, 9 Apr 2024 21:23:01 +0200 + +Let's stop talking about page_mapcount(). + +Link: https://lkml.kernel.org/r/20240409192301.907377-19-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + Documentation/admin-guide/cgroup-v1/memory.rst | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +--- a/Documentation/admin-guide/cgroup-v1/memory.rst~documentation-admin-guide-cgroup-v1-memoryrst-dont-reference-page_mapcount ++++ a/Documentation/admin-guide/cgroup-v1/memory.rst +@@ -802,8 +802,8 @@ a page or a swap can be moved only when + | | anonymous pages, file pages (and swaps) in the range mmapped by the task | + | | will be moved even if the task hasn't done page fault, i.e. they might | + | | not be the task's "RSS", but other task's "RSS" that maps the same file. | +-| | And mapcount of the page is ignored (the page can be moved even if | +-| | page_mapcount(page) > 1). You must enable Swap Extension (see 2.4) to | ++| | The mapcount of the page is ignored (the page can be moved independent | ++| | of the mapcount). You must enable Swap Extension (see 2.4) to | + | | enable move of swap charges. | + +---+--------------------------------------------------------------------------+ + +_ diff --git a/patches/drivers-virt-acrn-fix-pfnmap-pte-checks-in-acrn_vm_ram_map.patch b/patches/drivers-virt-acrn-fix-pfnmap-pte-checks-in-acrn_vm_ram_map.patch new file mode 100644 index 000000000..dcfbfc811 --- /dev/null +++ b/patches/drivers-virt-acrn-fix-pfnmap-pte-checks-in-acrn_vm_ram_map.patch @@ -0,0 +1,187 @@ +From: David Hildenbrand <david@redhat.com> +Subject: drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map() +Date: Wed, 10 Apr 2024 17:55:25 +0200 + +Patch series "mm: follow_pte() improvements and acrn follow_pte() fixes". + +Patch #1 fixes a bunch of issues I spotted in the acrn driver. It +compiles, that's all I know. I'll appreciate some review and testing from +acrn folks. + +Patch #2+#3 improve follow_pte(), passing a VMA instead of the MM, adding +more sanity checks, and improving the documentation. Gave it a quick test +on x86-64 using VM_PAT that ends up using follow_pte(). + + +This patch (of 3): + +We currently miss handling various cases, resulting in a dangerous +follow_pte() (previously follow_pfn()) usage. + +(1) We're not checking PTE write permissions. + +Maybe we should simply always require pte_write() like we do for +pin_user_pages_fast(FOLL_WRITE)? Hard to tell, so let's check for +ACRN_MEM_ACCESS_WRITE for now. + +(2) We're not rejecting refcounted pages. + +As we are not using MMU notifiers, messing with refcounted pages is +dangerous and can result in use-after-free. Let's make sure to reject them. + +(3) We are only looking at the first PTE of a bigger range. + +We only lookup a single PTE, but memmap->len may span a larger area. +Let's loop over all involved PTEs and make sure the PFN range is +actually contiguous. Reject everything else: it couldn't have worked +either way, and rather made use access PFNs we shouldn't be accessing. + +Link: https://lkml.kernel.org/r/20240410155527.474777-1-david@redhat.com +Link: https://lkml.kernel.org/r/20240410155527.474777-2-david@redhat.com +Fixes: 8a6e85f75a83 ("virt: acrn: obtain pa from VMA with PFNMAP flag") +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Alex Williamson <alex.williamson@redhat.com> +Cc: Christoph Hellwig <hch@lst.de> +Cc: Fei Li <fei1.li@intel.com> +Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> +Cc: Heiko Carstens <hca@linux.ibm.com> +Cc: Ingo Molnar <mingo@redhat.com> +Cc: Paolo Bonzini <pbonzini@redhat.com> +Cc: Yonghua Huang <yonghua.huang@intel.com> +Cc: Sean Christopherson <seanjc@google.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + drivers/virt/acrn/mm.c | 63 +++++++++++++++++++++++++++++---------- + 1 file changed, 47 insertions(+), 16 deletions(-) + +--- a/drivers/virt/acrn/mm.c~drivers-virt-acrn-fix-pfnmap-pte-checks-in-acrn_vm_ram_map ++++ a/drivers/virt/acrn/mm.c +@@ -156,23 +156,29 @@ int acrn_vm_memseg_unmap(struct acrn_vm + int acrn_vm_ram_map(struct acrn_vm *vm, struct acrn_vm_memmap *memmap) + { + struct vm_memory_region_batch *regions_info; +- int nr_pages, i = 0, order, nr_regions = 0; ++ int nr_pages, i, order, nr_regions = 0; + struct vm_memory_mapping *region_mapping; + struct vm_memory_region_op *vm_region; + struct page **pages = NULL, *page; + void *remap_vaddr; + int ret, pinned; + u64 user_vm_pa; +- unsigned long pfn; + struct vm_area_struct *vma; + + if (!vm || !memmap) + return -EINVAL; + ++ /* Get the page number of the map region */ ++ nr_pages = memmap->len >> PAGE_SHIFT; ++ if (!nr_pages) ++ return -EINVAL; ++ + mmap_read_lock(current->mm); + vma = vma_lookup(current->mm, memmap->vma_base); + if (vma && ((vma->vm_flags & VM_PFNMAP) != 0)) { ++ unsigned long start_pfn, cur_pfn; + spinlock_t *ptl; ++ bool writable; + pte_t *ptep; + + if ((memmap->vma_base + memmap->len) > vma->vm_end) { +@@ -180,25 +186,53 @@ int acrn_vm_ram_map(struct acrn_vm *vm, + return -EINVAL; + } + +- ret = follow_pte(vma->vm_mm, memmap->vma_base, &ptep, &ptl); +- if (ret < 0) { +- mmap_read_unlock(current->mm); ++ for (i = 0; i < nr_pages; i++) { ++ ret = follow_pte(vma->vm_mm, ++ memmap->vma_base + i * PAGE_SIZE, ++ &ptep, &ptl); ++ if (ret) ++ break; ++ ++ cur_pfn = pte_pfn(ptep_get(ptep)); ++ if (i == 0) ++ start_pfn = cur_pfn; ++ writable = !!pte_write(ptep_get(ptep)); ++ pte_unmap_unlock(ptep, ptl); ++ ++ /* Disallow write access if the PTE is not writable. */ ++ if (!writable && ++ (memmap->attr & ACRN_MEM_ACCESS_WRITE)) { ++ ret = -EFAULT; ++ break; ++ } ++ ++ /* Disallow refcounted pages. */ ++ if (pfn_valid(cur_pfn) && ++ !PageReserved(pfn_to_page(cur_pfn))) { ++ ret = -EFAULT; ++ break; ++ } ++ ++ /* Disallow non-contiguous ranges. */ ++ if (cur_pfn != start_pfn + i) { ++ ret = -EINVAL; ++ break; ++ } ++ } ++ mmap_read_unlock(current->mm); ++ ++ if (ret) { + dev_dbg(acrn_dev.this_device, + "Failed to lookup PFN at VMA:%pK.\n", (void *)memmap->vma_base); + return ret; + } +- pfn = pte_pfn(ptep_get(ptep)); +- pte_unmap_unlock(ptep, ptl); +- mmap_read_unlock(current->mm); + + return acrn_mm_region_add(vm, memmap->user_vm_pa, +- PFN_PHYS(pfn), memmap->len, ++ PFN_PHYS(start_pfn), memmap->len, + ACRN_MEM_TYPE_WB, memmap->attr); + } + mmap_read_unlock(current->mm); + +- /* Get the page number of the map region */ +- nr_pages = memmap->len >> PAGE_SHIFT; + pages = vzalloc(array_size(nr_pages, sizeof(*pages))); + if (!pages) + return -ENOMEM; +@@ -242,12 +276,11 @@ int acrn_vm_ram_map(struct acrn_vm *vm, + mutex_unlock(&vm->regions_mapping_lock); + + /* Calculate count of vm_memory_region_op */ +- while (i < nr_pages) { ++ for (i = 0; i < nr_pages; i += 1 << order) { + page = pages[i]; + VM_BUG_ON_PAGE(PageTail(page), page); + order = compound_order(page); + nr_regions++; +- i += 1 << order; + } + + /* Prepare the vm_memory_region_batch */ +@@ -264,8 +297,7 @@ int acrn_vm_ram_map(struct acrn_vm *vm, + regions_info->vmid = vm->vmid; + regions_info->regions_gpa = virt_to_phys(vm_region); + user_vm_pa = memmap->user_vm_pa; +- i = 0; +- while (i < nr_pages) { ++ for (i = 0; i < nr_pages; i += 1 << order) { + u32 region_size; + + page = pages[i]; +@@ -281,7 +313,6 @@ int acrn_vm_ram_map(struct acrn_vm *vm, + + vm_region++; + user_vm_pa += region_size; +- i += 1 << order; + } + + /* Inform the ACRN Hypervisor to set up EPT mappings */ +_ diff --git a/patches/fork-defer-linking-file-vma-until-vma-is-fully-initialized.patch b/patches/fork-defer-linking-file-vma-until-vma-is-fully-initialized.patch new file mode 100644 index 000000000..7a4d2b510 --- /dev/null +++ b/patches/fork-defer-linking-file-vma-until-vma-is-fully-initialized.patch @@ -0,0 +1,103 @@ +From: Miaohe Lin <linmiaohe@huawei.com> +Subject: fork: defer linking file vma until vma is fully initialized +Date: Wed, 10 Apr 2024 17:14:41 +0800 + +Thorvald reported a WARNING [1]. And the root cause is below race: + + CPU 1 CPU 2 + fork hugetlbfs_fallocate + dup_mmap hugetlbfs_punch_hole + i_mmap_lock_write(mapping); + vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree. + i_mmap_unlock_write(mapping); + hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem! + i_mmap_lock_write(mapping); + hugetlb_vmdelete_list + vma_interval_tree_foreach + hugetlb_vma_trylock_write -- Vma_lock is cleared. + tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem! + hugetlb_vma_unlock_write -- Vma_lock is assigned!!! + i_mmap_unlock_write(mapping); + +hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside +i_mmap_rwsem lock while vma lock can be used in the same time. Fix this +by deferring linking file vma until vma is fully initialized. Those vmas +should be initialized first before they can be used. + +Link: https://lkml.kernel.org/r/20240410091441.3539905-1-linmiaohe@huawei.com +Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing") +Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> +Reported-by: Thorvald Natvig <thorvald@google.com> +Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1] +Cc: Christian Brauner <brauner@kernel.org> +Cc: Heiko Carstens <hca@linux.ibm.com> +Cc: Jane Chu <jane.chu@oracle.com> +Cc: Kent Overstreet <kent.overstreet@linux.dev> +Cc: Liam R. Howlett <Liam.Howlett@oracle.com> +Cc: Mateusz Guzik <mjguzik@gmail.com> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Oleg Nesterov <oleg@redhat.com> +Cc: Peng Zhang <zhangpeng.00@bytedance.com> +Cc: Tycho Andersen <tandersen@netflix.com> +Cc: <stable@vger.kernel.org> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + kernel/fork.c | 33 +++++++++++++++++---------------- + 1 file changed, 17 insertions(+), 16 deletions(-) + +--- a/kernel/fork.c~fork-defer-linking-file-vma-until-vma-is-fully-initialized ++++ a/kernel/fork.c +@@ -714,6 +714,23 @@ static __latent_entropy int dup_mmap(str + } else if (anon_vma_fork(tmp, mpnt)) + goto fail_nomem_anon_vma_fork; + vm_flags_clear(tmp, VM_LOCKED_MASK); ++ /* ++ * Copy/update hugetlb private vma information. ++ */ ++ if (is_vm_hugetlb_page(tmp)) ++ hugetlb_dup_vma_private(tmp); ++ ++ /* ++ * Link the vma into the MT. After using __mt_dup(), memory ++ * allocation is not necessary here, so it cannot fail. ++ */ ++ vma_iter_bulk_store(&vmi, tmp); ++ ++ mm->map_count++; ++ ++ if (tmp->vm_ops && tmp->vm_ops->open) ++ tmp->vm_ops->open(tmp); ++ + file = tmp->vm_file; + if (file) { + struct address_space *mapping = file->f_mapping; +@@ -730,25 +747,9 @@ static __latent_entropy int dup_mmap(str + i_mmap_unlock_write(mapping); + } + +- /* +- * Copy/update hugetlb private vma information. +- */ +- if (is_vm_hugetlb_page(tmp)) +- hugetlb_dup_vma_private(tmp); +- +- /* +- * Link the vma into the MT. After using __mt_dup(), memory +- * allocation is not necessary here, so it cannot fail. +- */ +- vma_iter_bulk_store(&vmi, tmp); +- +- mm->map_count++; + if (!(tmp->vm_flags & VM_WIPEONFORK)) + retval = copy_page_range(tmp, mpnt); + +- if (tmp->vm_ops && tmp->vm_ops->open) +- tmp->vm_ops->open(tmp); +- + if (retval) { + mpnt = vma_next(&vmi); + goto loop_out; +_ diff --git a/patches/mm-allow-for-detecting-underflows-with-page_mapcount-again.patch b/patches/mm-allow-for-detecting-underflows-with-page_mapcount-again.patch new file mode 100644 index 000000000..d890f6a68 --- /dev/null +++ b/patches/mm-allow-for-detecting-underflows-with-page_mapcount-again.patch @@ -0,0 +1,133 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm: allow for detecting underflows with page_mapcount() again +Date: Tue, 9 Apr 2024 21:22:44 +0200 + +Patch series "mm: mapcount for large folios + page_mapcount() cleanups". + +This series tracks the mapcount of large folios in a single value, so it +can be read efficiently and atomically, just like the mapcount of small +folios. + +folio_mapcount() is then used in a couple more places, most notably to +reduce false negatives in folio_likely_mapped_shared(), and many users of +page_mapcount() are cleaned up (that's maybe why you got CCed on the full +series, sorry sh+xtensa folks! :) ). + +The remaining s390x user and one KSM user of page_mapcount() are getting +removed separately on the list right now. I have patches to handle the +other KSM one, the khugepaged one and the kpagecount one; as they are not +as "obvious", I will send them out separately in the future. Once that is +all in place, I'm planning on moving page_mapcount() into +fs/proc/task_mmu.c, the remaining user for the time being (and we can +discuss at LSF/MM details on that :) ). + +I proposed the mapcount for large folios (previously called total +mapcount) originally in part of [1] and I later included it in [2] where +it is a requirement. In the meantime, I changed the patch a bit so I +dropped all RB's. During the discussion of [1], Peter Xu correctly raised +that this additional tracking might affect the performance when PMD->PTE +remapping THPs. In the meantime. I addressed that by batching RMAP +operations during fork(), unmap/zap and when PMD->PTE remapping THPs. + +Running some of my micro-benchmarks [3] (fork,munmap,cow-byte,remap) on 1 +GiB of memory backed by folios with the same order, I observe the +following on an Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz tuned for +reproducible results as much as possible: + +Standard deviation is mostly < 1%, except for order-9, where it's < 2% for +fork() and munmap(). + +(1) Small folios are not affected (< 1%) in all 4 microbenchmarks. +(2) Order-4 folios are not affected (< 1%) in all 4 microbenchmarks. A bit + weird comapred to the other orders ... +(3) PMD->PTE remapping of order-9 THPs is not affected (< 1%) +(4) COW-byte (COWing a single page by writing a single byte) is not + affected for any order (< 1 %). The page copy_fault overhead dominates + everything. +(5) fork() is mostly not affected (< 1%), except order-2, where we have + a slowdown of ~4%. Already for order-3 folios, we're down to a slowdown + of < 1%. +(6) munmap() sees a slowdown by < 3% for some orders (order-5, + order-6, order-9), but less for others (< 1% for order-4 and order-8, + < 2% for order-2, order-3, order-7). + +Especially the fork() and munmap() benchmark are sensitive to each added +instruction and other system noise, so I suspect some of the change and +observed weirdness (order-4) is due to code layout changes and other +factors, but not really due to the added atomics. + +So in the common case where we can batch, the added atomics don't really +make a big difference, especially in light of the recent improvements for +large folios that we recently gained due to batching. Surprisingly, for +some cases where we cannot batch (e.g., COW), the added atomics don't seem +to matter, because other overhead dominates. + +My fork and munmap micro-benchmarks don't cover cases where we cannot +batch-process bigger parts of large folios. As this is not the common +case, I'm not worrying about that right now. + +Future work is batching RMAP operations during swapout and folio +migration. + +[1] https://lore.kernel.org/all/20230809083256.699513-1-david@redhat.com/ +[2] https://lore.kernel.org/all/20231124132626.235350-1-david@redhat.com/ +[3] https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/pte-mapped-folio-benchmarks.c?ref_type=heads + + +This patch (of 18): + +Commit 53277bcf126d ("mm: support page_mapcount() on page_has_type() +pages") made it impossible to detect mapcount underflows by treating any +negative raw mapcount value as a mapcount of 0. + +We perform such underflow checks in zap_present_folio_ptes() and +zap_huge_pmd(), which would currently no longer trigger. + +Let's check against PAGE_MAPCOUNT_RESERVE instead by using +page_type_has_type(), like page_has_type() would, so we can still catch +some underflows. + +Link: https://lkml.kernel.org/r/20240409192301.907377-1-david@redhat.com +Link: https://lkml.kernel.org/r/20240409192301.907377-2-david@redhat.com +Fixes: 53277bcf126d ("mm: support page_mapcount() on page_has_type() pages") +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + include/linux/mm.h | 5 ++--- + 1 file changed, 2 insertions(+), 3 deletions(-) + +--- a/include/linux/mm.h~mm-allow-for-detecting-underflows-with-page_mapcount-again ++++ a/include/linux/mm.h +@@ -1229,11 +1229,10 @@ static inline void page_mapcount_reset(s + */ + static inline int page_mapcount(struct page *page) + { +- int mapcount = atomic_read(&page->_mapcount) + 1; ++ int mapcount = atomic_read(&page->_mapcount); + + /* Handle page_has_type() pages */ +- if (mapcount < 0) +- mapcount = 0; ++ mapcount = page_type_has_type(mapcount) ? 0 : mapcount + 1; + if (unlikely(PageCompound(page))) + mapcount += folio_entire_mapcount(page_folio(page)); + +_ diff --git a/patches/mm-debug-print-only-page-mapcount-excluding-folio-entire-mapcount-in-__dump_folio.patch b/patches/mm-debug-print-only-page-mapcount-excluding-folio-entire-mapcount-in-__dump_folio.patch new file mode 100644 index 000000000..3173e507c --- /dev/null +++ b/patches/mm-debug-print-only-page-mapcount-excluding-folio-entire-mapcount-in-__dump_folio.patch @@ -0,0 +1,57 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/debug: print only page mapcount (excluding folio entire mapcount) in __dump_folio() +Date: Tue, 9 Apr 2024 21:23:00 +0200 + +Let's simplify and only print the page mapcount: we already print the +large folio mapcount and the entire folio mapcount for large folios +separately; that should be sufficient to figure out what's happening. + +While at it, print the page mapcount also if it had an underflow, +filtering out only typed pages. + +Link: https://lkml.kernel.org/r/20240409192301.907377-18-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + mm/debug.c | 9 ++------- + 1 file changed, 2 insertions(+), 7 deletions(-) + +--- a/mm/debug.c~mm-debug-print-only-page-mapcount-excluding-folio-entire-mapcount-in-__dump_folio ++++ a/mm/debug.c +@@ -55,15 +55,10 @@ static void __dump_folio(struct folio *f + unsigned long pfn, unsigned long idx) + { + struct address_space *mapping = folio_mapping(folio); +- int mapcount = atomic_read(&page->_mapcount) + 1; ++ int mapcount = atomic_read(&page->_mapcount); + char *type = ""; + +- /* Open-code page_mapcount() to avoid looking up a stale folio */ +- if (mapcount < 0) +- mapcount = 0; +- if (folio_test_large(folio)) +- mapcount += folio_entire_mapcount(folio); +- ++ mapcount = page_type_has_type(mapcount) ? 0 : mapcount + 1; + pr_warn("page: refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n", + folio_ref_count(folio), mapcount, mapping, + folio->index + idx, pfn); +_ diff --git a/patches/mm-filemap-use-folio_mapcount-in-filemap_unaccount_folio.patch b/patches/mm-filemap-use-folio_mapcount-in-filemap_unaccount_folio.patch new file mode 100644 index 000000000..1a8bf452f --- /dev/null +++ b/patches/mm-filemap-use-folio_mapcount-in-filemap_unaccount_folio.patch @@ -0,0 +1,49 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/filemap: use folio_mapcount() in filemap_unaccount_folio() +Date: Tue, 9 Apr 2024 21:22:56 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. + +Let's use folio_mapcount() instead of filemap_unaccount_folio(). + +No functional change intended, because we're only dealing with small +folios. + +Link: https://lkml.kernel.org/r/20240409192301.907377-14-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + mm/filemap.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/mm/filemap.c~mm-filemap-use-folio_mapcount-in-filemap_unaccount_folio ++++ a/mm/filemap.c +@@ -168,7 +168,7 @@ static void filemap_unaccount_folio(stru + add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); + + if (mapping_exiting(mapping) && !folio_test_large(folio)) { +- int mapcount = page_mapcount(&folio->page); ++ int mapcount = folio_mapcount(folio); + + if (folio_ref_count(folio) >= mapcount + 2) { + /* +_ diff --git a/patches/mm-huge_memory-use-folio_mapcount-in-zap_huge_pmd-sanity-check.patch b/patches/mm-huge_memory-use-folio_mapcount-in-zap_huge_pmd-sanity-check.patch new file mode 100644 index 000000000..8a1c0de67 --- /dev/null +++ b/patches/mm-huge_memory-use-folio_mapcount-in-zap_huge_pmd-sanity-check.patch @@ -0,0 +1,50 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/huge_memory: use folio_mapcount() in zap_huge_pmd() sanity check +Date: Tue, 9 Apr 2024 21:22:51 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. Let's similarly check for folio_mapcount() +underflows instead of page_mapcount() underflows like we do in +zap_present_folio_ptes() now. + +Instead of the VM_BUG_ON(), we should actually be doing something like +print_bad_pte(). For now, let's keep it simple and use WARN_ON_ONCE(), +performing that check independently of DEBUG_VM. + +Link: https://lkml.kernel.org/r/20240409192301.907377-9-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + mm/huge_memory.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/mm/huge_memory.c~mm-huge_memory-use-folio_mapcount-in-zap_huge_pmd-sanity-check ++++ a/mm/huge_memory.c +@@ -1851,7 +1851,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, + + folio = page_folio(page); + folio_remove_rmap_pmd(folio, page, vma); +- VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); ++ WARN_ON_ONCE(folio_mapcount(folio) < 0); + VM_BUG_ON_PAGE(!PageHead(page), page); + } else if (thp_migration_supported()) { + swp_entry_t entry; +_ diff --git a/patches/mm-improve-folio_likely_mapped_shared-using-the-mapcount-of-large-folios.patch b/patches/mm-improve-folio_likely_mapped_shared-using-the-mapcount-of-large-folios.patch new file mode 100644 index 000000000..e4abf5b2b --- /dev/null +++ b/patches/mm-improve-folio_likely_mapped_shared-using-the-mapcount-of-large-folios.patch @@ -0,0 +1,76 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm: improve folio_likely_mapped_shared() using the mapcount of large folios +Date: Tue, 9 Apr 2024 21:22:48 +0200 + +We can now read the mapcount of large folios very efficiently. Use it to +improve our handling of partially-mappable folios, falling back to making +a guess only in case the folio is not "obviously mapped shared". + +We can now better detect partially-mappable folios where the first page is +not mapped as "mapped shared", reducing "false negatives"; but false +negatives are still possible. + +While at it, fixup a wrong comment (false positive vs. false negative) +for KSM folios. + +Link: https://lkml.kernel.org/r/20240409192301.907377-6-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + include/linux/mm.h | 19 +++++++++++++++++-- + 1 file changed, 17 insertions(+), 2 deletions(-) + +--- a/include/linux/mm.h~mm-improve-folio_likely_mapped_shared-using-the-mapcount-of-large-folios ++++ a/include/linux/mm.h +@@ -2183,7 +2183,7 @@ static inline size_t folio_size(struct f + * indicate "mapped shared" (false positive) when two VMAs in the same MM + * cover the same file range. + * #. For (small) KSM folios, the return value can wrongly indicate "mapped +- * shared" (false negative), when the folio is mapped multiple times into ++ * shared" (false positive), when the folio is mapped multiple times into + * the same MM. + * + * Further, this function only considers current page table mappings that +@@ -2200,7 +2200,22 @@ static inline size_t folio_size(struct f + */ + static inline bool folio_likely_mapped_shared(struct folio *folio) + { +- return page_mapcount(folio_page(folio, 0)) > 1; ++ int mapcount = folio_mapcount(folio); ++ ++ /* Only partially-mappable folios require more care. */ ++ if (!folio_test_large(folio) || unlikely(folio_test_hugetlb(folio))) ++ return mapcount > 1; ++ ++ /* A single mapping implies "mapped exclusively". */ ++ if (mapcount <= 1) ++ return false; ++ ++ /* If any page is mapped more than once we treat it "mapped shared". */ ++ if (folio_entire_mapcount(folio) || mapcount > folio_nr_pages(folio)) ++ return true; ++ ++ /* Let's guess based on the first subpage. */ ++ return atomic_read(&folio->_mapcount) > 0; + } + + #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE +_ diff --git a/patches/mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node-v2.patch b/patches/mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node-v2.patch new file mode 100644 index 000000000..66006ad07 --- /dev/null +++ b/patches/mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node-v2.patch @@ -0,0 +1,40 @@ +From: Baoquan He <bhe@redhat.com> +Subject: mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node-v2 +Date: Wed, 10 Apr 2024 11:35:29 +0800 + +redo code comments, per Mike + +As Mike suggested, the old code comments above the 'continue' statement is +still useful for easier understanding code and system behaviour. So +rephrase and move them above line 'if (pgdat->node_present_pages)'. +Thanks to Mike. + +Link: https://lkml.kernel.org/r/ZhYJAVQRYJSTKZng@MiWiFi-R3L-srv +Signed-off-by: Baoquan He <bhe@redhat.com> +Cc: Mel Gorman <mgorman@suse.de> +Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + mm/mm_init.c | 9 ++++++++- + 1 file changed, 8 insertions(+), 1 deletion(-) + +--- a/mm/mm_init.c~mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node-v2 ++++ a/mm/mm_init.c +@@ -1840,7 +1840,14 @@ void __init free_area_init(unsigned long + pgdat = NODE_DATA(nid); + free_area_init_node(nid); + +- /* Any memory on that node */ ++ /* ++ * No sysfs hierarcy will be created via register_one_node() ++ *for memory-less node because here it's not marked as N_MEMORY ++ *and won't be set online later. The benefit is userspace ++ *program won't be confused by sysfs files/directories of ++ *memory-less node. The pgdat will get fully initialized by ++ *hotadd_init_pgdat() when memory is hotplugged into this node. ++ */ + if (pgdat->node_present_pages) { + node_set_state(nid, N_MEMORY); + check_for_memory(pgdat); +_ diff --git a/patches/mm-make-folio_mapcount-return-0-for-small-typed-folios.patch b/patches/mm-make-folio_mapcount-return-0-for-small-typed-folios.patch new file mode 100644 index 000000000..78eabbbcd --- /dev/null +++ b/patches/mm-make-folio_mapcount-return-0-for-small-typed-folios.patch @@ -0,0 +1,60 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm: make folio_mapcount() return 0 for small typed folios +Date: Tue, 9 Apr 2024 21:22:49 +0200 + +We already handle it properly for large folios. Let's also return "0" for +small typed folios, like page_mapcount() currently would. + +Consequently, folio_mapcount() will never return negative values for typed +folios, but may return negative values for underflows. + +Link: https://lkml.kernel.org/r/20240409192301.907377-7-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + include/linux/mm.h | 11 +++++++++-- + 1 file changed, 9 insertions(+), 2 deletions(-) + +--- a/include/linux/mm.h~mm-make-folio_mapcount-return-0-for-small-typed-folios ++++ a/include/linux/mm.h +@@ -1260,12 +1260,19 @@ static inline int folio_large_mapcount(c + * references the entire folio counts exactly once, even when such special + * page table entries are comprised of multiple ordinary page table entries. + * ++ * Will report 0 for pages which cannot be mapped into userspace, such as ++ * slab, page tables and similar. ++ * + * Return: The number of times this folio is mapped. + */ + static inline int folio_mapcount(const struct folio *folio) + { +- if (likely(!folio_test_large(folio))) +- return atomic_read(&folio->_mapcount) + 1; ++ int mapcount; ++ ++ if (likely(!folio_test_large(folio))) { ++ mapcount = atomic_read(&folio->_mapcount); ++ return page_type_has_type(mapcount) ? 0 : mapcount + 1; ++ } + return folio_large_mapcount(folio); + } + +_ diff --git a/patches/mm-memory-failure-use-folio_mapcount-in-hwpoison_user_mappings.patch b/patches/mm-memory-failure-use-folio_mapcount-in-hwpoison_user_mappings.patch new file mode 100644 index 000000000..4d82b0e10 --- /dev/null +++ b/patches/mm-memory-failure-use-folio_mapcount-in-hwpoison_user_mappings.patch @@ -0,0 +1,48 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/memory-failure: use folio_mapcount() in hwpoison_user_mappings() +Date: Tue, 9 Apr 2024 21:22:52 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. We can only unmap full folios; page_mapped(), which +we check here, is translated to folio_mapped() -- based on +folio_mapcount(). So let's print the folio mapcount instead. + +Link: https://lkml.kernel.org/r/20240409192301.907377-10-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + mm/memory-failure.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +--- a/mm/memory-failure.c~mm-memory-failure-use-folio_mapcount-in-hwpoison_user_mappings ++++ a/mm/memory-failure.c +@@ -1628,8 +1628,8 @@ static bool hwpoison_user_mappings(struc + + unmap_success = !page_mapped(p); + if (!unmap_success) +- pr_err("%#lx: failed to unmap page (mapcount=%d)\n", +- pfn, page_mapcount(p)); ++ pr_err("%#lx: failed to unmap page (folio mapcount=%d)\n", ++ pfn, folio_mapcount(page_folio(p))); + + /* + * try_to_unmap() might put mlocked page in lru cache, so call +_ diff --git a/patches/mm-memory-use-folio_mapcount-in-zap_present_folio_ptes.patch b/patches/mm-memory-use-folio_mapcount-in-zap_present_folio_ptes.patch new file mode 100644 index 000000000..2f2087ae1 --- /dev/null +++ b/patches/mm-memory-use-folio_mapcount-in-zap_present_folio_ptes.patch @@ -0,0 +1,57 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/memory: use folio_mapcount() in zap_present_folio_ptes() +Date: Tue, 9 Apr 2024 21:22:50 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. In zap_present_folio_ptes(), let's simply check the +folio mapcount(). If there is some issue, it will underflow at some point +either way when unmapping. + +As indicated already in commit 10ebac4f95e7 ("mm/memory: optimize +unmap/zap with PTE-mapped THP"), we already documented "If we ever have a +cheap folio_mapcount(), we might just want to check for underflows +there.". + +There is no change for small folios. For large folios, we'll now catch +more underflows when batch-unmapping, because instead of only testing the +mapcount of the first subpage, we'll test if the folio mapcount +underflows. + +Link: https://lkml.kernel.org/r/20240409192301.907377-8-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + mm/memory.c | 3 +-- + 1 file changed, 1 insertion(+), 2 deletions(-) + +--- a/mm/memory.c~mm-memory-use-folio_mapcount-in-zap_present_folio_ptes ++++ a/mm/memory.c +@@ -1502,8 +1502,7 @@ static __always_inline void zap_present_ + if (!delay_rmap) { + folio_remove_rmap_ptes(folio, page, nr, vma); + +- /* Only sanity-check the first page in a batch. */ +- if (unlikely(page_mapcount(page) < 0)) ++ if (unlikely(folio_mapcount(folio) < 0)) + print_bad_pte(vma, addr, ptent, page); + } + +_ diff --git a/patches/mm-migrate-use-folio_likely_mapped_shared-in-add_page_for_migration.patch b/patches/mm-migrate-use-folio_likely_mapped_shared-in-add_page_for_migration.patch new file mode 100644 index 000000000..38bec3312 --- /dev/null +++ b/patches/mm-migrate-use-folio_likely_mapped_shared-in-add_page_for_migration.patch @@ -0,0 +1,50 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/migrate: use folio_likely_mapped_shared() in add_page_for_migration() +Date: Tue, 9 Apr 2024 21:22:54 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. In add_page_for_migration(), we actually want to +check if the folio is mapped shared, to reject such folios. So let's use +folio_likely_mapped_shared() instead. + +For small folios, fully mapped THP, and hugetlb folios, there is no change. +For partially mapped, shared THP, we should now do a better job at +rejecting such folios. + +Link: https://lkml.kernel.org/r/20240409192301.907377-12-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + mm/migrate.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/mm/migrate.c~mm-migrate-use-folio_likely_mapped_shared-in-add_page_for_migration ++++ a/mm/migrate.c +@@ -2140,7 +2140,7 @@ static int add_page_for_migration(struct + goto out_putfolio; + + err = -EACCES; +- if (page_mapcount(page) > 1 && !migrate_all) ++ if (folio_likely_mapped_shared(folio) && !migrate_all) + goto out_putfolio; + + err = -EBUSY; +_ diff --git a/patches/mm-migrate_device-use-folio_mapcount-in-migrate_vma_check_page.patch b/patches/mm-migrate_device-use-folio_mapcount-in-migrate_vma_check_page.patch new file mode 100644 index 000000000..464cfb6d2 --- /dev/null +++ b/patches/mm-migrate_device-use-folio_mapcount-in-migrate_vma_check_page.patch @@ -0,0 +1,74 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/migrate_device: use folio_mapcount() in migrate_vma_check_page() +Date: Tue, 9 Apr 2024 21:22:57 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. Let's convert migrate_vma_check_page() to work on a +folio internally so we can remove the page_mapcount() usage. + +Note that we reject any large folios. + +There is a lot more folio conversion to be had, but that has to wait for +another day. No functional change intended. + +Link: https://lkml.kernel.org/r/20240409192301.907377-15-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + mm/migrate_device.c | 12 +++++++----- + 1 file changed, 7 insertions(+), 5 deletions(-) + +--- a/mm/migrate_device.c~mm-migrate_device-use-folio_mapcount-in-migrate_vma_check_page ++++ a/mm/migrate_device.c +@@ -324,6 +324,8 @@ static void migrate_vma_collect(struct m + */ + static bool migrate_vma_check_page(struct page *page, struct page *fault_page) + { ++ struct folio *folio = page_folio(page); ++ + /* + * One extra ref because caller holds an extra reference, either from + * isolate_lru_page() for a regular page, or migrate_vma_collect() for +@@ -336,18 +338,18 @@ static bool migrate_vma_check_page(struc + * check them than regular pages, because they can be mapped with a pmd + * or with a pte (split pte mapping). + */ +- if (PageCompound(page)) ++ if (folio_test_large(folio)) + return false; + + /* Page from ZONE_DEVICE have one extra reference */ +- if (is_zone_device_page(page)) ++ if (folio_is_zone_device(folio)) + extra++; + + /* For file back page */ +- if (page_mapping(page)) +- extra += 1 + page_has_private(page); ++ if (folio_mapping(folio)) ++ extra += 1 + folio_has_private(folio); + +- if ((page_count(page) - extra) > page_mapcount(page)) ++ if ((folio_ref_count(folio) - extra) > folio_mapcount(folio)) + return false; + + return true; +_ diff --git a/patches/mm-page_alloc-use-folio_mapped-in-__alloc_contig_migrate_range.patch b/patches/mm-page_alloc-use-folio_mapped-in-__alloc_contig_migrate_range.patch new file mode 100644 index 000000000..aea368ce7 --- /dev/null +++ b/patches/mm-page_alloc-use-folio_mapped-in-__alloc_contig_migrate_range.patch @@ -0,0 +1,63 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/page_alloc: use folio_mapped() in __alloc_contig_migrate_range() +Date: Tue, 9 Apr 2024 21:22:53 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. + +For tracing purposes, we use page_mapcount() in +__alloc_contig_migrate_range(). Adding that mapcount to total_mapped +sounds strange: total_migrated and total_reclaimed would count each page +only once, not multiple times. + +But then, isolate_migratepages_range() adds each folio only once to the +list. So for large folios, we would query the mapcount of the first page +of the folio, which doesn't make too much sense for large folios. + +Let's simply use folio_mapped() * folio_nr_pages(), which makes more sense +as nr_migratepages is also incremented by the number of pages in the folio +in case of successful migration. + +Link: https://lkml.kernel.org/r/20240409192301.907377-11-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + mm/page_alloc.c | 8 ++++++-- + 1 file changed, 6 insertions(+), 2 deletions(-) + +--- a/mm/page_alloc.c~mm-page_alloc-use-folio_mapped-in-__alloc_contig_migrate_range ++++ a/mm/page_alloc.c +@@ -6355,8 +6355,12 @@ int __alloc_contig_migrate_range(struct + + if (trace_mm_alloc_contig_migrate_range_info_enabled()) { + total_reclaimed += nr_reclaimed; +- list_for_each_entry(page, &cc->migratepages, lru) +- total_mapped += page_mapcount(page); ++ list_for_each_entry(page, &cc->migratepages, lru) { ++ struct folio *folio = page_folio(page); ++ ++ total_mapped += folio_mapped(folio) * ++ folio_nr_pages(folio); ++ } + } + + ret = migrate_pages(&cc->migratepages, alloc_migration_target, +_ diff --git a/patches/mm-pass-vma-instead-of-mm-to-follow_pte.patch b/patches/mm-pass-vma-instead-of-mm-to-follow_pte.patch new file mode 100644 index 000000000..28daf6a37 --- /dev/null +++ b/patches/mm-pass-vma-instead-of-mm-to-follow_pte.patch @@ -0,0 +1,186 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm: pass VMA instead of MM to follow_pte() +Date: Wed, 10 Apr 2024 17:55:26 +0200 + +... and centralize the VM_IO/VM_PFNMAP sanity check in there. We'll +now also perform these sanity checks for direct follow_pte() +invocations. + +For generic_access_phys(), we might now check multiple times: nothing to +worry about, really. + +Link: https://lkml.kernel.org/r/20240410155527.474777-3-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Acked-by: Sean Christopherson <seanjc@google.com> [KVM] +Cc: Alex Williamson <alex.williamson@redhat.com> +Cc: Christoph Hellwig <hch@lst.de> +Cc: Fei Li <fei1.li@intel.com> +Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> +Cc: Heiko Carstens <hca@linux.ibm.com> +Cc: Ingo Molnar <mingo@redhat.com> +Cc: Paolo Bonzini <pbonzini@redhat.com> +Cc: Yonghua Huang <yonghua.huang@intel.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + arch/s390/pci/pci_mmio.c | 4 ++-- + arch/x86/mm/pat/memtype.c | 5 +---- + drivers/vfio/vfio_iommu_type1.c | 4 ++-- + drivers/virt/acrn/mm.c | 3 +-- + include/linux/mm.h | 2 +- + mm/memory.c | 15 ++++++++------- + virt/kvm/kvm_main.c | 4 ++-- + 7 files changed, 17 insertions(+), 20 deletions(-) + +--- a/arch/s390/pci/pci_mmio.c~mm-pass-vma-instead-of-mm-to-follow_pte ++++ a/arch/s390/pci/pci_mmio.c +@@ -169,7 +169,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, uns + if (!(vma->vm_flags & VM_WRITE)) + goto out_unlock_mmap; + +- ret = follow_pte(vma->vm_mm, mmio_addr, &ptep, &ptl); ++ ret = follow_pte(vma, mmio_addr, &ptep, &ptl); + if (ret) + goto out_unlock_mmap; + +@@ -308,7 +308,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsi + if (!(vma->vm_flags & VM_WRITE)) + goto out_unlock_mmap; + +- ret = follow_pte(vma->vm_mm, mmio_addr, &ptep, &ptl); ++ ret = follow_pte(vma, mmio_addr, &ptep, &ptl); + if (ret) + goto out_unlock_mmap; + +--- a/arch/x86/mm/pat/memtype.c~mm-pass-vma-instead-of-mm-to-follow_pte ++++ a/arch/x86/mm/pat/memtype.c +@@ -954,10 +954,7 @@ static int follow_phys(struct vm_area_st + pte_t *ptep, pte; + spinlock_t *ptl; + +- if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) +- return -EINVAL; +- +- if (follow_pte(vma->vm_mm, vma->vm_start, &ptep, &ptl)) ++ if (follow_pte(vma, vma->vm_start, &ptep, &ptl)) + return -EINVAL; + + pte = ptep_get(ptep); +--- a/drivers/vfio/vfio_iommu_type1.c~mm-pass-vma-instead-of-mm-to-follow_pte ++++ a/drivers/vfio/vfio_iommu_type1.c +@@ -518,7 +518,7 @@ static int follow_fault_pfn(struct vm_ar + spinlock_t *ptl; + int ret; + +- ret = follow_pte(vma->vm_mm, vaddr, &ptep, &ptl); ++ ret = follow_pte(vma, vaddr, &ptep, &ptl); + if (ret) { + bool unlocked = false; + +@@ -532,7 +532,7 @@ static int follow_fault_pfn(struct vm_ar + if (ret) + return ret; + +- ret = follow_pte(vma->vm_mm, vaddr, &ptep, &ptl); ++ ret = follow_pte(vma, vaddr, &ptep, &ptl); + if (ret) + return ret; + } +--- a/drivers/virt/acrn/mm.c~mm-pass-vma-instead-of-mm-to-follow_pte ++++ a/drivers/virt/acrn/mm.c +@@ -187,8 +187,7 @@ int acrn_vm_ram_map(struct acrn_vm *vm, + } + + for (i = 0; i < nr_pages; i++) { +- ret = follow_pte(vma->vm_mm, +- memmap->vma_base + i * PAGE_SIZE, ++ ret = follow_pte(vma, memmap->vma_base + i * PAGE_SIZE, + &ptep, &ptl); + if (ret) + break; +--- a/include/linux/mm.h~mm-pass-vma-instead-of-mm-to-follow_pte ++++ a/include/linux/mm.h +@@ -2420,7 +2420,7 @@ void free_pgd_range(struct mmu_gather *t + unsigned long end, unsigned long floor, unsigned long ceiling); + int + copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma); +-int follow_pte(struct mm_struct *mm, unsigned long address, ++int follow_pte(struct vm_area_struct *vma, unsigned long address, + pte_t **ptepp, spinlock_t **ptlp); + int generic_access_phys(struct vm_area_struct *vma, unsigned long addr, + void *buf, int len, int write); +--- a/mm/memory.c~mm-pass-vma-instead-of-mm-to-follow_pte ++++ a/mm/memory.c +@@ -5926,7 +5926,7 @@ int __pmd_alloc(struct mm_struct *mm, pu + + /** + * follow_pte - look up PTE at a user virtual address +- * @mm: the mm_struct of the target address space ++ * @vma: the memory mapping + * @address: user virtual address + * @ptepp: location to store found PTE + * @ptlp: location to store the lock for the PTE +@@ -5945,15 +5945,19 @@ int __pmd_alloc(struct mm_struct *mm, pu + * + * Return: zero on success, -ve otherwise. + */ +-int follow_pte(struct mm_struct *mm, unsigned long address, ++int follow_pte(struct vm_area_struct *vma, unsigned long address, + pte_t **ptepp, spinlock_t **ptlp) + { ++ struct mm_struct *mm = vma->vm_mm; + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *ptep; + ++ if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) ++ goto out; ++ + pgd = pgd_offset(mm, address); + if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd))) + goto out; +@@ -6007,11 +6011,8 @@ int generic_access_phys(struct vm_area_s + int offset = offset_in_page(addr); + int ret = -EINVAL; + +- if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) +- return -EINVAL; +- + retry: +- if (follow_pte(vma->vm_mm, addr, &ptep, &ptl)) ++ if (follow_pte(vma, addr, &ptep, &ptl)) + return -EINVAL; + pte = ptep_get(ptep); + pte_unmap_unlock(ptep, ptl); +@@ -6026,7 +6027,7 @@ retry: + if (!maddr) + return -ENOMEM; + +- if (follow_pte(vma->vm_mm, addr, &ptep, &ptl)) ++ if (follow_pte(vma, addr, &ptep, &ptl)) + goto out_unmap; + + if (!pte_same(pte, ptep_get(ptep))) { +--- a/virt/kvm/kvm_main.c~mm-pass-vma-instead-of-mm-to-follow_pte ++++ a/virt/kvm/kvm_main.c +@@ -2902,7 +2902,7 @@ static int hva_to_pfn_remapped(struct vm + spinlock_t *ptl; + int r; + +- r = follow_pte(vma->vm_mm, addr, &ptep, &ptl); ++ r = follow_pte(vma, addr, &ptep, &ptl); + if (r) { + /* + * get_user_pages fails for VM_IO and VM_PFNMAP vmas and does +@@ -2917,7 +2917,7 @@ static int hva_to_pfn_remapped(struct vm + if (r) + return r; + +- r = follow_pte(vma->vm_mm, addr, &ptep, &ptl); ++ r = follow_pte(vma, addr, &ptep, &ptl); + if (r) + return r; + } +_ diff --git a/patches/mm-rmap-add-fast-path-for-small-folios-when-adding-removing-duplicating.patch b/patches/mm-rmap-add-fast-path-for-small-folios-when-adding-removing-duplicating.patch new file mode 100644 index 000000000..7dd9ae0f9 --- /dev/null +++ b/patches/mm-rmap-add-fast-path-for-small-folios-when-adding-removing-duplicating.patch @@ -0,0 +1,116 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/rmap: add fast-path for small folios when adding/removing/duplicating +Date: Tue, 9 Apr 2024 21:22:46 +0200 + +Let's add a fast-path for small folios to all relevant rmap functions. +Note that only RMAP_LEVEL_PTE applies. + +This is a preparation for tracking the mapcount of large folios in a +single value. + +Link: https://lkml.kernel.org/r/20240409192301.907377-4-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + include/linux/rmap.h | 13 +++++++++++++ + mm/rmap.c | 26 ++++++++++++++++---------- + 2 files changed, 29 insertions(+), 10 deletions(-) + +--- a/include/linux/rmap.h~mm-rmap-add-fast-path-for-small-folios-when-adding-removing-duplicating ++++ a/include/linux/rmap.h +@@ -322,6 +322,11 @@ static __always_inline void __folio_dup_ + + switch (level) { + case RMAP_LEVEL_PTE: ++ if (!folio_test_large(folio)) { ++ atomic_inc(&page->_mapcount); ++ break; ++ } ++ + do { + atomic_inc(&page->_mapcount); + } while (page++, --nr_pages > 0); +@@ -405,6 +410,14 @@ static __always_inline int __folio_try_d + if (PageAnonExclusive(page + i)) + return -EBUSY; + } ++ ++ if (!folio_test_large(folio)) { ++ if (PageAnonExclusive(page)) ++ ClearPageAnonExclusive(page); ++ atomic_inc(&page->_mapcount); ++ break; ++ } ++ + do { + if (PageAnonExclusive(page)) + ClearPageAnonExclusive(page); +--- a/mm/rmap.c~mm-rmap-add-fast-path-for-small-folios-when-adding-removing-duplicating ++++ a/mm/rmap.c +@@ -1172,15 +1172,18 @@ static __always_inline unsigned int __fo + + switch (level) { + case RMAP_LEVEL_PTE: ++ if (!folio_test_large(folio)) { ++ nr = atomic_inc_and_test(&page->_mapcount); ++ break; ++ } ++ + do { + first = atomic_inc_and_test(&page->_mapcount); +- if (first && folio_test_large(folio)) { ++ if (first) { + first = atomic_inc_return_relaxed(mapped); +- first = (first < ENTIRELY_MAPPED); ++ if (first < ENTIRELY_MAPPED) ++ nr++; + } +- +- if (first) +- nr++; + } while (page++, --nr_pages > 0); + break; + case RMAP_LEVEL_PMD: +@@ -1514,15 +1517,18 @@ static __always_inline void __folio_remo + + switch (level) { + case RMAP_LEVEL_PTE: ++ if (!folio_test_large(folio)) { ++ nr = atomic_add_negative(-1, &page->_mapcount); ++ break; ++ } ++ + do { + last = atomic_add_negative(-1, &page->_mapcount); +- if (last && folio_test_large(folio)) { ++ if (last) { + last = atomic_dec_return_relaxed(mapped); +- last = (last < ENTIRELY_MAPPED); ++ if (last < ENTIRELY_MAPPED) ++ nr++; + } +- +- if (last) +- nr++; + } while (page++, --nr_pages > 0); + break; + case RMAP_LEVEL_PMD: +_ diff --git a/patches/mm-rmap-always-inline-anon-file-rmap-duplication-of-a-single-pte.patch b/patches/mm-rmap-always-inline-anon-file-rmap-duplication-of-a-single-pte.patch new file mode 100644 index 000000000..ac24799c0 --- /dev/null +++ b/patches/mm-rmap-always-inline-anon-file-rmap-duplication-of-a-single-pte.patch @@ -0,0 +1,70 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/rmap: always inline anon/file rmap duplication of a single PTE +Date: Tue, 9 Apr 2024 21:22:45 +0200 + +As we grow the code, the compiler might make stupid decisions and +unnecessarily degrade fork() performance. Let's make sure to always +inline functions that operate on a single PTE so the compiler will always +optimize out the loop and avoid a function call. + +This is a preparation for maintining a total mapcount for large folios. + +Link: https://lkml.kernel.org/r/20240409192301.907377-3-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + include/linux/rmap.h | 17 +++++++++++++---- + 1 file changed, 13 insertions(+), 4 deletions(-) + +--- a/include/linux/rmap.h~mm-rmap-always-inline-anon-file-rmap-duplication-of-a-single-pte ++++ a/include/linux/rmap.h +@@ -347,8 +347,12 @@ static inline void folio_dup_file_rmap_p + { + __folio_dup_file_rmap(folio, page, nr_pages, RMAP_LEVEL_PTE); + } +-#define folio_dup_file_rmap_pte(folio, page) \ +- folio_dup_file_rmap_ptes(folio, page, 1) ++ ++static __always_inline void folio_dup_file_rmap_pte(struct folio *folio, ++ struct page *page) ++{ ++ __folio_dup_file_rmap(folio, page, 1, RMAP_LEVEL_PTE); ++} + + /** + * folio_dup_file_rmap_pmd - duplicate a PMD mapping of a page range of a folio +@@ -448,8 +452,13 @@ static inline int folio_try_dup_anon_rma + return __folio_try_dup_anon_rmap(folio, page, nr_pages, src_vma, + RMAP_LEVEL_PTE); + } +-#define folio_try_dup_anon_rmap_pte(folio, page, vma) \ +- folio_try_dup_anon_rmap_ptes(folio, page, 1, vma) ++ ++static __always_inline int folio_try_dup_anon_rmap_pte(struct folio *folio, ++ struct page *page, struct vm_area_struct *src_vma) ++{ ++ return __folio_try_dup_anon_rmap(folio, page, 1, src_vma, ++ RMAP_LEVEL_PTE); ++} + + /** + * folio_try_dup_anon_rmap_pmd - try duplicating a PMD mapping of a page range +_ diff --git a/patches/mm-track-mapcount-of-large-folios-in-single-value.patch b/patches/mm-track-mapcount-of-large-folios-in-single-value.patch new file mode 100644 index 000000000..73535167b --- /dev/null +++ b/patches/mm-track-mapcount-of-large-folios-in-single-value.patch @@ -0,0 +1,450 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm: track mapcount of large folios in single value +Date: Tue, 9 Apr 2024 21:22:47 +0200 + +Let's track the mapcount of large folios in a single value. The mapcount +of a large folio currently corresponds to the sum of the entire mapcount +and all page mapcounts. + +This sum is what we actually want to know in folio_mapcount() and it is +also sufficient for implementing folio_mapped(). + +With PTE-mapped THP becoming more important and more widely used, we want +to avoid looping over all pages of a folio just to obtain the mapcount of +large folios. The comment "In the common case, avoid the loop when no +pages mapped by PTE" in folio_total_mapcount() does no longer hold for +mTHP that are always mapped by PTE. + +Further, we are planning on using folio_mapcount() more frequently, and +might even want to remove page mapcounts for large folios in some kernel +configs. Therefore, allow for reading the mapcount of large folios +efficiently and atomically without looping over any pages. + +Maintain the mapcount also for hugetlb pages for simplicity. Use the new +mapcount to implement folio_mapcount() and folio_mapped(). Make +page_mapped() simply call folio_mapped(). We can now get rid of +folio_large_is_mapped(). + +_nr_pages_mapped is now only used in rmap code and for debugging purposes. +Keep folio_nr_pages_mapped() around, but document that its use should be +limited to rmap internals and debugging purposes. + +This change implies one additional atomic add/sub whenever +mapping/unmapping (parts of) a large folio. + +As we now batch RMAP operations for PTE-mapped THP during fork(), during +unmap/zap, and when PTE-remapping a PMD-mapped THP, and we adjust the +large mapcount for a PTE batch only once, the added overhead in the common +case is small. Only when unmapping individual pages of a large folio +(e.g., during COW), the overhead might be bigger in comparison, but it's +essentially one additional atomic operation. + +Note that before the new mapcount would overflow, already our refcount +would overflow: each mapping requires a folio reference. Extend the +focumentation of folio_mapcount(). + +Link: https://lkml.kernel.org/r/20240409192301.907377-5-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + Documentation/mm/transhuge.rst | 12 ++++---- + include/linux/mm.h | 44 ++++++++++++++----------------- + include/linux/mm_types.h | 5 ++- + include/linux/rmap.h | 10 +++++++ + mm/debug.c | 3 +- + mm/hugetlb.c | 4 +- + mm/internal.h | 3 ++ + mm/khugepaged.c | 2 - + mm/page_alloc.c | 4 ++ + mm/rmap.c | 34 +++++++---------------- + 10 files changed, 62 insertions(+), 59 deletions(-) + +--- a/Documentation/mm/transhuge.rst~mm-track-mapcount-of-large-folios-in-single-value ++++ a/Documentation/mm/transhuge.rst +@@ -116,14 +116,14 @@ pages: + succeeds on tail pages. + + - map/unmap of a PMD entry for the whole THP increment/decrement +- folio->_entire_mapcount and also increment/decrement +- folio->_nr_pages_mapped by ENTIRELY_MAPPED when _entire_mapcount +- goes from -1 to 0 or 0 to -1. ++ folio->_entire_mapcount, increment/decrement folio->_large_mapcount ++ and also increment/decrement folio->_nr_pages_mapped by ENTIRELY_MAPPED ++ when _entire_mapcount goes from -1 to 0 or 0 to -1. + + - map/unmap of individual pages with PTE entry increment/decrement +- page->_mapcount and also increment/decrement folio->_nr_pages_mapped +- when page->_mapcount goes from -1 to 0 or 0 to -1 as this counts +- the number of pages mapped by PTE. ++ page->_mapcount, increment/decrement folio->_large_mapcount and also ++ increment/decrement folio->_nr_pages_mapped when page->_mapcount goes ++ from -1 to 0 or 0 to -1 as this counts the number of pages mapped by PTE. + + split_huge_page internally has to distribute the refcounts in the head + page to the tail pages before clearing all PG_head/tail bits from the page +--- a/include/linux/mm.h~mm-track-mapcount-of-large-folios-in-single-value ++++ a/include/linux/mm.h +@@ -1239,16 +1239,26 @@ static inline int page_mapcount(struct p + return mapcount; + } + +-int folio_total_mapcount(const struct folio *folio); ++static inline int folio_large_mapcount(const struct folio *folio) ++{ ++ VM_WARN_ON_FOLIO(!folio_test_large(folio), folio); ++ return atomic_read(&folio->_large_mapcount) + 1; ++} + + /** +- * folio_mapcount() - Calculate the number of mappings of this folio. ++ * folio_mapcount() - Number of mappings of this folio. + * @folio: The folio. + * +- * A large folio tracks both how many times the entire folio is mapped, +- * and how many times each individual page in the folio is mapped. +- * This function calculates the total number of times the folio is +- * mapped. ++ * The folio mapcount corresponds to the number of present user page table ++ * entries that reference any part of a folio. Each such present user page ++ * table entry must be paired with exactly on folio reference. ++ * ++ * For ordindary folios, each user page table entry (PTE/PMD/PUD/...) counts ++ * exactly once. ++ * ++ * For hugetlb folios, each abstracted "hugetlb" user page table entry that ++ * references the entire folio counts exactly once, even when such special ++ * page table entries are comprised of multiple ordinary page table entries. + * + * Return: The number of times this folio is mapped. + */ +@@ -1256,17 +1266,7 @@ static inline int folio_mapcount(const s + { + if (likely(!folio_test_large(folio))) + return atomic_read(&folio->_mapcount) + 1; +- return folio_total_mapcount(folio); +-} +- +-static inline bool folio_large_is_mapped(const struct folio *folio) +-{ +- /* +- * Reading _entire_mapcount below could be omitted if hugetlb +- * participated in incrementing nr_pages_mapped when compound mapped. +- */ +- return atomic_read(&folio->_nr_pages_mapped) > 0 || +- atomic_read(&folio->_entire_mapcount) >= 0; ++ return folio_large_mapcount(folio); + } + + /** +@@ -1275,11 +1275,9 @@ static inline bool folio_large_is_mapped + * + * Return: True if any page in this folio is referenced by user page tables. + */ +-static inline bool folio_mapped(struct folio *folio) ++static inline bool folio_mapped(const struct folio *folio) + { +- if (likely(!folio_test_large(folio))) +- return atomic_read(&folio->_mapcount) >= 0; +- return folio_large_is_mapped(folio); ++ return folio_mapcount(folio) >= 1; + } + + /* +@@ -1289,9 +1287,7 @@ static inline bool folio_mapped(struct f + */ + static inline bool page_mapped(const struct page *page) + { +- if (likely(!PageCompound(page))) +- return atomic_read(&page->_mapcount) >= 0; +- return folio_large_is_mapped(page_folio(page)); ++ return folio_mapped(page_folio(page)); + } + + static inline struct page *virt_to_head_page(const void *x) +--- a/include/linux/mm_types.h~mm-track-mapcount-of-large-folios-in-single-value ++++ a/include/linux/mm_types.h +@@ -289,7 +289,8 @@ typedef struct { + * @virtual: Virtual address in the kernel direct map. + * @_last_cpupid: IDs of last CPU and last process that accessed the folio. + * @_entire_mapcount: Do not use directly, call folio_entire_mapcount(). +- * @_nr_pages_mapped: Do not use directly, call folio_mapcount(). ++ * @_large_mapcount: Do not use directly, call folio_mapcount(). ++ * @_nr_pages_mapped: Do not use outside of rmap and debug code. + * @_pincount: Do not use directly, call folio_maybe_dma_pinned(). + * @_folio_nr_pages: Do not use directly, call folio_nr_pages(). + * @_hugetlb_subpool: Do not use directly, use accessor in hugetlb.h. +@@ -348,8 +349,8 @@ struct folio { + struct { + unsigned long _flags_1; + unsigned long _head_1; +- unsigned long _folio_avail; + /* public: */ ++ atomic_t _large_mapcount; + atomic_t _entire_mapcount; + atomic_t _nr_pages_mapped; + atomic_t _pincount; +--- a/include/linux/rmap.h~mm-track-mapcount-of-large-folios-in-single-value ++++ a/include/linux/rmap.h +@@ -273,6 +273,7 @@ static inline int hugetlb_try_dup_anon_r + ClearPageAnonExclusive(&folio->page); + } + atomic_inc(&folio->_entire_mapcount); ++ atomic_inc(&folio->_large_mapcount); + return 0; + } + +@@ -306,6 +307,7 @@ static inline void hugetlb_add_file_rmap + VM_WARN_ON_FOLIO(folio_test_anon(folio), folio); + + atomic_inc(&folio->_entire_mapcount); ++ atomic_inc(&folio->_large_mapcount); + } + + static inline void hugetlb_remove_rmap(struct folio *folio) +@@ -313,11 +315,14 @@ static inline void hugetlb_remove_rmap(s + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); + + atomic_dec(&folio->_entire_mapcount); ++ atomic_dec(&folio->_large_mapcount); + } + + static __always_inline void __folio_dup_file_rmap(struct folio *folio, + struct page *page, int nr_pages, enum rmap_level level) + { ++ const int orig_nr_pages = nr_pages; ++ + __folio_rmap_sanity_checks(folio, page, nr_pages, level); + + switch (level) { +@@ -330,9 +335,11 @@ static __always_inline void __folio_dup_ + do { + atomic_inc(&page->_mapcount); + } while (page++, --nr_pages > 0); ++ atomic_add(orig_nr_pages, &folio->_large_mapcount); + break; + case RMAP_LEVEL_PMD: + atomic_inc(&folio->_entire_mapcount); ++ atomic_inc(&folio->_large_mapcount); + break; + } + } +@@ -382,6 +389,7 @@ static __always_inline int __folio_try_d + struct page *page, int nr_pages, struct vm_area_struct *src_vma, + enum rmap_level level) + { ++ const int orig_nr_pages = nr_pages; + bool maybe_pinned; + int i; + +@@ -423,6 +431,7 @@ static __always_inline int __folio_try_d + ClearPageAnonExclusive(page); + atomic_inc(&page->_mapcount); + } while (page++, --nr_pages > 0); ++ atomic_add(orig_nr_pages, &folio->_large_mapcount); + break; + case RMAP_LEVEL_PMD: + if (PageAnonExclusive(page)) { +@@ -431,6 +440,7 @@ static __always_inline int __folio_try_d + ClearPageAnonExclusive(page); + } + atomic_inc(&folio->_entire_mapcount); ++ atomic_inc(&folio->_large_mapcount); + break; + } + return 0; +--- a/mm/debug.c~mm-track-mapcount-of-large-folios-in-single-value ++++ a/mm/debug.c +@@ -68,8 +68,9 @@ static void __dump_folio(struct folio *f + folio_ref_count(folio), mapcount, mapping, + folio->index + idx, pfn); + if (folio_test_large(folio)) { +- pr_warn("head: order:%u entire_mapcount:%d nr_pages_mapped:%d pincount:%d\n", ++ pr_warn("head: order:%u mapcount:%d entire_mapcount:%d nr_pages_mapped:%d pincount:%d\n", + folio_order(folio), ++ folio_mapcount(folio), + folio_entire_mapcount(folio), + folio_nr_pages_mapped(folio), + atomic_read(&folio->_pincount)); +--- a/mm/hugetlb.c~mm-track-mapcount-of-large-folios-in-single-value ++++ a/mm/hugetlb.c +@@ -1517,7 +1517,7 @@ static void __destroy_compound_gigantic_ + struct page *p; + + atomic_set(&folio->_entire_mapcount, 0); +- atomic_set(&folio->_nr_pages_mapped, 0); ++ atomic_set(&folio->_large_mapcount, 0); + atomic_set(&folio->_pincount, 0); + + for (i = 1; i < nr_pages; i++) { +@@ -2120,7 +2120,7 @@ static bool __prep_compound_gigantic_fol + /* we rely on prep_new_hugetlb_folio to set the hugetlb flag */ + folio_set_order(folio, order); + atomic_set(&folio->_entire_mapcount, -1); +- atomic_set(&folio->_nr_pages_mapped, 0); ++ atomic_set(&folio->_large_mapcount, -1); + atomic_set(&folio->_pincount, 0); + return true; + +--- a/mm/internal.h~mm-track-mapcount-of-large-folios-in-single-value ++++ a/mm/internal.h +@@ -72,6 +72,8 @@ void page_writeback_init(void); + /* + * How many individual pages have an elevated _mapcount. Excludes + * the folio's entire_mapcount. ++ * ++ * Don't use this function outside of debugging code. + */ + static inline int folio_nr_pages_mapped(const struct folio *folio) + { +@@ -611,6 +613,7 @@ static inline void prep_compound_head(st + struct folio *folio = (struct folio *)page; + + folio_set_order(folio, order); ++ atomic_set(&folio->_large_mapcount, -1); + atomic_set(&folio->_entire_mapcount, -1); + atomic_set(&folio->_nr_pages_mapped, 0); + atomic_set(&folio->_pincount, 0); +--- a/mm/khugepaged.c~mm-track-mapcount-of-large-folios-in-single-value ++++ a/mm/khugepaged.c +@@ -1358,7 +1358,7 @@ static int hpage_collapse_scan_pmd(struc + * Check if the page has any GUP (or other external) pins. + * + * Here the check may be racy: +- * it may see total_mapcount > refcount in some cases? ++ * it may see folio_mapcount() > folio_ref_count(). + * But such case is ephemeral we could always retry collapse + * later. However it may report false positive if the page + * has excessive GUP pins (i.e. 512). Anyway the same check +--- a/mm/page_alloc.c~mm-track-mapcount-of-large-folios-in-single-value ++++ a/mm/page_alloc.c +@@ -943,6 +943,10 @@ static int free_tail_page_prepare(struct + bad_page(page, "nonzero entire_mapcount"); + goto out; + } ++ if (unlikely(folio_large_mapcount(folio))) { ++ bad_page(page, "nonzero large_mapcount"); ++ goto out; ++ } + if (unlikely(atomic_read(&folio->_nr_pages_mapped))) { + bad_page(page, "nonzero nr_pages_mapped"); + goto out; +--- a/mm/rmap.c~mm-track-mapcount-of-large-folios-in-single-value ++++ a/mm/rmap.c +@@ -1138,34 +1138,12 @@ int pfn_mkclean_range(unsigned long pfn, + return page_vma_mkclean_one(&pvmw); + } + +-int folio_total_mapcount(const struct folio *folio) +-{ +- int mapcount = folio_entire_mapcount(folio); +- int nr_pages; +- int i; +- +- /* In the common case, avoid the loop when no pages mapped by PTE */ +- if (folio_nr_pages_mapped(folio) == 0) +- return mapcount; +- /* +- * Add all the PTE mappings of those pages mapped by PTE. +- * Limit the loop to folio_nr_pages_mapped()? +- * Perhaps: given all the raciness, that may be a good or a bad idea. +- */ +- nr_pages = folio_nr_pages(folio); +- for (i = 0; i < nr_pages; i++) +- mapcount += atomic_read(&folio_page(folio, i)->_mapcount); +- +- /* But each of those _mapcounts was based on -1 */ +- mapcount += nr_pages; +- return mapcount; +-} +- + static __always_inline unsigned int __folio_add_rmap(struct folio *folio, + struct page *page, int nr_pages, enum rmap_level level, + int *nr_pmdmapped) + { + atomic_t *mapped = &folio->_nr_pages_mapped; ++ const int orig_nr_pages = nr_pages; + int first, nr = 0; + + __folio_rmap_sanity_checks(folio, page, nr_pages, level); +@@ -1185,6 +1163,7 @@ static __always_inline unsigned int __fo + nr++; + } + } while (page++, --nr_pages > 0); ++ atomic_add(orig_nr_pages, &folio->_large_mapcount); + break; + case RMAP_LEVEL_PMD: + first = atomic_inc_and_test(&folio->_entire_mapcount); +@@ -1201,6 +1180,7 @@ static __always_inline unsigned int __fo + nr = 0; + } + } ++ atomic_inc(&folio->_large_mapcount); + break; + } + return nr; +@@ -1436,10 +1416,14 @@ void folio_add_new_anon_rmap(struct foli + SetPageAnonExclusive(page); + } + ++ /* increment count (starts at -1) */ ++ atomic_set(&folio->_large_mapcount, nr - 1); + atomic_set(&folio->_nr_pages_mapped, nr); + } else { + /* increment count (starts at -1) */ + atomic_set(&folio->_entire_mapcount, 0); ++ /* increment count (starts at -1) */ ++ atomic_set(&folio->_large_mapcount, 0); + atomic_set(&folio->_nr_pages_mapped, ENTIRELY_MAPPED); + SetPageAnonExclusive(&folio->page); + __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr); +@@ -1522,6 +1506,7 @@ static __always_inline void __folio_remo + break; + } + ++ atomic_sub(nr_pages, &folio->_large_mapcount); + do { + last = atomic_add_negative(-1, &page->_mapcount); + if (last) { +@@ -1532,6 +1517,7 @@ static __always_inline void __folio_remo + } while (page++, --nr_pages > 0); + break; + case RMAP_LEVEL_PMD: ++ atomic_dec(&folio->_large_mapcount); + last = atomic_add_negative(-1, &folio->_entire_mapcount); + if (last) { + nr = atomic_sub_return_relaxed(ENTIRELY_MAPPED, mapped); +@@ -2714,6 +2700,7 @@ void hugetlb_add_anon_rmap(struct folio + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); + + atomic_inc(&folio->_entire_mapcount); ++ atomic_inc(&folio->_large_mapcount); + if (flags & RMAP_EXCLUSIVE) + SetPageAnonExclusive(&folio->page); + VM_WARN_ON_FOLIO(folio_entire_mapcount(folio) > 1 && +@@ -2728,6 +2715,7 @@ void hugetlb_add_new_anon_rmap(struct fo + BUG_ON(address < vma->vm_start || address >= vma->vm_end); + /* increment count (starts at -1) */ + atomic_set(&folio->_entire_mapcount, 0); ++ atomic_set(&folio->_large_mapcount, 0); + folio_clear_hugetlb_restore_reserve(folio); + __folio_set_anon(folio, vma, address, true); + SetPageAnonExclusive(&folio->page); +_ diff --git a/patches/nilfs2-add-kernel-doc-comments-to-nilfs_btree_convert_and_insert.patch b/patches/nilfs2-add-kernel-doc-comments-to-nilfs_btree_convert_and_insert.patch new file mode 100644 index 000000000..55417a732 --- /dev/null +++ b/patches/nilfs2-add-kernel-doc-comments-to-nilfs_btree_convert_and_insert.patch @@ -0,0 +1,49 @@ +From: Yang Li <yang.lee@linux.alibaba.com> +Subject: nilfs2: add kernel-doc comments to nilfs_btree_convert_and_insert() +Date: Wed, 10 Apr 2024 16:56:28 +0900 + +This commit adds kernel-doc style comments with complete parameter +descriptions for the function nilfs_btree_convert_and_insert. + +Link: https://lkml.kernel.org/r/20240410075629.3441-3-konishi.ryusuke@gmail.com +Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> +Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + fs/nilfs2/btree.c | 23 ++++++++++++++++------- + 1 file changed, 16 insertions(+), 7 deletions(-) + +--- a/fs/nilfs2/btree.c~nilfs2-add-kernel-doc-comments-to-nilfs_btree_convert_and_insert ++++ a/fs/nilfs2/btree.c +@@ -1857,13 +1857,22 @@ nilfs_btree_commit_convert_and_insert(st + } + + /** +- * nilfs_btree_convert_and_insert - +- * @bmap: +- * @key: +- * @ptr: +- * @keys: +- * @ptrs: +- * @n: ++ * nilfs_btree_convert_and_insert - Convert and insert entries into a B-tree ++ * @btree: NILFS B-tree structure ++ * @key: Key of the new entry to be inserted ++ * @ptr: Pointer (block number) associated with the key to be inserted ++ * @keys: Array of keys to be inserted in addition to @key ++ * @ptrs: Array of pointers associated with @keys ++ * @n: Number of keys and pointers in @keys and @ptrs ++ * ++ * This function is used to insert a new entry specified by @key and @ptr, ++ * along with additional entries specified by @keys and @ptrs arrays, into a ++ * NILFS B-tree. ++ * It prepares the necessary changes by allocating the required blocks and any ++ * necessary intermediate nodes. It converts configurations from other forms of ++ * block mapping (the one that currently exists is direct mapping) to a B-tree. ++ * ++ * Return: 0 on success or a negative error code on failure. + */ + int nilfs_btree_convert_and_insert(struct nilfs_bmap *btree, + __u64 key, __u64 ptr, +_ diff --git a/patches/nilfs2-add-kernel-doc-comments-to-nilfs_do_roll_forward.patch b/patches/nilfs2-add-kernel-doc-comments-to-nilfs_do_roll_forward.patch new file mode 100644 index 000000000..8291e214e --- /dev/null +++ b/patches/nilfs2-add-kernel-doc-comments-to-nilfs_do_roll_forward.patch @@ -0,0 +1,30 @@ +From: Yang Li <yang.lee@linux.alibaba.com> +Subject: nilfs2: add kernel-doc comments to nilfs_do_roll_forward() +Date: Wed, 10 Apr 2024 16:56:27 +0900 + +Patch series "nilfs2: fix missing kernel-doc comments". + +This commit adds kernel-doc style comments with complete parameter +descriptions for the function nilfs_do_roll_forward. + +Link: https://lkml.kernel.org/r/20240410075629.3441-1-konishi.ryusuke@gmail.com +Link: https://lkml.kernel.org/r/20240410075629.3441-2-konishi.ryusuke@gmail.com +Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> +Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + fs/nilfs2/recovery.c | 1 + + 1 file changed, 1 insertion(+) + +--- a/fs/nilfs2/recovery.c~nilfs2-add-kernel-doc-comments-to-nilfs_do_roll_forward ++++ a/fs/nilfs2/recovery.c +@@ -563,6 +563,7 @@ static int nilfs_recover_dsync_blocks(st + * checkpoint + * @nilfs: nilfs object + * @sb: super block instance ++ * @root: NILFS root instance + * @ri: pointer to a nilfs_recovery_info + */ + static int nilfs_do_roll_forward(struct the_nilfs *nilfs, +_ diff --git a/patches/nilfs2-add-kernel-doc-comments-to-nilfs_remove_all_gcinodes.patch b/patches/nilfs2-add-kernel-doc-comments-to-nilfs_remove_all_gcinodes.patch new file mode 100644 index 000000000..e41ffabf2 --- /dev/null +++ b/patches/nilfs2-add-kernel-doc-comments-to-nilfs_remove_all_gcinodes.patch @@ -0,0 +1,27 @@ +From: Yang Li <yang.lee@linux.alibaba.com> +Subject: nilfs2: add kernel-doc comments to nilfs_remove_all_gcinodes() +Date: Wed, 10 Apr 2024 16:56:29 +0900 + +This commit adds kernel-doc style comments with complete parameter +descriptions for the function nilfs_remove_all_gcinodes. + +Link: https://lkml.kernel.org/r/20240410075629.3441-4-konishi.ryusuke@gmail.com +Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> +Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + fs/nilfs2/gcinode.c | 1 + + 1 file changed, 1 insertion(+) + +--- a/fs/nilfs2/gcinode.c~nilfs2-add-kernel-doc-comments-to-nilfs_remove_all_gcinodes ++++ a/fs/nilfs2/gcinode.c +@@ -175,6 +175,7 @@ int nilfs_init_gcinode(struct inode *ino + + /** + * nilfs_remove_all_gcinodes() - remove all unprocessed gc inodes ++ * @nilfs: NILFS filesystem instance + */ + void nilfs_remove_all_gcinodes(struct the_nilfs *nilfs) + { +_ diff --git a/patches/arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch b/patches/old/arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch index 05948b801..05948b801 100644 --- a/patches/arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch +++ b/patches/old/arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch diff --git a/patches/arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch b/patches/old/arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch index 868cd6d1f..868cd6d1f 100644 --- a/patches/arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch +++ b/patches/old/arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch diff --git a/patches/sh-mm-cache-use-folio_mapped-in-copy_from_user_page.patch b/patches/sh-mm-cache-use-folio_mapped-in-copy_from_user_page.patch new file mode 100644 index 000000000..ee7cbdb09 --- /dev/null +++ b/patches/sh-mm-cache-use-folio_mapped-in-copy_from_user_page.patch @@ -0,0 +1,51 @@ +From: David Hildenbrand <david@redhat.com> +Subject: sh/mm/cache: use folio_mapped() in copy_from_user_page() +Date: Tue, 9 Apr 2024 21:22:55 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. + +We're already using folio_mapped in copy_user_highpage() and +copy_to_user_page() for a similar purpose so ... let's also simply use it +for copy_from_user_page(). + +There is no change for small folios. Likely we won't stumble over many +large folios on sh in that code either way. + +Link: https://lkml.kernel.org/r/20240409192301.907377-13-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + arch/sh/mm/cache.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/arch/sh/mm/cache.c~sh-mm-cache-use-folio_mapped-in-copy_from_user_page ++++ a/arch/sh/mm/cache.c +@@ -84,7 +84,7 @@ void copy_from_user_page(struct vm_area_ + { + struct folio *folio = page_folio(page); + +- if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) && ++ if (boot_cpu_data.dcache.n_aliases && folio_mapped(folio) && + test_bit(PG_dcache_clean, &folio->flags)) { + void *vfrom = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK); + memcpy(dst, vfrom, len); +_ diff --git a/patches/trace-events-page_ref-trace-the-raw-page-mapcount-value.patch b/patches/trace-events-page_ref-trace-the-raw-page-mapcount-value.patch new file mode 100644 index 000000000..1dfb32b14 --- /dev/null +++ b/patches/trace-events-page_ref-trace-the-raw-page-mapcount-value.patch @@ -0,0 +1,59 @@ +From: David Hildenbrand <david@redhat.com> +Subject: trace/events/page_ref: trace the raw page mapcount value +Date: Tue, 9 Apr 2024 21:22:58 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. We already trace raw page->refcount, raw +page->flags and raw page->mapping, and don't involve any folios. Let's +also trace the raw mapcount value that does not consider the entire +mapcount of large folios, and we don't add "1" to it. + +When dealing with typed folios, this makes a lot more sense. ... and +it's for debugging purposes only either way. + +Link: https://lkml.kernel.org/r/20240409192301.907377-16-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + include/trace/events/page_ref.h | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +--- a/include/trace/events/page_ref.h~trace-events-page_ref-trace-the-raw-page-mapcount-value ++++ a/include/trace/events/page_ref.h +@@ -30,7 +30,7 @@ DECLARE_EVENT_CLASS(page_ref_mod_templat + __entry->pfn = page_to_pfn(page); + __entry->flags = page->flags; + __entry->count = page_ref_count(page); +- __entry->mapcount = page_mapcount(page); ++ __entry->mapcount = atomic_read(&page->_mapcount); + __entry->mapping = page->mapping; + __entry->mt = get_pageblock_migratetype(page); + __entry->val = v; +@@ -79,7 +79,7 @@ DECLARE_EVENT_CLASS(page_ref_mod_and_tes + __entry->pfn = page_to_pfn(page); + __entry->flags = page->flags; + __entry->count = page_ref_count(page); +- __entry->mapcount = page_mapcount(page); ++ __entry->mapcount = atomic_read(&page->_mapcount); + __entry->mapping = page->mapping; + __entry->mt = get_pageblock_migratetype(page); + __entry->val = v; +_ diff --git a/patches/xtensa-mm-convert-check_tlb_entry-to-sanity-check-folios.patch b/patches/xtensa-mm-convert-check_tlb_entry-to-sanity-check-folios.patch new file mode 100644 index 000000000..4974b8551 --- /dev/null +++ b/patches/xtensa-mm-convert-check_tlb_entry-to-sanity-check-folios.patch @@ -0,0 +1,63 @@ +From: David Hildenbrand <david@redhat.com> +Subject: xtensa/mm: convert check_tlb_entry() to sanity check folios +Date: Tue, 9 Apr 2024 21:22:59 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. So let's convert check_tlb_entry() to perform +sanity checks on folios instead of pages. + +This essentially already happened: page_count() is mapped to +folio_ref_count(), and page_mapped() to folio_mapped() internally. +However, we would have printed the page_mapount(), which does not really +match what page_mapped() would have checked. + +Let's simply print the folio mapcount to avoid using page_mapcount(). For +small folios there is no change. + +Link: https://lkml.kernel.org/r/20240409192301.907377-17-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> +Signed-off-by: Andrew Morton <akpm@linux-foundation.org> +--- + + arch/xtensa/mm/tlb.c | 11 ++++++----- + 1 file changed, 6 insertions(+), 5 deletions(-) + +--- a/arch/xtensa/mm/tlb.c~xtensa-mm-convert-check_tlb_entry-to-sanity-check-folios ++++ a/arch/xtensa/mm/tlb.c +@@ -256,12 +256,13 @@ static int check_tlb_entry(unsigned w, u + dtlb ? 'D' : 'I', w, e, r0, r1, pte); + if (pte == 0 || !pte_present(__pte(pte))) { + struct page *p = pfn_to_page(r1 >> PAGE_SHIFT); +- pr_err("page refcount: %d, mapcount: %d\n", +- page_count(p), +- page_mapcount(p)); +- if (!page_count(p)) ++ struct folio *f = page_folio(p); ++ ++ pr_err("folio refcount: %d, mapcount: %d\n", ++ folio_ref_count(f), folio_mapcount(f)); ++ if (!folio_ref_count(f)) + rc |= TLB_INSANE; +- else if (page_mapcount(p)) ++ else if (folio_mapped(f)) + rc |= TLB_SUSPICIOUS; + } else { + rc |= TLB_INSANE; +_ diff --git a/pc/arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.pc b/pc/arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.pc deleted file mode 100644 index af2fa0892..000000000 --- a/pc/arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.pc +++ /dev/null @@ -1 +0,0 @@ -arch/arm/mm/fault.c diff --git a/pc/arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.pc b/pc/arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.pc deleted file mode 100644 index a7784d3ec..000000000 --- a/pc/arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.pc +++ /dev/null @@ -1 +0,0 @@ -arch/arm64/mm/fault.c diff --git a/pc/devel-series b/pc/devel-series index a9416dd3b..2388e4e54 100644 --- a/pc/devel-series +++ b/pc/devel-series @@ -94,6 +94,8 @@ mmpage_owner-defer-enablement-of-static-branch.patch # mm-shmem-inline-shmem_is_huge-for-disabled-transparent-hugepages.patch # +fork-defer-linking-file-vma-until-vma-is-fully-initialized.patch +# ### hfe # #ENDBRANCH mm-hotfixes-unstable @@ -252,6 +254,7 @@ mm-page_alloc-close-migratetype-race-between-freeing-and-stealing.patch mm-page_alloc-set-migratetype-inside-move_freepages.patch mm-page_isolation-prepare-for-hygienic-freelists.patch mm-page_isolation-prepare-for-hygienic-freelists-fix.patch +#mm-page_alloc-consolidate-free-page-accounting.patch: check review mm-page_alloc-consolidate-free-page-accounting.patch mm-page_alloc-consolidate-free-page-accounting-fix.patch mm-page_alloc-consolidate-free-page-accounting-fix-2.patch @@ -343,8 +346,8 @@ mm-slab-move-slab_memcg-hooks-to-mm-memcontrolc.patch # #mm-move-array-mem_section-init-code-out-of-memory_present.patch: https://lkml.kernel.org/r/Zgu_jjcLtEF-TlUj@kernel.org mm-move-array-mem_section-init-code-out-of-memory_present.patch -#mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node.patch: https://lkml.kernel.org/r/ZgvCuzgaJaXkAucR@kernel.org mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node.patch +mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node-v2.patch mm-make-__absent_pages_in_range-as-static.patch mm-page_allocc-remove-unneeded-codes-in-numa-version-of-build_zonelists.patch mm-page_allocc-remove-unneeded-codes-in-numa-version-of-build_zonelists-v2.patch @@ -482,9 +485,6 @@ riscv-mm-accelerate-pagefault-when-badaccess-fix.patch s390-mm-accelerate-pagefault-when-badaccess.patch x86-mm-accelerate-pagefault-when-badaccess.patch # -#arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch: https://lkml.kernel.org/r/ZhVQnM9hAdpt5WjT@arm.com -arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch -arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch # mm-remove-struct-page-from-get_shadow_from_swap_cache.patch # @@ -526,7 +526,7 @@ mm-add-per-order-mthp-anon_alloc-and-anon_alloc_fallback-counters-fix.patch mm-add-per-order-mthp-anon_swpout-and-anon_swpout_fallback-counters.patch # memory-tier-dax-kmem-introduce-an-abstract-layer-for-finding-allocating-and-putting-memory-types.patch -#memory-tier-create-cpuless-memory-tiers-after-obtaining-hmat-info.patch: https://lkml.kernel.org/r/20240405150244.00004b49@Huawei.com TBU? +#memory-tier-create-cpuless-memory-tiers-after-obtaining-hmat-info.patch: https://lkml.kernel.org/r/20240405150244.00004b49@Huawei.com TBU? check review memory-tier-create-cpuless-memory-tiers-after-obtaining-hmat-info.patch # mm-mmap-make-vma_wants_writenotify-return-bool.patch @@ -537,6 +537,29 @@ mmswap-add-document-about-rcu-read-lock-and-swapoff-interaction.patch # mm-always-sanity-check-anon_vma-first-for-per-vma-locks.patch # +drivers-virt-acrn-fix-pfnmap-pte-checks-in-acrn_vm_ram_map.patch +mm-pass-vma-instead-of-mm-to-follow_pte.patch +mm-follow_pte-improvements.patch +# +mm-allow-for-detecting-underflows-with-page_mapcount-again.patch +mm-rmap-always-inline-anon-file-rmap-duplication-of-a-single-pte.patch +mm-rmap-add-fast-path-for-small-folios-when-adding-removing-duplicating.patch +mm-track-mapcount-of-large-folios-in-single-value.patch +mm-improve-folio_likely_mapped_shared-using-the-mapcount-of-large-folios.patch +mm-make-folio_mapcount-return-0-for-small-typed-folios.patch +mm-memory-use-folio_mapcount-in-zap_present_folio_ptes.patch +mm-huge_memory-use-folio_mapcount-in-zap_huge_pmd-sanity-check.patch +mm-memory-failure-use-folio_mapcount-in-hwpoison_user_mappings.patch +mm-page_alloc-use-folio_mapped-in-__alloc_contig_migrate_range.patch +mm-migrate-use-folio_likely_mapped_shared-in-add_page_for_migration.patch +sh-mm-cache-use-folio_mapped-in-copy_from_user_page.patch +mm-filemap-use-folio_mapcount-in-filemap_unaccount_folio.patch +mm-migrate_device-use-folio_mapcount-in-migrate_vma_check_page.patch +trace-events-page_ref-trace-the-raw-page-mapcount-value.patch +xtensa-mm-convert-check_tlb_entry-to-sanity-check-folios.patch +mm-debug-print-only-page-mapcount-excluding-folio-entire-mapcount-in-__dump_folio.patch +documentation-admin-guide-cgroup-v1-memoryrst-dont-reference-page_mapcount.patch +# # # # @@ -661,4 +684,8 @@ test_hexdump-avoid-string-truncation-warning.patch block-partitions-ldm-convert-strncpy-to-strscpy.patch blktrace-convert-strncpy-to-strscpy_pad.patch # +nilfs2-add-kernel-doc-comments-to-nilfs_do_roll_forward.patch +nilfs2-add-kernel-doc-comments-to-nilfs_btree_convert_and_insert.patch +nilfs2-add-kernel-doc-comments-to-nilfs_remove_all_gcinodes.patch +# #ENDBRANCH mm-nonmm-unstable diff --git a/pc/documentation-admin-guide-cgroup-v1-memoryrst-dont-reference-page_mapcount.pc b/pc/documentation-admin-guide-cgroup-v1-memoryrst-dont-reference-page_mapcount.pc new file mode 100644 index 000000000..2698a376c --- /dev/null +++ b/pc/documentation-admin-guide-cgroup-v1-memoryrst-dont-reference-page_mapcount.pc @@ -0,0 +1 @@ +Documentation/admin-guide/cgroup-v1/memory.rst diff --git a/pc/drivers-virt-acrn-fix-pfnmap-pte-checks-in-acrn_vm_ram_map.pc b/pc/drivers-virt-acrn-fix-pfnmap-pte-checks-in-acrn_vm_ram_map.pc new file mode 100644 index 000000000..68cb88310 --- /dev/null +++ b/pc/drivers-virt-acrn-fix-pfnmap-pte-checks-in-acrn_vm_ram_map.pc @@ -0,0 +1 @@ +drivers/virt/acrn/mm.c diff --git a/pc/fork-defer-linking-file-vma-until-vma-is-fully-initialized.pc b/pc/fork-defer-linking-file-vma-until-vma-is-fully-initialized.pc new file mode 100644 index 000000000..8c04222ff --- /dev/null +++ b/pc/fork-defer-linking-file-vma-until-vma-is-fully-initialized.pc @@ -0,0 +1 @@ +kernel/fork.c diff --git a/pc/mm-allow-for-detecting-underflows-with-page_mapcount-again.pc b/pc/mm-allow-for-detecting-underflows-with-page_mapcount-again.pc new file mode 100644 index 000000000..476581c1d --- /dev/null +++ b/pc/mm-allow-for-detecting-underflows-with-page_mapcount-again.pc @@ -0,0 +1 @@ +include/linux/mm.h diff --git a/pc/mm-debug-print-only-page-mapcount-excluding-folio-entire-mapcount-in-__dump_folio.pc b/pc/mm-debug-print-only-page-mapcount-excluding-folio-entire-mapcount-in-__dump_folio.pc new file mode 100644 index 000000000..35593df66 --- /dev/null +++ b/pc/mm-debug-print-only-page-mapcount-excluding-folio-entire-mapcount-in-__dump_folio.pc @@ -0,0 +1 @@ +mm/debug.c diff --git a/pc/mm-filemap-use-folio_mapcount-in-filemap_unaccount_folio.pc b/pc/mm-filemap-use-folio_mapcount-in-filemap_unaccount_folio.pc new file mode 100644 index 000000000..cc4355cce --- /dev/null +++ b/pc/mm-filemap-use-folio_mapcount-in-filemap_unaccount_folio.pc @@ -0,0 +1 @@ +mm/filemap.c diff --git a/pc/mm-huge_memory-use-folio_mapcount-in-zap_huge_pmd-sanity-check.pc b/pc/mm-huge_memory-use-folio_mapcount-in-zap_huge_pmd-sanity-check.pc new file mode 100644 index 000000000..b35bccbe3 --- /dev/null +++ b/pc/mm-huge_memory-use-folio_mapcount-in-zap_huge_pmd-sanity-check.pc @@ -0,0 +1 @@ +mm/huge_memory.c diff --git a/pc/mm-improve-folio_likely_mapped_shared-using-the-mapcount-of-large-folios.pc b/pc/mm-improve-folio_likely_mapped_shared-using-the-mapcount-of-large-folios.pc new file mode 100644 index 000000000..476581c1d --- /dev/null +++ b/pc/mm-improve-folio_likely_mapped_shared-using-the-mapcount-of-large-folios.pc @@ -0,0 +1 @@ +include/linux/mm.h diff --git a/pc/mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node-v2.pc b/pc/mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node-v2.pc new file mode 100644 index 000000000..f9a04a055 --- /dev/null +++ b/pc/mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node-v2.pc @@ -0,0 +1 @@ +mm/mm_init.c diff --git a/pc/mm-make-folio_mapcount-return-0-for-small-typed-folios.pc b/pc/mm-make-folio_mapcount-return-0-for-small-typed-folios.pc new file mode 100644 index 000000000..476581c1d --- /dev/null +++ b/pc/mm-make-folio_mapcount-return-0-for-small-typed-folios.pc @@ -0,0 +1 @@ +include/linux/mm.h diff --git a/pc/mm-memory-failure-use-folio_mapcount-in-hwpoison_user_mappings.pc b/pc/mm-memory-failure-use-folio_mapcount-in-hwpoison_user_mappings.pc new file mode 100644 index 000000000..709648673 --- /dev/null +++ b/pc/mm-memory-failure-use-folio_mapcount-in-hwpoison_user_mappings.pc @@ -0,0 +1 @@ +mm/memory-failure.c diff --git a/pc/mm-memory-use-folio_mapcount-in-zap_present_folio_ptes.pc b/pc/mm-memory-use-folio_mapcount-in-zap_present_folio_ptes.pc new file mode 100644 index 000000000..cf949a50f --- /dev/null +++ b/pc/mm-memory-use-folio_mapcount-in-zap_present_folio_ptes.pc @@ -0,0 +1 @@ +mm/memory.c diff --git a/pc/mm-migrate-use-folio_likely_mapped_shared-in-add_page_for_migration.pc b/pc/mm-migrate-use-folio_likely_mapped_shared-in-add_page_for_migration.pc new file mode 100644 index 000000000..5cf4c320d --- /dev/null +++ b/pc/mm-migrate-use-folio_likely_mapped_shared-in-add_page_for_migration.pc @@ -0,0 +1 @@ +mm/migrate.c diff --git a/pc/mm-migrate_device-use-folio_mapcount-in-migrate_vma_check_page.pc b/pc/mm-migrate_device-use-folio_mapcount-in-migrate_vma_check_page.pc new file mode 100644 index 000000000..fc348dde7 --- /dev/null +++ b/pc/mm-migrate_device-use-folio_mapcount-in-migrate_vma_check_page.pc @@ -0,0 +1 @@ +mm/migrate_device.c diff --git a/pc/mm-page_alloc-use-folio_mapped-in-__alloc_contig_migrate_range.pc b/pc/mm-page_alloc-use-folio_mapped-in-__alloc_contig_migrate_range.pc new file mode 100644 index 000000000..5a02802e5 --- /dev/null +++ b/pc/mm-page_alloc-use-folio_mapped-in-__alloc_contig_migrate_range.pc @@ -0,0 +1 @@ +mm/page_alloc.c diff --git a/pc/mm-pass-vma-instead-of-mm-to-follow_pte.pc b/pc/mm-pass-vma-instead-of-mm-to-follow_pte.pc new file mode 100644 index 000000000..69aa23dd7 --- /dev/null +++ b/pc/mm-pass-vma-instead-of-mm-to-follow_pte.pc @@ -0,0 +1,7 @@ +arch/s390/pci/pci_mmio.c +arch/x86/mm/pat/memtype.c +drivers/vfio/vfio_iommu_type1.c +drivers/virt/acrn/mm.c +include/linux/mm.h +mm/memory.c +virt/kvm/kvm_main.c diff --git a/pc/mm-rmap-add-fast-path-for-small-folios-when-adding-removing-duplicating.pc b/pc/mm-rmap-add-fast-path-for-small-folios-when-adding-removing-duplicating.pc new file mode 100644 index 000000000..361b696c4 --- /dev/null +++ b/pc/mm-rmap-add-fast-path-for-small-folios-when-adding-removing-duplicating.pc @@ -0,0 +1,2 @@ +include/linux/rmap.h +mm/rmap.c diff --git a/pc/mm-rmap-always-inline-anon-file-rmap-duplication-of-a-single-pte.pc b/pc/mm-rmap-always-inline-anon-file-rmap-duplication-of-a-single-pte.pc new file mode 100644 index 000000000..54f1370cf --- /dev/null +++ b/pc/mm-rmap-always-inline-anon-file-rmap-duplication-of-a-single-pte.pc @@ -0,0 +1 @@ +include/linux/rmap.h diff --git a/pc/mm-track-mapcount-of-large-folios-in-single-value.pc b/pc/mm-track-mapcount-of-large-folios-in-single-value.pc new file mode 100644 index 000000000..76f188259 --- /dev/null +++ b/pc/mm-track-mapcount-of-large-folios-in-single-value.pc @@ -0,0 +1,10 @@ +Documentation/mm/transhuge.rst +include/linux/mm.h +include/linux/mm_types.h +include/linux/rmap.h +mm/debug.c +mm/hugetlb.c +mm/internal.h +mm/khugepaged.c +mm/page_alloc.c +mm/rmap.c diff --git a/pc/nilfs2-add-kernel-doc-comments-to-nilfs_btree_convert_and_insert.pc b/pc/nilfs2-add-kernel-doc-comments-to-nilfs_btree_convert_and_insert.pc new file mode 100644 index 000000000..10f3d8ae7 --- /dev/null +++ b/pc/nilfs2-add-kernel-doc-comments-to-nilfs_btree_convert_and_insert.pc @@ -0,0 +1 @@ +fs/nilfs2/btree.c diff --git a/pc/nilfs2-add-kernel-doc-comments-to-nilfs_do_roll_forward.pc b/pc/nilfs2-add-kernel-doc-comments-to-nilfs_do_roll_forward.pc new file mode 100644 index 000000000..016c94782 --- /dev/null +++ b/pc/nilfs2-add-kernel-doc-comments-to-nilfs_do_roll_forward.pc @@ -0,0 +1 @@ +fs/nilfs2/recovery.c diff --git a/pc/nilfs2-add-kernel-doc-comments-to-nilfs_remove_all_gcinodes.pc b/pc/nilfs2-add-kernel-doc-comments-to-nilfs_remove_all_gcinodes.pc new file mode 100644 index 000000000..a05a4e3d2 --- /dev/null +++ b/pc/nilfs2-add-kernel-doc-comments-to-nilfs_remove_all_gcinodes.pc @@ -0,0 +1 @@ +fs/nilfs2/gcinode.c diff --git a/pc/sh-mm-cache-use-folio_mapped-in-copy_from_user_page.pc b/pc/sh-mm-cache-use-folio_mapped-in-copy_from_user_page.pc new file mode 100644 index 000000000..b0cc6688f --- /dev/null +++ b/pc/sh-mm-cache-use-folio_mapped-in-copy_from_user_page.pc @@ -0,0 +1 @@ +arch/sh/mm/cache.c diff --git a/pc/trace-events-page_ref-trace-the-raw-page-mapcount-value.pc b/pc/trace-events-page_ref-trace-the-raw-page-mapcount-value.pc new file mode 100644 index 000000000..1f91aeaf1 --- /dev/null +++ b/pc/trace-events-page_ref-trace-the-raw-page-mapcount-value.pc @@ -0,0 +1 @@ +include/trace/events/page_ref.h diff --git a/pc/xtensa-mm-convert-check_tlb_entry-to-sanity-check-folios.pc b/pc/xtensa-mm-convert-check_tlb_entry-to-sanity-check-folios.pc new file mode 100644 index 000000000..0f03be441 --- /dev/null +++ b/pc/xtensa-mm-convert-check_tlb_entry-to-sanity-check-folios.pc @@ -0,0 +1 @@ +arch/xtensa/mm/tlb.c diff --git a/txt/documentation-admin-guide-cgroup-v1-memoryrst-dont-reference-page_mapcount.txt b/txt/documentation-admin-guide-cgroup-v1-memoryrst-dont-reference-page_mapcount.txt new file mode 100644 index 000000000..e49dd95ef --- /dev/null +++ b/txt/documentation-admin-guide-cgroup-v1-memoryrst-dont-reference-page_mapcount.txt @@ -0,0 +1,25 @@ +From: David Hildenbrand <david@redhat.com> +Subject: Documentation/admin-guide/cgroup-v1/memory.rst: don't reference page_mapcount() +Date: Tue, 9 Apr 2024 21:23:01 +0200 + +Let's stop talking about page_mapcount(). + +Link: https://lkml.kernel.org/r/20240409192301.907377-19-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/drivers-virt-acrn-fix-pfnmap-pte-checks-in-acrn_vm_ram_map.txt b/txt/drivers-virt-acrn-fix-pfnmap-pte-checks-in-acrn_vm_ram_map.txt new file mode 100644 index 000000000..77e13890c --- /dev/null +++ b/txt/drivers-virt-acrn-fix-pfnmap-pte-checks-in-acrn_vm_ram_map.txt @@ -0,0 +1,51 @@ +From: David Hildenbrand <david@redhat.com> +Subject: drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map() +Date: Wed, 10 Apr 2024 17:55:25 +0200 + +Patch series "mm: follow_pte() improvements and acrn follow_pte() fixes". + +Patch #1 fixes a bunch of issues I spotted in the acrn driver. It +compiles, that's all I know. I'll appreciate some review and testing from +acrn folks. + +Patch #2+#3 improve follow_pte(), passing a VMA instead of the MM, adding +more sanity checks, and improving the documentation. Gave it a quick test +on x86-64 using VM_PAT that ends up using follow_pte(). + + +This patch (of 3): + +We currently miss handling various cases, resulting in a dangerous +follow_pte() (previously follow_pfn()) usage. + +(1) We're not checking PTE write permissions. + +Maybe we should simply always require pte_write() like we do for +pin_user_pages_fast(FOLL_WRITE)? Hard to tell, so let's check for +ACRN_MEM_ACCESS_WRITE for now. + +(2) We're not rejecting refcounted pages. + +As we are not using MMU notifiers, messing with refcounted pages is +dangerous and can result in use-after-free. Let's make sure to reject them. + +(3) We are only looking at the first PTE of a bigger range. + +We only lookup a single PTE, but memmap->len may span a larger area. +Let's loop over all involved PTEs and make sure the PFN range is +actually contiguous. Reject everything else: it couldn't have worked +either way, and rather made use access PFNs we shouldn't be accessing. + +Link: https://lkml.kernel.org/r/20240410155527.474777-1-david@redhat.com +Link: https://lkml.kernel.org/r/20240410155527.474777-2-david@redhat.com +Fixes: 8a6e85f75a83 ("virt: acrn: obtain pa from VMA with PFNMAP flag") +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Alex Williamson <alex.williamson@redhat.com> +Cc: Christoph Hellwig <hch@lst.de> +Cc: Fei Li <fei1.li@intel.com> +Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> +Cc: Heiko Carstens <hca@linux.ibm.com> +Cc: Ingo Molnar <mingo@redhat.com> +Cc: Paolo Bonzini <pbonzini@redhat.com> +Cc: Yonghua Huang <yonghua.huang@intel.com> +Cc: Sean Christopherson <seanjc@google.com> diff --git a/txt/fork-defer-linking-file-vma-until-vma-is-fully-initialized.txt b/txt/fork-defer-linking-file-vma-until-vma-is-fully-initialized.txt new file mode 100644 index 000000000..630d6474b --- /dev/null +++ b/txt/fork-defer-linking-file-vma-until-vma-is-fully-initialized.txt @@ -0,0 +1,44 @@ +From: Miaohe Lin <linmiaohe@huawei.com> +Subject: fork: defer linking file vma until vma is fully initialized +Date: Wed, 10 Apr 2024 17:14:41 +0800 + +Thorvald reported a WARNING [1]. And the root cause is below race: + + CPU 1 CPU 2 + fork hugetlbfs_fallocate + dup_mmap hugetlbfs_punch_hole + i_mmap_lock_write(mapping); + vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree. + i_mmap_unlock_write(mapping); + hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem! + i_mmap_lock_write(mapping); + hugetlb_vmdelete_list + vma_interval_tree_foreach + hugetlb_vma_trylock_write -- Vma_lock is cleared. + tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem! + hugetlb_vma_unlock_write -- Vma_lock is assigned!!! + i_mmap_unlock_write(mapping); + +hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside +i_mmap_rwsem lock while vma lock can be used in the same time. Fix this +by deferring linking file vma until vma is fully initialized. Those vmas +should be initialized first before they can be used. + +Link: https://lkml.kernel.org/r/20240410091441.3539905-1-linmiaohe@huawei.com +Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing") +Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> +Reported-by: Thorvald Natvig <thorvald@google.com> +Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1] +Cc: Christian Brauner <brauner@kernel.org> +Cc: Heiko Carstens <hca@linux.ibm.com> +Cc: Jane Chu <jane.chu@oracle.com> +Cc: Kent Overstreet <kent.overstreet@linux.dev> +Cc: Liam R. Howlett <Liam.Howlett@oracle.com> +Cc: Mateusz Guzik <mjguzik@gmail.com> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Oleg Nesterov <oleg@redhat.com> +Cc: Peng Zhang <zhangpeng.00@bytedance.com> +Cc: Tycho Andersen <tandersen@netflix.com> +Cc: <stable@vger.kernel.org> diff --git a/txt/mm-allow-for-detecting-underflows-with-page_mapcount-again.txt b/txt/mm-allow-for-detecting-underflows-with-page_mapcount-again.txt new file mode 100644 index 000000000..9fac4cb3e --- /dev/null +++ b/txt/mm-allow-for-detecting-underflows-with-page_mapcount-again.txt @@ -0,0 +1,110 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm: allow for detecting underflows with page_mapcount() again +Date: Tue, 9 Apr 2024 21:22:44 +0200 + +Patch series "mm: mapcount for large folios + page_mapcount() cleanups". + +This series tracks the mapcount of large folios in a single value, so it +can be read efficiently and atomically, just like the mapcount of small +folios. + +folio_mapcount() is then used in a couple more places, most notably to +reduce false negatives in folio_likely_mapped_shared(), and many users of +page_mapcount() are cleaned up (that's maybe why you got CCed on the full +series, sorry sh+xtensa folks! :) ). + +The remaining s390x user and one KSM user of page_mapcount() are getting +removed separately on the list right now. I have patches to handle the +other KSM one, the khugepaged one and the kpagecount one; as they are not +as "obvious", I will send them out separately in the future. Once that is +all in place, I'm planning on moving page_mapcount() into +fs/proc/task_mmu.c, the remaining user for the time being (and we can +discuss at LSF/MM details on that :) ). + +I proposed the mapcount for large folios (previously called total +mapcount) originally in part of [1] and I later included it in [2] where +it is a requirement. In the meantime, I changed the patch a bit so I +dropped all RB's. During the discussion of [1], Peter Xu correctly raised +that this additional tracking might affect the performance when PMD->PTE +remapping THPs. In the meantime. I addressed that by batching RMAP +operations during fork(), unmap/zap and when PMD->PTE remapping THPs. + +Running some of my micro-benchmarks [3] (fork,munmap,cow-byte,remap) on 1 +GiB of memory backed by folios with the same order, I observe the +following on an Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz tuned for +reproducible results as much as possible: + +Standard deviation is mostly < 1%, except for order-9, where it's < 2% for +fork() and munmap(). + +(1) Small folios are not affected (< 1%) in all 4 microbenchmarks. +(2) Order-4 folios are not affected (< 1%) in all 4 microbenchmarks. A bit + weird comapred to the other orders ... +(3) PMD->PTE remapping of order-9 THPs is not affected (< 1%) +(4) COW-byte (COWing a single page by writing a single byte) is not + affected for any order (< 1 %). The page copy_fault overhead dominates + everything. +(5) fork() is mostly not affected (< 1%), except order-2, where we have + a slowdown of ~4%. Already for order-3 folios, we're down to a slowdown + of < 1%. +(6) munmap() sees a slowdown by < 3% for some orders (order-5, + order-6, order-9), but less for others (< 1% for order-4 and order-8, + < 2% for order-2, order-3, order-7). + +Especially the fork() and munmap() benchmark are sensitive to each added +instruction and other system noise, so I suspect some of the change and +observed weirdness (order-4) is due to code layout changes and other +factors, but not really due to the added atomics. + +So in the common case where we can batch, the added atomics don't really +make a big difference, especially in light of the recent improvements for +large folios that we recently gained due to batching. Surprisingly, for +some cases where we cannot batch (e.g., COW), the added atomics don't seem +to matter, because other overhead dominates. + +My fork and munmap micro-benchmarks don't cover cases where we cannot +batch-process bigger parts of large folios. As this is not the common +case, I'm not worrying about that right now. + +Future work is batching RMAP operations during swapout and folio +migration. + +[1] https://lore.kernel.org/all/20230809083256.699513-1-david@redhat.com/ +[2] https://lore.kernel.org/all/20231124132626.235350-1-david@redhat.com/ +[3] https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/pte-mapped-folio-benchmarks.c?ref_type=heads + + +This patch (of 18): + +Commit 53277bcf126d ("mm: support page_mapcount() on page_has_type() +pages") made it impossible to detect mapcount underflows by treating any +negative raw mapcount value as a mapcount of 0. + +We perform such underflow checks in zap_present_folio_ptes() and +zap_huge_pmd(), which would currently no longer trigger. + +Let's check against PAGE_MAPCOUNT_RESERVE instead by using +page_type_has_type(), like page_has_type() would, so we can still catch +some underflows. + +Link: https://lkml.kernel.org/r/20240409192301.907377-1-david@redhat.com +Link: https://lkml.kernel.org/r/20240409192301.907377-2-david@redhat.com +Fixes: 53277bcf126d ("mm: support page_mapcount() on page_has_type() pages") +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/mm-debug-print-only-page-mapcount-excluding-folio-entire-mapcount-in-__dump_folio.txt b/txt/mm-debug-print-only-page-mapcount-excluding-folio-entire-mapcount-in-__dump_folio.txt new file mode 100644 index 000000000..cfde676b5 --- /dev/null +++ b/txt/mm-debug-print-only-page-mapcount-excluding-folio-entire-mapcount-in-__dump_folio.txt @@ -0,0 +1,30 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/debug: print only page mapcount (excluding folio entire mapcount) in __dump_folio() +Date: Tue, 9 Apr 2024 21:23:00 +0200 + +Let's simplify and only print the page mapcount: we already print the +large folio mapcount and the entire folio mapcount for large folios +separately; that should be sufficient to figure out what's happening. + +While at it, print the page mapcount also if it had an underflow, +filtering out only typed pages. + +Link: https://lkml.kernel.org/r/20240409192301.907377-18-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/mm-filemap-use-folio_mapcount-in-filemap_unaccount_folio.txt b/txt/mm-filemap-use-folio_mapcount-in-filemap_unaccount_folio.txt new file mode 100644 index 000000000..0f01fd5c2 --- /dev/null +++ b/txt/mm-filemap-use-folio_mapcount-in-filemap_unaccount_folio.txt @@ -0,0 +1,31 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/filemap: use folio_mapcount() in filemap_unaccount_folio() +Date: Tue, 9 Apr 2024 21:22:56 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. + +Let's use folio_mapcount() instead of filemap_unaccount_folio(). + +No functional change intended, because we're only dealing with small +folios. + +Link: https://lkml.kernel.org/r/20240409192301.907377-14-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/mm-huge_memory-use-folio_mapcount-in-zap_huge_pmd-sanity-check.txt b/txt/mm-huge_memory-use-folio_mapcount-in-zap_huge_pmd-sanity-check.txt new file mode 100644 index 000000000..3a51914aa --- /dev/null +++ b/txt/mm-huge_memory-use-folio_mapcount-in-zap_huge_pmd-sanity-check.txt @@ -0,0 +1,32 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/huge_memory: use folio_mapcount() in zap_huge_pmd() sanity check +Date: Tue, 9 Apr 2024 21:22:51 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. Let's similarly check for folio_mapcount() +underflows instead of page_mapcount() underflows like we do in +zap_present_folio_ptes() now. + +Instead of the VM_BUG_ON(), we should actually be doing something like +print_bad_pte(). For now, let's keep it simple and use WARN_ON_ONCE(), +performing that check independently of DEBUG_VM. + +Link: https://lkml.kernel.org/r/20240409192301.907377-9-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/mm-improve-folio_likely_mapped_shared-using-the-mapcount-of-large-folios.txt b/txt/mm-improve-folio_likely_mapped_shared-using-the-mapcount-of-large-folios.txt new file mode 100644 index 000000000..2d4ab923a --- /dev/null +++ b/txt/mm-improve-folio_likely_mapped_shared-using-the-mapcount-of-large-folios.txt @@ -0,0 +1,34 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm: improve folio_likely_mapped_shared() using the mapcount of large folios +Date: Tue, 9 Apr 2024 21:22:48 +0200 + +We can now read the mapcount of large folios very efficiently. Use it to +improve our handling of partially-mappable folios, falling back to making +a guess only in case the folio is not "obviously mapped shared". + +We can now better detect partially-mappable folios where the first page is +not mapped as "mapped shared", reducing "false negatives"; but false +negatives are still possible. + +While at it, fixup a wrong comment (false positive vs. false negative) +for KSM folios. + +Link: https://lkml.kernel.org/r/20240409192301.907377-6-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node-v2.txt b/txt/mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node-v2.txt new file mode 100644 index 000000000..cb24a05fb --- /dev/null +++ b/txt/mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node-v2.txt @@ -0,0 +1,15 @@ +From: Baoquan He <bhe@redhat.com> +Subject: mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node-v2 +Date: Wed, 10 Apr 2024 11:35:29 +0800 + +redo code comments, per Mike + +As Mike suggested, the old code comments above the 'continue' statement is +still useful for easier understanding code and system behaviour. So +rephrase and move them above line 'if (pgdat->node_present_pages)'. +Thanks to Mike. + +Link: https://lkml.kernel.org/r/ZhYJAVQRYJSTKZng@MiWiFi-R3L-srv +Signed-off-by: Baoquan He <bhe@redhat.com> +Cc: Mel Gorman <mgorman@suse.de> +Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> diff --git a/txt/mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node.txt b/txt/mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node.txt index 8432ced96..5417640d0 100644 --- a/txt/mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node.txt +++ b/txt/mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node.txt @@ -9,8 +9,10 @@ memory-less node. The 'continue;' statement inside for_each_node() loop of free_area_init() is gilding the lily. Here, remove the special handling to make memory-less node share the same -code flow as normal node. And the code comment above the 'continue' -statement is not needed either. +code flow as normal node. + +And also rephrase the code comments above the 'continue' statement +and move them above above line 'if (pgdat->node_present_pages)'. Link: https://lkml.kernel.org/r/20240326061134.1055295-3-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> diff --git a/txt/mm-make-folio_mapcount-return-0-for-small-typed-folios.txt b/txt/mm-make-folio_mapcount-return-0-for-small-typed-folios.txt new file mode 100644 index 000000000..27fe7737b --- /dev/null +++ b/txt/mm-make-folio_mapcount-return-0-for-small-typed-folios.txt @@ -0,0 +1,29 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm: make folio_mapcount() return 0 for small typed folios +Date: Tue, 9 Apr 2024 21:22:49 +0200 + +We already handle it properly for large folios. Let's also return "0" for +small typed folios, like page_mapcount() currently would. + +Consequently, folio_mapcount() will never return negative values for typed +folios, but may return negative values for underflows. + +Link: https://lkml.kernel.org/r/20240409192301.907377-7-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/mm-memory-failure-use-folio_mapcount-in-hwpoison_user_mappings.txt b/txt/mm-memory-failure-use-folio_mapcount-in-hwpoison_user_mappings.txt new file mode 100644 index 000000000..7cb884cce --- /dev/null +++ b/txt/mm-memory-failure-use-folio_mapcount-in-hwpoison_user_mappings.txt @@ -0,0 +1,28 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/memory-failure: use folio_mapcount() in hwpoison_user_mappings() +Date: Tue, 9 Apr 2024 21:22:52 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. We can only unmap full folios; page_mapped(), which +we check here, is translated to folio_mapped() -- based on +folio_mapcount(). So let's print the folio mapcount instead. + +Link: https://lkml.kernel.org/r/20240409192301.907377-10-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/mm-memory-use-folio_mapcount-in-zap_present_folio_ptes.txt b/txt/mm-memory-use-folio_mapcount-in-zap_present_folio_ptes.txt new file mode 100644 index 000000000..e84035f3c --- /dev/null +++ b/txt/mm-memory-use-folio_mapcount-in-zap_present_folio_ptes.txt @@ -0,0 +1,38 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/memory: use folio_mapcount() in zap_present_folio_ptes() +Date: Tue, 9 Apr 2024 21:22:50 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. In zap_present_folio_ptes(), let's simply check the +folio mapcount(). If there is some issue, it will underflow at some point +either way when unmapping. + +As indicated already in commit 10ebac4f95e7 ("mm/memory: optimize +unmap/zap with PTE-mapped THP"), we already documented "If we ever have a +cheap folio_mapcount(), we might just want to check for underflows +there.". + +There is no change for small folios. For large folios, we'll now catch +more underflows when batch-unmapping, because instead of only testing the +mapcount of the first subpage, we'll test if the folio mapcount +underflows. + +Link: https://lkml.kernel.org/r/20240409192301.907377-8-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/mm-migrate-use-folio_likely_mapped_shared-in-add_page_for_migration.txt b/txt/mm-migrate-use-folio_likely_mapped_shared-in-add_page_for_migration.txt new file mode 100644 index 000000000..edcca30ba --- /dev/null +++ b/txt/mm-migrate-use-folio_likely_mapped_shared-in-add_page_for_migration.txt @@ -0,0 +1,32 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/migrate: use folio_likely_mapped_shared() in add_page_for_migration() +Date: Tue, 9 Apr 2024 21:22:54 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. In add_page_for_migration(), we actually want to +check if the folio is mapped shared, to reject such folios. So let's use +folio_likely_mapped_shared() instead. + +For small folios, fully mapped THP, and hugetlb folios, there is no change. +For partially mapped, shared THP, we should now do a better job at +rejecting such folios. + +Link: https://lkml.kernel.org/r/20240409192301.907377-12-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/mm-migrate_device-use-folio_mapcount-in-migrate_vma_check_page.txt b/txt/mm-migrate_device-use-folio_mapcount-in-migrate_vma_check_page.txt new file mode 100644 index 000000000..b131cce38 --- /dev/null +++ b/txt/mm-migrate_device-use-folio_mapcount-in-migrate_vma_check_page.txt @@ -0,0 +1,32 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/migrate_device: use folio_mapcount() in migrate_vma_check_page() +Date: Tue, 9 Apr 2024 21:22:57 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. Let's convert migrate_vma_check_page() to work on a +folio internally so we can remove the page_mapcount() usage. + +Note that we reject any large folios. + +There is a lot more folio conversion to be had, but that has to wait for +another day. No functional change intended. + +Link: https://lkml.kernel.org/r/20240409192301.907377-15-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/mm-page_alloc-use-folio_mapped-in-__alloc_contig_migrate_range.txt b/txt/mm-page_alloc-use-folio_mapped-in-__alloc_contig_migrate_range.txt new file mode 100644 index 000000000..caaa36374 --- /dev/null +++ b/txt/mm-page_alloc-use-folio_mapped-in-__alloc_contig_migrate_range.txt @@ -0,0 +1,39 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/page_alloc: use folio_mapped() in __alloc_contig_migrate_range() +Date: Tue, 9 Apr 2024 21:22:53 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. + +For tracing purposes, we use page_mapcount() in +__alloc_contig_migrate_range(). Adding that mapcount to total_mapped +sounds strange: total_migrated and total_reclaimed would count each page +only once, not multiple times. + +But then, isolate_migratepages_range() adds each folio only once to the +list. So for large folios, we would query the mapcount of the first page +of the folio, which doesn't make too much sense for large folios. + +Let's simply use folio_mapped() * folio_nr_pages(), which makes more sense +as nr_migratepages is also incremented by the number of pages in the folio +in case of successful migration. + +Link: https://lkml.kernel.org/r/20240409192301.907377-11-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/mm-pass-vma-instead-of-mm-to-follow_pte.txt b/txt/mm-pass-vma-instead-of-mm-to-follow_pte.txt new file mode 100644 index 000000000..9cf97d088 --- /dev/null +++ b/txt/mm-pass-vma-instead-of-mm-to-follow_pte.txt @@ -0,0 +1,22 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm: pass VMA instead of MM to follow_pte() +Date: Wed, 10 Apr 2024 17:55:26 +0200 + +... and centralize the VM_IO/VM_PFNMAP sanity check in there. We'll +now also perform these sanity checks for direct follow_pte() +invocations. + +For generic_access_phys(), we might now check multiple times: nothing to +worry about, really. + +Link: https://lkml.kernel.org/r/20240410155527.474777-3-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Acked-by: Sean Christopherson <seanjc@google.com> [KVM] +Cc: Alex Williamson <alex.williamson@redhat.com> +Cc: Christoph Hellwig <hch@lst.de> +Cc: Fei Li <fei1.li@intel.com> +Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> +Cc: Heiko Carstens <hca@linux.ibm.com> +Cc: Ingo Molnar <mingo@redhat.com> +Cc: Paolo Bonzini <pbonzini@redhat.com> +Cc: Yonghua Huang <yonghua.huang@intel.com> diff --git a/txt/mm-rmap-add-fast-path-for-small-folios-when-adding-removing-duplicating.txt b/txt/mm-rmap-add-fast-path-for-small-folios-when-adding-removing-duplicating.txt new file mode 100644 index 000000000..b2c5ab42c --- /dev/null +++ b/txt/mm-rmap-add-fast-path-for-small-folios-when-adding-removing-duplicating.txt @@ -0,0 +1,29 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/rmap: add fast-path for small folios when adding/removing/duplicating +Date: Tue, 9 Apr 2024 21:22:46 +0200 + +Let's add a fast-path for small folios to all relevant rmap functions. +Note that only RMAP_LEVEL_PTE applies. + +This is a preparation for tracking the mapcount of large folios in a +single value. + +Link: https://lkml.kernel.org/r/20240409192301.907377-4-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/mm-rmap-always-inline-anon-file-rmap-duplication-of-a-single-pte.txt b/txt/mm-rmap-always-inline-anon-file-rmap-duplication-of-a-single-pte.txt new file mode 100644 index 000000000..5acda0735 --- /dev/null +++ b/txt/mm-rmap-always-inline-anon-file-rmap-duplication-of-a-single-pte.txt @@ -0,0 +1,30 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm/rmap: always inline anon/file rmap duplication of a single PTE +Date: Tue, 9 Apr 2024 21:22:45 +0200 + +As we grow the code, the compiler might make stupid decisions and +unnecessarily degrade fork() performance. Let's make sure to always +inline functions that operate on a single PTE so the compiler will always +optimize out the loop and avoid a function call. + +This is a preparation for maintining a total mapcount for large folios. + +Link: https://lkml.kernel.org/r/20240409192301.907377-3-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/mm-track-mapcount-of-large-folios-in-single-value.txt b/txt/mm-track-mapcount-of-large-folios-in-single-value.txt new file mode 100644 index 000000000..46b5579f9 --- /dev/null +++ b/txt/mm-track-mapcount-of-large-folios-in-single-value.txt @@ -0,0 +1,64 @@ +From: David Hildenbrand <david@redhat.com> +Subject: mm: track mapcount of large folios in single value +Date: Tue, 9 Apr 2024 21:22:47 +0200 + +Let's track the mapcount of large folios in a single value. The mapcount +of a large folio currently corresponds to the sum of the entire mapcount +and all page mapcounts. + +This sum is what we actually want to know in folio_mapcount() and it is +also sufficient for implementing folio_mapped(). + +With PTE-mapped THP becoming more important and more widely used, we want +to avoid looping over all pages of a folio just to obtain the mapcount of +large folios. The comment "In the common case, avoid the loop when no +pages mapped by PTE" in folio_total_mapcount() does no longer hold for +mTHP that are always mapped by PTE. + +Further, we are planning on using folio_mapcount() more frequently, and +might even want to remove page mapcounts for large folios in some kernel +configs. Therefore, allow for reading the mapcount of large folios +efficiently and atomically without looping over any pages. + +Maintain the mapcount also for hugetlb pages for simplicity. Use the new +mapcount to implement folio_mapcount() and folio_mapped(). Make +page_mapped() simply call folio_mapped(). We can now get rid of +folio_large_is_mapped(). + +_nr_pages_mapped is now only used in rmap code and for debugging purposes. +Keep folio_nr_pages_mapped() around, but document that its use should be +limited to rmap internals and debugging purposes. + +This change implies one additional atomic add/sub whenever +mapping/unmapping (parts of) a large folio. + +As we now batch RMAP operations for PTE-mapped THP during fork(), during +unmap/zap, and when PTE-remapping a PMD-mapped THP, and we adjust the +large mapcount for a PTE batch only once, the added overhead in the common +case is small. Only when unmapping individual pages of a large folio +(e.g., during COW), the overhead might be bigger in comparison, but it's +essentially one additional atomic operation. + +Note that before the new mapcount would overflow, already our refcount +would overflow: each mapping requires a folio reference. Extend the +focumentation of folio_mapcount(). + +Link: https://lkml.kernel.org/r/20240409192301.907377-5-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/mmswap-add-document-about-rcu-read-lock-and-swapoff-interaction.txt b/txt/mmswap-add-document-about-rcu-read-lock-and-swapoff-interaction.txt index 8cb9cce6f..c04738c45 100644 --- a/txt/mmswap-add-document-about-rcu-read-lock-and-swapoff-interaction.txt +++ b/txt/mmswap-add-document-about-rcu-read-lock-and-swapoff-interaction.txt @@ -14,6 +14,6 @@ Link: https://lkml.kernel.org/r/20240407065450.498821-1-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> -Cc: Miaohe Lin <linmiaohe@huawei.com> +Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: Minchan Kim <minchan@kernel.org> diff --git a/txt/nilfs2-add-kernel-doc-comments-to-nilfs_btree_convert_and_insert.txt b/txt/nilfs2-add-kernel-doc-comments-to-nilfs_btree_convert_and_insert.txt new file mode 100644 index 000000000..291436c25 --- /dev/null +++ b/txt/nilfs2-add-kernel-doc-comments-to-nilfs_btree_convert_and_insert.txt @@ -0,0 +1,10 @@ +From: Yang Li <yang.lee@linux.alibaba.com> +Subject: nilfs2: add kernel-doc comments to nilfs_btree_convert_and_insert() +Date: Wed, 10 Apr 2024 16:56:28 +0900 + +This commit adds kernel-doc style comments with complete parameter +descriptions for the function nilfs_btree_convert_and_insert. + +Link: https://lkml.kernel.org/r/20240410075629.3441-3-konishi.ryusuke@gmail.com +Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> +Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> diff --git a/txt/nilfs2-add-kernel-doc-comments-to-nilfs_do_roll_forward.txt b/txt/nilfs2-add-kernel-doc-comments-to-nilfs_do_roll_forward.txt new file mode 100644 index 000000000..ca176590d --- /dev/null +++ b/txt/nilfs2-add-kernel-doc-comments-to-nilfs_do_roll_forward.txt @@ -0,0 +1,13 @@ +From: Yang Li <yang.lee@linux.alibaba.com> +Subject: nilfs2: add kernel-doc comments to nilfs_do_roll_forward() +Date: Wed, 10 Apr 2024 16:56:27 +0900 + +Patch series "nilfs2: fix missing kernel-doc comments". + +This commit adds kernel-doc style comments with complete parameter +descriptions for the function nilfs_do_roll_forward. + +Link: https://lkml.kernel.org/r/20240410075629.3441-1-konishi.ryusuke@gmail.com +Link: https://lkml.kernel.org/r/20240410075629.3441-2-konishi.ryusuke@gmail.com +Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> +Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> diff --git a/txt/nilfs2-add-kernel-doc-comments-to-nilfs_remove_all_gcinodes.txt b/txt/nilfs2-add-kernel-doc-comments-to-nilfs_remove_all_gcinodes.txt new file mode 100644 index 000000000..1b79f5900 --- /dev/null +++ b/txt/nilfs2-add-kernel-doc-comments-to-nilfs_remove_all_gcinodes.txt @@ -0,0 +1,10 @@ +From: Yang Li <yang.lee@linux.alibaba.com> +Subject: nilfs2: add kernel-doc comments to nilfs_remove_all_gcinodes() +Date: Wed, 10 Apr 2024 16:56:29 +0900 + +This commit adds kernel-doc style comments with complete parameter +descriptions for the function nilfs_remove_all_gcinodes. + +Link: https://lkml.kernel.org/r/20240410075629.3441-4-konishi.ryusuke@gmail.com +Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> +Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> diff --git a/txt/arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.txt b/txt/old/arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.txt index 9767e155b..9767e155b 100644 --- a/txt/arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.txt +++ b/txt/old/arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.txt diff --git a/txt/arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.txt b/txt/old/arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.txt index 578205409..578205409 100644 --- a/txt/arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.txt +++ b/txt/old/arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.txt diff --git a/txt/sh-mm-cache-use-folio_mapped-in-copy_from_user_page.txt b/txt/sh-mm-cache-use-folio_mapped-in-copy_from_user_page.txt new file mode 100644 index 000000000..c813e2160 --- /dev/null +++ b/txt/sh-mm-cache-use-folio_mapped-in-copy_from_user_page.txt @@ -0,0 +1,33 @@ +From: David Hildenbrand <david@redhat.com> +Subject: sh/mm/cache: use folio_mapped() in copy_from_user_page() +Date: Tue, 9 Apr 2024 21:22:55 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. + +We're already using folio_mapped in copy_user_highpage() and +copy_to_user_page() for a similar purpose so ... let's also simply use it +for copy_from_user_page(). + +There is no change for small folios. Likely we won't stumble over many +large folios on sh in that code either way. + +Link: https://lkml.kernel.org/r/20240409192301.907377-13-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/trace-events-page_ref-trace-the-raw-page-mapcount-value.txt b/txt/trace-events-page_ref-trace-the-raw-page-mapcount-value.txt new file mode 100644 index 000000000..e2e590c49 --- /dev/null +++ b/txt/trace-events-page_ref-trace-the-raw-page-mapcount-value.txt @@ -0,0 +1,32 @@ +From: David Hildenbrand <david@redhat.com> +Subject: trace/events/page_ref: trace the raw page mapcount value +Date: Tue, 9 Apr 2024 21:22:58 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. We already trace raw page->refcount, raw +page->flags and raw page->mapping, and don't involve any folios. Let's +also trace the raw mapcount value that does not consider the entire +mapcount of large folios, and we don't add "1" to it. + +When dealing with typed folios, this makes a lot more sense. ... and +it's for debugging purposes only either way. + +Link: https://lkml.kernel.org/r/20240409192301.907377-16-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> diff --git a/txt/xtensa-mm-convert-check_tlb_entry-to-sanity-check-folios.txt b/txt/xtensa-mm-convert-check_tlb_entry-to-sanity-check-folios.txt new file mode 100644 index 000000000..a307ce0bc --- /dev/null +++ b/txt/xtensa-mm-convert-check_tlb_entry-to-sanity-check-folios.txt @@ -0,0 +1,35 @@ +From: David Hildenbrand <david@redhat.com> +Subject: xtensa/mm: convert check_tlb_entry() to sanity check folios +Date: Tue, 9 Apr 2024 21:22:59 +0200 + +We want to limit the use of page_mapcount() to the places where it is +absolutely necessary. So let's convert check_tlb_entry() to perform +sanity checks on folios instead of pages. + +This essentially already happened: page_count() is mapped to +folio_ref_count(), and page_mapped() to folio_mapped() internally. +However, we would have printed the page_mapount(), which does not really +match what page_mapped() would have checked. + +Let's simply print the folio mapcount to avoid using page_mapcount(). For +small folios there is no change. + +Link: https://lkml.kernel.org/r/20240409192301.907377-17-david@redhat.com +Signed-off-by: David Hildenbrand <david@redhat.com> +Cc: Chris Zankel <chris@zankel.net> +Cc: Hugh Dickins <hughd@google.com> +Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> +Cc: Jonathan Corbet <corbet@lwn.net> +Cc: Matthew Wilcox (Oracle) <willy@infradead.org> +Cc: Max Filippov <jcmvbkbc@gmail.com> +Cc: Miaohe Lin <linmiaohe@huawei.com> +Cc: Muchun Song <muchun.song@linux.dev> +Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> +Cc: Peter Xu <peterx@redhat.com> +Cc: Richard Chang <richardycc@google.com> +Cc: Rich Felker <dalias@libc.org> +Cc: Ryan Roberts <ryan.roberts@arm.com> +Cc: Yang Shi <shy828301@gmail.com> +Cc: Yin Fengwei <fengwei.yin@intel.com> +Cc: Yoshinori Sato <ysato@users.sourceforge.jp> +Cc: Zi Yan <ziy@nvidia.com> |