linux/arch/arm64/mm/mteswap.c

131 lines
2.5 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0-only
#include <linux/pagemap.h>
#include <linux/xarray.h>
#include <linux/slab.h>
#include <linux/swap.h>
#include <linux/swapops.h>
#include <asm/mte.h>
static DEFINE_XARRAY(mte_pages);
void *mte_allocate_tag_storage(void)
{
/* tags granule is 16 bytes, 2 tags stored per byte */
return kmalloc(MTE_PAGE_TAG_STORAGE, GFP_KERNEL);
}
void mte_free_tag_storage(char *storage)
{
kfree(storage);
}
int mte_save_tags(struct page *page)
{
void *tag_storage, *ret;
arm64: mte: Fix/clarify the PG_mte_tagged semantics Currently the PG_mte_tagged page flag mostly means the page contains valid tags and it should be set after the tags have been cleared or restored. However, in mte_sync_tags() it is set before setting the tags to avoid, in theory, a race with concurrent mprotect(PROT_MTE) for shared pages. However, a concurrent mprotect(PROT_MTE) with a copy on write in another thread can cause the new page to have stale tags. Similarly, tag reading via ptrace() can read stale tags if the PG_mte_tagged flag is set before actually clearing/restoring the tags. Fix the PG_mte_tagged semantics so that it is only set after the tags have been cleared or restored. This is safe for swap restoring into a MAP_SHARED or CoW page since the core code takes the page lock. Add two functions to test and set the PG_mte_tagged flag with acquire and release semantics. The downside is that concurrent mprotect(PROT_MTE) on a MAP_SHARED page may cause tag loss. This is already the case for KVM guests if a VMM changes the page protection while the guest triggers a user_mem_abort(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [pcc@google.com: fix build with CONFIG_ARM64_MTE disabled] Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221104011041.290951-3-pcc@google.com
2022-11-04 01:10:35 +00:00
if (!page_mte_tagged(page))
return 0;
tag_storage = mte_allocate_tag_storage();
if (!tag_storage)
return -ENOMEM;
mte_save_page_tags(page_address(page), tag_storage);
mm/swap: stop using page->private on tail pages for THP_SWAP Patch series "mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups". This series stops using page->private on tail pages for THP_SWAP, replaces folio->private by folio->swap for swapcache folios, and starts using "new_folio" for tail pages that we are splitting to remove the usage of page->private for swapcache handling completely. This patch (of 4): Let's stop using page->private on tail pages, making it possible to just unconditionally reuse that field in the tail pages of large folios. The remaining usage of the private field for THP_SWAP is in the THP splitting code (mm/huge_memory.c), that we'll handle separately later. Update the THP_SWAP documentation and sanity checks in mm_types.h and __split_huge_page_tail(). [david@redhat.com: stop using page->private on tail pages for THP_SWAP] Link: https://lkml.kernel.org/r/6f0a82a3-6948-20d9-580b-be1dbf415701@redhat.com Link: https://lkml.kernel.org/r/20230821160849.531668-1-david@redhat.com Link: https://lkml.kernel.org/r/20230821160849.531668-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 16:08:46 +00:00
/* lookup the swap entry.val from the page */
ret = xa_store(&mte_pages, page_swap_entry(page).val, tag_storage,
GFP_KERNEL);
if (WARN(xa_is_err(ret), "Failed to store MTE tags")) {
mte_free_tag_storage(tag_storage);
return xa_err(ret);
} else if (ret) {
/* Entry is being replaced, free the old entry */
mte_free_tag_storage(ret);
}
return 0;
}
void mte_restore_tags(swp_entry_t entry, struct page *page)
{
void *tags = xa_load(&mte_pages, entry.val);
if (!tags)
return;
if (try_page_mte_tagging(page)) {
arm64: mte: Avoid setting PG_mte_tagged if no tags cleared or restored Prior to commit 69e3b846d8a7 ("arm64: mte: Sync tags for pages where PTE is untagged"), mte_sync_tags() was only called for pte_tagged() entries (those mapped with PROT_MTE). Therefore mte_sync_tags() could safely use test_and_set_bit(PG_mte_tagged, &page->flags) without inadvertently setting PG_mte_tagged on an untagged page. The above commit was required as guests may enable MTE without any control at the stage 2 mapping, nor a PROT_MTE mapping in the VMM. However, the side-effect was that any page with a PTE that looked like swap (or migration) was getting PG_mte_tagged set automatically. A subsequent page copy (e.g. migration) copied the tags to the destination page even if the tags were owned by KASAN. This issue was masked by the page_kasan_tag_reset() call introduced in commit e5b8d9218951 ("arm64: mte: reset the page tag in page->flags"). When this commit was reverted (20794545c146), KASAN started reporting access faults because the overriding tags in a page did not match the original page->flags (with CONFIG_KASAN_HW_TAGS=y): BUG: KASAN: invalid-access in copy_page+0x10/0xd0 arch/arm64/lib/copy_page.S:26 Read at addr f5ff000017f2e000 by task syz-executor.1/2218 Pointer tag: [f5], memory tag: [f2] Move the PG_mte_tagged bit setting from mte_sync_tags() to the actual place where tags are cleared (mte_sync_page_tags()) or restored (mte_restore_tags()). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: syzbot+c2c79c6d6eddc5262b77@syzkaller.appspotmail.com Fixes: 69e3b846d8a7 ("arm64: mte: Sync tags for pages where PTE is untagged") Cc: <stable@vger.kernel.org> # 5.14.x Cc: Steven Price <steven.price@arm.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/0000000000004387dc05e5888ae5@google.com/ Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20221006163354.3194102-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-10-06 16:33:54 +00:00
mte_restore_page_tags(page_address(page), tags);
set_page_mte_tagged(page);
}
}
void mte_invalidate_tags(int type, pgoff_t offset)
{
swp_entry_t entry = swp_entry(type, offset);
void *tags = xa_erase(&mte_pages, entry.val);
mte_free_tag_storage(tags);
}
arm64: mm: swap: support THP_SWAP on hardware with MTE Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with MTE as the MTE code works with the assumption tags save/restore is always handling a folio with only one page. The limitation should be removed as more and more ARM64 SoCs have this feature. Co-existence of MTE and THP_SWAP becomes more and more important. This patch makes MTE tags saving support large folios, then we don't need to split large folios into base pages for swapping out on ARM64 SoCs with MTE any more. arch_prepare_to_swap() should take folio rather than page as parameter because we support THP swap-out as a whole. It saves tags for all pages in a large folio. As now we are restoring tags based-on folio, in arch_swap_restore(), we may increase some extra loops and early-exitings while refaulting a large folio which is still in swapcache in do_swap_page(). In case a large folio has nr pages, do_swap_page() will only set the PTE of the particular page which is causing the page fault. Thus do_swap_page() runs nr times, and each time, arch_swap_restore() will loop nr times for those subpages in the folio. So right now the algorithmic complexity becomes O(nr^2). Once we support mapping large folios in do_swap_page(), extra loops and early-exitings will decrease while not being completely removed as a large folio might get partially tagged in corner cases such as, 1. a large folio in swapcache can be partially unmapped, thus, MTE tags for the unmapped pages will be invalidated; 2. users might use mprotect() to set MTEs on a part of a large folio. arch_thp_swp_supported() is dropped since ARM64 MTE was the only one who needed it. Link: https://lkml.kernel.org/r/20240322114136.61386-2-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Chris Li <chrisl@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-22 11:41:36 +00:00
static inline void __mte_invalidate_tags(struct page *page)
{
swp_entry_t entry = page_swap_entry(page);
mte_invalidate_tags(swp_type(entry), swp_offset(entry));
}
void mte_invalidate_tags_area(int type)
{
swp_entry_t entry = swp_entry(type, 0);
swp_entry_t last_entry = swp_entry(type + 1, 0);
void *tags;
XA_STATE(xa_state, &mte_pages, entry.val);
xa_lock(&mte_pages);
xas_for_each(&xa_state, tags, last_entry.val - 1) {
__xa_erase(&mte_pages, xa_state.xa_index);
mte_free_tag_storage(tags);
}
xa_unlock(&mte_pages);
}
arm64: mm: swap: support THP_SWAP on hardware with MTE Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with MTE as the MTE code works with the assumption tags save/restore is always handling a folio with only one page. The limitation should be removed as more and more ARM64 SoCs have this feature. Co-existence of MTE and THP_SWAP becomes more and more important. This patch makes MTE tags saving support large folios, then we don't need to split large folios into base pages for swapping out on ARM64 SoCs with MTE any more. arch_prepare_to_swap() should take folio rather than page as parameter because we support THP swap-out as a whole. It saves tags for all pages in a large folio. As now we are restoring tags based-on folio, in arch_swap_restore(), we may increase some extra loops and early-exitings while refaulting a large folio which is still in swapcache in do_swap_page(). In case a large folio has nr pages, do_swap_page() will only set the PTE of the particular page which is causing the page fault. Thus do_swap_page() runs nr times, and each time, arch_swap_restore() will loop nr times for those subpages in the folio. So right now the algorithmic complexity becomes O(nr^2). Once we support mapping large folios in do_swap_page(), extra loops and early-exitings will decrease while not being completely removed as a large folio might get partially tagged in corner cases such as, 1. a large folio in swapcache can be partially unmapped, thus, MTE tags for the unmapped pages will be invalidated; 2. users might use mprotect() to set MTEs on a part of a large folio. arch_thp_swp_supported() is dropped since ARM64 MTE was the only one who needed it. Link: https://lkml.kernel.org/r/20240322114136.61386-2-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Chris Li <chrisl@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-22 11:41:36 +00:00
int arch_prepare_to_swap(struct folio *folio)
{
long i, nr;
int err;
if (!system_supports_mte())
return 0;
nr = folio_nr_pages(folio);
for (i = 0; i < nr; i++) {
err = mte_save_tags(folio_page(folio, i));
if (err)
goto out;
}
return 0;
out:
while (i--)
__mte_invalidate_tags(folio_page(folio, i));
return err;
}
void arch_swap_restore(swp_entry_t entry, struct folio *folio)
{
long i, nr;
if (!system_supports_mte())
return;
nr = folio_nr_pages(folio);
for (i = 0; i < nr; i++) {
mte_restore_tags(entry, folio_page(folio, i));
entry.val++;
}
}