Commit graph

4845 commits

Author SHA1 Message Date
Doug Moore a880104a21 swap_pager: add new page range struct
Define a page_range struct to pair up the two values passed to
freerange functions. Have swp_pager_freeswapspace also take a
page_range argument rather than a pair of arguments.

In swp_pager_meta_free_all, drop a needless test and use a new
helper function to do the cleanup for each swap block.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D45562
2024-06-11 22:54:39 -05:00
Doug Moore dd0e5c02ab swap_pager: small improvement to find_least
Drop an unneeded test, a branch and a needless computation to save a
few instructions.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D45558
2024-06-11 11:36:23 -05:00
Ryan Libby 1b13e36fcc vm_page_insert: use pctrie combined insert/lookup
This reduces work done under vm_page_insert for large objects.

Reviewed by:	alc, dougm, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D45486
2024-06-06 10:26:50 -07:00
Ryan Libby 7658d1532c vm_radix: define vm_radix_insert_lookup_lt and use in vm_page_rename
Use the new pctrie combined lookup/insert.  This is an easy application
of the new facility.  There are other places where we do this for pages
that may need more plumbing to use combined lookup/insert.

Reviewed by:	kib (previous version), dougm, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D45396
2024-06-06 10:26:50 -07:00
Konstantin Belousov 9c5d7e4a0c pmap: move the smp_targeted_tlb_shutdown pointer stuff to amd64 pmap.h
Fixes:	bec000c9c1ef409989685bb03ff0532907befb4aESC
Sponsored by:	The FreeBSD Foundation
2024-06-06 08:15:08 +03:00
Alan Cox 60847070f9 vm: Eliminate a redundant call to vm_reserv_break_all()
When vm_object_collapse() was changed in commit 98087a0 to call
vm_object_terminate(), rather than destroying the object directly, its
call to vm_reserv_break_all() should have been removed, as
vm_object_terminate() calls vm_reserv_break_all().

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45495
2024-06-05 12:39:47 -05:00
Souradeep Chakrabarti bec000c9c1 amd64: add a func pointer to tlb shootdown function
Make the tlb shootdown function as a pointer. By default, it still
points to the system function smp_targeted_tlb_shootdown(). It allows
other implemenations to overwrite in the future.

Reviewed by:	kib
Tested by:	whu
Authored-by:    Souradeep Chakrabarti <schakrabarti@microsoft.com>
Co-Authored-by: Erni Sri Satya Vennela <ernis@microsoft.com>
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D45174
2024-06-05 12:25:05 +00:00
Doug Moore 543d55d791 vm_phys: use ilog2(x) instead of fls(x)-1
One of these changes saves two instructions on an amd64
GENERIC-NODEBUG build. The rest are entirely cosmetic, because the
compiler can deduce that x is nonzero, and avoid the needless test.

Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D45331
2024-06-04 13:07:07 -05:00
Alan Cox f1d73aacdc pmap: Skip some superpage promotion attempts that will fail
Implement a simple heuristic to skip pointless promotion attempts by
pmap_enter_quick_locked() and moea64_enter().  Specifically, when
vm_fault() calls pmap_enter_quick() to map neighboring pages at the end
of a copy-on-write fault, there is no point in attempting promotion in
pmap_enter_quick_locked() and moea64_enter().  Promotion will fail
because the base pages have differing protection.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D45431
MFC after:	1 week
2024-06-04 00:38:05 -05:00
Doug Moore e3537f9235 Revert "subr_pctrie: use ilog2(x) instead of fls(x)-1"
This reverts commit 574ef65069.
2024-06-03 13:07:42 -05:00
Doug Moore 574ef65069 subr_pctrie: use ilog2(x) instead of fls(x)-1
In three instances where fls(x)-1 is used, the compiler does not know
that x is nonzero and so adds needless zero checks.  Using ilog(x)
instead saves, in each instance, about 4 instructions, including a
conditional, and 16 or so bytes, on an amd64 build.

Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D45330
2024-06-03 12:45:45 -05:00
Mitchell Horne deab57178f Adjust comments referencing vm_mem_init()
I cannot find a time where the function was not named this.

Reviewed by:	kib, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D45383
2024-05-27 18:37:40 -03:00
Ryan Libby 9c975a0d90 pbuf_ctor(): Stop using LK_NOWAIT, use LK_NOWITNESS
The LK_NOWAIT was added to suppress a witness warning, but LK_NOWITNESS
is more what we mean.  This makes pbuf_ctor() more consistent with
buf_alloc(), although, unlike buf_alloc(), for pbuf there should not be
any danger of a wild locker relying on the type stability of the buf to
attempt a lock.  That is, this is essentially cosmetic.

Relevant history:
 - 531f8cfea0 Use dedicated lock name for pbufs
 - 5875b94c74 buf_alloc(): lock the buffer with LK_NOWAIT
 - c9e023541a pbuf_ctor(): lock the buffer with LK_NOWAIT
 - 1fb00c8f10 buf_alloc(): Stop using LK_NOWAIT, use LK_NOWITNESS

Reviewed by:	rew, kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D45360
2024-05-26 10:20:52 -07:00
Bojan Novković d25ed65043 uma: Fix improper uses of UMA_MD_SMALL_ALLOC
UMA_MD_SMALL_ALLOC was recently replaced by UMA_USE_DMAP, but
da76d349b6 missed some improper uses of the old symbol.
This change makes sure that UMA_USE_DMAP is used properly in
code that selects uma_small_alloc.

Fixes: da76d349b6
Reported by: eduardo, rlibby
Approved by: markj (mentor)
Differential Revision:	https://reviews.freebsd.org/D45368
2024-05-26 07:27:37 +02:00
Bojan Novković 0a44b8a56d vm: Simplify startup page dumping conditional
This commit introduces the MINIDUMP_STARTUP_PAGE_TRACKING symbol and
uses it to simplify several instances of a complex preprocessor conditional
for adding pages allocated when bootstraping the kernel to minidumps.

Reviewed by:	markj, mhorne
Approved by:	markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45085
2024-05-25 19:24:55 +02:00
Bojan Novković da76d349b6 uma: Deduplicate uma_small_alloc
This commit refactors the UMA small alloc code and
removes most UMA machine-dependent code.
The existing machine-dependent uma_small_alloc code is almost identical
across all architectures, except for powerpc where using the direct
map addresses involved extra steps in some cases.

The MI/MD split was replaced by a default uma_small_alloc
implementation that can be overridden by architecture-specific code by
defining the UMA_MD_SMALL_ALLOC symbol. Furthermore, UMA_USE_DMAP was
introduced to replace most UMA_MD_SMALL_ALLOC uses.

Reviewed by: markj, kib
Approved by: markj (mentor)
Differential Revision:	https://reviews.freebsd.org/D45084
2024-05-25 19:24:46 +02:00
Ryan Libby a216e311a7 vm_pageout_scan_inactive: take a lock break
In vm_pageout_scan_inactive, release the object lock when we go to
refill the scan batch queue so that someone else has a chance to acquire
it.  This improves access latency to the object when the pagedaemon is
processing many consecutive pages from a single object, and also in any
case avoids a hiccup during refill for the last touched object.

Reviewed by:	alc, markj (previous version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D45288
2024-05-24 08:52:58 -07:00
Pawel Jakub Dawidek 56a8aca83a Stop treating size 0 as unknown size in vnode_create_vobject().
Whenever file is created, the vnode_create_vobject() function will
try to determine its size by calling vn_getsize_locked() as size 0
is ambigious: it means either the file size is 0 or the file size
is unknown.

Introduce special value for the size argument: VNODE_NO_SIZE.
Only when it is given, the vnode_create_vobject() will try to obtain
file's size on its own.

Introduce dedicated vnode_disk_create_vobject() for use by
g_vfs_open(), so we don't have to call vn_isdisk() in the common case
(for regular files).

Handle the case of mediasize==0 in g_vfs_open().

Reviewed by: alc, kib, markj, olce
Approved by: oshogbo (mentor), allanjude (mentor)
Differential Revision: https://reviews.freebsd.org/D45244
2024-05-23 06:08:14 +00:00
Konstantin Belousov 6ada4e8a0a swap-like pagers: assert that writemapping decrease does not pass zero
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D45119
2024-05-13 21:33:29 +03:00
Konstantin Belousov e934040651 cdev_pager_allocate(): ensure that the cdev_pager_ops ctr is called only once
per allocated vm_object.  Otherwise, since constructors are not
idempotent, we e.g. leak device reference in case of non-managed pager.

PR:	278826
Reported by:	Austin Zhang <austin.zhang@dell.com>
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D45113
2024-05-12 04:13:00 +03:00
John Baldwin 9e0164087c vm: Change the return types of kernacc and useracc to bool
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D45156
2024-05-10 13:43:56 -07:00
Mark Johnston 661a83f9bf vm: Fix error handling in vm_thread_stack_back()
vm_object_page_remove() wants to busy the page, but that won't work
here.  (Kernel stack pages are always busy.)

Make the error handling path look more like vm_thread_stack_dispose().

Reported by:	pho
Reviewed by:	kib, bnovkov
Fixes:	7a79d06697 ("vm: improve kstack_object pindex calculation to avoid pindex holes")
Differential Revision:	https://reviews.freebsd.org/D45019
2024-04-30 09:45:48 -04:00
Mark Johnston 800da341bc thread: Simplify sanitizer integration with thread creation
fork() may allocate a new thread in one of two ways: from UMA, or cached
in a freed proc that was just allocated from UMA.  In either case, KASAN
and KMSAN need to initialize some state; in particular they need to
initialize the shadow mapping of the new thread's stack.

This is done differently between KASAN and KMSAN, which is confusing.
This patch improves things a bit:
- Add a new thread_recycle() function, which moves all kernel stack
  handling out of kern_fork.c, since it doesn't really belong there.
- Then, thread_alloc_stack() has only one local caller, so just inline
  it.
- Avoid redundant shadow stack initialization: thread_alloc()
  initializes the KMSAN shadow stack (via kmsan_thread_alloc()) even
  through vm_thread_new() already did that.
- Add kasan_thread_alloc(), for consistency with kmsan_thread_alloc().

No functional change intended.

Reviewed by:	khng
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44891
2024-04-22 11:46:59 -04:00
Bojan Novković 7a79d06697 vm: improve kstack_object pindex calculation to avoid pindex holes
This commit replaces the linear transformation of kernel virtual
addresses to kstack_object pindex values with a non-linear
scheme that circumvents physical memory fragmentation caused by
kernel stack guard pages. The new mapping scheme is used to
effectively "skip" guard pages and assign pindices for
non-guard pages in a contiguous fashion.

The new allocation scheme requires that all default-sized kstack KVAs
come from a separate, specially aligned region of the KVA space.
For this to work, this commited introduces a dedicated per-domain
kstack KVA arena used to allocate kernel stacks of default size.
The behaviour on 32-bit platforms remains unchanged due to a
significatly smaller KVA space.

Aside from fullfilling the requirements imposed by the new scheme, a
separate kstack KVA arena facilitates superpage promotion in the rest
of kernel and causes most kstacks to have guard pages at both ends.

Reviewed by:  alc, kib, markj
Tested by:    markj
Approved by:  markj (mentor)
Differential Revision: https://reviews.freebsd.org/D38852
2024-04-10 17:37:20 +02:00
Minsoo Choo 989a2cf19d vm_reserv_reclaim_contig: Return NULL not false
Reviewed by:	dougm, zlei
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44667
2024-04-10 08:50:16 -04:00
Stephen J. Kiernan cb20a74ca0 vm: add macro to mark arguments used when NUMA is defined
This fixes compiler warnings when -Wunused-arguments is enabled and
not quieted.

Reviewed by:	kib, markj
Obtained from:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D44623
2024-04-09 10:23:47 -04:00
Mark Johnston 4696650782 swap_pager: Unbusy readahead pages after an I/O error
The swap pager itself allocates readahead pages, so should take care to
unbusy them after a read error, just as it does in the non-error case.

PR:		277538
Reviewed by:	olce, dougm, alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44646
2024-04-08 09:02:48 -04:00
Doug Moore 1526667bc6 vm_reserv: Add vm_reserv_is_populated
Add a function to check whether an aligned block of vm pages are
allocated, for use with impending changes to arm64 superpage
managment.

Reviewed by:	alc
Differential Revision:	http://reviews.freebsd.org/D44575
2024-04-07 12:28:52 -05:00
John Baldwin 1f1b2286fd pmap: Convert boolean_t to bool.
Reviewed by:	kib (older version)
Differential Revision:	https://reviews.freebsd.org/D39921
2024-01-31 14:48:26 -08:00
Konstantin Belousov 38f5f2a4af sysctl vm.objects/vm.swap_objects: do not fill vnode info if jailed
Reported by:	Shawn Webb via markj
Reviewed by:	jhb, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-01-16 22:15:39 +02:00
Konstantin Belousov 69748e62e8 vm/vm_object.c: minor cleanup
Remove sys/cdefs.h and sys/socket.h includes.
Order sys/ includes alphabetically.
Do not check for NULL before free().

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
DIfferential revision:	https://reviews.freebsd.org/D43444
2024-01-13 18:45:53 +02:00
Konstantin Belousov b068bb09a1 Add vnode_pager_clean_{a,}sync(9)
Bump __FreeBSD_version for ZFS use.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D43356
2024-01-11 18:44:53 +02:00
Konstantin Belousov ed1a88a311 vnode_pager_generic_putpages(): rename maxblksz local to max_offset
Requested by:	markj
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D43358
2024-01-11 11:49:37 +02:00
Konstantin Belousov bdb46c21a3 vnode_pager_generic_putpages(): correctly handle clean block at EOF
The loop 'skip clean blocks' checking for the clean blocks in the dirty
pages might end up setting the in_hole to true when exactly at EOF at
the middle of the block, without advancing the prev_offset value. Then
the next block is not dirty, and next_offset is clipped back to poffset
+ maxsize, equal to prev_offset, failing the assertion.

Instead of asserting prev_offset < next_offset, we must skip the write.

Reported by:	asomers
PR:	276191
Reviewed by:	alc, markj
Tested by:	asomers
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D43358
2024-01-11 11:49:37 +02:00
Jason A. Harmening 10f2e94acc vm_page_reclaim_contig(): update comment to chase recent changes
Commit 2619c5ccfe ("Avoid waiting on physical allocations that can't
possibly be satisfied") changed the return value from bool to errno.
Adjust the function description to match reality.
2024-01-02 15:39:36 -06:00
Jason A. Harmening 0ee1cd6da9 vm_page.h: tweak page-busied assertion macros
Fix incorrect macro name and include the value of curthread in the
panic message where relevant.
2023-12-23 23:20:13 -06:00
Jason A. Harmening 2619c5ccfe Avoid waiting on physical allocations that can't possibly be satisfied
- Change vm_page_reclaim_contig[_domain] to return an errno instead
  of a boolean.  0 indicates a successful reclaim, ENOMEM indicates
  lack of available memory to reclaim, with any other error (currently
  only ERANGE) indicating that reclamation is impossible for the
  specified address range.  Change all callers to only follow
  up with vm_page_wait* in the ENOMEM case.

- Introduce vm_domainset_iter_ignore(), which marks the specified
  domain as unavailable for further use by the iterator.  Use this
  function to ignore domains that can't possibly satisfy a physical
  allocation request.  Since WAITOK allocations run the iterators
  repeatedly, this avoids the possibility of infinitely spinning
  in domain iteration if no available domain can satisfy the
  allocation request.

PR:		274252
Reported by:	kevans
Tested by:	kevans
Reviewed by:	markj
Differential Revision: https://reviews.freebsd.org/D42706
2023-12-23 23:01:40 -06:00
Doug Moore 6dd15b7a23 vm_phys; fix uncalled free_contig
Function vm_phys_free_contig does not always free memory properly when
the npages parameter is less than max block size.  Change it so that it does.

Note that this function is not currently invoked, and this error was
not triggered in earlier versions of the code.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D42891
2023-12-20 21:37:47 -06:00
Pawel Jakub Dawidek 6f3e9bac4d vm: Plug umtx shm object leak.
Reviewed by:	kib
Approved by:	oshogbo
MFC after:	1 week
Sponsored by:	Fudo Security
Differential Revision:	https://reviews.freebsd.org/D43073
2023-12-16 05:18:36 -08:00
Brooks Davis 7893419d49 Remove never implemented sbrk and sstk syscalls
Both system calls were stubs returning EOPNOTSUPP and libc did not
provide _ or __sys_ prefixed symbols.  The actual implementation of
sbrk(2) is on top of the undocumented break(2) system call.

Technically this is a change in ABI, but no non-contrived program ever
called these syscalls.

Reviewed by:	kib, emaste
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D42872
2023-12-04 20:36:08 +00:00
Andrew Turner 839999e7ef vm: Add kva_alloc_aligned
Add a function like kva_alloc that allows us to specify the alignment
of the virtual address space returned.

Reviewed by:	alc, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D42788
2023-11-30 10:50:03 +00:00
Andrew Turner 8daee410d2 vm: Use vmem_xalloc in kva_alloc
The kernel_arena used in kva_alloc has the qcache disabled. vmem_alloc
will first try to use the qcache before falling back to vmem_xalloc.

Rather than trying to use the qcache in vmem_alloc just call
vmem_xalloc directly.

Reviewed by:	alc, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D42831
2023-11-30 10:50:03 +00:00
Warner Losh fdafd315ad sys: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by:		Netflix
2023-11-26 22:24:00 -07:00
Warner Losh 29363fb446 sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by:		Netflix
2023-11-26 22:23:30 -07:00
Doug Moore 2a4897bd4e vm_phys: fix freelist_contig
vm_phys_find_freelist_contig is called to search a list of max-sized
free page blocks and find one that, when joined with adjacent blocks
in memory, can satisfy a request for a memory allocation bigger than
any single max-sized free page block. In commit
fa8a6585c7, I defined this function in
order to offer two improvements: 1) reduce the worst-case search time,
and 2) allow solutions that include less-than max-sized free page
blocks at the front or back of the giant allocation. However, it turns
out that this change introduced an error, reported in In Bug
274592. That error concerns failing to check segment boundaries. This
change fixes an error in vm_phys_find_freelist_config that resolves
that bug. It also abandons improvement 2), because the value of that
improvement is small and because preserving it would require more
testing than I am able to do.

PR:		274592
Reported by:	shafaisal.us@gmail.com
Reviewed by:	alc, markj
Tested by:	shafaisal.us@gmail.com
Fixes:	fa8a6585c7 vm_phys: avoid waste in multipage allocation
MFC after:	10 days
Differential Revision:	https://reviews.freebsd.org/D42509
2023-11-15 03:25:45 -06:00
Alexander Motin f0fa40867d Fix build on powerpc after previous commit. 2023-11-09 21:21:47 -05:00
Alexander Motin a03c23931e uma: Improve memory modified after free panic messages
- Pass zone pointer to trash_ctor() and report zone name in the panic
message.  It may be difficult to figyre out zone just by the item size.
 - Do not pass user arguments to internal trash calls, pass thezone.
 - Report malloc type name in the same unified panic message.
 - Report corruption offset from the beginning of the items instead of
the full pointer.  It makes panic message shorter and more readable.
2023-11-09 19:46:26 -05:00
Alexander Motin 7c566d6cfc uma: Micro-optimize memory trashing
Use u_long for memory accesses instead of uint32_t.  On my tests on
amd64 this by ~30% reduces time spent in those functions thanks to
bigger 64bit accesses.  i386 still uses 32bit accesses.

MFC after:	1 month
2023-11-09 13:07:46 -05:00
Bojan Novković e4078494f3 vm_fault: Revert commit 64087fd7f3
The underlying issue that originally triggered a kernel panic was
addressed and the fix was ported to all relevant pmaps, so the
safeguards placed in vm_fault.c can be removed now.

Reviewed by:	alc, kib, markj
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D42517
2023-11-09 10:14:05 -05:00
Olivier Certner 733e0abd28 uma: Permit specifying max of cache line and some custom alignment
To be used for structures for which we want to enforce that pointers to
them have some number of lower bits always set to 0, while still
ensuring we benefit from cache line alignment to avoid false sharing
between structures and fields within the structures (provided they are
properly ordered).

First candidate consumer that comes to mind is 'struct thread', see next
commit.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42265
2023-11-02 09:30:03 -04:00