Commit graph

4810 commits

Author SHA1 Message Date
Jason A. Harmening 0ee1cd6da9 vm_page.h: tweak page-busied assertion macros
Fix incorrect macro name and include the value of curthread in the
panic message where relevant.
2023-12-23 23:20:13 -06:00
Jason A. Harmening 2619c5ccfe Avoid waiting on physical allocations that can't possibly be satisfied
- Change vm_page_reclaim_contig[_domain] to return an errno instead
  of a boolean.  0 indicates a successful reclaim, ENOMEM indicates
  lack of available memory to reclaim, with any other error (currently
  only ERANGE) indicating that reclamation is impossible for the
  specified address range.  Change all callers to only follow
  up with vm_page_wait* in the ENOMEM case.

- Introduce vm_domainset_iter_ignore(), which marks the specified
  domain as unavailable for further use by the iterator.  Use this
  function to ignore domains that can't possibly satisfy a physical
  allocation request.  Since WAITOK allocations run the iterators
  repeatedly, this avoids the possibility of infinitely spinning
  in domain iteration if no available domain can satisfy the
  allocation request.

PR:		274252
Reported by:	kevans
Tested by:	kevans
Reviewed by:	markj
Differential Revision: https://reviews.freebsd.org/D42706
2023-12-23 23:01:40 -06:00
Doug Moore 6dd15b7a23 vm_phys; fix uncalled free_contig
Function vm_phys_free_contig does not always free memory properly when
the npages parameter is less than max block size.  Change it so that it does.

Note that this function is not currently invoked, and this error was
not triggered in earlier versions of the code.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D42891
2023-12-20 21:37:47 -06:00
Pawel Jakub Dawidek 6f3e9bac4d vm: Plug umtx shm object leak.
Reviewed by:	kib
Approved by:	oshogbo
MFC after:	1 week
Sponsored by:	Fudo Security
Differential Revision:	https://reviews.freebsd.org/D43073
2023-12-16 05:18:36 -08:00
Brooks Davis 7893419d49 Remove never implemented sbrk and sstk syscalls
Both system calls were stubs returning EOPNOTSUPP and libc did not
provide _ or __sys_ prefixed symbols.  The actual implementation of
sbrk(2) is on top of the undocumented break(2) system call.

Technically this is a change in ABI, but no non-contrived program ever
called these syscalls.

Reviewed by:	kib, emaste
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D42872
2023-12-04 20:36:08 +00:00
Andrew Turner 839999e7ef vm: Add kva_alloc_aligned
Add a function like kva_alloc that allows us to specify the alignment
of the virtual address space returned.

Reviewed by:	alc, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D42788
2023-11-30 10:50:03 +00:00
Andrew Turner 8daee410d2 vm: Use vmem_xalloc in kva_alloc
The kernel_arena used in kva_alloc has the qcache disabled. vmem_alloc
will first try to use the qcache before falling back to vmem_xalloc.

Rather than trying to use the qcache in vmem_alloc just call
vmem_xalloc directly.

Reviewed by:	alc, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D42831
2023-11-30 10:50:03 +00:00
Warner Losh fdafd315ad sys: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by:		Netflix
2023-11-26 22:24:00 -07:00
Warner Losh 29363fb446 sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by:		Netflix
2023-11-26 22:23:30 -07:00
Doug Moore 2a4897bd4e vm_phys: fix freelist_contig
vm_phys_find_freelist_contig is called to search a list of max-sized
free page blocks and find one that, when joined with adjacent blocks
in memory, can satisfy a request for a memory allocation bigger than
any single max-sized free page block. In commit
fa8a6585c7, I defined this function in
order to offer two improvements: 1) reduce the worst-case search time,
and 2) allow solutions that include less-than max-sized free page
blocks at the front or back of the giant allocation. However, it turns
out that this change introduced an error, reported in In Bug
274592. That error concerns failing to check segment boundaries. This
change fixes an error in vm_phys_find_freelist_config that resolves
that bug. It also abandons improvement 2), because the value of that
improvement is small and because preserving it would require more
testing than I am able to do.

PR:		274592
Reported by:	shafaisal.us@gmail.com
Reviewed by:	alc, markj
Tested by:	shafaisal.us@gmail.com
Fixes:	fa8a6585c7 vm_phys: avoid waste in multipage allocation
MFC after:	10 days
Differential Revision:	https://reviews.freebsd.org/D42509
2023-11-15 03:25:45 -06:00
Alexander Motin f0fa40867d Fix build on powerpc after previous commit. 2023-11-09 21:21:47 -05:00
Alexander Motin a03c23931e uma: Improve memory modified after free panic messages
- Pass zone pointer to trash_ctor() and report zone name in the panic
message.  It may be difficult to figyre out zone just by the item size.
 - Do not pass user arguments to internal trash calls, pass thezone.
 - Report malloc type name in the same unified panic message.
 - Report corruption offset from the beginning of the items instead of
the full pointer.  It makes panic message shorter and more readable.
2023-11-09 19:46:26 -05:00
Alexander Motin 7c566d6cfc uma: Micro-optimize memory trashing
Use u_long for memory accesses instead of uint32_t.  On my tests on
amd64 this by ~30% reduces time spent in those functions thanks to
bigger 64bit accesses.  i386 still uses 32bit accesses.

MFC after:	1 month
2023-11-09 13:07:46 -05:00
Bojan Novković e4078494f3 vm_fault: Revert commit 64087fd7f3
The underlying issue that originally triggered a kernel panic was
addressed and the fix was ported to all relevant pmaps, so the
safeguards placed in vm_fault.c can be removed now.

Reviewed by:	alc, kib, markj
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D42517
2023-11-09 10:14:05 -05:00
Olivier Certner 733e0abd28 uma: Permit specifying max of cache line and some custom alignment
To be used for structures for which we want to enforce that pointers to
them have some number of lower bits always set to 0, while still
ensuring we benefit from cache line alignment to avoid false sharing
between structures and fields within the structures (provided they are
properly ordered).

First candidate consumer that comes to mind is 'struct thread', see next
commit.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42265
2023-11-02 09:30:03 -04:00
Olivier Certner 87090f5e5a uma: New check_align_mask(): Validate alignments (INVARIANTS)
New function check_align_mask() asserts (under INVARIANTS) that the mask
fits in a (signed) integer (see the comment) and that the corresponding
alignment is a power of two.

Use check_align_mask() in uma_set_align_mask() and also in uma_zcreate()
to replace the KASSERT() there (that was checking only for a power of
2).

Reviewed by:            kib, markj
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42263
2023-11-02 09:30:03 -04:00
Olivier Certner 3d8f548b9e uma: Make the cache alignment mask unsigned
In uma_set_align_mask(), ensure that the passed value doesn't have its
highest bit set, which would lead to problems since keg/zone alignment
is internally stored as signed integers.  Such big values do not make
sense anyway and indicate some programming error.  A future commit will
introduce checks for this case and other ones.

Reviewed by:            kib, markj
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42262
2023-11-02 09:30:03 -04:00
Olivier Certner e557eafe72 uma: UMA_ALIGN_CACHE: Resolve the proper value at use point
Having a special value of -1 that is resolved internally to
'uma_align_cache' provides no significant advantages and prevents
changing that variable to an unsigned type, which is natural for an
alignment mask.  So suppress it and replace its use with a call to
uma_get_align_mask().  The small overhead of the added function call is
irrelevant since UMA_ALIGN_CACHE is only used when creating new zones,
which is not performance critical.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42259
2023-11-02 09:30:03 -04:00
Olivier Certner dc8f7692fd uma: Hide 'uma_align_cache'; Create/rename accessors
Create the uma_get_cache_align_mask() accessor and put it in a separate
private header so as to minimize namespace pollution in header/source
files that need only this function and not the whole 'uma.h' header.

Make sure the accessors have '_mask' as a suffix, so that callers are
aware that the real alignment is the power of two that is the mask plus
one.  Rename the stem to something more explicit.  Rename
uma_set_cache_align_mask()'s single parameter to 'mask'.

Hide 'uma_align_cache' to ensure that it cannot be set in any other way
then by a call to uma_set_cache_align_mask(), which will perform sanity
checks in a further commit.  While here, rename it to
'uma_cache_align_mask'.

This is also in preparation for some further changes, such as improving
the sanity checks, eliminating internal resolving of UMA_ALIGN_CACHE and
changing the type of the 'uma_cache_align_mask' variable.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42258
2023-11-02 09:30:03 -04:00
Gordon Bergling fc9f1d2c63 uma.h: Fix a typo in a source code comment
- s/setable/settable/

MFC after:	3 days
2023-10-15 14:09:21 +02:00
Zhenlei Huang c415cfc8be vm_phys: Add corresponding sysctl knob for loader tunable
The loader tunable 'vm.numa.disabled' does not have corresponding sysctl
MIB entry. Add it so that it can be retrieved, and `sysctl -T` will also
report it correctly.

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42138
2023-10-12 18:14:49 +08:00
Zhenlei Huang a55fbda874 vm_page: Add corresponding sysctl knob for loader tunable
The loader tunable 'vm.pgcache_zone_max_pcpu' does not have corresponding
sysctl MIB entry. Add it so that it can be retrieved, and `sysctl -T`
will also report it correctly.

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42138
2023-10-12 18:14:49 +08:00
Mark Johnston e61568aeee swap_pager: Fix a race in swap_pager_swapoff_object()
When we disable swapping to a device, we scan the full VM object list
looking for objects with swap trie nodes that reference the device in
question.  The pages corresponding to those nodes are paged in.

While paging in, we drop the VM object lock.  Moreover, we do not hold a
reference for the object; swap_pager_swapoff_object() merely bumps the
paging-in-progress counter.  vm_object_terminate() waits for this
counter to drain before proceeding and freeing pages.

However, swap_pager_swapoff_object() decrements the counter before
re-acquiring the VM object lock, which means that vm_object_terminate()
can race to acquire the lock and free the pages.  Then,
swap_pager_swapoff_object() ends up unbusying a freed page.  Fix the
problem by acquiring the lock before waking up sleepers.

PR:		273610
Reported by:	Graham Perrin <grahamperrin@gmail.com>
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42029
2023-10-02 07:49:52 -04:00
Doug Moore 10db91ecec vm_radix: add a missing paren
429c871ddd left parens unbalanced in a
powerpc case that my testing missed.  Restore balance.

Reported by:	jenkins
2023-09-12 04:19:51 -05:00
Doug Moore 429c871ddd radix_trie: have vm_radix use pctrie code
Implement everything currently in vm_radix.c with calls to functions
in subr_pctrie.c, asccessed via the interface provided by the
DEFINE_PCTRIE_SMR macro.

Add back some #includes removed in the first attempt, and avoid the
use of a discontinued type in a bit of conditionally compiled code.

Reviewed by:	alc, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D41344
2023-09-12 02:42:38 -05:00
Doug Moore 6cec93da46 Revert "radix_trie: have vm_radix use pctrie code"
This reverts commit a494d30465.
2023-09-11 03:35:36 -05:00
Doug Moore a494d30465 radix_trie: have vm_radix use pctrie code
Implement everything currently in vm_radix.c with calls to functions
in subr_pctrie.c, asccessed via the interface provided by the
DEFINE_PCTRIE_SMR macro.

Reviewed by:	alc, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D41344
2023-09-11 01:53:40 -05:00
Konstantin Belousov 8882b7852a add pmap_active_cpus()
For amd64, i386, arm, and riscv, i.e. all architectures except arm64,
the custom implementation is provided since we maintain the bitmask of
active CPUs anyway.

Arm64 uses somewhat naive iteration over CPUs and match current vmspace'
pmap with the argument. It is not guaranteed that vmspace->pmap is the
same as the active pmap, but the inaccuracy should be toleratable.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32360
2023-08-23 03:02:21 +03:00
Konstantin Belousov 5f452214f2 vm_map.c: fix syntax
Fixes:	c718009884
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-08-18 16:37:16 +03:00
Konstantin Belousov c718009884 vm_map.c: plug several more places which might modify entry->offset
for the GUARD entries protecting stacks gaps.

syzkaller: https://syzkaller.appspot.com/bug?extid=c325d6a75e4fd0a68714
Reviewed by:	dougm, markj (previous version)
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41475
2023-08-18 15:43:35 +03:00
Warner Losh 685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh 2ff63af9b8 sys: Remove $FreeBSD$: one-line .h pattern
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:18 -06:00
Warner Losh 95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Dmitry Chagin f3e11927dc vm: Allow MAP_32BIT for all architectures
Reviewed by:		alc, kib, markj
Differential revision:	https://reviews.freebsd.org/D41435
2023-08-14 20:20:20 +03:00
Dmitry Chagin 0ddd32b617 vm: MAP_32BIT_MAX_ADDR defined in sys/mman.h
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D41434
2023-08-14 20:18:30 +03:00
Alan Cox 37e5d49e1e vm: Fix address hints of 0 with MAP_32BIT
Also, rename min_addr to default_addr, which better reflects what it
represents.  The min_addr is not a minimum address in the same way that
max_addr is actually a maximum address that can be allocated.  For
example, a non-zero hint can be less than min_addr and be allocated.

Reported by:	dchagin
Reviewed by:	dchagin, kib, markj
Fixes:	d8e6f4946c "vm: Fix anonymous memory clustering under ASLR"
Differential Revision:	https://reviews.freebsd.org/D41397
2023-08-12 02:35:21 -05:00
Konstantin Belousov 9b65fa6940 linuxolator: implement Linux' PROT_GROWSDOWN
From the Linux man page for mprotect(2):
   PROT_GROWSDOWN
       Apply  the  protection  mode  down to the beginning of a mapping
       that grows downward (which should be a stack segment or a
       segment mapped with the MAP_GROWSDOWN flag set).

Reported by:	dchagin
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:14 +03:00
Konstantin Belousov 90049eabcf vm_map_protect(): add VM_MAP_PROTECT_GROWSDOWN flag
which requests to propagate lowest stack segment protection to the grow gap.
This seems to be required for Linux emulation.

Reported by:	dchagin
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:14 +03:00
Konstantin Belousov b6037edbd1 vm_map_growstack(): restore stack gap data if gap entry was removed
and then restored.

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov 9d7ea6cff7 vm_map: do not allow to merge stack gap entries
At least, offset handling is wrong for them.

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov 55be6be12c vm_map_protect(): handle stack protection stored in the stack guard
mprotect(2) on the stack region needs to adjust guard stored protection,
so that e.g. enable executing on stack worked properly on stack growth.

Reported by:	dchagin
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov 79169929f0 vm_map_protect(): move guard handling at the last phase into an empty dedicated helper
Restructure the first phase slightly, to facilitate further changes.

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov aa928a5216 vm_map_growstack(): handle max protection for stacks
Do not assume that protection is same as max_protection.  Store both in
offset, packed in the same way as the prot syscall parameter.

Reviewed by:	alc, markj (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov 0fb6aae7f0 vm_map.c: add CONTAINS_BITS macro
Suggested by:	dougm
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov ba41b0de3e Add vm_map_insert1(9)
The function returns the newly created entry.
Use vm_map_insert1() in stack grow code to avoid gap entry re-lookup.

The comment update for vm_map_try_merge_entries() was suggested by dougm.

Suggested by:	alc
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov 3b44ee50be vm_map_insert(): update herald comment
Only a part of the object may be mapped.

Noted by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov 9da33e8d10 Update comment describing struct vm_map
There is no list connecting all entries any more, and correspondingly no
order on the list entries.

Reviewed by:	dougm
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D41405
2023-08-10 09:01:26 +03:00
Doug Moore e77f4e7f59 vm_phys: tune vm_phys_enqueue_contig loop
Rewrite the final loop in vm_phys_enqueue_contig as a new function,
vm_phys_enq_beg, to reduce amd64 code size.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D41289
2023-08-04 21:09:39 -05:00
Doug Moore ccdb28275d vm_phys_enq_range: no alignment assert for npages==0
Do not assume that when vm_phys_enq_range is passed npages==0 that the
vm_page argument is valid in any way, much less that it has a
page-aligned address. Just don't look at it. Assert nothing about it.

Reported by:	karels
Differential Revision:	https://reviews.freebsd.org/D41317
2023-08-04 13:41:59 -05:00
Doug Moore c9b06fa527 vm_phys_enqueue_contig: handle npages==0
By letting vm_phys_enqueue_contig handle the case when npages == 0,
the callers can stop checking it, and the compiler can stop
zero-checking with every call to ffs(). Letting vm_phys_enqueue_contig
call vm_phys_enqueue_contig for part of its work also saves a few
bytes.

The amd64 object code shrinks by 128 bytes.

Reviewed by:	kib (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D41154
2023-08-03 09:19:48 -05:00