Define a page_range struct to pair up the two values passed to
freerange functions. Have swp_pager_freeswapspace also take a
page_range argument rather than a pair of arguments.
In swp_pager_meta_free_all, drop a needless test and use a new
helper function to do the cleanup for each swap block.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D45562
Drop an unneeded test, a branch and a needless computation to save a
few instructions.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D45558
This reduces work done under vm_page_insert for large objects.
Reviewed by: alc, dougm, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D45486
Use the new pctrie combined lookup/insert. This is an easy application
of the new facility. There are other places where we do this for pages
that may need more plumbing to use combined lookup/insert.
Reviewed by: kib (previous version), dougm, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D45396
When vm_object_collapse() was changed in commit 98087a0 to call
vm_object_terminate(), rather than destroying the object directly, its
call to vm_reserv_break_all() should have been removed, as
vm_object_terminate() calls vm_reserv_break_all().
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D45495
Make the tlb shootdown function as a pointer. By default, it still
points to the system function smp_targeted_tlb_shootdown(). It allows
other implemenations to overwrite in the future.
Reviewed by: kib
Tested by: whu
Authored-by: Souradeep Chakrabarti <schakrabarti@microsoft.com>
Co-Authored-by: Erni Sri Satya Vennela <ernis@microsoft.com>
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D45174
One of these changes saves two instructions on an amd64
GENERIC-NODEBUG build. The rest are entirely cosmetic, because the
compiler can deduce that x is nonzero, and avoid the needless test.
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D45331
Implement a simple heuristic to skip pointless promotion attempts by
pmap_enter_quick_locked() and moea64_enter(). Specifically, when
vm_fault() calls pmap_enter_quick() to map neighboring pages at the end
of a copy-on-write fault, there is no point in attempting promotion in
pmap_enter_quick_locked() and moea64_enter(). Promotion will fail
because the base pages have differing protection.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D45431
MFC after: 1 week
In three instances where fls(x)-1 is used, the compiler does not know
that x is nonzero and so adds needless zero checks. Using ilog(x)
instead saves, in each instance, about 4 instructions, including a
conditional, and 16 or so bytes, on an amd64 build.
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D45330
I cannot find a time where the function was not named this.
Reviewed by: kib, markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45383
The LK_NOWAIT was added to suppress a witness warning, but LK_NOWITNESS
is more what we mean. This makes pbuf_ctor() more consistent with
buf_alloc(), although, unlike buf_alloc(), for pbuf there should not be
any danger of a wild locker relying on the type stability of the buf to
attempt a lock. That is, this is essentially cosmetic.
Relevant history:
- 531f8cfea0 Use dedicated lock name for pbufs
- 5875b94c74 buf_alloc(): lock the buffer with LK_NOWAIT
- c9e023541a pbuf_ctor(): lock the buffer with LK_NOWAIT
- 1fb00c8f10 buf_alloc(): Stop using LK_NOWAIT, use LK_NOWITNESS
Reviewed by: rew, kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D45360
UMA_MD_SMALL_ALLOC was recently replaced by UMA_USE_DMAP, but
da76d349b6 missed some improper uses of the old symbol.
This change makes sure that UMA_USE_DMAP is used properly in
code that selects uma_small_alloc.
Fixes: da76d349b6
Reported by: eduardo, rlibby
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45368
This commit introduces the MINIDUMP_STARTUP_PAGE_TRACKING symbol and
uses it to simplify several instances of a complex preprocessor conditional
for adding pages allocated when bootstraping the kernel to minidumps.
Reviewed by: markj, mhorne
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45085
This commit refactors the UMA small alloc code and
removes most UMA machine-dependent code.
The existing machine-dependent uma_small_alloc code is almost identical
across all architectures, except for powerpc where using the direct
map addresses involved extra steps in some cases.
The MI/MD split was replaced by a default uma_small_alloc
implementation that can be overridden by architecture-specific code by
defining the UMA_MD_SMALL_ALLOC symbol. Furthermore, UMA_USE_DMAP was
introduced to replace most UMA_MD_SMALL_ALLOC uses.
Reviewed by: markj, kib
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45084
In vm_pageout_scan_inactive, release the object lock when we go to
refill the scan batch queue so that someone else has a chance to acquire
it. This improves access latency to the object when the pagedaemon is
processing many consecutive pages from a single object, and also in any
case avoids a hiccup during refill for the last touched object.
Reviewed by: alc, markj (previous version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D45288
Whenever file is created, the vnode_create_vobject() function will
try to determine its size by calling vn_getsize_locked() as size 0
is ambigious: it means either the file size is 0 or the file size
is unknown.
Introduce special value for the size argument: VNODE_NO_SIZE.
Only when it is given, the vnode_create_vobject() will try to obtain
file's size on its own.
Introduce dedicated vnode_disk_create_vobject() for use by
g_vfs_open(), so we don't have to call vn_isdisk() in the common case
(for regular files).
Handle the case of mediasize==0 in g_vfs_open().
Reviewed by: alc, kib, markj, olce
Approved by: oshogbo (mentor), allanjude (mentor)
Differential Revision: https://reviews.freebsd.org/D45244
per allocated vm_object. Otherwise, since constructors are not
idempotent, we e.g. leak device reference in case of non-managed pager.
PR: 278826
Reported by: Austin Zhang <austin.zhang@dell.com>
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D45113
vm_object_page_remove() wants to busy the page, but that won't work
here. (Kernel stack pages are always busy.)
Make the error handling path look more like vm_thread_stack_dispose().
Reported by: pho
Reviewed by: kib, bnovkov
Fixes: 7a79d06697 ("vm: improve kstack_object pindex calculation to avoid pindex holes")
Differential Revision: https://reviews.freebsd.org/D45019
fork() may allocate a new thread in one of two ways: from UMA, or cached
in a freed proc that was just allocated from UMA. In either case, KASAN
and KMSAN need to initialize some state; in particular they need to
initialize the shadow mapping of the new thread's stack.
This is done differently between KASAN and KMSAN, which is confusing.
This patch improves things a bit:
- Add a new thread_recycle() function, which moves all kernel stack
handling out of kern_fork.c, since it doesn't really belong there.
- Then, thread_alloc_stack() has only one local caller, so just inline
it.
- Avoid redundant shadow stack initialization: thread_alloc()
initializes the KMSAN shadow stack (via kmsan_thread_alloc()) even
through vm_thread_new() already did that.
- Add kasan_thread_alloc(), for consistency with kmsan_thread_alloc().
No functional change intended.
Reviewed by: khng
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44891
This commit replaces the linear transformation of kernel virtual
addresses to kstack_object pindex values with a non-linear
scheme that circumvents physical memory fragmentation caused by
kernel stack guard pages. The new mapping scheme is used to
effectively "skip" guard pages and assign pindices for
non-guard pages in a contiguous fashion.
The new allocation scheme requires that all default-sized kstack KVAs
come from a separate, specially aligned region of the KVA space.
For this to work, this commited introduces a dedicated per-domain
kstack KVA arena used to allocate kernel stacks of default size.
The behaviour on 32-bit platforms remains unchanged due to a
significatly smaller KVA space.
Aside from fullfilling the requirements imposed by the new scheme, a
separate kstack KVA arena facilitates superpage promotion in the rest
of kernel and causes most kstacks to have guard pages at both ends.
Reviewed by: alc, kib, markj
Tested by: markj
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D38852
This fixes compiler warnings when -Wunused-arguments is enabled and
not quieted.
Reviewed by: kib, markj
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D44623
The swap pager itself allocates readahead pages, so should take care to
unbusy them after a read error, just as it does in the non-error case.
PR: 277538
Reviewed by: olce, dougm, alc, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44646
Add a function to check whether an aligned block of vm pages are
allocated, for use with impending changes to arm64 superpage
managment.
Reviewed by: alc
Differential Revision: http://reviews.freebsd.org/D44575
Remove sys/cdefs.h and sys/socket.h includes.
Order sys/ includes alphabetically.
Do not check for NULL before free().
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
DIfferential revision: https://reviews.freebsd.org/D43444
The loop 'skip clean blocks' checking for the clean blocks in the dirty
pages might end up setting the in_hole to true when exactly at EOF at
the middle of the block, without advancing the prev_offset value. Then
the next block is not dirty, and next_offset is clipped back to poffset
+ maxsize, equal to prev_offset, failing the assertion.
Instead of asserting prev_offset < next_offset, we must skip the write.
Reported by: asomers
PR: 276191
Reviewed by: alc, markj
Tested by: asomers
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43358
Commit 2619c5ccfe ("Avoid waiting on physical allocations that can't
possibly be satisfied") changed the return value from bool to errno.
Adjust the function description to match reality.
- Change vm_page_reclaim_contig[_domain] to return an errno instead
of a boolean. 0 indicates a successful reclaim, ENOMEM indicates
lack of available memory to reclaim, with any other error (currently
only ERANGE) indicating that reclamation is impossible for the
specified address range. Change all callers to only follow
up with vm_page_wait* in the ENOMEM case.
- Introduce vm_domainset_iter_ignore(), which marks the specified
domain as unavailable for further use by the iterator. Use this
function to ignore domains that can't possibly satisfy a physical
allocation request. Since WAITOK allocations run the iterators
repeatedly, this avoids the possibility of infinitely spinning
in domain iteration if no available domain can satisfy the
allocation request.
PR: 274252
Reported by: kevans
Tested by: kevans
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D42706
Function vm_phys_free_contig does not always free memory properly when
the npages parameter is less than max block size. Change it so that it does.
Note that this function is not currently invoked, and this error was
not triggered in earlier versions of the code.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D42891
Both system calls were stubs returning EOPNOTSUPP and libc did not
provide _ or __sys_ prefixed symbols. The actual implementation of
sbrk(2) is on top of the undocumented break(2) system call.
Technically this is a change in ABI, but no non-contrived program ever
called these syscalls.
Reviewed by: kib, emaste
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D42872
Add a function like kva_alloc that allows us to specify the alignment
of the virtual address space returned.
Reviewed by: alc, kib, markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42788
The kernel_arena used in kva_alloc has the qcache disabled. vmem_alloc
will first try to use the qcache before falling back to vmem_xalloc.
Rather than trying to use the qcache in vmem_alloc just call
vmem_xalloc directly.
Reviewed by: alc, kib, markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42831
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.
Sponsored by: Netflix
vm_phys_find_freelist_contig is called to search a list of max-sized
free page blocks and find one that, when joined with adjacent blocks
in memory, can satisfy a request for a memory allocation bigger than
any single max-sized free page block. In commit
fa8a6585c7, I defined this function in
order to offer two improvements: 1) reduce the worst-case search time,
and 2) allow solutions that include less-than max-sized free page
blocks at the front or back of the giant allocation. However, it turns
out that this change introduced an error, reported in In Bug
274592. That error concerns failing to check segment boundaries. This
change fixes an error in vm_phys_find_freelist_config that resolves
that bug. It also abandons improvement 2), because the value of that
improvement is small and because preserving it would require more
testing than I am able to do.
PR: 274592
Reported by: shafaisal.us@gmail.com
Reviewed by: alc, markj
Tested by: shafaisal.us@gmail.com
Fixes: fa8a6585c7 vm_phys: avoid waste in multipage allocation
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D42509
- Pass zone pointer to trash_ctor() and report zone name in the panic
message. It may be difficult to figyre out zone just by the item size.
- Do not pass user arguments to internal trash calls, pass thezone.
- Report malloc type name in the same unified panic message.
- Report corruption offset from the beginning of the items instead of
the full pointer. It makes panic message shorter and more readable.
Use u_long for memory accesses instead of uint32_t. On my tests on
amd64 this by ~30% reduces time spent in those functions thanks to
bigger 64bit accesses. i386 still uses 32bit accesses.
MFC after: 1 month
The underlying issue that originally triggered a kernel panic was
addressed and the fix was ported to all relevant pmaps, so the
safeguards placed in vm_fault.c can be removed now.
Reviewed by: alc, kib, markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D42517
To be used for structures for which we want to enforce that pointers to
them have some number of lower bits always set to 0, while still
ensuring we benefit from cache line alignment to avoid false sharing
between structures and fields within the structures (provided they are
properly ordered).
First candidate consumer that comes to mind is 'struct thread', see next
commit.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42265