Commit graph

364 commits

Author SHA1 Message Date
Mark Johnston 69ccea1c89 vm_page: Let vm_page_init_page() take a pool parameter
This is useful for a subsequent patch which implements lazy
initialization of vm_page structures using a dedicate vm_phys free page
pool.

No functional change intended.

Reviewed by:	alc, kib, emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D40399
2024-06-13 21:18:59 -04:00
Stephen J. Kiernan cb20a74ca0 vm: add macro to mark arguments used when NUMA is defined
This fixes compiler warnings when -Wunused-arguments is enabled and
not quieted.

Reviewed by:	kib, markj
Obtained from:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D44623
2024-04-09 10:23:47 -04:00
Jason A. Harmening 0ee1cd6da9 vm_page.h: tweak page-busied assertion macros
Fix incorrect macro name and include the value of curthread in the
panic message where relevant.
2023-12-23 23:20:13 -06:00
Jason A. Harmening 2619c5ccfe Avoid waiting on physical allocations that can't possibly be satisfied
- Change vm_page_reclaim_contig[_domain] to return an errno instead
  of a boolean.  0 indicates a successful reclaim, ENOMEM indicates
  lack of available memory to reclaim, with any other error (currently
  only ERANGE) indicating that reclamation is impossible for the
  specified address range.  Change all callers to only follow
  up with vm_page_wait* in the ENOMEM case.

- Introduce vm_domainset_iter_ignore(), which marks the specified
  domain as unavailable for further use by the iterator.  Use this
  function to ignore domains that can't possibly satisfy a physical
  allocation request.  Since WAITOK allocations run the iterators
  repeatedly, this avoids the possibility of infinitely spinning
  in domain iteration if no available domain can satisfy the
  allocation request.

PR:		274252
Reported by:	kevans
Tested by:	kevans
Reviewed by:	markj
Differential Revision: https://reviews.freebsd.org/D42706
2023-12-23 23:01:40 -06:00
Warner Losh 29363fb446 sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by:		Netflix
2023-11-26 22:23:30 -07:00
Warner Losh 95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Doug Moore 9e81742892 vm_phys: add binary segment search
Replace several sequential searches for a segment that contains a
phyiscal address with a call to a function that does it by binary
search.  In vm_page_reclaim_contig_domain_ext, find the first segment
to reclaim from, and reclaim from each subsequent appropriate segment.
Eliminate vm_phys_scan_contig.

Reviewed by:	alc, markj
Differential Revision:	https://reviews.freebsd.org/D40058
2023-06-16 01:43:45 -05:00
Andrew Gallatin 8b0dafdb2f vm: implement vm_page_reclaim_contig_domain_ext()
Implement vm_page_reclaim_contig_domain_ext() to reclaim multiple
contiguous regions at once.  This makes it more efficient for users
that need multiple contiguous regions to reclaim those regions
efficiently.

This is needed because callers like ktls may need to reclaim many
contiguous regions, and each scan of physical memory can take
multiple seconds on a large memory machine (order of 100GB of
RMA).  Rather than modifying the core algorithm, I extended
vm_page_reclaim_contig_domain() to take a "desired_runs" argument to
allow the caller to request that it reclaim more than just a single
run. There is no functional change intended for all existing
callers.

The first user for this interface is the ktls code
(https://reviews.freebsd.org/D39421). By reclaiming multiple runs,
ktls goes from consuming hours of CPU to refill its buffer zone to
just seconds or minutes.

Differential Revision: https://reviews.freebsd.org/D39739
Sponsored by:	Netflix
Reviewed by:	alc, jhb, markj
2023-05-09 13:09:34 -04:00
Konstantin Belousov 934bfc128e Add vm_page_any_valid()
Use it and several other vm_page_*_valid() functions in more places.

Suggested and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37024
2022-10-19 20:24:07 +03:00
Konstantin Belousov d950c5898a vm/vm_extern.h, vm/vm_page.h: use sys/kassert.h
instead of fatty sys/systm.h.

Suggested by:	jhb
Reviewed by:	alc, imp, jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34089
2022-02-01 05:55:35 +02:00
Mark Johnston a2665158d0 vm_page: Remove vm_page_sbusy() and vm_page_xbusy()
They are unused today and cannot be safely used in the face of unlocked
lookup, in which pages may be busied without the object lock held.

Obtained from:	jeff (object_concurrency patches)
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D32948
2021-11-15 13:01:30 -05:00
Mark Johnston 87b646630c vm_page: Consolidate page busy sleep mechanisms
- Modify vm_page_busy_sleep() and vm_page_busy_sleep_unlocked() to take
  a VM_ALLOC_* flag indicating whether to sleep on shared-busy, and fix
  up callers.
- Modify vm_page_busy_sleep() to return a status indicating whether the
  object lock was dropped, and fix up callers.
- Convert callers of vm_page_sleep_if_busy() to use vm_page_busy_sleep()
  instead.
- Remove vm_page_sleep_if_(x)busy().

No functional change intended.

Obtained from:	jeff (object_concurrency patches)
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D32947
2021-11-15 13:01:30 -05:00
Mark Johnston a9d6f1fe0a Remove some remaining references to VM_ALLOC_NOOBJ
Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32037
2021-10-19 21:22:56 -04:00
Mark Johnston c40cf9bc62 vm_page: Stop handling VM_ALLOC_NOOBJ in vm_page_alloc_domain_after()
This makes the allocator simpler since it can assume object != NULL.
Also modify the function to unconditionally preserve PG_ZERO, so
VM_ALLOC_ZERO is effectively ignored (and still must be implemented by
the caller for now).

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32033
2021-10-19 21:22:56 -04:00
Mark Johnston 92db9f3bb7 Introduce vm_page_alloc_noobj_contig()
This is the same as vm_page_alloc_noobj(), but allocates physically
contiguous runs of memory.  For now it is implemented in terms of
vm_page_alloc_contig(), with the difference that
vm_page_alloc_noobj_contig() implements VM_ALLOC_ZERO by zeroing the
page.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32005
2021-10-19 21:22:56 -04:00
Mark Johnston b498f71bc5 vm_page: Add a new page allocator interface for unnamed pages
The diff adds vm_page_alloc_noobj() and vm_page_alloc_noobj_domain().
These mostly correspond to vm_page_alloc() and vm_page_alloc_domain()
when no VM object is specified, with the exception that they handle
VM_ALLOC_ZERO by zeroing the page, rather than by preserving PG_ZERO.

This simplifies callers and will permit simplification of the
vm_page_alloc_domain() definition.

Since the new allocator variant is similar to vm_page_alloc_freelist(),
implement both of them using a common backend allocator function.  No
functional change intended.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31985
2021-10-19 21:22:55 -04:00
Konstantin Belousov 5b10e79edb Un-staticise vm_page_init_page()
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30785
2021-06-17 16:58:44 +03:00
Ryan Stone 660344ca44 Add a VM flag to prevent reclaim on a failed contig allocation
If a M_WAITOK contig alloc fails, the VM subsystem will try to
reclaim contiguous memory twice before actually failing the
request.  On a system with 64GB of RAM I've observed this take
400-500ms before it finally gives up, and I believe that this
will only be worse on systems with even more memory.

In certain contexts this delay is extremely harmful, so add a flag
that will skip reclaim for allocation requests to allow those
paths to opt-out of doing an expensive reclaim.

Sponsored by: Dell Inc
Differential Revision:	https://reviews.freebsd.org/D28422
Reviewed by: markj, kib
2021-02-03 16:16:51 -05:00
Mark Johnston 431fb8abd7 vm_phys: Try to clean up NUMA KPIs
It can useful for code outside the VM system to look up the NUMA domain
of a page backing a virtual or physical address, specifically when
creating NUMA-aware data structures.  We have _vm_phys_domain() for
this, but the leading underscore implies that it's an internal function,
and vm_phys.h has dependencies on a number of other headers.

Rename vm_phys_domain() to vm_page_domain(), and _vm_phys_domain() to
vm_phys_domain().  Make the latter an inline function.

Add _vm_phys.h and define struct vm_phys_seg there so that it's easier
to use in other headers.  Include it from vm_page.h so that
vm_page_domain() can be defined there.

Include machine/vmparam.h from _vm_phys.h since it depends directly on
some constants defined there.

Reviewed by:	alc
Reviewed by:	dougm, kib (earlier versions)
Differential Revision:	https://reviews.freebsd.org/D27207
2020-11-19 03:59:21 +00:00
Konstantin Belousov 6f3b523c9a Avoid dump_avail[] redefinition.
Move dump_avail[] extern declaration and inlines into a new header
vm/vm_dumpset.h.  This fixes default gcc build for mips.

Reviewed by:	alc, scottph
Tested by:	kevans (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26741
2020-10-14 22:51:40 +00:00
Bryan Drewery c2c6fb90e0 Use unlocked page lookup for inmem() to avoid object lock contention
Reviewed By:	kib, markj
Submitted by:	mlaier
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D26653
2020-10-09 23:49:42 +00:00
Konstantin Belousov 42f96162c3 vm_page_dump_index_to_pa(): Add braces to the expression involving + and &.
The precedence of the '&' operator is less than of '+'.  Added braces
do change the order of evaluation into the natural one, in my opinion.
On the other hand, the value of the expression should not change since
all elements should have page-aligned values.

This fixes a gcc warning reported.

Reported by:	adrian
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-10-08 22:46:15 +00:00
D Scott Phillips 00e6614750 Sparsify the vm_page_dump bitmap
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).

Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.

Reviewed by:	jhb
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26131
2020-09-21 22:21:59 +00:00
D Scott Phillips ab041f713a Move vm_page_dump bitset array definition to MI code
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.

Reviewed by:	kib, markj
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26129
2020-09-21 22:20:37 +00:00
Conrad Meyer 0292c54bdb Add support for multithreading the inactive queue pageout within a domain.
In very high throughput workloads, the inactive scan can become overwhelmed
as you have many cores producing pages and a single core freeing.  Since
Mark's introduction of batched pagequeue operations, we can now run multiple
inactive threads working on independent batches.

To avoid confusing the pid and other control algorithms, I (Jeff) do this in
a mpi-like fan out and collect model that is driven from the primary page
daemon.  It decides whether the shortfall can be overcome with a single
thread and if not dispatches multiple threads and waits for their results.

The heuristic is based on timing the pageout activity and averaging a
pages-per-second variable which is exponentially decayed. This is visible in
sysctl and may be interesting for other purposes.

I (Jeff) have verified that this does indeed double our paging throughput
when used with two threads. With four we tend to run into other contention
problems.  For now I would like to commit this infrastructure with only a
single thread enabled.

The number of worker threads per domain can be controlled with the
'vm.pageout_threads_per_domain' tunable.

Submitted by:	jeff (earlier version)
Discussed with:	markj
Tested by:	pho
Sponsored by:	probably Netflix (based on contemporary commits)
Differential Revision:	https://reviews.freebsd.org/D21629
2020-08-11 20:37:45 +00:00
Mark Johnston efec381dd1 Remove most lingering references to the page lock in comments.
Finish updating comments to reflect new locking protocols introduced
over the past year.  In particular, vm_page_lock is now effectively
unused.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25868
2020-08-04 14:59:43 +00:00
Mark Johnston 958d8f527c Remove the volatile qualifier from busy_lock.
Use atomic(9) to load the lock state.  Some places were doing this
already, so it was inconsistent.  In initialization code, the lock state
is still initialized with plain stores.

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25861
2020-07-29 19:38:49 +00:00
Mark Johnston f72e5be58a vm_page_xbusy_claim(): Use atomics to update busy lock state.
vm_page_xbusy_claim() could clobber the waiter bit.  For its original
use, kernel memory pages, this was not a problem since nothing would
ever block on the busy lock for such pages.  r363607 introduced a new
use where this could in principle be a problem.

Fix the problem by using atomic_cmpset to update the lock owner.  Since
this macro is defined only for INVARIANTS kernels the extra overhead
doesn't seem prohibitive.

Reported by:	vangyzen
Reviewed by:	alc, kib, vangyzen
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25859
2020-07-28 19:50:39 +00:00
Chuck Silvers 4dfa06e114 Add a new function vm_page_free_invalid() for freeing invalid pages
that might be wired.  If the page is wired then it cannot be freed now,
but the thread that eventually unwires it will free it at that point.

Reviewed by:	markj, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D25430
2020-07-17 23:09:36 +00:00
Scott Long ffc568ba8b Revert r362998, r326999 while a better compatibility strategy is devised. 2020-07-09 22:38:36 +00:00
Scott Long b302c2e5c9 Migrate the feature of excluding RAM pages to use "excludelist"
as its nomenclature.

MFC after:	1 week
2020-07-07 20:33:11 +00:00
Mark Johnston a9ea09e548 Re-check for wirings after busying the page in vm_page_release_locked().
A concurrent unlocked lookup can wire the page after
vm_page_release_locked() releases the last wiring, in which case
vm_page_release_locked() must not free the page.  Once the xbusy lock is
acquired, that, the object lock and the fact that the page is unmapped
ensure that the wire count cannot increase, so re-check for new wirings
after the page is xbusied.

Update the comment above vm_page_wired() to reflect the new
synchronization rules.

Reported by:	glebius
Reviewed by:	alc, jeff, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24592
2020-04-28 13:51:41 +00:00
Jeff Roberson 6be21eb778 Provide a lock free alternative to resolve bogus pages. This is not likely
to be much of a perf win, just a nice code simplification.

Reviewed by:	markj, kib
Differential Revision:	https://reviews.freebsd.org/D23866
2020-02-28 21:42:48 +00:00
Jeff Roberson c49be4f1c6 Add unlocked grab* function variants that use lockless radix code to
lookup pages.  These variants will fall back to their locked counterparts
if the page is not present.

Discussed with:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23449
2020-02-27 02:37:27 +00:00
Jeff Roberson e9ceb9dd11 Don't release xbusy on kmem pages. After lockless page lookup we will not
be able to guarantee that they can be racquired without blocking.

Reviewed by:	kib
Discussed with:	markj
Differential Revision:	https://reviews.freebsd.org/D23506
2020-02-19 09:10:11 +00:00
Jeff Roberson f212367b42 Refactor _vm_page_busy_sleep to reduce the delta between the various
sleep routines and introduce a variant that supports lockless sleep.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23612
2020-02-17 01:08:00 +00:00
Jeff Roberson ee9e43f8dd Add an explicit busy state for free pages. This improves behavior with
potential bugs that access freed pages as well as providing a path
towards lockless page lookup.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23444
2020-02-04 20:33:01 +00:00
Mark Johnston f0a273c00f Remove a couple of lingering usages of the page lock.
Update vm_page_scan_contig() and vm_page_reclaim_run() to stop using
vm_page_change_lock().  It has no use after r356157.  Remove
vm_page_change_lock() now that it has no users.

Remove an unncessary check for wirings in vm_page_scan_contig(), which
was previously checking twice.  The check is racy until
vm_page_reclaim_run() ensures that the page is unmapped, so one check is
sufficient.

Reviewed by:	jeff, kib (previous versions)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D23279
2020-02-01 18:23:51 +00:00
Mark Johnston 727150ff03 Remove some unused functions.
The previous series of patches orphaned some vm_page functions, so
remove them.

Reviewed by:	dougm, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22886
2019-12-28 19:04:29 +00:00
Mark Johnston dc71caa037 Update the vm_page.h block comment to reflect recent changes.
Explain the new locking rules for per-page queue state updates.

Reviewed by:	jeff, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22884
2019-12-28 19:04:15 +00:00
Mark Johnston 9f5632e6c8 Remove page locking for queue operations.
With the previous reviews, the page lock is no longer required in order
to perform queue operations on a page.  It is also no longer needed in
the page queue scans.  This change effectively eliminates remaining uses
of the page lock and also the false sharing caused by multiple pages
sharing a page lock.

Reviewed by:	jeff
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22885
2019-12-28 19:04:00 +00:00
Mark Johnston f3f38e2580 Start implementing queue state updates using fcmpset loops.
This is in preparation for eliminating the use of the vm_page lock for
protecting queue state operations.

Introduce the vm_page_pqstate_commit_*() functions.  These functions act
as helpers around vm_page_astate_fcmpset() and are specialized for
specific types of operations.  vm_page_pqstate_commit() wraps these
functions.

Convert a number of routines to use these new helpers.  Use
vm_page_release_toq() in vm_page_unwire() and vm_page_release() to
atomically release a wiring reference and release the page into a queue.
This has the side effect that vm_page_unwire() will leave the page in
the active queue if it is already present there.

Convert the page queue scans to use the new helpers.  Simplify
vm_pageout_reinsert_inactive(), which requeues pages that were found to
be busy during an inactive queue scan, to avoid duplicating the work of
vm_pqbatch_process_page().  In particular, if PGA_REQUEUE or
PGA_REQUEUE_HEAD is set, let that be handled during batch processing.

Reviewed by:	jeff
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22770
Differential Revision:	https://reviews.freebsd.org/D22771
Differential Revision:	https://reviews.freebsd.org/D22772
Differential Revision:	https://reviews.freebsd.org/D22773
Differential Revision:	https://reviews.freebsd.org/D22776
2019-12-28 19:03:32 +00:00
Jeff Roberson 3cf3b4e641 Make page busy state deterministic on free. Pages must be xbusy when
removed from objects including calls to free.  Pages must not be xbusy
when freed and not on an object.  Strengthen assertions to match these
expectations.  In practice very little code had to change busy handling
to meet these rules but we can now make stronger guarantees to busy
holders and avoid conditionally dropping busy in free.

Refine vm_page_remove() and vm_page_replace() semantics now that we have
stronger guarantees about busy state.  This removes redundant and
potentially problematic code that has proliferated.

Discussed with:	markj
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22822
2019-12-22 06:56:44 +00:00
Mark Johnston c2f22e9790 Fix the aflag shift on big-endian platforms after r355672.
The structure offset is zero regardless of endianness.

Reported by:	brooks
Pointy hat:	markj
2019-12-18 01:56:38 +00:00
Jeff Roberson a808177864 Add a deferred free mechanism for freeing swap space that does not require
an exclusive object lock.

Previously swap space was freed on a best effort basis when a page that
had valid swap was dirtied, thus invalidating the swap copy.  This may be
done inconsistently and requires the object lock which is not always
convenient.

Instead, track when swap space is present.  The first dirty is responsible
for deleting space or setting PGA_SWAP_FREE which will trigger background
scans to free the swap space.

Simplify the locking in vm_fault_dirty() now that we can reliably identify
the first dirty.

Discussed with:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22654
2019-12-15 03:15:06 +00:00
Mark Johnston cbc080b4c4 Avoid relying on silent type casting in the native atomic_load_32.
Reported by:	np
2019-12-12 23:55:34 +00:00
Mark Johnston 6fbaf6859c Implement atomic state updates using the new vm_page_astate_t structure.
Introduce primitives vm_page_astate_load() and vm_page_astate_fcmpset()
to operate on the 32-bit per-page atomic state.  Modify
vm_page_pqstate_fcmpset() to use them.  No functional change intended.

Introduce PGA_QUEUE_OP_MASK, a subset of PGA_QUEUE_STATE_MASK that only
includes queue operation flags.  This will be used in subsequent
patches.

Reviewed by:	alc, jeff, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22753
2019-12-12 21:13:20 +00:00
Mark Johnston 5cff1f4dc3 Introduce vm_page_astate.
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state.  The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.

This change merely adds the structure and updates references to atomic
state fields.  No functional change intended.

Reviewed by:	alc, jeff, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22650
2019-12-10 18:14:50 +00:00
Jeff Roberson fb1d575ceb Reduce duplication in grab functions by providing allocflags based inlines.
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22635
2019-12-08 01:16:22 +00:00
Jeff Roberson 584061b480 Garbage collect the mostly unused us_keg field. Use appropriately named
union members in vm_page.h to store the zone and slab.  Remove some nearby
dead code.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22564
2019-11-28 07:49:25 +00:00