system/freebsd-src

mirror of https://github.com/freebsd/freebsd-src synced 2024-07-21 18:27:22 +00:00

Author	SHA1	Message	Date
Ryan Libby	a216e311a7	vm_pageout_scan_inactive: take a lock break In vm_pageout_scan_inactive, release the object lock when we go to refill the scan batch queue so that someone else has a chance to acquire it. This improves access latency to the object when the pagedaemon is processing many consecutive pages from a single object, and also in any case avoids a hiccup during refill for the last touched object. Reviewed by: alc, markj (previous version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D45288	2024-05-24 08:52:58 -07:00
Warner Losh	29363fb446	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix	2023-11-26 22:23:30 -07:00
Warner Losh	685dc743dc	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/	2023-08-16 11:54:36 -06:00
Andrew Gallatin	1cac76c93f	vm: reduce lock contention when processing vm batchqueues Rather than waiting until the batchqueue is full to acquire the lock & process the queue, we now start trying to acquire the lock using trylocks when the batchqueue is 1/2 full. This removes almost all contention on the vm pagequeue mutex for for our busy sendfile() based web workload. It also greadly reduces the amount of time a network driver ithread remains blocked on a mutex, and eliminates some packet drops under heavy load. So that the system does not loose the benefit of processing large batchqueues, I've doubled the size of the batchqueues. This way, when there is no contention, we process the same batch size as before. This has been run for several months on a busy Netflix server, as well as on my personal desktop. Reviewed by: markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D37305	2022-12-14 14:34:07 -05:00
Mark Johnston	0cb2610ee2	vm: Remove handling for OBJT_DEFAULT objects Now that OBJT_DEFAULT objects can't be instantiated, we can simplify checks of the form object->type == OBJT_DEFAULT \|\| (object->flags & OBJ_SWAP) != 0. No functional change intended. Reviewed by: alc, kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35788	2022-07-17 07:09:48 -04:00
Mark Johnston	e123264e4d	vm: Fix racy checks for swap objects Commit `4b8365d752` introduced the ability to dynamically register VM object types, for use by tmpfs, which creates swap-backed objects. As a part of this, checks for such objects changed from object->type == OBJT_DEFAULT \|\| object->type == OBJT_SWAP to object->type == OBJT_DEFAULT \|\| (object->flags & OBJ_SWAP) != 0 In particular, objects of type OBJT_DEFAULT do not have OBJ_SWAP set; the swap pager sets this flag when converting from OBJT_DEFAULT to OBJT_SWAP. A few of these checks are done without the object lock held. It turns out that this can result in false negatives since the swap pager converts objects like so: object->type = OBJT_SWAP; object->flags \|= OBJ_SWAP; Fix the problem by adding explicit tests for OBJT_SWAP objects in unlocked checks. PR: 258932 Fixes: `4b8365d752` ("Add OBJT_SWAP_TMPFS pager") Reported by: bdrewery Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35470	2022-06-20 12:48:14 -04:00
Konstantin Belousov	b51927b7b0	Revert "vm_pageout_scans: correct detection of active object" This reverts commit `3de96d664a`. Problem is that it is possible to reach the state with ref_count == 1 for the mapped non-anonymous object. For instance, anonymous posix shmfd or linux shmfs object could be mapped, and then corresponding file descriptor closed, dropping the object reference owned by the shmfd/shmfs file. Then the check in inactive scan assumes that the object and page are not mapped and frees the page, while they are not. PR: 261707 Discussed with: markj Sponsored by: The FreeBSD Foundation MFC after: now	2022-02-10 16:55:10 +02:00
Konstantin Belousov	3de96d664a	vm_pageout_scans: correct detection of active object For non-anonymous swap objects, there is always a reference from the owner to the object to keep it from recycling. Account for it when deciding should we query pmap for hardware active references for the page. As result, we avoid unneeded calls to pmap_ts_referenced(), which for non-mapped page means avoiding unneccessary lock and unlock of the pv list. Reviewed by: markj Discussed with: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33924	2022-01-22 19:34:32 +02:00
Mark Johnston	4a864f624a	vm_pageout: Print a more accurate message to the console before an OOM kill Previously we'd always print "out of swap space." This can be misleading, as there are other reasons an OOM kill can be triggered. In particular, it's entirely possible to trigger an OOM kill on a system with plenty of free swap space. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33810	2022-01-14 15:04:21 -05:00
Mark Johnston	c4a25e0713	vm_pageout: Group sysctl variables together with sysctl definitions Fix some style bugs while here. No functional change intended. Reviewed by: alc, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33811	2022-01-11 09:27:45 -05:00
Gordon Bergling	fa7a635f7e	Fix a few typos in source code comments - s/becase/because/ MFC after: 5 days	2021-08-14 09:06:09 +02:00
Konstantin Belousov	0ef5eee9d9	Add vn_lktype_write() and remove repetetive code that calculates vnode locking type for write. Reviewed by: khng, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31405	2021-08-04 19:40:13 +03:00
Konstantin Belousov	7079449b0b	sys/vm: remove several other uses of OBJT_SWAP_TMPFS Mostly in cases where OBJ_SWAP flag works as well, or by reversing the condition so that object types can be listed. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30168	2021-05-13 20:10:35 +03:00
Konstantin Belousov	4b8365d752	Add OBJT_SWAP_TMPFS pager This is OBJT_SWAP pager, specialized for tmpfs. Right now, both swap pager and generic vm code have to explicitly handle swap objects which are tmpfs vnode v_object, in the special ways. Replace (almost) all such places with proper methods. Since VM still needs a notion of the 'swap object', regardless of its use, add yet another type-classification flag OBJ_SWAP. Set it in vm_object_allocate() where other type-class flags are set. This change almost completely eliminates the knowledge of tmpfs from VM, and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Mark Johnston	2913cc4637	vm_pageout: Avoid rounding down the inactive scan target With helper page daemon threads, enabled by default in r364786, we divide the inactive target by the number of threads, rounding down, and sum the total number of pages freed by the threads. This sum is compared with the original target, but by rounding down we might lose pages, causing the page daemon control loop to conclude that inactive queue scanning isn't keeping up with demand for free pages. Typically this results in excessive swapping. Fix the problem by accounting for the error in the main pagedaemon thread's target. Note that by default the problem will manifest only in systems with >16 CPUs in a NUMA domain. Reviewed by: cem Discussed with: dougm Reported and tested by: dhw, glebius Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26610	2020-10-02 19:16:06 +00:00
Mark Johnston	97458520cc	Increase the default vm.max_user_wired value. Since r347532 (merged to stable/12) we only count user-wired pages towards the system limit. However, we now also treat pages wired by hypervisors (bhyve and virtualbox) as user-wired, so starting VMs with large amounts of RAM tends to fail due to the low limit. The purpose of the limit is to provide a seatbelt, not to impose some policy on the use of wired memory. Thus, increase the default limit to allow reasonable VM configurations to work without tuning. Reviewed by: kib Discussed with: dougm MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26424	2020-09-17 16:49:28 +00:00
Eric van Gyzen	609de97e04	vm_pageout_scan_active: ensure ps_delta is initialized Reported by: Coverity Reviewed by: markj MFC after: 2 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D26212	2020-08-28 19:59:02 +00:00
Conrad Meyer	74f5530d7a	vm_pageout: Scale worker threads with CPUs Autoscale vm_pageout worker threads from r364129 with CPU count. The default is arbitrarily chosen to be 16 CPUs per worker thread, but can be adjusted with the vm.pageout_cpus_per_thread tunable. There will never be less than 1 thread per populated NUMA domain, and the previous arbitrary upper limit (at most ncpus/2 threads per NUMA domain) is preserved. Care is taken to gracefully handle asymmetric NUMA nodes, such as empty node systems (e.g., AMD 2990WX) and systems with nodes of varying size (e.g., some larger >20 core Intel Haswell/Broadwell Xeon). Reviewed by: kib, markj Sponsored by: Isilon Differential Revision: https://reviews.freebsd.org/D26152	2020-08-25 21:36:56 +00:00
Mateusz Guzik	a92a971bbb	vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)	2020-08-16 17:18:54 +00:00
Conrad Meyer	ea7b737a6f	vm_pageout: Correct threshold calculation on single-CPU systems Reported by: Michael Butler X-MFC-With: r364129	2020-08-14 18:48:48 +00:00
Conrad Meyer	0292c54bdb	Add support for multithreading the inactive queue pageout within a domain. In very high throughput workloads, the inactive scan can become overwhelmed as you have many cores producing pages and a single core freeing. Since Mark's introduction of batched pagequeue operations, we can now run multiple inactive threads working on independent batches. To avoid confusing the pid and other control algorithms, I (Jeff) do this in a mpi-like fan out and collect model that is driven from the primary page daemon. It decides whether the shortfall can be overcome with a single thread and if not dispatches multiple threads and waits for their results. The heuristic is based on timing the pageout activity and averaging a pages-per-second variable which is exponentially decayed. This is visible in sysctl and may be interesting for other purposes. I (Jeff) have verified that this does indeed double our paging throughput when used with two threads. With four we tend to run into other contention problems. For now I would like to commit this infrastructure with only a single thread enabled. The number of worker threads per domain can be controlled with the 'vm.pageout_threads_per_domain' tunable. Submitted by: jeff (earlier version) Discussed with: markj Tested by: pho Sponsored by: probably Netflix (based on contemporary commits) Differential Revision: https://reviews.freebsd.org/D21629	2020-08-11 20:37:45 +00:00
Mark Johnston	efec381dd1	Remove most lingering references to the page lock in comments. Finish updating comments to reflect new locking protocols introduced over the past year. In particular, vm_page_lock is now effectively unused. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25868	2020-08-04 14:59:43 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Mateusz Guzik	23ed568caa	vm: remove no longer needed atomic_load_ptr casts	2020-02-14 23:16:29 +00:00
Jonathan T. Looney	3c200db9d2	Modify the vm.panic_on_oom sysctl to take a count of events. Currently, the vm.panic_on_oom sysctl is a boolean which controls the behavior of the VM system when it encounters an out-of-memory situation. If set to 0, the VM system kills the largest process. If set to any other value, the VM system will initiate a panic. This change makes the sysctl a count of events. If set to 0, the VM system kills the largest process. If set to any other value, the VM system will kill the largest process until it has seen the specified number of out-of-memory events. Once it reaches the specified number of events, it will initiate a panic. This change is helpful in capturing cores when the system is in a perpetual cycle of out-of-memory events (as opposed to just hitting one or two sporadic out-of-memory events). Reviewed by: kib MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D23601	2020-02-10 18:06:38 +00:00
Alexander Motin	ace409ce9c	Restore loop break in vm_pageout_lowmem(). r355004 removed return statement from this loop with intention to also call uma_reclaim_wakeup(). But in case of vm.lowmem_period=0 it causes infinite loop. Reviewed by: markj Sponsored by: iXsystems, Inc.	2020-01-14 03:27:57 +00:00
Mark Johnston	f7607c300b	Clear queue operation flags when migrating a page to another queue. The page daemon loops may move pages back to the active queue if references are detected. In this case we must take care to clear existing queue operation flags. In particular, PGA_REQUEUE_HEAD may be set, and that flag is only valid if the page belongs to the inactive queue. Also fix a bug in the active queue scan where we were updating "old" instead of "new". This would only have been hit in rare cases where the page moved out of the active queue after the beginning of the scan. Reported by: Bob Prohaska, Idwer Vollering Tested by: Idwer Vollering Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D23001	2020-01-02 19:26:04 +00:00
Mark Johnston	9f5632e6c8	Remove page locking for queue operations. With the previous reviews, the page lock is no longer required in order to perform queue operations on a page. It is also no longer needed in the page queue scans. This change effectively eliminates remaining uses of the page lock and also the false sharing caused by multiple pages sharing a page lock. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22885	2019-12-28 19:04:00 +00:00
Mark Johnston	b7f30bff2f	Generalize lazy dequeue logic for wired pages. Some recent work aims to remove the use of the page lock for synchronizing updates to page queue state. This change adds a mechanism to preserve the existing behaviour of lazily dequeuing wired pages, which was previously synchronized using the page lock. Handle this by setting PGA_DEQUEUE when a managed page's wire count transitions from 0 to 1. When the page daemon encounters a page with a flag in PGA_QUEUE_OP_MASK set, it creates a batch queue entry for that page, but in so doing it does not modify the page itself and thus racing with a concurrent free of the page is harmless. The flag is advisory; the page daemon still checks for wirings after acquiring the object and page xbusy locks. vm_page_unwire_managed() now clears PGA_DEQUEUE on a 1->0 transition. It must do this before dropping the reference to avoid a use-after-free but also handles races with concurrent wirings to ensure that PGA_DEQUEUE is not left unset on a wired page. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22882	2019-12-28 19:03:46 +00:00
Mark Johnston	f3f38e2580	Start implementing queue state updates using fcmpset loops. This is in preparation for eliminating the use of the vm_page lock for protecting queue state operations. Introduce the vm_page_pqstate_commit_*() functions. These functions act as helpers around vm_page_astate_fcmpset() and are specialized for specific types of operations. vm_page_pqstate_commit() wraps these functions. Convert a number of routines to use these new helpers. Use vm_page_release_toq() in vm_page_unwire() and vm_page_release() to atomically release a wiring reference and release the page into a queue. This has the side effect that vm_page_unwire() will leave the page in the active queue if it is already present there. Convert the page queue scans to use the new helpers. Simplify vm_pageout_reinsert_inactive(), which requeues pages that were found to be busy during an inactive queue scan, to avoid duplicating the work of vm_pqbatch_process_page(). In particular, if PGA_REQUEUE or PGA_REQUEUE_HEAD is set, let that be handled during batch processing. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22770 Differential Revision: https://reviews.freebsd.org/D22771 Differential Revision: https://reviews.freebsd.org/D22772 Differential Revision: https://reviews.freebsd.org/D22773 Differential Revision: https://reviews.freebsd.org/D22776	2019-12-28 19:03:32 +00:00
Jeff Roberson	a808177864	Add a deferred free mechanism for freeing swap space that does not require an exclusive object lock. Previously swap space was freed on a best effort basis when a page that had valid swap was dirtied, thus invalidating the swap copy. This may be done inconsistently and requires the object lock which is not always convenient. Instead, track when swap space is present. The first dirty is responsible for deleting space or setting PGA_SWAP_FREE which will trigger background scans to free the swap space. Simplify the locking in vm_fault_dirty() now that we can reliably identify the first dirty. Discussed with: alc, kib, markj Differential Revision: https://reviews.freebsd.org/D22654	2019-12-15 03:15:06 +00:00
Mark Johnston	5cff1f4dc3	Introduce vm_page_astate. This is a 32-bit structure embedded in each vm_page, consisting mostly of page queue state. The use of a structure makes it easy to store a snapshot of a page's queue state in a stack variable and use cmpset loops to update that state without requiring the page lock. This change merely adds the structure and updates references to atomic state fields. No functional change intended. Reviewed by: alc, jeff, kib Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22650	2019-12-10 18:14:50 +00:00
Mark Johnston	9c770a27ce	Simplify vm_pageout_init_domain() and add a "big picture" comment. Stop subtracting 1024/200 from vmd_page_count/200. I cannot see how such precise accounting can make a difference on modern systems. Add some explanation of what the page daemon does and how it handles memory shortages. Reviewed by: dougm Discussed with: jeff, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22396	2019-11-22 16:31:43 +00:00
Mark Johnston	8fc2550837	Reclaim memory from UMA if the page daemon is struggling. Use the UMA reclaim thread to asynchronously drain all caches if there is a severe shortage in a domain. Otherwise we only trigger UMA reclamation every 10s even when the system has completely run out of memory. Stop entirely draining the caches when one domain falls below its min threshold. In some workloads it is normal for one NUMA domain to end up being nearly depleted by kernel memory allocations, for example for the ZFS ARC. The domainset iterators skip domains below the vmd_min_free theshold on the first iteration, so we should allow that mechanism to limit further depletion of the domain's free pages before taking the extreme step of calling uma_reclaim(UMA_RECLAIM_DRAIN_CPU). Discussed with: jeff MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22395	2019-11-22 16:31:30 +00:00
Jeff Roberson	0012f373e4	(4/6) Protect page valid with the busy lock. Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594	2019-10-15 03:45:41 +00:00
Jeff Roberson	63e9755548	(1/6) Replace busy checks with acquires where it is trival to do so. This is the first in a series of patches that promotes the page busy field to a first class lock that no longer requires the object lock for consistency. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21548	2019-10-15 03:35:11 +00:00
Doug Moore	2288078c5e	Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map. In case the implementation ever changes from using a chain of next pointers, then changing the macro definition will be necessary, but changing all the files that iterate over vm_map entries will not. Drop a counter in vm_object.c that would have an effect only if the vm_map entry count was wrong. Discussed with: alc Reviewed by: markj Tested by: pho (earlier version) Differential Revision: https://reviews.freebsd.org/D21882	2019-10-08 07:14:21 +00:00
Mark Johnston	e8bcf6966b	Revert r352406, which contained changes I didn't intend to commit.	2019-09-16 15:04:45 +00:00
Mark Johnston	41fd4b9422	Fix a couple of nits in r352110. - Remove a dead variable from the amd64 pmap_extract_and_hold(). - Fix grammar in the vm_page_wire man page. Reported by: alc Reviewed by: alc, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21639	2019-09-16 15:03:12 +00:00
Mark Johnston	fee2a2fa39	Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486	2019-09-09 21:32:42 +00:00
Mark Johnston	7cdeaf3309	Add preliminary support for atomic updates of per-page queue state. Queue operations on a page use the page lock when updating the page to reflect the desired queue state, and the page queue lock when physically enqueuing or dequeuing a page. Multiple pages share a given page lock, but queue state is per-page; this false sharing results in heavy lock contention. Take a small step towards the use of atomic_cmpset to synchronize updates to per-page queue state by introducing vm_page_pqstate_cmpset() and using it in the page daemon. In the longer term the plan is to stop using the page lock to protect page identity and rely only on the object and page busy locks. However, since the page daemon avoids acquiring the object lock except when necessary, some synchronization with a concurrent free of the page is required. vm_page_pqstate_cmpset() can be used to ensure that queue state updates are successful only if the page is not scheduled for a dequeue, which is sufficient for the page daemon. Add vm_page_swapqueue(), which moves a page from one queue to another using vm_page_pqstate_cmpset(). Use it in the active queue scan, which does not use the object lock. Modify vm_page_dequeue_deferred() to use vm_page_pqstate_cmpset() as well. Reviewed by: kib Discussed with: jeff Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21257	2019-09-03 14:29:58 +00:00
Mark Johnston	08cfa56ea3	Extend uma_reclaim() to permit different reclamation targets. The page daemon periodically invokes uma_reclaim() to reclaim cached items from each zone when the system is under memory pressure. This is important since the size of these caches is unbounded by default. However it also results in bursts of high latency when allocating from heavily used zones as threads miss in the per-CPU caches and must access the keg in order to allocate new items. With r340405 we maintain an estimate of each zone's usage of its (per-NUMA domain) cache of full buckets. Start making use of this estimate to avoid reclaiming the entire cache when under memory pressure. In particular, introduce TRIM, DRAIN and DRAIN_CPU verbs for uma_reclaim() and uma_zone_reclaim(). When trimming, only items in excess of the estimate are reclaimed. Draining a zone reclaims all of the cached full buckets (the previous behaviour of uma_reclaim()), and may further drain the per-CPU caches in extreme cases. Now, when under memory pressure, the page daemon will trim zones rather than draining them. As a result, heavily used zones do not incur bursts of bucket cache misses following reclamation, but large, unused caches will be reclaimed as before. Reviewed by: jeff Tested by: pho (an earlier version) MFC after: 2 months Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D16667	2019-09-01 22:22:43 +00:00
Konstantin Belousov	245139c69d	Fix OOM handling of some corner cases. In addition to pagedaemon initiating OOM, also do it from the vm_fault() internals. Namely, if the thread waits for a free page to satisfy page fault some preconfigured amount of time, trigger OOM. These triggers are rate-limited, due to a usual case of several threads of the same multi-threaded process to enter fault handler simultaneously. The faults from pagedaemon threads participate in the calculation of OOM rate, but are not under the limit. Reviewed by: markj (previous version) Tested by: pho Discussed with: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D13671	2019-08-16 09:43:49 +00:00
Mark Johnston	eeacb3b02f	Merge the vm_page hold and wire mechanisms. The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are that holds are not counted as a reference with respect to LRU, and holds have an implicit free-on-last unhold semantic whereas vm_page_unwire() callers must explicitly determine whether to free the page once the last reference to the page is released. This change removes the KPIs which directly manipulate hold_count. Functions such as vm_fault_quick_hold_pages() now return wired pages instead. Since r328977 the overhead of maintaining LRU for wired pages is lower, and in many cases vm_fault_quick_hold_pages() callers would swap holds for wirings on the returned pages anyway, so with this change we remove a number of page lock acquisitions. No functional change is intended. __FreeBSD_version is bumped. Reviewed by: alc, kib Discussed with: jeff Discussed with: jhb, np (cxgbe) Tested by: pho (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19247	2019-07-08 19:46:20 +00:00
Doug Moore	0cab71bcee	Fix style(9) violations involving division by PAGE_SIZE. Reviewed by: alc Approved by: markj (mentor) Differential Revision: https://reviews.freebsd.org/D20847	2019-07-06 15:55:16 +00:00
Mark Johnston	d70f0ab38d	Cache the next queue element when traversing a page queue. When QUEUE_MACRO_DEBUG_TRASH is configured, removing a queue element invalidates its queue linkage pointers. vm_pageout_collect_batch() was relying on these pointers remaining valid after a removal, so modify it to fetch the next queued page before dequeuing the current page. Submitted by: Don Morris <dgmorris@earthlink.net> Reviewed by: cem, vangyzen MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D20842	2019-07-03 18:46:39 +00:00
Mark Johnston	d842aa5114	Add a vm_page_wired() predicate. Use it instead of accessing the wire_count field directly. No functional change intended. Reviewed by: alc, kib MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20485	2019-06-02 01:00:17 +00:00
Mark Johnston	54a3a11421	Provide separate accounting for user-wired pages. Historically we have not distinguished between kernel wirings and user wirings for accounting purposes. User wirings (via mlock(2)) were subject to a global limit on the number of wired pages, so if large swaths of physical memory were wired by the kernel, as happens with the ZFS ARC among other things, the limit could be exceeded, causing user wirings to fail. The change adds a new counter, v_user_wire_count, which counts the number of virtual pages wired by user processes via mlock(2) and mlockall(2). Only user-wired pages are subject to the system-wide limit which helps provide some safety against deadlocks. In particular, while sources of kernel wirings typically support some backpressure mechanism, there is no way to reclaim user-wired pages shorting of killing the wiring process. The limit is exported as vm.max_user_wired, renamed from vm.max_wired, and changed from u_int to u_long. The choice to count virtual user-wired pages rather than physical pages was done for simplicity. There are mechanisms that can cause user-wired mappings to be destroyed while maintaining a wiring of the backing physical page; these make it difficult to accurately track user wirings at the physical page layer. The change also closes some holes which allowed user wirings to succeed even when they would cause the system limit to be exceeded. For instance, mmap() may now fail with ENOMEM in a process that has called mlockall(MCL_FUTURE) if the new mapping would cause the user wiring limit to be exceeded. Note that bhyve -S is subject to the user wiring limit, which defaults to 1/3 of physical RAM. Users that wish to exceed the limit must tune vm.max_user_wired. Reviewed by: kib, ngie (mlock() test changes) Tested by: pho (earlier version) MFC after: 45 days Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19908	2019-05-13 16:38:48 +00:00
Doug Moore	64f8d2575a	fls() should find the most significant bit of an int faster than a linear search can, so use it to avoid a linear search in isqrt. Approved by: kib (mentor), markj (mentor) Differential Revision: https://reviews.freebsd.org/D20102	2019-05-03 02:55:54 +00:00
Mark Johnston	46e39081f4	Clear pointers to indicate that the respective locks are released. This fixes a problem in r344231: vm_pageout_launder() may scan two queues when swap is disabled. Reported by: pho MFC with: r344231	2019-02-21 15:44:32 +00:00

1 2 3 4 5 ...

531 commits