system/freebsd-src

mirror of https://github.com/freebsd/freebsd-src synced 2024-09-20 00:33:57 +00:00

Author	SHA1	Message	Date
Edward Tomasz Napierala	0f559a9f09	Make vmdaemon timeout configurable Make vmdaemon timeout configurable, so that one can adjust how often it runs. Here's a trick: set this to 1, then run 'limits -m 0 sh', then run whatever you want with 'ktrace -it XXX', and observe how the working set changes over time. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D22038	2021-10-17 13:49:29 +01:00
Dawid Gorecki	889b56c8cd	setrlimit: Take stack gap into account. Calling setrlimit with stack gap enabled and with low values of stack resource limit often caused the program to abort immediately after exiting the syscall. This happened due to the fact that the resource limit was calculated assuming that the stack started at sv_usrstack, while with stack gap enabled the stack is moved by a random number of bytes. Save information about stack size in struct vmspace and adjust the rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max, then the value is truncated to rlim_max. PR: 253208 Reviewed by: kib Obtained from: Semihalf Sponsored by: Stormshield MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D31516	2021-10-15 10:21:47 +02:00
Warner Losh	cdccd11b36	forward declare struct thread sys/sysctl.h moved struct thread forward declaration under #ifdef _KERNEL and so this header fails when included from userland. Add a forward declaration here. Fixes: `99eefc727e` Sponsored by: Netflix	2021-10-11 12:59:39 -06:00
Konstantin Belousov	174aad047e	vm_fault: do not trigger OOM too early Wakeup in vm_waitpfault() does not mean that the thread would get the page on the next vm_page_alloc() call, other thread might steal the free page we were waiting for. On the other hand, this wakeup might come much earlier than just vm_pfault_oom_wait seconds, if the rate of the page reclamation is high enough. If wakeups come fast and we loose the allocation race enough times, OOM could be undeservably triggered much earlier than vm_pfault_oom_attempts x vm_pfault_oom_wait seconds. Fix it by not counting the number of sleeps, but measuring the time to th first allocation failure, and triggering OOM when it was older than oom_attempts x oom_wait seconds. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D32287	2021-10-08 12:24:46 +03:00
Mitchell Horne	31991a5a45	minidump: De-duplicate is_dumpable() The function is identical in each minidump implementation, so move it to vm_phys.c. The only slight exception is powerpc where the function was public, for use in moea64_scan_pmap(). Reviewed by: kib, markj, imp (earlier version) MFC after: 2 weeks Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D31884	2021-09-29 16:41:52 -03:00
Gleb Smirnoff	183f8e1e57	Externalize nsw_cluster_max and initialize it early. GEOM_ELI needs to know the value, cause it will soon have special memory handling for IO operations associated with swap. Move initialization to swap_pager_init(), which is executed at SI_SUB_VM, unlike swap_pager_swap_init(), which would be executed only when a swap is configured. GEOM_ELI might need the value at SI_SUB_DRIVERS, when disks are tasted by GEOM. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D24400	2021-09-28 11:23:52 -07:00
Gleb Smirnoff	c6213beff4	Add flag BIO_SWAP to mark IOs that are associated with swap. Submitted by: jtl Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D24400	2021-09-28 11:23:51 -07:00
Konstantin Belousov	bd3a668087	vm_page_startup: correct calculation of the starting page Also avoid unneded calculations when phys segment end is the phys_avail[] start. Submitted by: alc Reviewed by: markj MFC after: 1 week Fixes: `181bfb42fd` Differential revision: https://reviews.freebsd.org/D32009	2021-09-19 21:27:55 +03:00
Mark Johnston	d6e77cda9b	uma: Show the count of free slabs in each per-domain keg's sysctl tree This is useful for measuring the number of pages that could be freed from a NOFREE zone under memory pressure. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-09-17 14:19:05 -04:00
Konstantin Belousov	181bfb42fd	vm_phys: do not ignore phys_avail[] segments that do not fit completely into vm_phys segments If phys_avail[] segment only intersect with some vm_phys segment, add pages from it to the free list that belong to the given vm_phys_seg, instead of dropping them. The vm_phys segments are generally result of subdivision of phys_avail segments, for instance DMA32 or LOWMEM boundaries split them. On amd64, after UEFI in-place kernel activation (copy_staging disable) was enabled, we typically have a large phys_avail[] segment below 4G which crosses LOWMEM (1M) boundary. With the current way of requiring phys_avail[] fully fit into vm_phys_seg, this memory was ignored. Reported by: madpilot Reviewed by: markj Discussed with: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31958	2021-09-16 20:01:19 +03:00
Mark Johnston	686aa9287c	swap_pager: Handle large swap_pager_reserve() requests This interface is used solely by md(4) when the MD_RESERVE flag is specified, as in `mdconfig -a -t swap -s 1G -o reserve`. It pre-allocates swap blocks for the entire object. The number of blocks to be reserved is specified as a vm_size_t, but swp_pager_getswapspace() can allocate at most INT_MAX blocks. vm_size_t also seems like the incorrect type to use here it refers only to the size of the VM object, not the size of a mapping. So: - change the type of "size" in swap_pager_reserve() to vm_pindex_t, and - clamp the requested number of blocks for a single swp_pager_getswapspace() call to INT_MAX. Reported by: syzkaller Reviewed by: dougm, alc, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31875	2021-09-07 14:04:50 -04:00
Bjoern A. Zeeb	eccb516db8	vm: use __func__ for the correct function name In `fee2a2fa39` the KASSERTs in vm_page_unwire_noq() changed from "vm_page_unwire" to "vm_page_unref". While the former no longer was part of that function the latter does not exist as a function and is highly confusing when hit when using tools to lookup the functions and not doing a full-text search. Use %s __func__ for printing the function name, as that will do the right thing as code moves around and functions get renamed. Hit: while debugging a wired page leak with linuxkpi/iwlwifi Sponsored by: The FreeBSD Foundation Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D31635	2021-08-22 17:43:12 +00:00
Gordon Bergling	fa7a635f7e	Fix a few typos in source code comments - s/becase/because/ MFC after: 5 days	2021-08-14 09:06:09 +02:00
Mark Johnston	100949103a	uma: Add KMSAN hooks For now, just hook the allocation path: upon allocation, items are marked as initialized (absent M_ZERO). Some zones are exempted from this when it would otherwise raise false positives. Use kmsan_orig() to update the origin map for UMA and malloc(9) allocations. This allows KMSAN to print the return address when an uninitialized UMA item is implicated in a report. For example: panic: MSan: Uninitialized UMA memory from m_getm2+0x7fe Sponsored by: The FreeBSD Foundation	2021-08-10 21:27:54 -04:00
Mark Johnston	8978608832	amd64: Populate the KMSAN shadow maps and integrate with the VM - During boot, allocate PDP pages for the shadow maps. The region above KERNBASE is currently not shadowed. - Create a dummy shadow for the vm page array. For now, this array is not protected by the shadow map to help reduce kernel memory usage. - Grow shadows when growing the kernel map. - Increase the default kernel stack size when KMSAN is enabled. As with KASAN, sanitizer instrumentation appears to create stack frames large enough that the default value is not sufficient. - Disable UMA's use of the direct map when KMSAN is configured. KMSAN cannot validate the direct map. - Disable unmapped I/O when KMSAN configured. - Lower the limit on paging buffers when KMSAN is configured. Each buffer has a static MAXPHYS-sized allocation of KVA, which in turn eats 2*MAXPHYS of space in the shadow map. Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31295	2021-08-10 21:27:53 -04:00
Ka Ho Ng	de2e152959	Add vnode_pager_purge_range(9) KPI This KPI is created in addition to the existing vnode_pager_setsize(9) KPI. The KPI is intended for file systems that are able to turn a range of file into sparse range, also known as hole-punching. Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D27194	2021-08-05 22:52:26 +08:00
Konstantin Belousov	0ef5eee9d9	Add vn_lktype_write() and remove repetetive code that calculates vnode locking type for write. Reviewed by: khng, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31405	2021-08-04 19:40:13 +03:00
Konstantin Belousov	041b7317f7	Add pmap_vm_page_alloc_check() which is the place to put MD asserts about allocated pages. On amd64, verify that allocated page does not belong to the kernel (text, data) or early allocated pages. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31121	2021-07-31 16:53:42 +03:00
Mark Johnston	4e8e26a004	redzone: Raise a compile error if KASAN is configured redzone(9) does some munging of the allocation to insert redzones before and after a valid memory buffer, but KASAN does not know about this and will raise false positives if both are configured. Until this is fixed, do not allow both to be configured. Note that KASAN provides similar checking on its own but currently does not force the creation of redzones for all UMA allocations; this should be addressed as well. Sponsored by: The FreeBSD Foundation	2021-07-23 10:47:13 -04:00
Mark Johnston	b0dfc48684	uma: Fix a few problems with KASAN integration - Ensure that all items returned by UMA are aligned to KASAN_SHADOW_SCALE (8). This was true in practice since smaller alignments are not used by any consumers, but we should enforce it anyway. - Use a non-zero code for marking redzones that appear naturally in items that are not a multiple of the scale factor in size. Currently we do not modify keg layouts to force the creation of redzones. - Use a non-zero code for marking freed per-CPU items, otherwise accesses of freed per-CPU items are not detected by the runtime. Sponsored by: The FreeBSD Foundation	2021-07-09 20:38:50 -04:00
Konstantin Belousov	5b10e79edb	Un-staticise vm_page_init_page() Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30785	2021-06-17 16:58:44 +03:00
Mateusz Guzik	128e25842e	vm: add another pager private flag Move OBJ_SHADOWLIST around to let pager flags be next to each other. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D30258	2021-05-15 20:47:29 +00:00
Konstantin Belousov	28bc23ab92	tmpfs: dynamically register tmpfs pager Remove OBJT_SWAP_TMPFS. Move tmpfs-specific swap pager bits into tmpfs_subr.c. There is no longer any code to directly support tmpfs in sys/vm, most tmpfs knowledge is shared by non-anon swap object type implementation. The tmpfs-specific methods are provided by registered tmpfs pager, which inherits from the swap pager. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30168	2021-05-13 20:13:34 +03:00
Konstantin Belousov	b730fd30b7	vm: Add KPI to dynamically register pagers Pager is allowed to inherit part of its implementation from the existing pager, which is done by copying non-NULL virtual method slots. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30168	2021-05-13 20:12:29 +03:00
Konstantin Belousov	7079449b0b	sys/vm: remove several other uses of OBJT_SWAP_TMPFS Mostly in cases where OBJ_SWAP flag works as well, or by reversing the condition so that object types can be listed. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30168	2021-05-13 20:10:35 +03:00
Konstantin Belousov	3e7a11ca21	vm_object_set_memattr(): handle all object types without listing them explicitly This avoids the need to know all existing object types in advance, by the cost of loosing the assert that unknown object type is handled in a sane manner. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30168	2021-05-13 20:10:35 +03:00
Konstantin Belousov	00a3fe968b	vm_object_kvme_type(): reimplement by embedding kvme_type into pagerops Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30168	2021-05-13 20:10:35 +03:00
Mark Johnston	9246b3090c	fork: Suspend other threads if both RFPROC and RFMEM are not set Otherwise, a multithreaded parent process may trigger races in vm_forkproc() if one thread calls rfork() with RFMEM set and another calls rfork() without RFMEM. Also simplify vm_forkproc() a bit, vmspace_unshare() already checks to see if the address space is shared. Reported by: syzbot+0aa7c2bec74c4066c36f@syzkaller.appspotmail.com Reported by: syzbot+ea84cb06937afeae609d@syzkaller.appspotmail.com Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30220	2021-05-13 08:33:23 -04:00
Mark Johnston	06d1fd9f42	swap_pager: Zero swap info before exporting to userspace Otherwise padding bytes are leaked. Reported by: KMSAN MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-05-12 12:52:05 -04:00
Konstantin Belousov	d474440ab3	Constify vm_pager-related virtual tables. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	4b8365d752	Add OBJT_SWAP_TMPFS pager This is OBJT_SWAP pager, specialized for tmpfs. Right now, both swap pager and generic vm code have to explicitly handle swap objects which are tmpfs vnode v_object, in the special ways. Replace (almost) all such places with proper methods. Since VM still needs a notion of the 'swap object', regardless of its use, add yet another type-classification flag OBJ_SWAP. Set it in vm_object_allocate() where other type-class flags are set. This change almost completely eliminates the knowledge of tmpfs from VM, and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	0d2dfc6fed	pagertab: use designated initializers Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	838adc533f	Style enum obj_type Put each type into dedicated line, which makes addition of new types cleaner. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	a7c198a24b	Implement vm_object_vnode() using vm_pager_getvp() Allow vp_heldp argument to be NULL, in which case the returned vnode is not held for tmpfs swap objects. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	1390a5cbeb	Add pgo_freespace method Makes the code in vm_object collapse/page_remove cleaner Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	192112b74f	Add pgo_getvp method This eliminates the staircase of conditions in vm_map_entry_set_vnode_text(). Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	c23c555bc1	Add pgo_mightbedirty method Used to implement vm_object_mightbedirty() Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	180bcaa46c	vm_pager: add pgo_set_writeable_dirty method specialized for swap and vnode pagers, and used to implement vm_object_set_writeable_dirty(). Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	ee4211bca6	vm_pager: style some wrappers Fill lines with the function definitions. Use local var to shorten repeated extra-long expressions. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:02 +03:00
Konstantin Belousov	a0850dd057	swappagerops: slightly more style-compliant formatting Remove excess spaces from comments. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:02 +03:00
Mark Johnston	9a7c2de364	realloc: Fix KASAN(9) shadow map updates When copying from the old buffer to the new buffer, we don't know the requested size of the old allocation, but only the size of the allocation provided by UMA. This value is "alloc". Because the copy may access bytes in the old allocation's red zone, we must mark the full allocation valid in the shadow map. Do so using the correct size. Reported by: kp Tested by: kp Sponsored by: The FreeBSD Foundation	2021-05-05 17:12:51 -04:00
Alexander Motin	2760658b21	Improve UMA cache reclamation. When estimating working set size, measure only allocation batches, not free batches. Allocation and free patterns can be very different. For example, ZFS on vm_lowmem event can free to UMA few gigabytes of memory in one call, but it does not mean it will request the same amount back that fast too, in fact it won't. Update working set size on every reclamation call, shrinking caches faster under pressure. Lack of this caused repeating vm_lowmem events squeezing more and more memory out of real consumers only to make it stuck in UMA caches. I saw ZFS drop ARC size in half before previous algorithm after periodic WSS update decided to reclaim UMA caches. Introduce voluntary reclamation of UMA caches not used for a long time. For each zdom track longterm minimal cache size watermark, freeing some unused items every UMA_TIMEOUT after first 15 minutes without cache misses. Freed memory can get better use by other consumers. For example, ZFS won't grow its ARC unless it see free memory, since it does not know it is not really used. And even if memory is not really needed, periodic free during inactivity periods should reduce its fragmentation. Reviewed by: markj, jeff (previous version) MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29790	2021-05-02 19:45:23 -04:00
Konstantin Belousov	ecfbddf0cd	sysctl vm.objects: report backing object and swap use For anonymous objects, provide a handle kvo_me naming the object, and report the handle of the backing object. This allows userspace to deconstruct the shadow chain. Right now the handle is the address of the object in KVA, but this is not guaranteed. For the same anonymous objects, report the swap space used for actually swapped out pages, in kvo_swapped field. I do not believe that it is useful to report full 64bit counter there, so only uint32_t value is returned, clamped to the max. For kinfo_vmentry, report anonymous object handle backing the entry, so that the shadow chain for the specific mapping can be deconstructed. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29771	2021-04-19 21:32:01 +03:00
Mark Johnston	aabe13f145	uma: Introduce per-domain reclamation functions Make it possible to reclaim items from a specific NUMA domain. - Add uma_zone_reclaim_domain() and uma_reclaim_domain(). - Permit parallel reclamations. Use a counter instead of a flag to synchronize with zone_dtor(). - Use the zone lock to protect cache_shrink() now that parallel reclaims can happen. - Add a sysctl that can be used to trigger reclamation from a specific domain. Currently the new KPIs are unused, so there should be no functional change. Reviewed by: mav MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29685	2021-04-14 13:03:34 -04:00
Mark Johnston	54f421f9e8	uma: Split bucket_cache_drain() to permit per-domain reclamation Note that the per-domain variant does not shrink the target bucket size. No functional change intended. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-04-14 13:03:34 -04:00
Mark Johnston	2b914b85dd	kmem: Add KASAN state transitions Memory allocated with kmem_* is unmapped upon free, so KASAN doesn't provide a lot of benefit, but since allocations are always a multiple of the page size we can create a redzone when the allocation request size is not a multiple of the page size. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29458	2021-04-13 17:42:21 -04:00
Mark Johnston	244f3ec642	kstack: Add KASAN state transitions We allocate kernel stacks using a UMA cache zone. Cache zones have KASAN disabled by default, but in this case it makes sense to enable it. Reviewed by: andrew MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29457	2021-04-13 17:42:21 -04:00
Mark Johnston	09c8cb717d	uma: Add KASAN state transitions - Add a UMA_ZONE_NOKASAN flag to indicate that items from a particular zone should not be sanitized. This is applied implicitly for NOFREE and cache zones. - Add KASAN call backs which get invoked: 1) when a slab is imported into a keg 2) when an item is allocated from a zone 3) when an item is freed to a zone 4) when a slab is freed back to the VM In state transitions 1 and 3, memory is poisoned so that accesses will trigger a panic. In state transitions 2 and 4, memory is marked valid. - Disable trashing if KASAN is enabled. It just adds extra CPU overhead to catch problems that are detected by KASAN. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29456	2021-04-13 17:42:21 -04:00
Mark Johnston	982693bb72	vm_fault: Shoot down multiply mapped COW source page mappings Reviewed by: kib, rlibby Discussed with: alc Approved by: so Security: CVE-2021-29626 Security: FreeBSD-SA-21:08.vm	2021-04-06 14:49:28 -04:00
Konstantin Belousov	89619b747b	Add sysctl debug.uma_reclaim Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-04-04 20:39:06 +03:00

1 2 3 4 5 ...

4562 commits