Commit graph

244 commits

Author SHA1 Message Date
Jason A. Harmening 2619c5ccfe Avoid waiting on physical allocations that can't possibly be satisfied
- Change vm_page_reclaim_contig[_domain] to return an errno instead
  of a boolean.  0 indicates a successful reclaim, ENOMEM indicates
  lack of available memory to reclaim, with any other error (currently
  only ERANGE) indicating that reclamation is impossible for the
  specified address range.  Change all callers to only follow
  up with vm_page_wait* in the ENOMEM case.

- Introduce vm_domainset_iter_ignore(), which marks the specified
  domain as unavailable for further use by the iterator.  Use this
  function to ignore domains that can't possibly satisfy a physical
  allocation request.  Since WAITOK allocations run the iterators
  repeatedly, this avoids the possibility of infinitely spinning
  in domain iteration if no available domain can satisfy the
  allocation request.

PR:		274252
Reported by:	kevans
Tested by:	kevans
Reviewed by:	markj
Differential Revision: https://reviews.freebsd.org/D42706
2023-12-23 23:01:40 -06:00
Andrew Turner 839999e7ef vm: Add kva_alloc_aligned
Add a function like kva_alloc that allows us to specify the alignment
of the virtual address space returned.

Reviewed by:	alc, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D42788
2023-11-30 10:50:03 +00:00
Andrew Turner 8daee410d2 vm: Use vmem_xalloc in kva_alloc
The kernel_arena used in kva_alloc has the qcache disabled. vmem_alloc
will first try to use the qcache before falling back to vmem_xalloc.

Rather than trying to use the qcache in vmem_alloc just call
vmem_xalloc directly.

Reviewed by:	alc, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D42831
2023-11-30 10:50:03 +00:00
Warner Losh 29363fb446 sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by:		Netflix
2023-11-26 22:23:30 -07:00
Konstantin Belousov 8882b7852a add pmap_active_cpus()
For amd64, i386, arm, and riscv, i.e. all architectures except arm64,
the custom implementation is provided since we maintain the bitmask of
active CPUs anyway.

Arm64 uses somewhat naive iteration over CPUs and match current vmspace'
pmap with the argument. It is not guaranteed that vmspace->pmap is the
same as the active pmap, but the inaccuracy should be toleratable.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32360
2023-08-23 03:02:21 +03:00
Warner Losh 685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Colin Percival 45cc8519f5 tslog: Annotate parts of SYSINIT cpu
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
SYSINIT cpu takes roughly 2770 us:
* 2280 us in vm_ksubmap_init
  * 535 us in kmem_malloc
    * 450 us in pmap_zero_page
  * 1720 us in pmap_growkernel
    * 1620 us in pmap_zero_page
* 80 us in bufinit
* 480 us in cpu_setregs
  * 430 us in cpu_setregs calling load_cr0

Much of this is hypervisor overhead: load_cr0 is slow because it traps
to the hypervisor, and 99% of the time in pmap_zero_page is spent when
we first touch the page, presumably due to the host Linux kernel
faulting in backing pages one by one.

Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D40327
2023-06-04 10:16:35 -07:00
John Baldwin f49fd63a6a kmem_malloc/free: Use void * instead of vm_offset_t for kernel pointers.
Reviewed by:	kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36549
2022-09-22 15:09:19 -07:00
Colin Percival b9fd884a8d sys/vm: Add TSLOG to some functions
The functions pbuf_init, kva_alloc, and keg_alloc_slab are significant
contributors to the kernel boot time when FreeBSD boots inside the
Firecracker VMM.  Instrument them so they show up on flamecharts.
2022-08-12 16:53:04 -07:00
Mark Johnston 540da48d83 vm_kern: Update KMSAN shadow maps when allocating kmem memory
This addresses a couple of false positive reports for memory returned by
malloc_large().

Sponsored by:	The FreeBSD Foundation
2022-06-20 12:48:13 -04:00
Mark Johnston e4bdb6857a vm_page: Handle VM_ALLOC_NORECLAIM in the contiguous page allocator
We added _NORECLAIM to request that kmem_alloc_contig_pages() not spend
time scanning physical memory for candidates to reclaim.  In some
situations the scanning can induce large amounts of undesirable latency,
and it's less important that the request be satisfied than it is that we
not spend many milliseconds scanning.

The problem extends to vm_reserv_reclaim_contig(), which unlike
vm_reserv_reclaim() may have to scan the entire list of partially
populated reservations.  Use VM_ALLOC_NORECLAIM to request that this
scan not be executed.[1]

As a side effect, this fixes a regression in 02fb0585e7 ("vm_page:
Drop handling of VM_ALLOC_NOOBJ in vm_page_alloc_contig_domain()")
where VM_ALLOC_CONTIG was not included in VPAC_FLAGS or VPANC_FLAGS even
though it is not masked by kmem_alloc_contig_pages().[2]

Reported by:	gallatin [1], glebius [2]
Reviewed by:	alc, glebius, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32899
2021-11-11 14:26:41 -05:00
Mark Johnston a4667e09e6 Convert vm_page_alloc() callers to use vm_page_alloc_noobj().
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ.  In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.

Similarly, convert vm_page_alloc_domain() callers.

Note that callers are now responsible for assigning the pindex.

Reviewed by:	alc, hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31986
2021-10-19 21:22:56 -04:00
Mark Johnston aabe13f145 uma: Introduce per-domain reclamation functions
Make it possible to reclaim items from a specific NUMA domain.

- Add uma_zone_reclaim_domain() and uma_reclaim_domain().
- Permit parallel reclamations.  Use a counter instead of a flag to
  synchronize with zone_dtor().
- Use the zone lock to protect cache_shrink() now that parallel reclaims
  can happen.
- Add a sysctl that can be used to trigger reclamation from a specific
  domain.

Currently the new KPIs are unused, so there should be no functional
change.

Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29685
2021-04-14 13:03:34 -04:00
Mark Johnston 2b914b85dd kmem: Add KASAN state transitions
Memory allocated with kmem_* is unmapped upon free, so KASAN doesn't
provide a lot of benefit, but since allocations are always a multiple of
the page size we can create a redzone when the allocation request size
is not a multiple of the page size.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29458
2021-04-13 17:42:21 -04:00
Konstantin Belousov 89619b747b Add sysctl debug.uma_reclaim
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-04 20:39:06 +03:00
Konstantin Belousov 51a7be5f60 Style
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-04-04 20:39:06 +03:00
Mark Johnston 23e875fd97 vm_kern: Avoid sign extension in the KVA_QUANTUM definition
Otherwise, on a powerpc64 NUMA system with hashed page tables, the
first-level superpage reservation size is large enough that the value of
the kernel KVA arena import quantum, KVA_NUMA_IMPORT_QUANTUM, is
negative and gets sign-extended when passed to vmem_set_import().  This
results in a boot-time hang on such platforms.

Reported by:	bdragon
MFC after:	3 days
2021-02-22 15:50:09 -05:00
Ryan Stone 660344ca44 Add a VM flag to prevent reclaim on a failed contig allocation
If a M_WAITOK contig alloc fails, the VM subsystem will try to
reclaim contiguous memory twice before actually failing the
request.  On a system with 64GB of RAM I've observed this take
400-500ms before it finally gives up, and I believe that this
will only be worse on systems with even more memory.

In certain contexts this delay is extremely harmful, so add a flag
that will skip reclaim for allocation requests to allow those
paths to opt-out of doing an expensive reclaim.

Sponsored by: Dell Inc
Differential Revision:	https://reviews.freebsd.org/D28422
Reviewed by: markj, kib
2021-02-03 16:16:51 -05:00
Bryan Drewery 5fee468e83 Revert r368523 which fixed contig allocs waiting forever.
This needs to account for empty NUMA domains or domains which do not
satisfy the requested range.

Discussed with:	markj
2020-12-15 19:38:16 +00:00
Bryan Drewery bbfec1633b contig allocs: Don't retry forever on M_WAITOK.
This restores behavior from before domain iterators were added in
r327895 and r327896.

The vm_domainset_iter_policy() will do a vm_wait_doms() and then
restart its iterator when M_WAITOK is set.  It will also force
the containing loop to have M_NOWAIT.  So we get an unbounded
retry loop rather than the intended bounded retries that
kmem_alloc_contig_pages() already handles.

This also restores M_WAITOK to the vmem_alloc() call in
kmem_alloc_attr_domain() and kmem_alloc_contig_domain().

Reviewed by:	markj, kib
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D27507
2020-12-10 20:44:29 +00:00
Mark Johnston 431fb8abd7 vm_phys: Try to clean up NUMA KPIs
It can useful for code outside the VM system to look up the NUMA domain
of a page backing a virtual or physical address, specifically when
creating NUMA-aware data structures.  We have _vm_phys_domain() for
this, but the leading underscore implies that it's an internal function,
and vm_phys.h has dependencies on a number of other headers.

Rename vm_phys_domain() to vm_page_domain(), and _vm_phys_domain() to
vm_phys_domain().  Make the latter an inline function.

Add _vm_phys.h and define struct vm_phys_seg there so that it's easier
to use in other headers.  Include it from vm_page.h so that
vm_page_domain() can be defined there.

Include machine/vmparam.h from _vm_phys.h since it depends directly on
some constants defined there.

Reviewed by:	alc
Reviewed by:	dougm, kib (earlier versions)
Differential Revision:	https://reviews.freebsd.org/D27207
2020-11-19 03:59:21 +00:00
Mark Johnston aea9103e06 Use a large kmem arena import size on NUMA systems.
This helps minimize internal fragmentation that occurs when 2MB imports
are interleaved across NUMA domains.  Virtually all KVA allocations on
direct map platforms consume more than one page, so the fragmentation
manifests as runs of 511 4KB page mappings in the kernel.

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26050
2020-08-26 14:31:48 +00:00
Mark Johnston 7efe14cb99 Commit a missing piece of r364302.
This had failed to apply due to a merge conflict.

Reported by:	Jenkins
MFC with:	r364302
2020-08-17 14:06:51 +00:00
Mark Johnston 7dd979dfef Remove the VM map zone.
Today, the zone is only used to allocate a trio of kernel maps: the
kernel map itself, and the exec and pipe submaps.  Maps for user
processes are dynamically allocated but are embedded in the vmspace
structure, which is allocated from its own zone.  Make the
aforementioned kernel maps statically allocated and get rid of the zone.

While here, remove a stale comment above vmspace_alloc() and change the
names of locks initialized in vm_map_init() to match vmspace_zinit().

Reported by:	alc
Reviewed by:	alc, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26052
2020-08-17 13:02:01 +00:00
Mark Johnston 33655d9546 Factor out the kmem contig page alloc and reclamation code.
kmem_alloc_attr_domain() and kmem_alloc_contig_domain() duplicated each
other's page allocation and reclamation logic.  Place it in a single
function to make it easier to add additional consumers.  No functional
change intended.

Reviewed by:	jeff, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24475
2020-04-21 16:01:44 +00:00
Mark Johnston 303b77029b Minimize conditional compilation for handling of M_EXEC.
This simplifies some planned changes.  No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24474
2020-04-21 15:55:28 +00:00
Jeff Roberson e9ceb9dd11 Don't release xbusy on kmem pages. After lockless page lookup we will not
be able to guarantee that they can be racquired without blocking.

Reviewed by:	kib
Discussed with:	markj
Differential Revision:	https://reviews.freebsd.org/D23506
2020-02-19 09:10:11 +00:00
Warner Losh 58aa35d429 Remove sparc64 kernel support
Remove all sparc64 specific files
Remove all sparc64 ifdefs
Removee indireeect sparc64 ifdefs
2020-02-03 17:35:11 +00:00
Jeff Roberson 91e31c3c08 Consistently use busy and vm_page_valid() rather than touching page bits
directly.  This improves API compliance, asserts, etc.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23283
2020-01-23 04:54:49 +00:00
Jeff Roberson a81c400e75 Simplify VM and UMA startup by eliminating boot pages. Instead use careful
ordering to allocate early pages in the same way boot pages were but only
as needed.  After the KVA allocator has started up we allocate the KVA that
we consumed during boot.  This also makes the boot pages freeable since they
have vm_page structures allocated with the rest of memory.

Parts of this patch were written and tested by markj.

Reviewed by:	glebius, markj
Differential Revision:	https://reviews.freebsd.org/D23102
2020-01-16 05:01:21 +00:00
Mateusz Guzik a314aba874 vm: add missing CLTFLAG_MPSAFE annotations
This covers all vm/* files.
2020-01-12 05:08:57 +00:00
Jeff Roberson 3cf3b4e641 Make page busy state deterministic on free. Pages must be xbusy when
removed from objects including calls to free.  Pages must not be xbusy
when freed and not on an object.  Strengthen assertions to match these
expectations.  In practice very little code had to change busy handling
to meet these rules but we can now make stronger guarantees to busy
holders and avoid conditionally dropping busy in free.

Refine vm_page_remove() and vm_page_replace() semantics now that we have
stronger guarantees about busy state.  This removes redundant and
potentially problematic code that has proliferated.

Discussed with:	markj
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22822
2019-12-22 06:56:44 +00:00
Mark Johnston f822c9e287 Apply mapping protections to preloaded kernel modules on amd64.
With an upcoming change the amd64 kernel will map preloaded files RW
instead of RWX, so the kernel linker must adjust protections
appropriately using pmap_change_prot().

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21860
2019-10-18 13:56:45 +00:00
Mark Johnston 9d75f0dc75 Map the vm_page array into KVA on amd64.
r351198 allows the kernel to use domain-local memory to back the vm_page
array (up to 2MB boundaries) and reserves a separate PML4 entry for that
purpose.  One consequence of that change is that the vm_page array is no
longer present in minidumps, which only adds pages mapped above
VM_MIN_KERNEL_ADDRESS.

To avoid the friction caused by having kernel data structures mapped
below VM_MIN_KERNEL_ADDRESS, map the vm_page array starting at
VM_MIN_KERNEL_ADDRESS instead of using a dedicated PML4 entry.

Reviewed by:	kib
Discussed with:	jeff
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21491
2019-09-03 13:18:51 +00:00
Mark Johnston acad79e66f Unconditionally enable debug.vm_lowmem.
It is useful for testing purposes to be able to drain UMA caches, so
do not limit the sysctl to DIAGNOSTIC kernels.

MFC after:	1 week
Sponsored by:	Netflix
2019-08-21 16:01:17 +00:00
Mark Johnston 88ea538a98 Replace uses of vm_page_unwire(m, PQ_NONE) with vm_page_unwire_noq(m).
These calls are not the same in general: the former will dequeue the
page if it is enqueued, while the latter will just leave it alone.  But,
all existing uses of the former apply to unmanaged pages, which are
never enqueued in the first place.  No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20470
2019-06-07 18:23:29 +00:00
Konstantin Belousov 9f70117263 Eliminate adj_free field from vm_map_entry.
Drop the adj_free field from vm_map_entry_t. Refine the max_free field
so that p->max_free is the size of the largest gap with one endpoint
in the subtree rooted at p. Change vm_map_findspace so that, first,
the address-based splay is restricted to tree nodes with large-enough
max_free value, to avoid searching for the right starting point in a
subtree where all the gaps are too small. Second, when the address
search leads to a tree search for the first large-enough gap, that gap
is the subject of a splay-search that brings the gap to the top of the
tree, so that an immediate insertion will take constant time.

Break up the splay code into separate components, one for searching
and breaking up the tree and another for reassembling it. Use these
components, and not splay itself, for linking and unlinking. Drop the
after-where parameter to link, as it is computed as a side-effect of
the splay search.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	markj
Tested by:	pho
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D17794
2019-03-29 16:53:46 +00:00
Mark Johnston 2b6010705c Improve vmem tuning for platforms without a direct map.
On platforms without a direct map (i.e., platforms without
UMA_MD_SMALL_ALLOC defined), the boundary tag allocator reserves a
number of tags for use when allocating a new slab of boundary tags,
as such platforms require free boundary tags in order to allocate
boundary tags.  r327899 increased the number of boundary tags required
for a KVA allocation in the worst case, and the aforementioned
reservation was not updated accordingly.  In some cases, this could
lead to a system hang.  Fix the problem by increasing this reservation.

Also reduce KVA_QUANTUM on systems lacking superpage support.
The previous import quantum (4MB with a 4KB page size) was quite large
for systems with limited KVA, and fragmentation in kernel_arena could
cause kernel memory allocation failures even with a substantial amount
of free KVA.

Reported and tested by:	jhibbits
Reviewed by:	alc, kib
No objections:	jeff
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19337
2019-02-25 19:22:13 +00:00
Konstantin Belousov d22ff6e6a2 contigmalloc: handle M_EXEC.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19092
2019-02-07 02:00:23 +00:00
Mark Johnston 07702f72e5 Avoid specifying VM_PROT_EXECUTE in mappings from pipe_map and exec_map.
These submaps are used for mapping pipe buffers and execv() argument
strings respectively, so there's no need for such mappings to have
execute permissions.

Reported by:	jhb
Reviewed by:	alc, jhb, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17827
2018-11-06 21:57:03 +00:00
Mark Johnston 9978bd996b Add malloc_domainset(9) and _domainset variants to other allocator KPIs.
Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain.  The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted.  Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.

This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.

Discussed with:	gallatin, jeff
Reported and tested by:	pho (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17418
2018-10-30 18:26:34 +00:00
Mark Johnston 920239efde Fix some problems that manifest when NUMA domain 0 is empty.
- In uma_prealloc(), we need to check for an empty domain before the
  first allocation attempt, not after.  Fix this by switching
  uma_prealloc() to use a vm_domainset iterator, which addresses the
  secondary issue of using a signed domain identifier in round-robin
  iteration.
- Don't automatically create a page daemon for domain 0.
- In domainset_empty_vm(), recompute ds_cnt and ds_order after
  excluding empty domains; otherwise we may frequently specify an empty
  domain when calling in to the page allocator, wasting CPU time.
  Convert DOMAINSET_PREF() policies for empty domains to round-robin.
- When freeing bootstrap pages, don't count them towards the per-domain
  total page counts for now: some vm_phys segments are created before
  the SRAT is parsed and are thus always identified as being in domain 0
  even when they are not.  Then, when bootstrap pages are freed, they
  are added to a domain that we had previously thought was empty.  Until
  this is corrected, we simply exclude them from the per-domain page
  count.

Reported and tested by:	Rajesh Kumar <rajfbsd@gmail.com>
Reviewed by:	gallatin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17704
2018-10-30 17:57:40 +00:00
Mark Johnston 4c29d2de67 Refactor domainset iterators for use by malloc(9) and UMA.
Before this change we had two flavours of vm_domainset iterators: "page"
and "malloc".  The latter was only used for kmem_*() and hard-coded its
behaviour based on kernel_object's policy.  Moreover, its use contained
a race similar to that fixed by r338755 since the kernel_object's
iterator was being run without the object lock.

In some cases it is useful to be able to explicitly specify a policy
(domainset) or policy+iterator (domainset_ref) when performing memory
allocations.  To that end, refactor the vm_dominset_* KPI to permit
this, and get rid of the "malloc" domainset_iter KPI in the process.

Reviewed by:	jeff (previous version)
Tested by:	pho (part of a larger patch)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17417
2018-10-23 16:35:58 +00:00
Andrew Gallatin 30c5525b3c Allow empty NUMA memory domains to support Threadripper2
The AMD Threadripper 2990WX is basically a slightly crippled Epyc.
Rather than having 4 memory controllers, one per NUMA domain, it has
only 2  memory controllers enabled. This means that only 2 of the
4 NUMA domains can be populated with physical memory, and the
others are empty.

Add support to FreeBSD for empty NUMA domains by:

- creating empty memory domains when parsing the SRAT table,
    rather than failing to parse the table
- not running the pageout deamon threads in empty domains
- adding defensive code to UMA to avoid allocating from empty domains
- adding defensive code to cpuset to avoid binding to an empty domain
    Thanks to Jeff for suggesting this strategy.

Reviewed by:	alc, markj
Approved by:	re (gjb@)
Differential Revision:	https://reviews.freebsd.org/D1683
2018-10-01 14:14:21 +00:00
Mark Johnston 3d14a7bb43 Ensure that "domain" is initialized when vm_ndomains == 1.
Reported by:	alc
Approved by:	re (gjb)
2018-09-24 15:32:46 +00:00
Mark Johnston 969e147aff Ensure that imports into per-domain kmem arenas are KVA_QUANTUM-aligned.
The old code appears to assume that vmem_alloc() would import
size-aligned KVA chunks from the parent kernel_arena, but vmem doesn't
provide this guarantee.

Also remove the unused global RWX arena and add comments explaining why
we have per-domain arenas.

Reported by:	alc
Reviewed by:	alc, kib (previous version)
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17249
2018-09-20 18:29:55 +00:00
Mark Johnston 25ed23cfbb Change the domain selection policy in kmem_back().
Ensure that pages backing the same virtual large page come from the
same physical domain, as kmem_malloc_domain() does.

PR:		231038
Reviewed by:	alc, kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17248
2018-09-20 15:45:12 +00:00
Mark Johnston 1aed6d48a8 Move kernel vmem arena initialization to vm_kern.c.
This keeps the initialization coupled together with the kmem_* KPI
implementation, which is the main user of these arenas.

No functional change intended.

Reviewed by:	alc
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17247
2018-09-19 19:13:43 +00:00
Alan Cox 49bfa624ac Eliminate the arena parameter to kmem_free(). Implicitly this corrects an
error in the function hypercall_memfree(), where the wrong arena was being
passed to kmem_free().

Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are
mapped in kmem with execute permissions.  Use this flag to determine which
arena the kmem virtual addresses are returned to.

Eliminate UMA_SLAB_KRWX.  The introduction of VPO_KMEM_EXEC makes it
redundant.

Update the nearby comment for UMA_SLAB_KERNEL.

Reviewed by:	kib, markj
Discussed with:	jeff
Approved by:	re (marius)
Differential Revision:	https://reviews.freebsd.org/D16845
2018-08-25 19:38:08 +00:00
Alan Cox 83a90bffd8 Eliminate kmem_malloc()'s unused arena parameter. (The arena parameter
became unused in FreeBSD 12.x as a side-effect of the NUMA-related
changes.)

Reviewed by:	kib, markj
Discussed with:	jeff, re@
Differential Revision:	https://reviews.freebsd.org/D16825
2018-08-21 16:43:46 +00:00