LINKER_LOAD_FILE() calls linker_load_dependencies() which will return
EEXIST in case the module to be loaded has already been compiled into
the kernel. Since the format of the module is now recognized then there
is no need to retry loading with a different linker, otherwise the
userland will get misleading error number ENOEXEC.
PR: 274936
Reviewed by: dfr
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D42474
Since newnfs_copycred() calls crsetgroups() which in turn calls
crextend() which might do a malloc(M_WAITOK), newnfs_copycred()
cannot be called with a mutex held. Fortunately, the malloc()
call is rarely done, since XU_GROUPS is 16 and the NFS client
uses a maximum of 17 (only 17 groups will cause the malloc() to
be called). Further, it is only a problem if the malloc() tries
to sleep(). As such, this bug does not seem to have caused
problems in practice.
This patch fixes the one place in the NFS client where
newnfs_copycred() is called while a mutex is held by moving the
call to after where the mutex is released.
Found by inspection while working on an experimental patch.
MFC after: 2 weeks
This should make crash reports a bit more useful without having to ask
for additional information.
Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42465
Commit 4692906480 made e6000sw's
implementation of miibus_(read|write)reg assume that the softc lock is
held. I presume that is to avoid lock recursion in e6000sw_attach() ->
e6000sw_attach_miibus() -> mii_attach() -> MIIBUS_READREG().
However, the lock assertion in e6000sw_readphy_locked() can fail if a
different driver uses the interface to probe registers. Work around the
problem by providing implementations which lock the softc if it is not
already locked.
PR: 274795
Fixes: 4692906480 ("e6000sw: add readphy and writephy wrappers")
Reviewed by: kp, imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42466
Start reporting nvme errors from devices, like we report ata and scsi
errors.
Sponsored by: Netflix
Reviewed by: mav, jhb
Differential Revision: https://reviews.freebsd.org/D41086
As part of transaction group commit, dsl_pool_sync() sequentially calls
dsl_dataset_sync() for each dirty dataset, which subsequently calls
dmu_objset_sync(). dmu_objset_sync() in turn uses up to 75% of CPU
cores to run sync_dnodes_task() in taskq threads to sync the dirty
dnodes (files).
There are two problems:
1. Each ZVOL in a pool is a separate dataset/objset having a single
dnode. This means the objsets are synchronized serially, which
leads to a bottleneck of ~330K blocks written per second per pool.
2. In the case of multiple dirty dnodes/files on a dataset/objset on a
big system they will be sync'd in parallel taskq threads. However,
it is inefficient to to use 75% of CPU cores of a big system to do
that, because of (a) bottlenecks on a single write issue taskq, and
(b) allocation throttling. In addition, if not for the allocation
throttling sorting write requests by bookmarks (logical address),
writes for different files may reach space allocators interleaved,
leading to unwanted fragmentation.
The solution to both problems is to always sync no more and (if
possible) no fewer dnodes at the same time than there are allocators
the pool.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Closes#15197
sys/cam/cam.h includes opt_cam.h, so none of the clients need to do
this. cam.h does all the right dancing to conditionally include
opt_cam.h only when it makes sense. It generally only matters when
cam_debug.h is included (it must be included before that). Many of the
stray opt_cam.h includes were after cam_debug.h which would be a problem
were it not included in cam/cam.h. The other users of CAM options that
aren't debug all already include cam/cam.h.
Also trim unneeded sys/cdefs.h files from the files touched.
Sponsored by: Netflix
Mainly, provide a little more detail on the caller's responsibilities.
Suggested by: kib, jhb
Reviewed by: kib, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42458
KIOXIA CD8 SSDs routinely take ~25 seconds to delete non-empty
namespace. In some cases like hot-plug it takes longer, triggering
timeout and controller resets after just 30 seconds. Linux for many
years has separate 60 seconds timeout for admin queue. This patch
does the same. And it is good to be consistent.
Sponsored by: iXsystems, Inc.
Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42454
As VM_LAST was included in the array, the size check had to always pass.
While here modernize the assert itself.
Sponsored by: Rubicon Communications, LLC ("Netgate")
If we fail to find the pfctl family we should not attempt to make the
call. That means that either pf is not loaded, or it's a very old (i.e.
pre-netlink) version.
Reported by: manu
Sponsored by: Rubicon Communications, LLC ("Netgate")
Current cleanup code assumes that all the fields are allocated and/or setup by
the time cleanup is called, but this is not always true: a failure in mid-setup
of the device will cause the functions to be called with possibly uninitialized
fields.
Fix the functions to cope with such sate, while also attempting to make the
cleanup idempotent.
Finally fix an error path during setup that would not mark the device as
closed, and hence prevents the kernel from finishing booting.
Fixes: 96375eac94 ("xen-netfront: add multiqueue support")
Sponsored by: Citrix Systems R&D
The current sizing of the array used to store grant table frames is broken, as
the calculation:
max_nr_glist_frames = (boot_max_nr_grant_frames *
GREFS_PER_GRANT_FRAME /
(PAGE_SIZE / sizeof(grant_ref_t)));
Is plain bogus, for once grant_ref_t is the type of the grant reference, but
not the entry used to store such references in the grant frames. But even if
the above calculation is switched to use grant_entry_v1_t, it would end up as:
max_nr_glist_frames = (boot_max_nr_grant_frames *
(PAGE_SIZE / sizeof(grant_entry_v1_t)) /
(PAGE_SIZE / sizeof(grant_entry_v1_t)));
Which is pointless (note GREFS_PER_GRANT_FRAME has been expanded to (PAGE_SIZE
/ sizeof(grant_entry_v1_t))).
Just use boot_max_nr_grant_frames directly to size the grant table frames
array.
Fixes: 30d1eefe39 ("Import OS interfaces to Xen services.")
Sponsored by: Citrix Systems R&D
The tun interface triggers the bpf hook when a packet is transmitted,
the tap interface triggers it when the packet is read from the
character device. This is inconsistent.
So fix the tap device such that it behaves like the tun device.
This is needed for adding support for the tap device to packetdrill.
Reviewed by: kevans, rscheff
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D42467
This patch allows the IPPROTO_UDPLITE-level socket options
UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV to be used on
AF_INET6 sockets in addition to AF_INET sockets.
Reviewed by: ae, rscheff
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42430
The very few places that rely on malloc/calloc of a zero-size region
won't attempt to dereference it, so just return NULL rather than rolling
the dice with the underlying malloc implementation.
Reported by: brooks, Shawn Webb
We have not had gdb 6.1 in the base system for some time; there is no
need to check for it.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34478
The previous code would correctly parse strings including quotation
marks (") or backslash (/), but the tests when creating the export
includes them in the final string. This prevents exporting paths
with embedded spaces, for example "/exports/with space". Trying
results in log lines resembling:
mountd[1337]: bad exports list line '/exports/with\ space':
/exports/with\ space: lstat() failed: No such file or directory.
Turns out that when creating its exports list, zfs escapes strings
in a format compatible with vis(3). Since I expect that zfs sharenfs
is the dominating use case for generating an exports list, use
strunvis(3) to parse the export path. The result is lines like the
following allowing spaces:
/exports/with\040space -network 192.168.0 -mask 255.255.255.0
A man page update will be done as a separate commit.
MFC after: 1 month
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D42432
The indirection used by bsd.progs.mk is setting META_XTRAS
means the value needs to be passed in the environment to
gendirdeps.mk, as any expansion before then will be empty.
Remove a now misleading comment from bsd.progs.mk
before it includes bsd.prog.mk
Update gendirdeps.mk to accommodate this.
Reviewed by: stevek
When the cross-mount walking logic in vfs_lookup() was factored into
a separate function, the main cross-mount traversal loop was changed
from a do...while loop conditional on the current vnode having
VIRF_MOUNTPOINT set to an unconditional for(;;) loop. For the
unionfs 'crosslock' case in which the vnode may be re-locked, this
meant that continuing the loop upon finding inconsistent
v_mountedhere state would no longer branch to a check that the vnode
is in fact still a mountpoint. This would in turn lead to over-
iteration and, for INVARIANTS builds, a failed assert on the next
iteration.
Fix this by restoring the previous loop behavior.
Reported by: pho
Tested by: pho
Fixes: 80bd5ef070
MFC after: 1 week
This feature is marked as ZFEATURE_FLAG_READONLY_COMPAT and so
irrelevant for read-only pool imports by the loader:
"com.delphix:spacemap_v2"
This should cause no functional changes, just a code cleanup.
I'm sorry, missed it in previous commit.
MFC after: 2 months
Add compat.aarch32 tunables for maxssiz, maxdsiz, and maxvmem.
Set the default values same as for amd64.
Fix freebsd32 sysentvec on arm64 to provide sv_maxssiz, and sv_fixlimit.
PR: 274705
Reviewed by: markj
Tested by: fuz
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42451
These features are marked as ZFEATURE_FLAG_READONLY_COMPAT and so
irrelevant for read-only pool imports by the loader:
"com.datto:resilver_defer",
"com.delphix:obsolete_counts",
"com.delphix:spacemap_histogram",
"com.delphix:zpool_checkpoint",
"com.intel:allocation_classes",
"org.zfsonlinux:allocation_classes"
This should cause no functional changes, just a code cleanup.
MFC after: 2 months
Add AMD Zen 4 (znver4) to the list of valid "Intel x86 CPU types"
Reviewed by: emaste
Approved by: emaste
Differential Revision: https://reviews.freebsd.org/D41518
When the scheduler is stopped, mtx_unlock() turns into a no-op, so the
loop
while (mtx_owned(&Giant))
mtx_unlock(&Giant);
runs forever if the calling thread has Giant locked.
Reviewed by: mhorne
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42460
When IPv6 support was added to pfsync, PFSYNC_MINPKT increased such that
we always allocate enough space for either IPv4 or IPv6 headers. IPv6
headers are 20 bytes larger than IPv4 headers. When pfsync_sendout()
does its thing, it ends up allocating enough space for either; thus when
transmitting an IPv4 packet, the last 20 bytes of the buffer are left
uninitialized.
Fix the problem by stashing the length in a local variable and adjusting
it depending on the address family in use.
While here, just zero the entire buffer in one go rather than being
careful to initialize each subheader. This seems simpler and less error
prone.
Reported by: KMSAN
Reviewed by: kp
Fixes: 6fc7fc2dbb ("pfsync: transport over IPv6")
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42461
Otherwise a KMSAN report (which panics the system by default) could
trigger a recursive panic.
MFC after: 1 week
Fixes: ca6cd604c8 ("kmsan: Use the correct origin bytes in kmsan_check_arg()")
net/frr[89] revealed an interesting edge-case on arm when dynamically
linking a shared library that declares more than one static TLS variable
with at least one using the "initial-exec" TLS model. In the case
of frr[89], this library was libfrr.so which essentially does the
following:
#include <stdio.h>
#include "lib.h"
static __thread int *a
__attribute__((tls_model("initial-exec")));
void lib_test()
{
static __thread int b = -1;
printf("&a = %p\n", &a);
printf(" a = %p\n", a);
printf("\n");
printf("&b = %p\n", &b);
printf(" b = %d\n", b);
}
Allocates a file scoped `static __thread` pointer with
tls_model("initial-exec") and later a block scoped TLS int. Notice in
the above minimal reproducer, `b == -1`. The relocation process does
the wrong thing and ends up pointing both `a` and `b` at the same place
in memory.
The output of the above in the broken state is:
&a = 0x4009c018
a = 0xffffffff
&b = 0x4009c018
b = -1
With the patch applied, the output becomes:
&a = 0x4009c01c
a = 0x0
&b = 0x4009c018
b = -1
Reviewed by: kib
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D42415/
Add an implementation of ieee80211_add_vhtcap() which works based on
information derived from the vap (and possibly channel/band but we do
not support that yet in net80211). This is needed for scans request
information in LinuxKPI at times before we have a BSS.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: adrian, cc
Differential Revision: https://reviews.freebsd.org/D42422