fork1() does not behave if called under Giant. For instance, it might
need to call thread_suspend_check() which explicitly verifies that Giant
is not locked. On the other hand, the kthread KPI is often called from
SYSINIT() which is still Giant-locked.
Correct this by dropping Giant in kthread_add() and kproc_create().
Reported by: pho
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41694
This vnet loader tunable is defined with SYSCTL_PROC, thus will not be
initialized by kernel on vnet creating and will always have the default
value TCP_FASTOPEN_CCACHE_BUCKET_LIMIT_DEFAULT.
Fix by fetching the value from the corresponding kernel environment during
vnet constructing.
PR: 273509
Reviewed by: #transport, tuexen
Fixes: c560df6f12 This is an implementation of the client side of TCP Fast Open (TFO) [RFC7413]
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D41691
When run in a jail, /dev/mdctl is missing. So skip any tests that use
mdconfig or mdmfs with md in this case: they can't possibly work. This
is in line with other tests that test for presence of required features
and skip if they aren't present. I did this instead of checking for
jails so they can still run in jails that allow creation of md devices.
Sponsored by: Netflix
Overflow in cache_changesize would make the value flip to 0 and stay
there as 0 << 1 does not do anything.
Note callers limit the outcome to something below u_int.
Also note there entire vnode handling thing both in vfs layer as a whole
and this file can't decide whether to long, u_long or u_int.
Notable upstream pull request merges:
#15018 Increase limit of redaction list by using spill block
#15161 Make zoned/jailed zfsprops(7) make more sense
#15216 Relax error reporting in zpool import and zpool split
#15218 Selectable block allocators
#15227 ZIL: Tune some assertions
#15228 ZIL: Revert zl_lock scope reduction
#15233 ZIL: Change ZIOs issue order
Obtained from: OpenZFS
OpenZFS commit: 95f71c019d
ZFS historically has had several space allocators that were
dynamically selectable. While these have been retained in
OpenZFS, only a single allocator has been statically compiled
in. This patch compiles all allocators for OpenZFS and provides
a module parameter to allow for manual selection between them.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Closes#15218
For zpool import and zpool split, zpool_enable_datasets is called
to mount and share all datasets in a pool. If there is an error
while mounting or sharing any dataset in the pool, the status of
import or split is reported as failure. However, the changes do
show up in zpool list.
This commit updates the error reporting in zpool import and zpool
split path. More descriptive messages are shown to user in case
there is an error during mount or share. Errors in mount or share
do not effect the overall status of zpool import and zpool split.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#15216
If we fail to create a proc entry in spl_proc_init() we may end up
calling unregister_sysctl_table() twice: one in the failure path of
spl_proc_init() and another time during spl_proc_fini().
Avoid the double call to unregister_sysctl_table() and while at it
refactor the code a bit to reduce code duplication.
This was accidentally introduced when the spl code was
updated for Linux 6.5 compatibility.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Closes#15234Closes#15235
In zil_lwb_write_issue(), after issuing lwb_root_zio/lwb_write_zio,
we have no right to access lwb->lwb_child_zio. If it was not there,
the first two ZIOs may have already completed and freed the lwb.
ZIOs issue in opposite order from children to parent should keep
the lwb valid till the end, since the lwb can be freed only after
lwb_root_zio completion callback.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#15233
While I have no reports of it, I suspect possible use-after-free
scenario when zil_commit_waiter() tries to dereference zcw_lwb
for lwb already freed by zil_sync(), while zcw_done is not set.
Extension of zl_lock scope as it was originally should block
zil_sync() from freeing the lwb, closing this race.
This reverts #14959 and couple chunks of #14841.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#15228
In zil_free_lwb() we should first assert lwb_state or the rest of
assertions can be misleading if it is false.
Add lwb_state assertions in zil_lwb_add_block() to make sure we are
not trying to add elements to lwb_vdev_tree after it was processed.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#15227
Switch to using db_addr_t to hold frame pointer values until they are
verified to be suitably aligned.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D41532
If a parent downstream port doesn't support ARI, the code would try to
create VFs anyway but then all PCI config space access to those VFs
would fail.
Tested by: np
Sponsored by: Chelsio Communications
While here, also update a mention of ANSI C.
Sponsored by: Klara, Inc.
Reviewed by: kevans, markj
Differential Revision: https://reviews.freebsd.org/D41686
This option replaces WITH_INIT_ALL_PATTERN and WITH_INIT_ALL_ZERO with
INIT_ALL=pattern and INIT_ALL=zero respectively. As these are
relatively rarely used options no backwards compatibility is
implemented.
Reviewed by: emaste
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D41675
This will enable alternative mallocs to be included in the tree and
selected by setting LIBC_MALLOC. As there is only one today (jemalloc)
this option does nothing, but we expect to add other implementations
in the future. This will also reduce diffs to CheriBSD.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D41660
Ignore OPT_* values in showconfig out in exising code paths and add
a new path to include descriptions for each. For now, hardcode the
description contents rather than attempting to generate it. This runs
the risk of docs getting out of date, limits the amount of new shell
code added today while a lua rewrite is nearly ready to land.
This change requires a followup commit to enable OPT_* values in
"make showconfig" in order to actually find group options.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D41681
Support group options where 1 of n values will be selected (or a default
value will be used). After processing, an OPT_FOO will be set to one
value from __FOO_OPTIONS for each FOO in __SINGLE_OPTIONS. If the user
sets FOO that value will be used, otherwise __FOO_DEFAULT will be used.
Options that don't work an a particular system can be remapped to an
alternative using BROKEN_SINGLE_OPTIONS which can be set to a list of
3-tuples of the form:
OPTION broken_value replacement_value
This is somewhat inspired by OPTIONS_SINGLE from ports, but the
structure is quite different with a per-option variable in the style of
MK_FOO={yes,no}.
Reviewed by: imp, emaste
Differential Revision: https://reviews.freebsd.org/D41659
Now armv8_crypto is using FPU_KERN_NOCTX, this results in a kernel panic
in armv8_crypto.c:armv8_crypto_cipher_setup:
panic: recursive fpu_kern_enter while in PCB_FP_NOSAVE state
This is because in armv8_crypto.c:armv8_crypto_cipher_process,
directly after calling fpu_kern_enter() a call is made to
armv8_crypto_cipher_setup(), resulting in nested calls to
fpu_kern_enter() without the required fpu_kern_leave() in between.
Move fpu_kern_enter() in armv8_crypto_cipher_process() after the
call to armv8_crypto_cipher_setup() to resolve this.
Reviewed by: markj, andrew
Fixes: 6485286f53 ("armv8_crypto: Switch to using FPU_KERN_NOCTX")
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D41671
The GICv3 ITS device supports two options for device tables. Currently
we support a single table to hold all device IDs, however when the
device ID space grows large this can be too large for the GITS_BASER
register to describe.
To handle this case, and to reduce the memory needed when this space
is sparse support the second option, the indirect table. The indirect
table is a 2 level table where the first level contains the physical
address of the second with a valid bit. The second level is an ITS
page sized table where each entry is the original entry size.
As we don't need to allocate a second level table for devices IDs that
don't exist this can reduce the allocation size.
Reviewed by: gallatin
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D41555
We currently use the same Arm generic time in both userspace and the
kernel. As we always enable userspace access to the virtual timer we
can tell userspace to use it.
Reviewed by: imp
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D41565
Prior to this commit privileged accounts in a jail could not access to the
filesystem extended attributes in the system namespace. To control access to
the system namespace in a per-jail basis add a new configuration parameter
allow.extattr which is off by default.
Reported by: zirias
Tested by: zirias
Obtained from: HardenedBSD
Reviewed by: kevans, jamie
Differential revision: https://reviews.freebsd.org/D41643
MFC after: 1 week
Relnotes: yes
FreeBSD does not permits manipulating extended attributes in the system
namespace by unprivileged accounts, even if account has appropriate
privileges to access filesystem object.
In Linux the system namespace is used to preserve posix acls. Some Gnu
coreutils binaries uses posix acls, eg, install, ls. And fails if we
unexpectedly return EPERM error from xattr system calls.
In the other hands, in Linux read and write access to the system
namespace depend on the policy implemented for each filesystem, so we'll
mimics we're a filesystem that prohibits this for unpriveleged accounts.
Reported by: zirias
Tested by: zirias
MFC after: 1 week
On Linux ENODATA mean the named attribute does not exist, or the
process has no access to this attribute.
Reported by: zirias
Tested by: zirias
MFC after: 1 week
After 2c10be9e06, we may jump to the bad_far label without `pcb` being
set, resulting in a follow-up fault as we may dereference it immediately
after the jump if td_intr_nesting_level == 0. In this branch, it should
be safe to dereference `td` as we're not handling the special case
mentioned below of accessing it during promotion/demotion.
This seems to fix a null ptr deref I hit during my most recent pkgbase
build attempt on the Windows DevKit, though that was admittedly
encountered while we were on the way to a panic from an apparent
use-after-free in ZFS bits.
Reviewed by: andrew, markj
Fixes: 2c10be9e06 ("arm64: Handle translation faults for thread [..]")
Differential Revision: https://reviews.freebsd.org/D41677
Building module/zfs/dbuf.c for 32-bit targets can result in a warning:
In file included from
/usr/src/sys/contrib/openzfs/include/sys/zfs_context.h:97,
from /usr/src/sys/contrib/openzfs/module/zfs/dbuf.c:32:
/usr/src/sys/contrib/openzfs/module/zfs/dbuf.c: In function
'dmu_buf_will_clone':
/usr/src/sys/contrib/openzfs/lib/libspl/include/assert.h:116:33: error:
cast from pointer to integer of different size
[-Werror=pointer-to-int-cast]
116 | const uint64_t __left = (uint64_t)(LEFT);
\
| ^
/usr/src/sys/contrib/openzfs/lib/libspl/include/assert.h:148:25: note:
in expansion of macro 'VERIFY0'
148 | #define ASSERT0 VERIFY0
| ^~~~~~~
/usr/src/sys/contrib/openzfs/module/zfs/dbuf.c:2704:9: note: in
expansion of macro 'ASSERT0'
2704 | ASSERT0(dbuf_find_dirty_eq(db, tx->tx_txg));
| ^~~~~~~
This is because dbuf_find_dirty_eq() returns a pointer, which if
pointers are 32-bit results in a warning about the cast to uint64_t.
Instead, use the ASSERT3P() macro, with == and NULL as second and third
arguments, which should work regardless of the target's bitness.
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes#15224
The values of WITH_ and WITHOUT_ options are ignored, but group options
are not.
Reviewed by: imp, emaste
Differential Revision: https://reviews.freebsd.org/D41683
strverscmp(3) and versionsort(3) where first released in 13.2
PR: 273401
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D41617