Use u_long for memory accesses instead of uint32_t. On my tests on
amd64 this by ~30% reduces time spent in those functions thanks to
bigger 64bit accesses. i386 still uses 32bit accesses.
MFC after: 1 month
The underlying issue that originally triggered a kernel panic was
addressed and the fix was ported to all relevant pmaps, so the
safeguards placed in vm_fault.c can be removed now.
Reviewed by: alc, kib, markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D42517
When enabled, pretend that the IPv4 and transport layer checksum
is correct for packets injected via the character device.
This is a prerequisite for adding support for LRO, which will
be added next. Then packetdrill can be used to test the LRO
code in local mode.
Reviewed by: rscheff, zlei
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D42477
This allows us to figure out how many states each hashrow contains. That
can be important to know when debugging performance issues.
A simple probe could be:
dtrace -n 'pf:purge:state:rowcount { @counts["states per row"] = quantize(arg1); }'
dtrace: description 'pf:purge:state:rowcount ' matched 1 probe
^C
states per row
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 8257624
1 | 14321
2 | 0
MFC after: 1 week
Sponsored by: Modirum MDPay
This unbreaks clang-tblgen build against the host pseudo platform.
Sponsored by: Juniper Networks, Inc.
MFC after: 3 days
Reviewed by: sjg
Differential Revision: https://reviews.freebsd.org/D42481
Current L2ARC write rate and headroom parameters are very conservative:
l2arc_write_max=8M and l2arc_headroom=2 (ie: a full L2ARC writes at
8 MB/s, scanning 16/32 MB of ARC tail each time; a warming L2ARC runs
at 2x these rates).
These values were selected 15+ years ago based on then-current SSDs
size, performance and endurance. Today we have multi-TB, fast and
cheap SSDs which can sustain much higher read/write rates.
For this reason, this patch increases l2arc_write_max to 32M and
l2arc_headroom to 8 (4x increase for both).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes#15457
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes#15504
CAM_NVME_STATUS and CAM_REQ_SOFTTIMEOUT were missing, though the latter
hasn't been used yet. The former is being used and showing up in dmesg
output as Unknown 0x420.
Fixes: f564de00f7
Fixes: 774ab87cf2
Sponsored by: Netflix
If pf_ioctl_addrule() returns an error it will have freed the rule
itself. There's no need for the caller to free it again.
PR: 274915
Reported by: Dave Cottlehuber <dch@FreeBSD.org>
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Private read/write mapping can't be used to modify the mapped files, so
they will remain be immutable. Private read/write mappings are usually
used to load the data segment of executable files, rejecting them will
rendering immutable executable files to stop working.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes#15344
As shown in #15404#issuecomment-1765002181, Ubuntu kernel has
'Provides: zfs-dkms', which will cause uninstall of the kernel, when
attempting to install openzfs-zfs-dkms.
As a workaround remove the 'Conflicts: zfs-dkms' definition from
the debian control file.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes#15503
This feature allows disks to be added one at a time to a RAID-Z group,
expanding its capacity incrementally. This feature is especially useful
for small pools (typically with only one RAID-Z group), where there
isn't sufficient hardware to add capacity by adding a whole new RAID-Z
group (typically doubling the number of disks).
== Initiating expansion ==
A new device (disk) can be attached to an existing RAIDZ vdev, by
running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank
raidz2-0 sda`. The new device will become part of the RAIDZ group. A
"raidz expansion" will be initiated, and the new device will contribute
additional space to the RAIDZ group once the expansion completes.
The `feature@raidz_expansion` on-disk feature flag must be `enabled` to
initiate an expansion, and it remains `active` for the life of the pool.
In other words, pools with expanded RAIDZ vdevs can not be imported by
older releases of the ZFS software.
== During expansion ==
The expansion entails reading all allocated space from existing disks in
the RAIDZ group, and rewriting it to the new disks in the RAIDZ group
(including the newly added device).
The expansion progress can be monitored with `zpool status`.
Data redundancy is maintained during (and after) the expansion. If a
disk fails while the expansion is in progress, the expansion pauses
until the health of the RAIDZ vdev is restored (e.g. by replacing the
failed disk and waiting for reconstruction to complete).
The pool remains accessible during expansion. Following a reboot or
export/import, the expansion resumes where it left off.
== After expansion ==
When the expansion completes, the additional space is available for use,
and is reflected in the `available` zfs property (as seen in `zfs list`,
`df`, etc).
Expansion does not change the number of failures that can be tolerated
without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after
expansion).
A RAIDZ vdev can be expanded multiple times.
After the expansion completes, old blocks remain with their old
data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but
distributed among the larger set of disks. New blocks will be written
with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been
expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ
vdev's "assumed parity ratio" does not change, so slightly less space
than is expected may be reported for newly-written blocks, according to
`zfs list`, `df`, `ls -s`, and similar tools.
Sponsored-by: The FreeBSD Foundation
Sponsored-by: iXsystems, Inc.
Sponsored-by: vStack
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Authored-by: Matthew Ahrens <mahrens@delphix.com>
Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com>
Contributions-by: Stuart Maybee <stuart.maybee@comcast.net>
Contributions-by: Thorsten Behrens <tbehrens@outlook.com>
Contributions-by: Fmstrat <nospam@nowsci.com>
Contributions-by: Don Brady <dev.fs.zfs@gmail.com>
Signed-off-by: Don Brady <dev.fs.zfs@gmail.com>
Closes#15022
When print-ip-demux.c was introduced on ee67461e, the pfsync_ip_print
function was missed, causing tcpdump to treat pfsync packets on network
interfaces as an unknown protocol.
MFC after: 1 week
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D42504
Let pmap_enter_l2() create wired mappings. In particular, allocate a
leaf PTP for use during demotion. This is the last pmap which requires
such a change ahead of reverting commit 64087fd7f3.
Reviewed by: markj
Sponsored by: Google, Inc. (GSoC 2023)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D41633
Since ingenic JZ4780 SOC support has been removed there is no need
to support ingenic quirks in the UART driver.
Invert of commit b192bae67e
Reviewed by: imp, manu
Approved by: imp, manu (mentor)
Differential Revision: https://reviews.freebsd.org/D42497
With Linux v6.6.0 and GCC 12, when debug build is configured,
implicit conversion error is raised while converting
'enum <anonymous>' to 'boolean_t'. Use 'B_TRUE' instead of
'true' to fix the issue.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#15489
Add a ZFS feature flag to indicate OpenZFS availability.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gordon Tetlow <gordon@freebsd.org>
Closes#15484
When retrieving a system attribute, the size of the supplied
buffer is ignored. If the buffer is too small to hold the attribute,
sa_attr_op() will write past the end of the buffer.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason King <jking@racktopsystems.com>
Closes#15476
PR#15459 add all read-only compatible zpool features to grub2
compatibility list. 'obsolete_counts' is a read-only features that
depends on 'device_removal' feature which is not read-only and
is marked as ZFEATURE_FLAG_MOS. Creating a pool with grub2
compatibility enables 'device_removal' feature as well, which is
not desired.
This commit removes the 'obsolete_counts' feature from
grub2 compatibility list, as GRUB only supports read-only
compatible features.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#15499
This ensures that certificate files or bundles with DOS or Mac line
endings are recognized as such and handled identically to those with
Unix line endings.
PR: 274952
Reviewed by: allanjude
Differential Revision: https://reviews.freebsd.org/D42490
Previously taskq_init_ent() was an empty macro, while actual init
was done by taskq_dispatch_ent(). It could be slightly faster in
case taskq never enqueued. But without it taskq_empty_ent() relied
on the structure being zeroed by somebody else, that is not good.
As a side effect this allows the same task to be queued several
times, that is normal on FreeBSD, that may or may not get useful
here also one day.
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#15455
- Use sbuf_new_for_sysctl() to reduce double-buffering on sysctl
output.
- Use much faster sbuf_cat() instead of sbuf_printf("%s").
Together it reduces `sysctl kstat.zfs.misc.dbufs` time from minutes
to seconds, making dbufstat almost usable.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#15495
Add a dataset_kstats_rename function, and call it when renaming
a zvol on FreeBSD and Linux.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes#15482Closes#15486
Alien does not honour the %posttrans hook.
So move the dkms uninstall/install scripts to the
%pre/%post hooks in case of package install/upgrade.
In case of package removal, handle that in %preun.
Add removal of all old dkms modules.
Add checking for broken 'dkms status'. Handle that as
good as possible and warn the user about it.
Also add more verbose messages about what we are doing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes#15415
Otherwise callbacks may trigger KMSAN violations in the dlen == 0 case.
For example, raidz_syn_pq_abd() will compare an uninitialized pointer
with itself before returning. This seems harmless, but let's maintain
good hygiene and avoid passing uninitialized variables, if only to
placate KMSAN.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#15491
There have been rare cases where the VDEV_ENC_SYSFS_PATH value that zed
gets passed is stale. To mitigate this, dynamically check the sysfs
path at the time of zed event processing, and use the dynamic value if
possible. Note that there will be other times when we can not
dynamically detect the sysfs path (like if a disk disappears) and have
to rely on the old value for things like turning on the fault LED. That
is to say, we can't just blindly use the dynamic path in every case.
Also:
- Add enclosure sysfs entry when running 'zpool add'
- Fix 'slot' and 'enc' zpool.d scripts for nvme
Reviewed-by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#15462
Currently vdev_queue_class_length is responsible for checking how long
the queue length is, however, it doesn't check the length when a list
is used, rather it just returns whether it is empty or not. To fix this
I added a counter variable to vdev_queue_class to keep track of the sync
IO ops, and changed vdev_queue_class_length to reference this variable
instead.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: MigeljanImeri <ImeriMigel@gmail.com>
Closes#15478
Two additional stdio changes followed 86a16ada1e and need to be
reverted as part of the fflush fix.
This reverts commit 6e13794fbe.
This reverts commit bafaa70b6f.
Fixes: d09a3bf72c ("fflush: correct buffer handling in __sflush")
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42491
errno.h was added in 44cf1e5eb4, which has been reverted.
Fixes: d09a3bf72c ("fflush: correct buffer handling in __sflush")
Sponsored by: The FreeBSD Foundation
Preemptively address a collision with LIBFDT (to be added in the future)
from src.libnames.mk, which gets included via bsd.progs.mk. No
functional change intended.
Reviewed by: imp
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D42486
This fixes CVE-2014-8611 correctly.
The commit that purported to fix CVE-2014-8611 (805288c2f0) only hid
it behind another bug. Two later commits, 86a16ada1e and
44cf1e5eb4, attempted to address this new bug but mostly just confused
the issue. This commit rolls back the three previous changes and fixes
CVE-2014-8611 correctly.
The key to understanding the bug (and the fix) is that `_w` has
different meanings for different stream modes. If the stream is
unbuffered, it is always zero. If the stream is fully buffered, it is
the amount of space remaining in the buffer (equal to the buffer size
when the buffer is empty and zero when the buffer is full). If the
stream is line-buffered, it is a negative number reflecting the amount
of data in the buffer (zero when the buffer is empty and negative buffer
size when the buffer is full).
At the heart of `fflush()`, we call the stream's write function in a
loop, where `t` represents the return value from the last call and `n`
the amount of data that remains to be written. When the write function
fails, we need to move the unwritten data to the top of the buffer
(unless nothing was written) and adjust `_p` (which points to the next
free location in the buffer) and `_w` accordingly. These variables have
already been set to the values they should have after a successful
flush, so instead of adjusting them down to reflect what was written,
we're adjusting them up to reflect what remains.
The bug was that while `_p` was always adjusted, we only adjusted `_w`
if the stream was fully buffered. The fix is to also adjust `_w` for
line-buffered streams. Everything else is just noise.
Fixes: 805288c2f0
Fixes: 86a16ada1e
Fixes: 44cf1e5eb4
Sponsored by: Klara, Inc.
1. Suppose that linux_queue_delayed_work_on() is called with
non-zero delay and found the work.state WORK_ST_IDLE. It
resets the state to WORK_ST_TIMER and locks timer.mtx. Now, if
linux_cancel_delayed_work_sync() was also called meantime, read
state as WORK_ST_TIMER and already taken the mutex, it is executing
callout_stop() on non-armed callout. Then linux_queue_delayed_work_on()
continues and schedules callout. But the return value from cancel() is
false, making it possible to the requeue from callback to slip in.
2. If linux_cancel_delayed_work_sync() returned true, we need to cancel
again. The requeue from callback could have revived the work.
The end result is that we schedule callout that might be freed, since
cancel_delayed_work_sync() claims that everything was stopped. This
contradicts the way the KPI is used in Linux, where consumers expect
that cancel_delayed_work_sync() is reliable on its own.
Reviewed by: markj
Discussed with: bz
Sponsored by: NVidia networking
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42468
LINKER_LOAD_FILE() calls linker_load_dependencies() which will return
EEXIST in case the module to be loaded has already been compiled into
the kernel. Since the format of the module is now recognized then there
is no need to retry loading with a different linker, otherwise the
userland will get misleading error number ENOEXEC.
PR: 274936
Reviewed by: dfr
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D42474
Since newnfs_copycred() calls crsetgroups() which in turn calls
crextend() which might do a malloc(M_WAITOK), newnfs_copycred()
cannot be called with a mutex held. Fortunately, the malloc()
call is rarely done, since XU_GROUPS is 16 and the NFS client
uses a maximum of 17 (only 17 groups will cause the malloc() to
be called). Further, it is only a problem if the malloc() tries
to sleep(). As such, this bug does not seem to have caused
problems in practice.
This patch fixes the one place in the NFS client where
newnfs_copycred() is called while a mutex is held by moving the
call to after where the mutex is released.
Found by inspection while working on an experimental patch.
MFC after: 2 weeks
This should make crash reports a bit more useful without having to ask
for additional information.
Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42465
Commit 4692906480 made e6000sw's
implementation of miibus_(read|write)reg assume that the softc lock is
held. I presume that is to avoid lock recursion in e6000sw_attach() ->
e6000sw_attach_miibus() -> mii_attach() -> MIIBUS_READREG().
However, the lock assertion in e6000sw_readphy_locked() can fail if a
different driver uses the interface to probe registers. Work around the
problem by providing implementations which lock the softc if it is not
already locked.
PR: 274795
Fixes: 4692906480 ("e6000sw: add readphy and writephy wrappers")
Reviewed by: kp, imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42466