Commit graph

287581 commits

Author SHA1 Message Date
Alexander Motin f0fa40867d Fix build on powerpc after previous commit. 2023-11-09 21:21:47 -05:00
Alexander Motin a03c23931e uma: Improve memory modified after free panic messages
- Pass zone pointer to trash_ctor() and report zone name in the panic
message.  It may be difficult to figyre out zone just by the item size.
 - Do not pass user arguments to internal trash calls, pass thezone.
 - Report malloc type name in the same unified panic message.
 - Report corruption offset from the beginning of the items instead of
the full pointer.  It makes panic message shorter and more readable.
2023-11-09 19:46:26 -05:00
Tom Jones 14105aae55 nlm: Fix error messages for failed remote rpcbind contact
In case of a remote rpcbind connection timeout,
the NFS kernel lock manager emits an error message
along the lines of:

    NLM: failed to contact remote rpcbind, stat = 5, port = 28416

In the Bugzilla PR, Garrett Wollman identified the following problems
with that error message:

- The error is in decimal, which can only be deciphered by reading the
  source code.
- The port number is byte-swapped.
- The error message does not identify the client the NLM is trying to
  communicate with.

Fix the shortcomings of the current error message by:

- Printing out the port number correctly.
- Mentioning the remote client.

The low-level decimal error remains an outstanding issue though.
It seems like the error strings describing the error codes live outside
of the kernel code currently.

PR:		244698
Reported by:	wollman
Approved by:	allanjude
Sponsored by:	National Bureau of Economic Research
Sponsored by:	Klara, Inc.
Co-authored-by:	Mateusz Piotrowski <0mp@FreeBSD.org>
2023-11-09 21:54:28 +01:00
R. Christian McDonald 6e5b1ff71e libc: enable initial-exec (IE) as default thread-local storage model on arm
As suggested by jrtc27@ in https://reviews.freebsd.org/D42415, this
patch enables IE as default thread-local storage model in libc on arm.

Reviewed by:	kib
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D42445
2023-11-09 21:24:23 +01:00
Konstantin Belousov ede4c412b3 vfs_domount_update(): ensure that 'goto end' works
We need to vfs_op_enter()/vn_seqc_write_start() before jumping to
cleanup.

PR:	274992
Reported by:	trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Fixes:	9ef7a491a4
2023-11-09 22:18:47 +02:00
Konstantin Belousov af21145f33 pf_purge_expired_states(): fix build without SDT probes
Sponsored by:	The FreeBSD Foundation
2023-11-09 22:17:53 +02:00
Umer Saleem 40fccc423a ZTS: Test for all known zpool feature sets
zpool_create_features_007_pos only tested for compat-2020 feature
set. It would be useful to test for all known features sets. If
any additional feature is found enabled that is not present in
compatibility list or feature set, it should be caught and
reported earlier.

This commit also removes encryption from openzfsonosx-1.8.1
compatibility list. Encryption enables bookmark_v2, since it is
a dependency of encryption, but not listed in openzfsonoxx-1.8.1
compatibility list.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15505
2023-11-09 10:58:23 -08:00
Umer Saleem 15a8fa76b2 Update zpool-features.7 for grub2 compatibility list updates
This commit updates zpool-features.7 man page to add newly added
zpool features to grub2 compatibility list.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15505
2023-11-09 10:58:09 -08:00
Alexander Motin 1f8a5187ff ktls: Remove unneeded vm/uma_dbg.h include
It was used in original implementation, but is no longer.

MFC after:	2 weeks
2023-11-09 13:53:07 -05:00
Alexander Motin 7c566d6cfc uma: Micro-optimize memory trashing
Use u_long for memory accesses instead of uint32_t.  On my tests on
amd64 this by ~30% reduces time spent in those functions thanks to
bigger 64bit accesses.  i386 still uses 32bit accesses.

MFC after:	1 month
2023-11-09 13:07:46 -05:00
Bojan Novković e4078494f3 vm_fault: Revert commit 64087fd7f3
The underlying issue that originally triggered a kernel panic was
addressed and the fix was ported to all relevant pmaps, so the
safeguards placed in vm_fault.c can be removed now.

Reviewed by:	alc, kib, markj
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D42517
2023-11-09 10:14:05 -05:00
Michael Tuexen 44669b7650 if_tuntap: remove redundant check
eh can't be NULL, so there is no need to check for it.
Reported by:	zlei
MFC after:	1 week
Sponsored by:	Netflix, Inc.
2023-11-09 11:43:54 +01:00
Michael Tuexen ff69d13a50 if_tuntap: support receive checksum offloading for tap interfaces
When enabled, pretend that the IPv4 and transport layer checksum
is correct for packets injected via the character device.
This is a prerequisite for adding support for LRO, which will
be added next. Then packetdrill can be used to test the LRO
code in local mode.

Reviewed by:		rscheff, zlei
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D42477
2023-11-09 11:37:27 +01:00
Kristof Provost 0d2ab4a4ce pf: add hashtable row count SDT
This allows us to figure out how many states each hashrow contains. That
can be important to know when debugging performance issues.

A simple probe could be:

	dtrace -n 'pf:purge:state:rowcount { @counts["states per row"] = quantize(arg1); }'
	dtrace: description 'pf:purge:state:rowcount ' matched 1 probe
	^C

	  states per row
	           value  ------------- Distribution ------------- count
	              -1 |                                         0
	               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 8257624
	               1 |                                         14321
	               2 |                                         0

MFC after:	1 week
Sponsored by:	Modirum MDPay
2023-11-09 14:21:53 +01:00
Martin Matuska e716630d4c zfs: merge openzfs/zfs@887a3c533
Notable upstream pull request merges:
 #15022 5caeef02f RAID-Z expansion feature
 #15457 887a3c533 Increase L2ARC write rate and headroom
 #15504 1c1be60fa Unbreak FreeBSD world build after 3bd4df384

Obtained from:	OpenZFS
OpenZFS commit:	887a3c533b
2023-11-09 13:19:17 +01:00
Ka Ho Ng f5b3e68629 dirdeps: Update clang-tblgen dependencies
This unbreaks clang-tblgen build against the host pseudo platform.

Sponsored by:	Juniper Networks, Inc.
MFC after:	3 days
Reviewed by:	sjg
Differential Revision:	https://reviews.freebsd.org/D42481
2023-11-08 19:43:29 -05:00
Ka Ho Ng 5fb425aa00 dirdeps: Update liblldb dependencies
Sponsored by:	Juniper Networks, Inc.
MFC after:	3 days
Reviewed by:	sjg
Differential Revision:	https://reviews.freebsd.org/D42480
2023-11-08 19:43:25 -05:00
shodanshok 887a3c533b
Increase L2ARC write rate and headroom
Current L2ARC write rate and headroom parameters are very conservative:
l2arc_write_max=8M and l2arc_headroom=2 (ie: a full L2ARC writes at
8 MB/s, scanning 16/32 MB of ARC tail each time; a warming L2ARC runs
at 2x these rates).

These values were selected 15+ years ago based on then-current SSDs
size, performance and endurance. Today we have multi-TB, fast and
cheap SSDs which can sustain much higher read/write rates.

For this reason, this patch increases l2arc_write_max to 32M and
l2arc_headroom to 8 (4x increase for both).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #15457
2023-11-08 16:30:47 -08:00
Martin Matuška 1c1be60fa2
Unbreak FreeBSD world build after 3bd4df384
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15504
2023-11-08 16:29:34 -08:00
Warner Losh b2b381d365 cam: Add human readable statuses for some CAM_ status values.
CAM_NVME_STATUS and CAM_REQ_SOFTTIMEOUT were missing, though the latter
hasn't been used yet. The former is being used and showing up in dmesg
output as Unknown 0x420.

Fixes: f564de00f7
Fixes: 774ab87cf2
Sponsored by: Netflix
2023-11-08 15:38:16 -07:00
Kristof Provost a6246a50b6 pf: fix double free if pf_ioctl_addrule() fails
If pf_ioctl_addrule() returns an error it will have freed the rule
itself. There's no need for the caller to free it again.

PR:		274915
Reported by:	Dave Cottlehuber <dch@FreeBSD.org>
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-11-08 21:58:52 +01:00
Low-power a160c153e2
Linux: reject read/write mapping to immutable file only on VM_SHARED
Private read/write mapping can't be used to modify the mapped files, so
they will remain be immutable. Private read/write mappings are usually
used to load the data segment of executable files, rejecting them will
rendering immutable executable files to stop working.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #15344
2023-11-08 12:19:38 -08:00
AllKind 3a81bf4ad2
Workaround to allow openzfs-zfs-dkms install on Ubuntu
As shown in #15404#issuecomment-1765002181, Ubuntu kernel has
'Provides: zfs-dkms', which will cause uninstall of the kernel, when
attempting to install openzfs-zfs-dkms.
As a workaround remove the 'Conflicts: zfs-dkms' definition from
the debian control file.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15503
2023-11-08 10:30:46 -08:00
Don Brady 5caeef02fa
RAID-Z expansion feature
This feature allows disks to be added one at a time to a RAID-Z group,
expanding its capacity incrementally.  This feature is especially useful
for small pools (typically with only one RAID-Z group), where there
isn't sufficient hardware to add capacity by adding a whole new RAID-Z
group (typically doubling the number of disks).

== Initiating expansion ==

A new device (disk) can be attached to an existing RAIDZ vdev, by
running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank
raidz2-0 sda`.  The new device will become part of the RAIDZ group.  A
"raidz expansion" will be initiated, and the new device will contribute
additional space to the RAIDZ group once the expansion completes.

The `feature@raidz_expansion` on-disk feature flag must be `enabled` to
initiate an expansion, and it remains `active` for the life of the pool.
In other words, pools with expanded RAIDZ vdevs can not be imported by
older releases of the ZFS software.

== During expansion ==

The expansion entails reading all allocated space from existing disks in
the RAIDZ group, and rewriting it to the new disks in the RAIDZ group
(including the newly added device).

The expansion progress can be monitored with `zpool status`.

Data redundancy is maintained during (and after) the expansion.  If a
disk fails while the expansion is in progress, the expansion pauses
until the health of the RAIDZ vdev is restored (e.g. by replacing the
failed disk and waiting for reconstruction to complete).

The pool remains accessible during expansion.  Following a reboot or
export/import, the expansion resumes where it left off.

== After expansion ==

When the expansion completes, the additional space is available for use,
and is reflected in the `available` zfs property (as seen in `zfs list`,
`df`, etc).

Expansion does not change the number of failures that can be tolerated
without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after
expansion).

A RAIDZ vdev can be expanded multiple times.

After the expansion completes, old blocks remain with their old
data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but
distributed among the larger set of disks.  New blocks will be written
with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been
expanded once to 6-wide, has 4 data to 2 parity).  However, the RAIDZ
vdev's "assumed parity ratio" does not change, so slightly less space
than is expected may be reported for newly-written blocks, according to
`zfs list`, `df`, `ls -s`, and similar tools.

Sponsored-by: The FreeBSD Foundation
Sponsored-by: iXsystems, Inc.
Sponsored-by: vStack
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Authored-by: Matthew Ahrens <mahrens@delphix.com>
Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com>
Contributions-by: Stuart Maybee <stuart.maybee@comcast.net>
Contributions-by: Thorsten Behrens <tbehrens@outlook.com>
Contributions-by: Fmstrat <nospam@nowsci.com>
Contributions-by: Don Brady <dev.fs.zfs@gmail.com>
Signed-off-by: Don Brady <dev.fs.zfs@gmail.com>
Closes #15022
2023-11-08 10:19:41 -08:00
Luiz Amaral 85247ee6a2 tcpdump: decode pfsync packets on network interfaces
When print-ip-demux.c was introduced on ee67461e, the pfsync_ip_print
function was missed, causing tcpdump to treat pfsync packets on network
interfaces as an unknown protocol.

MFC after:	1 week
Sponsored by:	InnoGames GmbH
Differential Revision:	https://reviews.freebsd.org/D42504
2023-11-08 16:12:14 +01:00
Bojan Novković d0941ed9b5 riscv: Add a leaf PTP when pmap_enter(psind=1) creates a wired mapping
Let pmap_enter_l2() create wired mappings.  In particular, allocate a
leaf PTP for use during demotion.  This is the last pmap which requires
such a change ahead of reverting commit 64087fd7f3.

Reviewed by:	markj
Sponsored by:	Google, Inc. (GSoC 2023)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D41633
2023-11-08 07:19:15 -05:00
Mark Johnston 7e5002e3d6 makefs/zfs: Add a regression test which checks file access permissions
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2023-11-08 07:18:58 -05:00
Mark Johnston 50565cf514 makefs/zfs: Don't set ZFS_NO_EXECS_DENIED in file flags
This flag was leftover from testing and should have been removed.

PR:		274938
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2023-11-08 07:04:12 -05:00
Martin Matuska 14c2e0a0c5 zfs: merge openzfs/zfs@9198de8f1
Notable upstream pull request merges:
 #15197 3bd4df384 Improve ZFS objset sync parallelism
 #15455 020f6fd09 FreeBSD: Implement taskq_init_ent()
 #15476 3d86999c7 sa_lookup() ignores buffer size
 #15478 2a154b848 Fix accounting error for pending sync IO ops in
                  zpool iostat
 #15484 dc45a00ea Add kern.features.zfs
 #15486 e36ff84c3 Update the kstat dataset_name when renaming a zvol
 #15491 f4cd1bac7 Make abd_raidz_gen_iterate() pass an initialized
                  pointer to the callback
 #15495 58398cbd0 FreeBSD: Optimize large kstat outputs

Obtained from:	OpenZFS
OpenZFS commit:	9198de8f10
2023-11-08 09:17:55 +01:00
Oskar Holmlund f25b0d6dd7 UART: Remove ingenic xburst (mips) code from ns8250 driver
Since ingenic JZ4780 SOC support has been removed there is no need
to support ingenic quirks in the UART driver.
Invert of commit b192bae67e

Reviewed by:    imp, manu
Approved by:    imp, manu (mentor)
Differential Revision:  https://reviews.freebsd.org/D42497
2023-11-08 09:03:55 +01:00
Umer Saleem 9198de8f10
Linux 6.6 compat: fix implicit conversion error with debug build
With Linux v6.6.0 and GCC 12, when debug build is configured,
implicit conversion error is raised while converting
'enum <anonymous>' to 'boolean_t'. Use 'B_TRUE' instead of
'true' to fix the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15489
2023-11-07 13:24:16 -08:00
Gordon Tetlow dc45a00eac
Add kern.features.zfs
Add a ZFS feature flag to indicate OpenZFS availability.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gordon Tetlow <gordon@freebsd.org>
Closes #15484
2023-11-07 13:21:56 -08:00
Antranig Vartanian d6e457328d
ping6(8): Add ping6(8) as MLINK to ping(8)
Reviewed by:	chuck
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D42203
2023-11-08 05:17:37 +08:00
Jason King 3d86999c75
sa_lookup() ignores buffer size.
When retrieving a system attribute, the size of the supplied
buffer is ignored. If the buffer is too small to hold the attribute,
sa_attr_op() will write past the end of the buffer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason King <jking@racktopsystems.com>
Closes #15476
2023-11-07 12:11:48 -08:00
Umer Saleem 78ac868824
Remove obsolete_counts from grub2 compatibility list
PR#15459 add all read-only compatible zpool features to grub2
compatibility list. 'obsolete_counts' is a read-only features that
depends on 'device_removal' feature which is not read-only and
is marked as ZFEATURE_FLAG_MOS. Creating a pool with grub2
compatibility enables 'device_removal' feature as well, which is
not desired.

This commit removes the 'obsolete_counts' feature from
grub2 compatibility list, as GRUB only supports read-only
compatible features.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15499
2023-11-07 12:04:56 -08:00
Dag-Erling Smørgrav f7d16a627e certctl: Convert line endings before inspecting files.
This ensures that certificate files or bundles with DOS or Mac line
endings are recognized as such and handled identically to those with
Unix line endings.

PR:		274952
Reviewed by:	allanjude
Differential Revision:	https://reviews.freebsd.org/D42490
2023-11-07 20:53:09 +01:00
Alexander Motin 020f6fd093
FreeBSD: Implement taskq_init_ent()
Previously taskq_init_ent() was an empty macro, while actual init
was done by taskq_dispatch_ent().  It could be slightly faster in
case taskq never enqueued. But without it taskq_empty_ent() relied
on the structure being zeroed by somebody else, that is not good.

As a side effect this allows the same task to be queued several
times, that is normal on FreeBSD, that may or may not get useful
here also one day.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15455
2023-11-07 11:37:18 -08:00
Alexander Motin 58398cbd03
FreeBSD: Optimize large kstat outputs
- Use sbuf_new_for_sysctl() to reduce double-buffering on sysctl
output.
- Use much faster sbuf_cat() instead of sbuf_printf("%s").

Together it reduces `sysctl kstat.zfs.misc.dbufs` time from minutes
to seconds, making dbufstat almost usable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15495
2023-11-07 11:35:40 -08:00
Alan Somers e36ff84c33
Update the kstat dataset_name when renaming a zvol
Add a dataset_kstats_rename function, and call it when renaming
a zvol on FreeBSD and Linux.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #15482
Closes #15486
2023-11-07 11:34:50 -08:00
AllKind 9ce567c6ff
Fix dkms installation of deb packages created with Alien.
Alien does not honour the %posttrans hook.
So move the dkms uninstall/install scripts to the
 %pre/%post hooks in case of package install/upgrade.
In case of package removal, handle that in %preun.
Add removal of all old dkms modules.
Add checking for broken 'dkms status'. Handle that as
good as possible and warn the user about it.
Also add more verbose messages about what we are doing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15415
2023-11-07 11:27:29 -08:00
Mark Johnston f4cd1bac72
Make abd_raidz_gen_iterate() pass an initialized pointer to the callback
Otherwise callbacks may trigger KMSAN violations in the dlen == 0 case.
For example, raidz_syn_pq_abd() will compare an uninitialized pointer
with itself before returning.  This seems harmless, but let's maintain
good hygiene and avoid passing uninitialized variables, if only to
placate KMSAN.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15491
2023-11-07 10:24:15 -08:00
Tony Hutter 358ce2cf28
zed: misc vdev_enc_sysfs_path fixes
There have been rare cases where the VDEV_ENC_SYSFS_PATH value that zed
gets passed is stale.  To mitigate this, dynamically check the sysfs
path at the time of zed event processing, and use the dynamic value if
possible.  Note that there will be other times when we can not
dynamically detect the sysfs path (like if a disk disappears) and have
to rely on the old value for things like turning on the fault LED.  That
is to say, we can't just blindly use the dynamic path in every case.

Also:
	- Add enclosure sysfs entry when running 'zpool add'
	- Fix 'slot' and 'enc' zpool.d scripts for nvme

Reviewed-by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15462
2023-11-07 09:09:24 -08:00
MigeljanImeri 2a154b8484
Fix accounting error for pending sync IO ops in zpool iostat
Currently vdev_queue_class_length is responsible for checking how long
the queue length is, however, it doesn't check the length when a list
is used, rather it just returns whether it is empty or not. To fix this
I added a counter variable to vdev_queue_class to keep track of the sync
IO ops, and changed vdev_queue_class_length to reference this variable
instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: MigeljanImeri <ImeriMigel@gmail.com>
Closes #15478
2023-11-07 09:06:14 -08:00
Ed Maste 4e0e01bf65 fflush: correct buffer handling in __sflush
Two additional stdio changes followed 86a16ada1e and need to be
reverted as part of the fflush fix.

This reverts commit 6e13794fbe.
This reverts commit bafaa70b6f.

Fixes: d09a3bf72c ("fflush: correct buffer handling in __sflush")
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42491
2023-11-07 11:03:34 -05:00
Ed Maste 418f026bd5 libc: remove unused errno.h include
errno.h was added in 44cf1e5eb4, which has been reverted.

Fixes: d09a3bf72c ("fflush: correct buffer handling in __sflush")
Sponsored by: The FreeBSD Foundation
2023-11-07 10:23:20 -05:00
Mark Johnston b247ff70e8 stand: Rename LIBFDT to LIBSAFDT
Preemptively address a collision with LIBFDT (to be added in the future)
from src.libnames.mk, which gets included via bsd.progs.mk.  No
functional change intended.

Reviewed by:	imp
MFC after:	1 week
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D42486
2023-11-07 09:57:32 -05:00
Dag-Erling Smørgrav b8dbfb0a6c fflush: Add test for buffer handling in __sflush
Sponsored by:	Klara, Inc.
2023-11-07 08:21:12 -05:00
Dag-Erling Smørgrav d09a3bf72c fflush: correct buffer handling in __sflush
This fixes CVE-2014-8611 correctly.

The commit that purported to fix CVE-2014-8611 (805288c2f0) only hid
it behind another bug.  Two later commits, 86a16ada1e and
44cf1e5eb4, attempted to address this new bug but mostly just confused
the issue.  This commit rolls back the three previous changes and fixes
CVE-2014-8611 correctly.

The key to understanding the bug (and the fix) is that `_w` has
different meanings for different stream modes.  If the stream is
unbuffered, it is always zero.  If the stream is fully buffered, it is
the amount of space remaining in the buffer (equal to the buffer size
when the buffer is empty and zero when the buffer is full).  If the
stream is line-buffered, it is a negative number reflecting the amount
of data in the buffer (zero when the buffer is empty and negative buffer
size when the buffer is full).

At the heart of `fflush()`, we call the stream's write function in a
loop, where `t` represents the return value from the last call and `n`
the amount of data that remains to be written.  When the write function
fails, we need to move the unwritten data to the top of the buffer
(unless nothing was written) and adjust `_p` (which points to the next
free location in the buffer) and `_w` accordingly.  These variables have
already been set to the values they should have after a successful
flush, so instead of adjusting them down to reflect what was written,
we're adjusting them up to reflect what remains.

The bug was that while `_p` was always adjusted, we only adjusted `_w`
if the stream was fully buffered.  The fix is to also adjust `_w` for
line-buffered streams.  Everything else is just noise.

Fixes: 805288c2f0
Fixes: 86a16ada1e
Fixes: 44cf1e5eb4
Sponsored by:	Klara, Inc.
2023-11-07 08:21:12 -05:00
Konstantin Belousov 96cb1d7000 linuxkpi linux_work: use 'true' instead of 'non-zero'
Submitted by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D42468
2023-11-07 12:58:21 +02:00
Konstantin Belousov 05fe82455f linuxkpi: races between linux_queue_delayed_work_on() and linux_cancel_delayed_work_sync()
1. Suppose that linux_queue_delayed_work_on() is called with
   non-zero delay and found the work.state WORK_ST_IDLE. It
   resets the state to WORK_ST_TIMER and locks timer.mtx. Now, if
   linux_cancel_delayed_work_sync() was also called meantime, read
   state as WORK_ST_TIMER and already taken the mutex, it is executing
   callout_stop() on non-armed callout. Then linux_queue_delayed_work_on()
   continues and schedules callout.  But the return value from cancel() is
   false, making it possible to the requeue from callback to slip in.

2. If linux_cancel_delayed_work_sync() returned true, we need to cancel
   again.  The requeue from callback could have revived the work.

The end result is that we schedule callout that might be freed, since
cancel_delayed_work_sync() claims that everything was stopped.  This
contradicts the way the KPI is used in Linux, where consumers expect
that cancel_delayed_work_sync() is reliable on its own.

Reviewed by:	markj
Discussed with:	bz
Sponsored by:	NVidia networking
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D42468
2023-11-07 12:58:04 +02:00