Update to eliminate bogus min to ensure 0 was never passed to
pause. Instead, requrest 1ms with an 'infinite' precision, which
defaults to whatever the underlying time counter can do. This should
ensure we run fairly quickly to start processing done events, while
still giving a small pause for the system to catch its breath. This rate
limiter still is less than ideal, and this commit doesn't change
that. It should really have no functional change: it just uses a better
interface to express the desired sleep.
Sponsored by: Netflix
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D45316
Add counts for the number of requests that complete with the ENOMEM as
kern.geom.nomem_count and the number of times we pause the g_down thread
to let the system recover as kern.geom.pause_count.
Sponsored by: Netflix
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D45309
Whenever file is created, the vnode_create_vobject() function will
try to determine its size by calling vn_getsize_locked() as size 0
is ambigious: it means either the file size is 0 or the file size
is unknown.
Introduce special value for the size argument: VNODE_NO_SIZE.
Only when it is given, the vnode_create_vobject() will try to obtain
file's size on its own.
Introduce dedicated vnode_disk_create_vobject() for use by
g_vfs_open(), so we don't have to call vn_isdisk() in the common case
(for regular files).
Handle the case of mediasize==0 in g_vfs_open().
Reviewed by: alc, kib, markj, olce
Approved by: oshogbo (mentor), allanjude (mentor)
Differential Revision: https://reviews.freebsd.org/D45244
Implement an API where previously code was directly reaching into the
buf's internal lock.
Reviewed by: mckusick, imp, kib, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D45249
The functions g_eli_init_uma and g_eli_fini_uma are used to trace
the number of devices in GELI. There is an issue where the g_eli_create
function may fail before g_eli_init_uma is called, however
g_eli_fini_uma is still executed in the fail path. This can
incorrectly decrease the device count to zero, potentially leading to
the UMA pool being freed. Accessing the device after the pool has been
freed causes a system panic.
This commit resolves the issue by ensuring devices count is increassed
eariler.
PR: 278828
Reported by: Andre Albsmeier <mail@fbsd2.e4m.org>
Reviewed by: asomers
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D45225
If any of the disks can support trim, cascade that up the
stack. Otherwise, trims won't pass through striped raid setups.
PR: 277673
Reviewed by: imp (minor style tweaks from bug report)
It reports the value of the g_eli_alloc_sz variable. Allocations of
this size or less will use UMA. Larger allocations will use malloc.
Since malloc is slower, it is useful for users to know this variable so
they can avoid such allocations. For example, ZFS users can set
vfs.zfs.vdev.aggregation_limit to this value.
MFC after: 1 week
Sponsored by: Axcient
Reviewed by: markj, imp
Differential Revision: https://reviews.freebsd.org/D44904
Commit 33cb9b3c3a replaced a g_raid3_destroy_device() call with a
g_raid3_free_device() call, which was incorrect and could lead to a
panic if a RAID3 GEOM failed to start (e.g., due to missing disks).
Reported by: graid3 tests
Fixes: 33cb9b3c3a ("graid3: Fix teardown races")
MFC after: 3 days
Sponsored by: Klara, Inc.
Fix a problem in graid implementation of Promise RAID1 created with 4+ disks.
Such an array generally works fine until reboot only due to a bug
in metadata writing code. Before the fix, next taste erronously created
RAID1E (kind of RAID10) instead of RAID1, hence graid used wrong offsets
for I/O operations.
The bug did not affect Promise RAID1 arrays with 2 or 3 disks only.
Reviewed by: mav
MFC after: 3 days
Despite the name, req->serror is used in some cases to copy non-error
messages to userspace. So, report errors when copying out so long as
they don't clobber an earlier error.
Reviewed by: mav, imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D43146
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.
Sponsored by: Netflix
Ensure they are all panic/debugger safe.
Most handlers for this event are for disk drivers/geom modules. There
are a mix of checks being used here (or not), so let's standardize on
checking the presence of the RB_NOSYNC flag.
This flag is set whenever:
1. The kernel has panicked and kern.sync_on_panic=0*
2. We reboot from within the kernel debugger (the "reset" command)
3. Userspace requested it, e.g. by 'reboot -n'
Name the functions consistently.
*This sysctl is tuned to zero by default, but its existence means that
these handlers can be executed after a panic, at the user's discretion.
IMO this use-case is implicitly understood to be risky, and we'd be
better off eliminating it altogether.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42337
This is not exhaustive, just done ahead of some upcoming changes to
these files.
Don't include sys/cdefs.h explicitly. No functional change intended.
Reviewed by: imp, jhb
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42335
Port commit dc399583ba from g_mirror, which has an effectively
identical startup sequence.
This fixes a race that was occasionally causing panics during GEOM test
suite runs on riscv.
MFC after: 1 month
When we're recoverying a damangae GPT, or when we're restoring a backed
up partition tables, don't enforce the 4k alignment for start/end LBAs.
This is useful for 512e/4kn drives when we're creating a new partition
table or partition. However, when we're trying to fix / restore an old
partition, we shouldn't force this alignment, since in that case it's
more important to use the partition table as is than to optimize
performance by rounding (which isn't required by the standard).
MFC After: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D42359
When building a kernel with clang 17 and KTR enabled, such as with the
LINT configurations, a -Werror warning is emitted:
sys/geom/geom_io.c:145:31: error: use of logical '&&' with constant operand [-Werror,-Wconstant-logical-operand]
145 | if ((KTR_COMPILE & KTR_GEOM) && (ktr_mask & KTR_GEOM)) {
| ~~~~~~~~~~~~~~~~~~~~~~~~ ^
sys/geom/geom_io.c:145:31: note: use '&' for a bitwise operation
145 | if ((KTR_COMPILE & KTR_GEOM) && (ktr_mask & KTR_GEOM)) {
| ^~
| &
sys/geom/geom_io.c:145:31: note: remove constant to silence this warning
Replace the multiple uses of the expression with one macro, and in this
macro use "!= 0" to get a logical operand instead of a bitwise one.
Reviewed by: jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D41823
The LVM label is stored on any of the first four sectors, and the
PV (physical volume) header is stored within the same sector following
the LVM label. The current implementation does not fully check the
offset of PV header, when attaching a bad formatted LVM PV the kernel
may crash due to out-of-bounds memory read.
PR: 266562
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36773
Previously a debug kernel would trigger an assertion failure if an I/O
request attempted to read off the end of a concat volume, but a
non-debug kernel would use an invalid sub-disk to try to complete the
request eventually resulting in some sort of fault in the kernel.
Instead, turn the assertions into explicit checks that fail requests
beyond the end of the volume with EIO. For requests which run over
the end of the volume, return a short request.
PR: 257838
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41222
The removal of the sparc64 support in February 2020 obsoleted the
VTOC8 partitioning scheme as no other FreeBSD platform makes use
of it. Moreover, the code is bitrotting as nothing defines e. g.
LOADER_VTOC8_SUPPORT any more and, thus, should go now, too. With
this change, the following commits are reverted as far as VTOC8
is concerned and parts haven't already previously been deleted
along with prior sparc64 removals:
094fcb157da7d366e958ba8d50d08b
The alignment example d9711c28ef
added to the VTOC8 section of gpart.8 is folded into the MBR one.
This should finally conclude the deorbit of sparc64-specific bits.
We had joy, we had fun
we ran Unix on a Sun.
But that source and the song
of FreeBSD have all gone.
Credits to Michael Bueker for the original "Unix on a Sun" and Rod
McKuen for the "Seasons in the Sun" lyrics.
When a storage device reports that it does not support cache flush, the
GEOM disk layer by default returns ENOTSUPP in response to a BIO_FLUSH
command.
On AWS, local volumes do not advertise themselves as having write-cache
enabled. When they are selected for L3 on all HDD nodes, the L3
subsystem may inadvertently kick these L3 devices if a BIO_FLUSH command
fails with an ENOTSUPP return code. The fix is to make GEOM disk return
success (0) when this condition occurs and add a sysctl to make this
error handling config-driven
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/710
The SPDX folks have obsoleted the BSD-2-Clause-NetBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
A one-bit wide bit-field can take only the values 0 and -1. Clang 16
introduced a warning that "implicit truncation from 'int' to a one-bit
wide bit-field changes value from 1 to -1". Fix by using c99 bool.
Reported by: Clang, via dim
Reviewed by: dim
Sponsored by: The FreeBSD Foundation
If all of the mirror's children have the same rotation rate, report
that. But if they have mixed rotation rates, or if any child has an
unknown rotation rate, report "Unknown".
MFC after: 2 weeks
Sponsored by: Axcient
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D39458
As in commit 2f1cfb7f63 ("gmirror: Pre-allocate the timeout event
structure"), graid3 must avoid M_WAITOK allocations in callout handlers.
Reported by: graid3 regression tests
MFC after 2 weeks
Pointer addresses are always >= 0. Assert that the value is >= 0
instead.
PR: 207855, 207856
Reviewed by: imp
Reported by: David Binderman
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37677
The "canonical" EBR partition names like `ada0s4+00002081` are not
particularly meaningful. The "compat" aliases share the same namespace
as the parent MBR, resulting in user-friendly names like `ada0s6`.
These names are consistent with the way Linux names EBR partitions.
We previously provided a sysctl kern.features.geom_part_ebr_compat
(enabled by default) to control the "compat" names. Remove the sysctl
and always create the aliases.
Relnotes: yes
Reviewed by: cem, imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38812
`hdr_entries` and `hdr_entsz` are both uint32_t as defined in UEFI spec.
Current spec does not have upper limit of the number of partition
entries and the size of partition entry, it is potential that malicious
or corrupted GPT header read from untrusted source contains large size of
entry number or size.
PR: 266548
Reviewed by: oshogbo, cem, imp, markj
Approved by: kp (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36709
This can sometimes happen with broken HDDs.
MFC after: 2 weeks
Sponsored by: Axcient
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D37313
Current specification does not have upper limit of the number of
partition entries and the size of partition entry. In
799eac8c3d Andrey V. Elsukov introduced a
limit maximum number of GPT entries to 4k, but that is for write routine
(gpart create) only. When attaching disks that have large number of GPT
entries exceeding the limit, or disks with large size of partition
entry, it is still possible to exhaust kernel memory.
1. Reuse the limit of the maximum number of partition entries.
2. Limit the maximum size of GPT entry to 1k.
In current specification (2.10) the size of GPT entry is 128 *
2^n while n >= 0, and the size - 128 is reserved. 1k should be
sufficient enough for foreseen future.
PR: 266548
Discussed with: imp
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D36717