It reports the value of the g_eli_alloc_sz variable. Allocations of
this size or less will use UMA. Larger allocations will use malloc.
Since malloc is slower, it is useful for users to know this variable so
they can avoid such allocations. For example, ZFS users can set
vfs.zfs.vdev.aggregation_limit to this value.
MFC after: 1 week
Sponsored by: Axcient
Reviewed by: markj, imp
Differential Revision: https://reviews.freebsd.org/D44904
Commit 33cb9b3c3a replaced a g_raid3_destroy_device() call with a
g_raid3_free_device() call, which was incorrect and could lead to a
panic if a RAID3 GEOM failed to start (e.g., due to missing disks).
Reported by: graid3 tests
Fixes: 33cb9b3c3a ("graid3: Fix teardown races")
MFC after: 3 days
Sponsored by: Klara, Inc.
Fix a problem in graid implementation of Promise RAID1 created with 4+ disks.
Such an array generally works fine until reboot only due to a bug
in metadata writing code. Before the fix, next taste erronously created
RAID1E (kind of RAID10) instead of RAID1, hence graid used wrong offsets
for I/O operations.
The bug did not affect Promise RAID1 arrays with 2 or 3 disks only.
Reviewed by: mav
MFC after: 3 days
Despite the name, req->serror is used in some cases to copy non-error
messages to userspace. So, report errors when copying out so long as
they don't clobber an earlier error.
Reviewed by: mav, imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D43146
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.
Sponsored by: Netflix
Ensure they are all panic/debugger safe.
Most handlers for this event are for disk drivers/geom modules. There
are a mix of checks being used here (or not), so let's standardize on
checking the presence of the RB_NOSYNC flag.
This flag is set whenever:
1. The kernel has panicked and kern.sync_on_panic=0*
2. We reboot from within the kernel debugger (the "reset" command)
3. Userspace requested it, e.g. by 'reboot -n'
Name the functions consistently.
*This sysctl is tuned to zero by default, but its existence means that
these handlers can be executed after a panic, at the user's discretion.
IMO this use-case is implicitly understood to be risky, and we'd be
better off eliminating it altogether.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42337
This is not exhaustive, just done ahead of some upcoming changes to
these files.
Don't include sys/cdefs.h explicitly. No functional change intended.
Reviewed by: imp, jhb
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42335
Port commit dc399583ba from g_mirror, which has an effectively
identical startup sequence.
This fixes a race that was occasionally causing panics during GEOM test
suite runs on riscv.
MFC after: 1 month
When we're recoverying a damangae GPT, or when we're restoring a backed
up partition tables, don't enforce the 4k alignment for start/end LBAs.
This is useful for 512e/4kn drives when we're creating a new partition
table or partition. However, when we're trying to fix / restore an old
partition, we shouldn't force this alignment, since in that case it's
more important to use the partition table as is than to optimize
performance by rounding (which isn't required by the standard).
MFC After: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D42359
When building a kernel with clang 17 and KTR enabled, such as with the
LINT configurations, a -Werror warning is emitted:
sys/geom/geom_io.c:145:31: error: use of logical '&&' with constant operand [-Werror,-Wconstant-logical-operand]
145 | if ((KTR_COMPILE & KTR_GEOM) && (ktr_mask & KTR_GEOM)) {
| ~~~~~~~~~~~~~~~~~~~~~~~~ ^
sys/geom/geom_io.c:145:31: note: use '&' for a bitwise operation
145 | if ((KTR_COMPILE & KTR_GEOM) && (ktr_mask & KTR_GEOM)) {
| ^~
| &
sys/geom/geom_io.c:145:31: note: remove constant to silence this warning
Replace the multiple uses of the expression with one macro, and in this
macro use "!= 0" to get a logical operand instead of a bitwise one.
Reviewed by: jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D41823
The LVM label is stored on any of the first four sectors, and the
PV (physical volume) header is stored within the same sector following
the LVM label. The current implementation does not fully check the
offset of PV header, when attaching a bad formatted LVM PV the kernel
may crash due to out-of-bounds memory read.
PR: 266562
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36773
Previously a debug kernel would trigger an assertion failure if an I/O
request attempted to read off the end of a concat volume, but a
non-debug kernel would use an invalid sub-disk to try to complete the
request eventually resulting in some sort of fault in the kernel.
Instead, turn the assertions into explicit checks that fail requests
beyond the end of the volume with EIO. For requests which run over
the end of the volume, return a short request.
PR: 257838
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41222
The removal of the sparc64 support in February 2020 obsoleted the
VTOC8 partitioning scheme as no other FreeBSD platform makes use
of it. Moreover, the code is bitrotting as nothing defines e. g.
LOADER_VTOC8_SUPPORT any more and, thus, should go now, too. With
this change, the following commits are reverted as far as VTOC8
is concerned and parts haven't already previously been deleted
along with prior sparc64 removals:
094fcb157da7d366e958ba8d50d08b
The alignment example d9711c28ef
added to the VTOC8 section of gpart.8 is folded into the MBR one.
This should finally conclude the deorbit of sparc64-specific bits.
We had joy, we had fun
we ran Unix on a Sun.
But that source and the song
of FreeBSD have all gone.
Credits to Michael Bueker for the original "Unix on a Sun" and Rod
McKuen for the "Seasons in the Sun" lyrics.
When a storage device reports that it does not support cache flush, the
GEOM disk layer by default returns ENOTSUPP in response to a BIO_FLUSH
command.
On AWS, local volumes do not advertise themselves as having write-cache
enabled. When they are selected for L3 on all HDD nodes, the L3
subsystem may inadvertently kick these L3 devices if a BIO_FLUSH command
fails with an ENOTSUPP return code. The fix is to make GEOM disk return
success (0) when this condition occurs and add a sysctl to make this
error handling config-driven
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/710
The SPDX folks have obsoleted the BSD-2-Clause-NetBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
A one-bit wide bit-field can take only the values 0 and -1. Clang 16
introduced a warning that "implicit truncation from 'int' to a one-bit
wide bit-field changes value from 1 to -1". Fix by using c99 bool.
Reported by: Clang, via dim
Reviewed by: dim
Sponsored by: The FreeBSD Foundation
If all of the mirror's children have the same rotation rate, report
that. But if they have mixed rotation rates, or if any child has an
unknown rotation rate, report "Unknown".
MFC after: 2 weeks
Sponsored by: Axcient
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D39458
As in commit 2f1cfb7f63 ("gmirror: Pre-allocate the timeout event
structure"), graid3 must avoid M_WAITOK allocations in callout handlers.
Reported by: graid3 regression tests
MFC after 2 weeks
Pointer addresses are always >= 0. Assert that the value is >= 0
instead.
PR: 207855, 207856
Reviewed by: imp
Reported by: David Binderman
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37677
The "canonical" EBR partition names like `ada0s4+00002081` are not
particularly meaningful. The "compat" aliases share the same namespace
as the parent MBR, resulting in user-friendly names like `ada0s6`.
These names are consistent with the way Linux names EBR partitions.
We previously provided a sysctl kern.features.geom_part_ebr_compat
(enabled by default) to control the "compat" names. Remove the sysctl
and always create the aliases.
Relnotes: yes
Reviewed by: cem, imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38812
`hdr_entries` and `hdr_entsz` are both uint32_t as defined in UEFI spec.
Current spec does not have upper limit of the number of partition
entries and the size of partition entry, it is potential that malicious
or corrupted GPT header read from untrusted source contains large size of
entry number or size.
PR: 266548
Reviewed by: oshogbo, cem, imp, markj
Approved by: kp (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36709
This can sometimes happen with broken HDDs.
MFC after: 2 weeks
Sponsored by: Axcient
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D37313
Current specification does not have upper limit of the number of
partition entries and the size of partition entry. In
799eac8c3d Andrey V. Elsukov introduced a
limit maximum number of GPT entries to 4k, but that is for write routine
(gpart create) only. When attaching disks that have large number of GPT
entries exceeding the limit, or disks with large size of partition
entry, it is still possible to exhaust kernel memory.
1. Reuse the limit of the maximum number of partition entries.
2. Limit the maximum size of GPT entry to 1k.
In current specification (2.10) the size of GPT entry is 128 *
2^n while n >= 0, and the size - 128 is reserved. 1k should be
sufficient enough for foreseen future.
PR: 266548
Discussed with: imp
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D36717
Explicitly pass the struct thread argument.
Move the function prototype from sys/systm.h to geom/geom.h, we do not
need almost each kernel source to see the prototype, it is now used
only by kern/vfs_mountroot.c outside geom/geom_event.c, where the
function is defined.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888
Make most AST handlers dynamically registered. This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it. For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.
Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit. For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.
The dynamic registration also allows third-party modules to register AST
handlers if needed. There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it. In fact, this
is already present behavior for hwpmc.ko and ufs.ko. I do not think it
is worth the efforts and the runtime overhead to try to fix it.
Reviewed by: markj
Tested by: emaste (arm64), pho
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888
Rather than trying to shoehorn flags into the requested superblock
address, create a separate flags parameter to the ffs_sbget()
function in sys/ufs/ffs/ffs_subr.c. The ffs_sbget() function is
used both in the kernel and in user-level utilities through export
to the sbget() function in the libufs(3) library (see sbget(3)
for details). The kernel uses ffs_sbget() when mounting UFS
filesystems, in the glabel(8) and gjournal(8) GEOM utilities,
and in the standalone library used when booting the system
from a UFS root filesystem.
The ffs_sbget() function reads the superblock located at the byte
offset specified by its sblockloc parameter. The value UFS_STDSB
may be specified for sblockloc to request that the standard
location for the superblock be read.
The two existing options are now flags:
UFS_NOHASHFAIL will note if the check hash is wrong but will still
return the superblock. This is used by the bootstrap code to
give the system a chance to come up so that fsck can be run to
correct the problem.
UFS_NOMSG indicates that superblock inconsistency error messages
should not be printed. It is used by programs like fsck that
want to print their own error message and programs like glabel(8)
that just want to know if a UFS filesystem exists on a partition.
One additional flag is added:
UFS_NOCSUM causes only the superblock itself to be returned, but does
not read in any auxiliary data structures like the cylinder group
summary information. It is used by clients like glabel(8) that
just want to check for possible filesystem types. Using UFS_NOCSUM
skips the superblock checks for csum data which allows superblocks
that have corrupted csum data to be read and used.
The validate_sblock() function checks that the superblock has not
been corrupted in a way that can crash or hang the system. Unless
the UFS_NOMSG flag is specified, it will print out any errors that
it finds. Prior to this commit, validate_sblock() returned as soon
as it found an inconsistency so would print at most one message.
It now does all its checks so when UFS_NOMSG has not been specified
will print out everything that it finds inconsistent.
Sponsored by: The FreeBSD Foundation
With clang 15, the following -Werror warning is produced:
sys/geom/geom_subr.c:484:16: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
g_wither_washer()
^
void
This is because g_wither_washer() is declared with a (void) argument
list, but defined with an empty argument list. Make the definition match
the declaration.
MFC after: 3 days
With clang 15, the following -Werror warning is produced:
sys/geom/geom_io.c:272:10: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
g_io_init()
^
void
This is because g_io_init() is declared with a (void) argument list, but
defined with an empty argument list. Make the definition match the
declaration.
MFC after: 3 days
With clang 15, the following -Werror warnings are produced:
sys/geom/geom_event.c:261:13: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
g_run_events()
^
void
sys/geom/geom_event.c:405:12: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
g_do_wither()
^
void
sys/geom/geom_event.c:449:13: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
g_event_init()
^
void
This is because g_run_events(), g_do_wither(), and g_event_init() are
declared with (void) argument lists, but defined with empty argument
lists. Make the definitions match the declarations.
MFC after: 3 days
Historically, GEOM utilities (gpart(8), gstripe(8), gmirror(8),
etc) used the gctl_error() routine to report errors. If they called
gctl_error() they would exit with EXIT_FAILURE, otherwise they would
return with EXIT_SUCCESS. If they used gctl_error() to output an
informational message, for example when run with the -v (verbose)
option, they would mistakenly exit with EXIT_FAILURE. A further
limitation of the gctl_error() function was that it could only be
called once. Messages from any additional calls to gctl_error()
would be silently discarded.
To resolve these problems a new function, gctl_msg() has been added.
It can be called multiple times to output multiple messages. It
also has an additional errno argument which should be zero if it is
an informational message or an errno value (EINVAL, EBUSY, etc) if
it is an error. When done the gctl_post_messages() function should
be called to indicate that all messages have been posted. If any
of the messages had a non-zero errno, the utility will EXIT_FAILURE.
If only informational messages (with zero errno) were posted, the
utility will EXIT_SUCCESS.
Tested by: Peter Holm
PR: 265184
MFC after: 1 week
Before this patch CAM periph drivers called both disk_alloc() and
disk_create() same time on periph creation. But then prevented disks
from opening until the periph probe completion with cam_periph_hold().
As result, especially if disk misbehaves during the probe, GEOM event
thread, triggered to taste the disk, got blocked on open attempt,
potentially for a long time, unable to process other events.
This patch moves disk_create() call from periph creation to the end of
the probe. To allow disk_create() calls from non-sleepable CAM contexts
some of its duties requiring memory allocations are moved either back
to disk_alloc() or forward to g_disk_create(), so now disk_alloc() and
disk_add_alias() are the only disk methods that require sleeping. If
disk fails during the probe disk_create() may just be skipped, going
directly to disk_destroy(). Other method calls during that time are
just ignored. Since GEOM may now see the disks after CAM bus scan is
already completed, introduce per-periph boot hold functions. Enclosure
driver already had such mechanism, so just generalize it.
Reviewed by: imp
MFC after: 1 month
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D35784
SES allows element descriptors to contain characters like spaces and
quotes that devfs does not allow to appear in device aliases. Since SES
element descriptors are outside of the kernel's control, we should
gracefully handle a failure to create a device physical path alias.
PR: 264513
Reported by: Yuri <yuri@aetern.org>
Reviewed by: imp, mav
Sponsored by: Axcient
MFC after: 2 weeks