This is not exhaustive, just done ahead of some upcoming changes to
these files.
Don't include sys/cdefs.h explicitly. No functional change intended.
Reviewed by: imp, jhb
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42335
Port commit dc399583ba from g_mirror, which has an effectively
identical startup sequence.
This fixes a race that was occasionally causing panics during GEOM test
suite runs on riscv.
MFC after: 1 month
When we're recoverying a damangae GPT, or when we're restoring a backed
up partition tables, don't enforce the 4k alignment for start/end LBAs.
This is useful for 512e/4kn drives when we're creating a new partition
table or partition. However, when we're trying to fix / restore an old
partition, we shouldn't force this alignment, since in that case it's
more important to use the partition table as is than to optimize
performance by rounding (which isn't required by the standard).
MFC After: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D42359
When building a kernel with clang 17 and KTR enabled, such as with the
LINT configurations, a -Werror warning is emitted:
sys/geom/geom_io.c:145:31: error: use of logical '&&' with constant operand [-Werror,-Wconstant-logical-operand]
145 | if ((KTR_COMPILE & KTR_GEOM) && (ktr_mask & KTR_GEOM)) {
| ~~~~~~~~~~~~~~~~~~~~~~~~ ^
sys/geom/geom_io.c:145:31: note: use '&' for a bitwise operation
145 | if ((KTR_COMPILE & KTR_GEOM) && (ktr_mask & KTR_GEOM)) {
| ^~
| &
sys/geom/geom_io.c:145:31: note: remove constant to silence this warning
Replace the multiple uses of the expression with one macro, and in this
macro use "!= 0" to get a logical operand instead of a bitwise one.
Reviewed by: jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D41823
The LVM label is stored on any of the first four sectors, and the
PV (physical volume) header is stored within the same sector following
the LVM label. The current implementation does not fully check the
offset of PV header, when attaching a bad formatted LVM PV the kernel
may crash due to out-of-bounds memory read.
PR: 266562
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36773
Previously a debug kernel would trigger an assertion failure if an I/O
request attempted to read off the end of a concat volume, but a
non-debug kernel would use an invalid sub-disk to try to complete the
request eventually resulting in some sort of fault in the kernel.
Instead, turn the assertions into explicit checks that fail requests
beyond the end of the volume with EIO. For requests which run over
the end of the volume, return a short request.
PR: 257838
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41222
The removal of the sparc64 support in February 2020 obsoleted the
VTOC8 partitioning scheme as no other FreeBSD platform makes use
of it. Moreover, the code is bitrotting as nothing defines e. g.
LOADER_VTOC8_SUPPORT any more and, thus, should go now, too. With
this change, the following commits are reverted as far as VTOC8
is concerned and parts haven't already previously been deleted
along with prior sparc64 removals:
094fcb157da7d366e958ba8d50d08b
The alignment example d9711c28ef
added to the VTOC8 section of gpart.8 is folded into the MBR one.
This should finally conclude the deorbit of sparc64-specific bits.
We had joy, we had fun
we ran Unix on a Sun.
But that source and the song
of FreeBSD have all gone.
Credits to Michael Bueker for the original "Unix on a Sun" and Rod
McKuen for the "Seasons in the Sun" lyrics.
When a storage device reports that it does not support cache flush, the
GEOM disk layer by default returns ENOTSUPP in response to a BIO_FLUSH
command.
On AWS, local volumes do not advertise themselves as having write-cache
enabled. When they are selected for L3 on all HDD nodes, the L3
subsystem may inadvertently kick these L3 devices if a BIO_FLUSH command
fails with an ENOTSUPP return code. The fix is to make GEOM disk return
success (0) when this condition occurs and add a sysctl to make this
error handling config-driven
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/710
The SPDX folks have obsoleted the BSD-2-Clause-NetBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
A one-bit wide bit-field can take only the values 0 and -1. Clang 16
introduced a warning that "implicit truncation from 'int' to a one-bit
wide bit-field changes value from 1 to -1". Fix by using c99 bool.
Reported by: Clang, via dim
Reviewed by: dim
Sponsored by: The FreeBSD Foundation
If all of the mirror's children have the same rotation rate, report
that. But if they have mixed rotation rates, or if any child has an
unknown rotation rate, report "Unknown".
MFC after: 2 weeks
Sponsored by: Axcient
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D39458
As in commit 2f1cfb7f63 ("gmirror: Pre-allocate the timeout event
structure"), graid3 must avoid M_WAITOK allocations in callout handlers.
Reported by: graid3 regression tests
MFC after 2 weeks
Pointer addresses are always >= 0. Assert that the value is >= 0
instead.
PR: 207855, 207856
Reviewed by: imp
Reported by: David Binderman
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37677
The "canonical" EBR partition names like `ada0s4+00002081` are not
particularly meaningful. The "compat" aliases share the same namespace
as the parent MBR, resulting in user-friendly names like `ada0s6`.
These names are consistent with the way Linux names EBR partitions.
We previously provided a sysctl kern.features.geom_part_ebr_compat
(enabled by default) to control the "compat" names. Remove the sysctl
and always create the aliases.
Relnotes: yes
Reviewed by: cem, imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38812
`hdr_entries` and `hdr_entsz` are both uint32_t as defined in UEFI spec.
Current spec does not have upper limit of the number of partition
entries and the size of partition entry, it is potential that malicious
or corrupted GPT header read from untrusted source contains large size of
entry number or size.
PR: 266548
Reviewed by: oshogbo, cem, imp, markj
Approved by: kp (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36709
This can sometimes happen with broken HDDs.
MFC after: 2 weeks
Sponsored by: Axcient
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D37313
Current specification does not have upper limit of the number of
partition entries and the size of partition entry. In
799eac8c3d Andrey V. Elsukov introduced a
limit maximum number of GPT entries to 4k, but that is for write routine
(gpart create) only. When attaching disks that have large number of GPT
entries exceeding the limit, or disks with large size of partition
entry, it is still possible to exhaust kernel memory.
1. Reuse the limit of the maximum number of partition entries.
2. Limit the maximum size of GPT entry to 1k.
In current specification (2.10) the size of GPT entry is 128 *
2^n while n >= 0, and the size - 128 is reserved. 1k should be
sufficient enough for foreseen future.
PR: 266548
Discussed with: imp
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D36717
Explicitly pass the struct thread argument.
Move the function prototype from sys/systm.h to geom/geom.h, we do not
need almost each kernel source to see the prototype, it is now used
only by kern/vfs_mountroot.c outside geom/geom_event.c, where the
function is defined.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888
Make most AST handlers dynamically registered. This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it. For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.
Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit. For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.
The dynamic registration also allows third-party modules to register AST
handlers if needed. There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it. In fact, this
is already present behavior for hwpmc.ko and ufs.ko. I do not think it
is worth the efforts and the runtime overhead to try to fix it.
Reviewed by: markj
Tested by: emaste (arm64), pho
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888
Rather than trying to shoehorn flags into the requested superblock
address, create a separate flags parameter to the ffs_sbget()
function in sys/ufs/ffs/ffs_subr.c. The ffs_sbget() function is
used both in the kernel and in user-level utilities through export
to the sbget() function in the libufs(3) library (see sbget(3)
for details). The kernel uses ffs_sbget() when mounting UFS
filesystems, in the glabel(8) and gjournal(8) GEOM utilities,
and in the standalone library used when booting the system
from a UFS root filesystem.
The ffs_sbget() function reads the superblock located at the byte
offset specified by its sblockloc parameter. The value UFS_STDSB
may be specified for sblockloc to request that the standard
location for the superblock be read.
The two existing options are now flags:
UFS_NOHASHFAIL will note if the check hash is wrong but will still
return the superblock. This is used by the bootstrap code to
give the system a chance to come up so that fsck can be run to
correct the problem.
UFS_NOMSG indicates that superblock inconsistency error messages
should not be printed. It is used by programs like fsck that
want to print their own error message and programs like glabel(8)
that just want to know if a UFS filesystem exists on a partition.
One additional flag is added:
UFS_NOCSUM causes only the superblock itself to be returned, but does
not read in any auxiliary data structures like the cylinder group
summary information. It is used by clients like glabel(8) that
just want to check for possible filesystem types. Using UFS_NOCSUM
skips the superblock checks for csum data which allows superblocks
that have corrupted csum data to be read and used.
The validate_sblock() function checks that the superblock has not
been corrupted in a way that can crash or hang the system. Unless
the UFS_NOMSG flag is specified, it will print out any errors that
it finds. Prior to this commit, validate_sblock() returned as soon
as it found an inconsistency so would print at most one message.
It now does all its checks so when UFS_NOMSG has not been specified
will print out everything that it finds inconsistent.
Sponsored by: The FreeBSD Foundation
With clang 15, the following -Werror warning is produced:
sys/geom/geom_subr.c:484:16: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
g_wither_washer()
^
void
This is because g_wither_washer() is declared with a (void) argument
list, but defined with an empty argument list. Make the definition match
the declaration.
MFC after: 3 days
With clang 15, the following -Werror warning is produced:
sys/geom/geom_io.c:272:10: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
g_io_init()
^
void
This is because g_io_init() is declared with a (void) argument list, but
defined with an empty argument list. Make the definition match the
declaration.
MFC after: 3 days
With clang 15, the following -Werror warnings are produced:
sys/geom/geom_event.c:261:13: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
g_run_events()
^
void
sys/geom/geom_event.c:405:12: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
g_do_wither()
^
void
sys/geom/geom_event.c:449:13: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
g_event_init()
^
void
This is because g_run_events(), g_do_wither(), and g_event_init() are
declared with (void) argument lists, but defined with empty argument
lists. Make the definitions match the declarations.
MFC after: 3 days
Historically, GEOM utilities (gpart(8), gstripe(8), gmirror(8),
etc) used the gctl_error() routine to report errors. If they called
gctl_error() they would exit with EXIT_FAILURE, otherwise they would
return with EXIT_SUCCESS. If they used gctl_error() to output an
informational message, for example when run with the -v (verbose)
option, they would mistakenly exit with EXIT_FAILURE. A further
limitation of the gctl_error() function was that it could only be
called once. Messages from any additional calls to gctl_error()
would be silently discarded.
To resolve these problems a new function, gctl_msg() has been added.
It can be called multiple times to output multiple messages. It
also has an additional errno argument which should be zero if it is
an informational message or an errno value (EINVAL, EBUSY, etc) if
it is an error. When done the gctl_post_messages() function should
be called to indicate that all messages have been posted. If any
of the messages had a non-zero errno, the utility will EXIT_FAILURE.
If only informational messages (with zero errno) were posted, the
utility will EXIT_SUCCESS.
Tested by: Peter Holm
PR: 265184
MFC after: 1 week
Before this patch CAM periph drivers called both disk_alloc() and
disk_create() same time on periph creation. But then prevented disks
from opening until the periph probe completion with cam_periph_hold().
As result, especially if disk misbehaves during the probe, GEOM event
thread, triggered to taste the disk, got blocked on open attempt,
potentially for a long time, unable to process other events.
This patch moves disk_create() call from periph creation to the end of
the probe. To allow disk_create() calls from non-sleepable CAM contexts
some of its duties requiring memory allocations are moved either back
to disk_alloc() or forward to g_disk_create(), so now disk_alloc() and
disk_add_alias() are the only disk methods that require sleeping. If
disk fails during the probe disk_create() may just be skipped, going
directly to disk_destroy(). Other method calls during that time are
just ignored. Since GEOM may now see the disks after CAM bus scan is
already completed, introduce per-periph boot hold functions. Enclosure
driver already had such mechanism, so just generalize it.
Reviewed by: imp
MFC after: 1 month
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D35784
SES allows element descriptors to contain characters like spaces and
quotes that devfs does not allow to appear in device aliases. Since SES
element descriptors are outside of the kernel's control, we should
gracefully handle a failure to create a device physical path alias.
PR: 264513
Reported by: Yuri <yuri@aetern.org>
Reviewed by: imp, mav
Sponsored by: Axcient
MFC after: 2 weeks
It is unused, especially now that the underlying d_dumper methods do not
accept the argument.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35174
The physical address argument is essentially ignored by every dumper
method. In addition, the dump routines don't actually pass a real
address; every call to dump_append() passes a value of zero for
physical.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35173
Add support for the following NOTE events:
NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, NOTE_READ, and NOTE_WRITE.
Differential Revision: https://reviews.freebsd.org/D34777
The contract with the lower layers is that once ENXIO is reported, all
further I/O to the device is not possible. This is reported when the
device departs for good or changes in some material manner out from
underneath the system. Since the lower layers terminate all pending I/O
when this is detected with ENXIO, reporting more than one provides no
extra value. ENXIO suppression done with atomics due to race described
in e8827f4094. It's on the error path and a rare event, so this won't affect
performance.
Sponsored by: Netflix
Reviewed by: mckusick, kib
Differential Revision: https://reviews.freebsd.org/D35034
On the 0 -> 1 transition of sc_enxio_active, report that we're doing
this. This is a rare, but interesting, event. Convert to using atomics
to set this field to prevent a rare race:
In CAM, when we invalidate a device, one thread (T1) will start the
process in error processing called from *dadone
(cam_periph_error). This routine will queue work to xpt_async_td
(T2) and indicate to *dadone to call biodone(ENXIO) for the bio. T2
wakes up and basically waits to acquire the periph lock. T2 will do
so when T1 drops the periph lock just before T1's call to
biodone. T2 acquires the lock and calls biodone(ENXIO) on all
pending bios. These two threads will race and we could lose the
printf or get two in rare cases. Since we only touch sc_enxio_active
in an error path that's infrequent, the extra atomic traffic will be
rare but will ensure robustness.
Sponsored by: Netflix
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35037
We have a report of a panic in GELI that appears to go away when
unmapped I/O is disabled. Add a tunable to make such investigations
easier in the future. No functional change intended.
PR: 262894
Reviewed by: asomers
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34944
This code was marked gone_in(14), so it can now be removed.
The only consumer of this interface is dumpon(8). We do not maintain
strict backwards compatibility for this utility because a) it
can't/shouldn't be used from a jail or chroot and b) it is highly
specific interface unique to FreeBSD. The host's (presumably more
up-to-date) copy of dumpon(8) should be used to configure kernel dump
devices.
Reviewed by: markj, emaste
MFC after: never
Differential Revision: https://reviews.freebsd.org/D34914
This code was marked gone_in(13), so its time has passed.
The only consumer of this interface is dumpon(8). We do not maintain
strict backwards compatibility for this utility because a) it
can't/shouldn't be used from a jail or chroot and b) it is highly
specific interface unique to FreeBSD. The host's (presumably more
up-to-date) copy of dumpon(8) should be used to configure kernel dump
devices.
Reviewed by: markj, emaste
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D34913