Commit graph

2486 commits

Author SHA1 Message Date
Mateusz Guzik 8ad5b9498e Revert "geom_bde: plug set-but-not-used vars"
The commit at hand happens to break userspace build as the header ins
included by sbin/gbde/gbde.c and the __diagused macro is not provided to
userspace.

Revert until this gets sorted out.

This reverts commit 26e837e2d4.
2021-12-09 19:23:05 +00:00
Mateusz Guzik 2cc5a480a6 geom_raid3: plug set-but-not-unused vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 18:08:03 +00:00
Mateusz Guzik c904812018 geom_eli: mostly plug set-but-not-unused vars
The remaining case is an ignored error.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 18:05:06 +00:00
Mateusz Guzik 0d81fba680 geom_mirror: plug set-but-not-unused vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 18:00:27 +00:00
Mateusz Guzik 26e837e2d4 geom_bde: plug set-but-not-used vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 17:53:48 +00:00
Scott Long 2d5d242406 Fix "set but not used" for geom
Sponsored by: Rubicon Communications, LLC ("Netgate")
2021-12-03 23:40:24 -07:00
Mitchell Horne 0d2224733e Implement GET_STACK_USAGE on remaining archs
This definition enables callers to estimate remaining space on the
kstack, and take action on it. Notably, it enables optimizations in the
GEOM and netgraph subsystems to directly dispatch work items when there
is sufficient stack space, rather than queuing them for a worker thread.

Implement it for riscv, arm, and mips. Remove the #ifdefs, so it will
not go unimplemented elsewhere.

PR:		259157
Reviewed by:	mav, kib, markj (previous version)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32580
2021-11-30 11:15:56 -04:00
Mateusz Guzik b74fdaaf1c geom_multipath: plug set-but-not-used vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-11-25 11:31:50 +00:00
Mateusz Guzik cb2bfd3ecb geom_journal: plug set-but-not-unused vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-11-24 21:21:59 +00:00
Alexander Motin 06bd74e1e3 GEOM: Switch g_io_deliver() locking from cp to pp.
Single provider may have multiple consumers, and locking one of consumers
is not sufficient to protect the provider.  Though the only part of the
provider this locking protects now is its statistics.

Reported by:	Arka Sharma <arka.sw1988@gmail.com>
MFC after:	2 weeks
2021-11-21 18:50:59 -05:00
Wuyang Chung 9cb485d18f geom: Remove g_class.config
g_class.config is write only, remove it.
2021-11-18 23:17:07 -07:00
Konstantin Belousov 4fdc5b8494 g_vfs_close(): vp is unused
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-11-18 05:02:59 +02:00
Konstantin Belousov c34a5148e8 ffs: fix newly introduced LOR between mntfs vnode lock and topology lock
The mntfs vnode lock should be before topology, as established in
ffs_mountfs().  Extend the locked region in ffs_unmount().

Reported and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33013
2021-11-16 20:01:31 +02:00
Kirk McKusick 728f2c6131 Suppress UFS/FFS superblock check-hash failure messages when identifying
disk labels.

When the geom label subsystem is checking labels to discover if they
are UFS/FFS filesystems, do not print a kernel error message if a
superblock is found with a check-hash error. That issue is best
handled later if an attempt is made to actually use the filesystem.

Sponsored by: Netflix
2021-11-15 09:26:21 -08:00
Konstantin Belousov 8db7d16526 geom_vfs: lock devvp in g_vfs_close()
It is needed for g_vfs_close() invalidating the buffers.  We rely on the
vnode lock for correctness.

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32761
2021-11-13 01:00:13 +02:00
Gordon Bergling 9d2e51884e gjournal(8): Fix a typo in a source code comment
- s/writting/writing/

MFC after:	3 days
2021-11-03 17:14:00 +01:00
Warner Losh edfbbfd541 gpart: Move MBR efimedia reporting to a separate routine
Move the efimedia reporting to g_part_mbr_efimedia and use that from
g_part_mbr_dumpconf to report it.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D32781
2021-11-02 17:09:17 -06:00
Warner Losh e3ab141fda gpart: Move GPT efimedia reporting to a separate routine
Move the efimedia reporting to g_part_gpt_efimedia and use that from
g_part_gpt_dumpconf to report it.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D32780
2021-11-02 17:09:17 -06:00
Mateusz Guzik 627d5d1966 geli: eli data -> eli_data for consistency with other geom classes
PR:	259392
Reported by:	dewayne@heuristicsystems.com.au
MFC after:	1 week
2021-10-31 20:36:51 +00:00
Jessica Clarke 63d24336fd Fix off-by-one error in msdosfs FAT32 volume label copying
I dropped the + 1 from the other two instances in each file but failed
to do so for this one, resulting in a more egregious buffer overread
than the one I was fixing (since the read character ended up in the
output if there was space).

Reported by:	Jenkins
Fixes:	34fb1c133c ("Fix intra-object buffer overread for labeled msdosfs volumes")
2021-10-28 01:01:00 +01:00
Jessica Clarke 34fb1c133c Fix intra-object buffer overread for labeled msdosfs volumes
Volume labels, like directory entries, are padded with spaces and so
have no NUL terminator. Whilst the MIN for the dsize argument to strlcpy
ensures that the copy does not overflow the destination, strlcpy is
defined to return the number of characters in the source string,
regardless of the provided dsize, and so keeps reading until it finds a
NUL, which likely exists somewhere within the following fields, but On
CHERI with the subobject bounds enabled in the compiler this buffer
overread will be detected and trap with a bounds violation.

Found by:	CHERI
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D32579
2021-10-27 18:38:37 +01:00
Mark Johnston f0a08fa9f5 geom_label: Add more validation for NTFS volume tasting
- Ensure that the computed MFT record size isn't negative or larger than
  maxphys before trying to read $Volume.
- Guard against truncated records in volume metadata.
- Ensure that the record length is large enough to contain the volume
  name.
- Verify that the (UTF-16-encoded) volume name's length is a multiple of
  two.

PR:		258833, 258914
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-10-04 18:15:06 -04:00
Gleb Smirnoff b984d153e0 Don't set GELI UMA zone as UMA_ZONE_NOFREE.
That fixes memory leak on last GELI provider destroyed, introduced
in 2dbc9a388e. This patch was originally developed late 2019 and
the flag was necessary to prevent zone drainage under memory pressure.
Today, with f09cbea31a the UMA is fixed not to drain into reserves.

Discussed with:	jtl, markj
Fixes:		2dbc9a388e
PR:		258787
2021-10-01 10:31:17 -07:00
Gleb Smirnoff 2dbc9a388e Fix memory deadlock when GELI partition is used for swap.
When we get low on memory, the VM system tries to free some by swapping
pages. However, if we are so low on free pages that GELI allocations block,
then the swapout operation cannot complete. This keeps the VM system from
being able to free enough memory so the allocation can complete.

To alleviate this, keep a UMA pool at the GELI layer which is used for data
buffer allocation in the fast path, and reserve some of that memory for swap
operations. If an IO operation is a swap, then use the reserved memory. If
the allocation still fails, return ENOMEM instead of blocking.

For non-swap allocations, change the default to using M_NOWAIT. In general,
this *should* be better, since it gives upper layers a signal of the memory
pressure and a chance to manage their failure strategy appropriately. However,
a user can set the kern.geom.eli.blocking_malloc sysctl/tunable to restore
the previous M_WAITOK strategy.

Submitted by:		jtl
Reviewed by:		imp
Differential Revision:	https://reviews.freebsd.org/D24400
2021-09-28 11:23:52 -07:00
Gleb Smirnoff c6213beff4 Add flag BIO_SWAP to mark IOs that are associated with swap.
Submitted by:		jtl
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D24400
2021-09-28 11:23:51 -07:00
Mark Johnston 5402baa5b5 g_label: Handle small sector sizes when tasting
Make sure that the provider sector size is large enough to contain a
valid label before trying to read it.  We performed this check already
for most label types, but not for several filesystem labels.

Reported by:	syzbot+f52918174cdf193ae29c@syzkaller.appspotmail.com
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-09-07 11:19:29 -04:00
Mark Johnston 9e9ba9c73d graid: Avoid tasting devices with small sector sizes
The RAID metadata parsers effectively assume a sector size of 512 bytes
or larger, but md(4) devices can be created with a sector size that's
any power of 2.  Add some seatbelts to graid tasting routines to ensure
that the requested sector(s) are large enough for the device to
plausibly contain RAID metadata.

Reported by:	syzbot+f43583c9bf8357c8b56f@syzkaller.appspotmail.com
Reported by:	syzbot+537dd9f22b91b698e161@syzkaller.appspotmail.com
Reported by:	syzbot+51509dd48871c57c6e47@syzkaller.appspotmail.com
Reported by:	syzbot+c882a31037ea2a54ff63@syzkaller.appspotmail.com
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-08-31 17:09:52 -04:00
Gordon Bergling 5bdf58e196 Fix some common typos in source code comments
- s/priviledged/privileged/
- s/funtion/function/
- s/doens't/doesn't/
- s/sychronization/synchronization/

MFC after:	3 days
2021-08-28 18:57:23 +02:00
Mark Johnston 645b7efd49 geom_disk: Add KMSAN checks
- In g_disk_start(), verify that the data to be written is initialized
  according to KMSAN shadow state.
- In g_disk_done(), verify that the block driver updated shadow state as
  expected, so as to catch sources of false positives early.

Sponsored by:	The FreeBSD Foundation
2021-08-11 16:33:41 -04:00
Alexander Motin c2da954203 geom(4): Mark all sysctls as CTLFLAG_MPSAFE.
This code does not use Giant lock for very long time.

MFC after:	2 weeks
2021-08-10 20:18:46 -04:00
John Baldwin 419d406e4e geom_vfs: Pre-allocate event for g_vfs_destroy.
When an active g_vfs is orphaned due to an underlying disk going away
the destroy is deferred until the filesystem is unmounted in
g_vfs_done().  However, g_vfs_done() is invoked from a non-sleepable
context and cannot use M_WAITOK to allocate the event.  Instead,
allocate the event in g_vfs_orphan() and save it in the softc to be
retrieved by the last call to g_vfs_done().

Reported by:	Jithesh Arakkan @ Chelsio
Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31354
2021-07-29 17:09:23 -07:00
John Baldwin 5b5d78897c Use a more specific type for geom_disk.d_event.
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D31353
2021-07-29 16:34:46 -07:00
Warner Losh 47aeda7b70 geom_disk: use a preallocated geom_event for disk destruction.
Preallocate a geom_event (using the new geom_alloc_event) when we create
a disk. When we create the disk, we're going to be in a sleepable
context, so we can always allocate this extra bit of memory. Then use
this preallocated memory to free the disk. CAM can try to free the disk
from an unsleepable context if there was I/O outstanding when the disk
was destroyted (say because the SIM said it had gone away). The I/O
context isn't sleepable. Rather than trying to invent a retry mechanism
and making sure all the other geom_disk consumers did it properly,
preallocating the event ensure that the geom_disk will be properly torn
down, even when there's memory pressure when the disk departs.

Reviewd by:		jhb
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30544
2021-07-23 18:08:52 -06:00
Warner Losh 380710a5c8 geom: create an API to allocate events, and use that storage to send them
g_alloc_event will allocate storage for an opaque event. g_post_event_ep
can use memory returned by g_alloc_event to send an event from a context
that might not be able to allocate the event. Occasionally, we can
alloate memory when we create an object, but not while we're destroy
it. This allows one to allocate at creation time memory to use when
destorying the object.

Reviewed by:		jhb
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30544
2021-07-23 18:08:45 -06:00
Jessica Clarke e77ef47d36 geom_label: Partially reinstate old sysinstall(8) workaround
This partially reverts commit af433832f7.
Since such bogus disklabels still exist in the wild, we now probe for a
disklabel to decide whether to ignore the UFS partition or not; if there
is a label then we use the old behaviour, and if there isn't one then we
use the new behaviour.

Reviewed by:	cy, mckusick
Differential Revision:	https://reviews.freebsd.org/D31068
2021-07-21 02:51:25 +01:00
Mark Johnston 0fcafe8516 eli: Zero pad bytes that arise when certain auth algorithms are used
When authentication is configured, GELI ensures that the amount of data
per sector is a multiple of 16 bytes.  This is done in
eli_metadata_softc().  When the digest size is not a multiple of 16
bytes, this leaves some extra pad bytes at the end of every sector, and
they were not being zeroed before being written to disk.  In particular,
this happens with the HMAC/SHA1, HMAC/RIPEMD160 and HMAC/SHA384 data
authentication algorithms.

This change ensures that they are zeroed before being written to disk.

Reported by:	KMSAN
Reviewed by:	delphij, asomers
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31170
2021-07-15 12:23:04 -04:00
Mark Johnston 39552dff7b graid3: Zero the metadata block before writing
Ensure that string buffers and pad bytes are zero-filled before writing
graid3 metadata.

Reported by:	KMSAN
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-07-13 17:46:02 -04:00
Mark Johnston 0f09ab89cc gconcat: Zero the metadata block before writing
Ensure that string buffers and pad bytes are zero-filled before writing
gconcat metadata.  Also make sure to zero the full block buffer before
encoding the metadata and writing.

Fix some style bugs in g_concat_write_metadata() while here.

Reported by:	KMSAN
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-07-13 17:45:59 -04:00
Mark Johnston 7f053a44ae gmirror: Zero the metadata block before writing
The mirror metadata fields contain string buffers and pad bytes, neither
were being zeroed before metadata was written to disk.  Also, the
metadata structure is smaller than the sector size, and in one case
gmirror was failing to zero-fill the full buffer before writing.

Fix these problems by pre-zeroing the metadata structure and the sector
buffer.

Reported by:	KMSAN
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-07-13 17:45:57 -04:00
Jessica Clarke af433832f7 geom_label: Remove an old sysinstall(8) workaround
We removed sysinstall(8) back in 2011, so this workaround should be long
since unnecessary. This workaround can end up breaking cases that are
hit in the real world, such as dd'ing a small pre-built disk image to a
large partition that you intend to grow on first boot and uses a UFS
disk label for / in its /etc/fstab (as the only reliable thing a raw UFS
image can reference).

Reviewed by:	imp, mckusick
Differential Revision:	https://reviews.freebsd.org/D30825
2021-07-05 16:15:32 +01:00
Noah Bergbauer d575e81fbc gconcat: Implement new online append feature
Implement the "gconcat append" command which can be used
to append a disk to the end of an existing gconcat device
without unmounting.

If the gconcat device is using the "automatic" method, i.e.,
stores metadata on the devices, new metadata is written
to all existing components, as well as to the newly added one.

Pull Request:	https://github.com/freebsd/freebsd-src/pull/472
Reviewed by:	imp@
2021-06-14 11:42:03 -06:00
Noah Bergbauer 56fd97660a gconcat: Add new lock to allow modifications to the disk list in preparation for online append
In addition, rename existing sc_lock to sc_append_lock

Reviewed by:		imp@
Pull Request:		https://github.com/freebsd/freebsd-src/pull/447
Sponsored by:		Netflix
2021-06-02 15:59:25 -06:00
Noah Bergbauer e61e072f3b gconcat: Switch array to TAILQ to prepare for online append
Reviewed by:		imp@
Pull Request:		https://github.com/freebsd/freebsd-src/pull/447
Sponsored by:		Netflix
2021-06-02 15:50:27 -06:00
Alan Somers 420dbe763f gmultipath: make physpath distinct from the underlying providers'
zfsd uses a device's physical path attribute to automatically replace a
missing ZFS disk when a blank disk is inserted into the same physical
slot.  Currently gmultipath passes through its underlying providers'
physical path attribute.  That may cause zfsd to replace a missing
gmultipath provider with a newly arrived, single-path disk.  That would
be bad.

This commit fixes that problem by simply appending "/mp" to the
underlying providers' physical path, in a manner similar to what geli
already does.

Sponsored by:	Axcient
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D29941
2021-05-06 12:32:27 -06:00
Mark Johnston 2f1cfb7f63 gmirror: Pre-allocate the timeout event structure
We can't call malloc(M_WAITOK) in a callout handler.

Reviewed by:	imp
Reported by:	pho
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29223
2021-03-11 15:45:15 -05:00
Mark Johnston 68f6800ce0 opencrypto: Introduce crypto_dispatch_async()
Currently, OpenCrypto consumers can request asynchronous dispatch by
setting a flag in the cryptop.  (Currently only IPSec may do this.)   I
think this is a bit confusing: we (conditionally) set cryptop flags to
request async dispatch, and then crypto_dispatch() immediately examines
those flags to see if the consumer wants async dispatch. The flag names
are also confusing since they don't specify what "async" applies to:
dispatch or completion.

Add a new KPI, crypto_dispatch_async(), rather than encoding the
requested dispatch type in each cryptop. crypto_dispatch_async() falls
back to crypto_dispatch() if the session's driver provides asynchronous
dispatch. Get rid of CRYPTOP_ASYNC() and CRYPTOP_ASYNC_KEEPORDER().

Similarly, add crypto_dispatch_batch() to request processing of a tailq
of cryptops, rather than encoding the scheduling policy using cryptop
flags.  Convert GELI, the only user of this interface (disabled by
default) to use the new interface.

Add CRYPTO_SESS_SYNC(), which can be used by consumers to determine
whether crypto requests will be dispatched synchronously. This is just
a helper macro. Use it instead of looking at cap flags directly.

Fix style in crypto_done(). Also get rid of CRYPTO_RETW_EMPTY() and
just check the relevant queues directly. This could result in some
unnecessary wakeups but I think it's very uncommon to be using more than
one queue per worker in a given workload, so checking all three queues
is a waste of cycles.

Reviewed by:	jhb
Sponsored by:	Ampere Computing
Submitted by:	Klara, Inc.
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28194
2021-02-08 09:19:19 -05:00
Edward Tomasz Napierala 123019739c geom(4): make g_newprovider_event() return if G_P_WITHER is set
This fixes a failed assertion in scenario where the provider
disappears, disk_gone() gets called, and at the exact same
time something else closes the device node triggering a retaste.

Reviewed By:	mav
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27330
2020-12-29 14:29:59 +00:00
Konstantin Belousov cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Mateusz Guzik d5127d1ae2 gbde: replace malloc_last_fail with a kludge
This facilitates removal of malloc_last_fail without really impacting
anything.
2020-11-12 20:20:57 +00:00
Mark Johnston f44994874b ffs: Clamp BIO_SPEEDUP length
On 32-bit platforms, the computed size of the BIO_SPEEDUP requested by
softdep_request_cleanup() may be negative when assigned to bp->b_bcount,
which has type "long".

Clamp the size to LONG_MAX.  Also convert the unused g_io_speedup() to
use an off_t for the magnitude of the shortage for consistency with
softdep_send_speedup().

Reviewed by:	chs, kib
Reported by:	pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27081
2020-11-11 13:48:07 +00:00
Warner Losh a3f4217ec0 Remove frontstuff
Nothing implements this in the tree. Remove the ioctl and the
conversion to the geom atttribute stuff.

This was introduced in r94287 in 2002 and was retired in r113390
2003. It appeared in FreeBSD 5.0, but no other releases. This is a
vestige that was missed at the time and overlooked until now. No
compat is provided for this reason.  And there's no implementation of
it today. And it was never part of a release from a stable branch.

Reviewed by: phk@
Differential Revision: https://reviews.freebsd.org/D26967
2020-10-27 06:43:24 +00:00
Alexander Motin 8b220f8915 Fix asymmetry in devstat(9) calls by GEOM.
Before this GEOM passed bio pointer to transaction start, but not end.
It was irrelevant until devstat(9) got DTrace hooks, that appeared to
provide bio pointer on I/O completion, but not on submission.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2020-10-24 21:07:10 +00:00
Robert Wing a2b559df1e geom_ctl.c: remove stale header files
- Remove "opt_geom.h", no kernel options are used.

- Remove <sys/sysctl.h>, no sysctl functionality is used here.

- Remove <sys/bio.h>, requirements for bio moved out in r112534.

- Remove <sys/lock.h> and <sys/mutex.h>, last used by DROP_GIANT() and
  PICKUP_GIANT(), which were removed in r115624.

- Remove <sys/disk.h> and <sys/kernel.h>, not used.

Reviewed by: phk, kevans (mentor)
Approved by: phk, kevans (mentor)
Differential Revision: https://reviews.freebsd.org/D26805
2020-10-20 20:59:13 +00:00
Edward Tomasz Napierala 3001e97deb Fix fallout from r366811.
PR:		250442
Reported by:	lwhsu
Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26855
2020-10-19 20:26:37 +00:00
Edward Tomasz Napierala d22ff249d9 Make g_attach() return ENXIO for orphaned providers; update various
classes to add missing error checking.

Reviewed by:	imp
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26658
2020-10-18 16:24:08 +00:00
Warner Losh bc683a89a3 Move kernel env global variables, etc to sys/kenv.h
The kernel globals for kenv are confined to 2 files that need them and
a few that likely shouldn't (but as written the code does). Move them
from sys/systm.h to sys/kenv.h. This removed a XXX from systm.h and
cleans it up a little bit...
2020-10-07 06:16:37 +00:00
Eugene Grosbein b2b5d4c07d geom_part: make it possible recovering broken GPT after some LBAs cut off
This is followup to r365477.

If pre-formatted device has GPT and a partition covering
last available LBAs and the device is attached using
a bridge reducing amount of LBAs, then it could be not enough
forcing GEOM to use primary GPT. Also, we should make it possible
to recover GPT and this requires either deleting or resizing the partition.

This change enables "gpart delete" and "gpart resize" commands
on corrupted GPT with following "gpart recover".

It still does not allow modifying corrupted GPT without
preliminary setting sysctl kern.geom.part.check_integrity=0

For example:

# gpart show da0
=>        34  3906963389  da0  GPT  (1.8T) [CORRUPT]
          34      262144    1  ms-reserved  (128M)
      262178        2014       - free -  (1.0M)
      264192  3906764943    2  freebsd-swap  (1.8T)
# gpart resize -i 2 -s 3900000000 da0
# gpart recover da0

Reported by:	Alex Korchmar
MFC after:	3 days
2020-09-17 04:39:39 +00:00
Warner Losh 0c97af56a7 We don't need the sc_ekeys_lock in standalone environment.
When we bring in geli into the boot loader, we are single threaded so
we don't have to worry about locking. We have no mutexes, and don't need
to use them, so comment it out.

MFC After: 3 days
2020-09-14 23:51:14 +00:00
Edward Tomasz Napierala 60f083efe2 Move TDP_GEOM check from userret() to ast(); this code path is quite
infrequent.

Reviewed by:	kib
No objections:	mav
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26374
2020-09-14 10:14:03 +00:00
Eugene Grosbein cea05ed9a9 geom_part: extend kern.geom.part.check_integrity to work on GPT
There are multiple USB/SATA bridges on the market that unconditionally
cut some LBAs off connected media. This could be a problem
for pre-partitioned drives so GEOM complains and does not create
devices in /dev for slices/partitions preventing access to existing data.

We have kern.geom.part.check_integrity that allows us to correct
partitioning if changed from default 1 to 0 but it works for MBR only.
If backup copy of GPT is unavailable due to decreases number of LBAs,
kernel still does not give access to partitions and prints to dmesg:

GEOM: md0: corrupt or invalid GPT detected.
GEOM: md0: GPT rejected -- may not be recoverable.

This change makes it work for GPT too, so it created partitions in /dev
and prints to dmesg this instead:

GEOM: md0: the secondary GPT table is corrupt or invalid.
GEOM: md0: using the primary only -- recovery suggested.

Then "gpart recover" re-created backup copy of GPT
and allows further manipulations with partitions.

This change is no-op for default configuration having
kern.geom.part.check_integrity=1

Reported by:	Alex Korchmar
MFC after:	3 days.
2020-09-08 22:23:53 +00:00
Mateusz Guzik d40bc60752 geom: clean up empty lines in .c and .h files 2020-09-01 22:14:09 +00:00
Warner Losh 887611b122 Retire devctl_notify_f()
devctl_notify_f isn't needed, so retire it. The flags argument is now
unused, so rather than keep it around, retire it. Convert all old
users of it to devctl_notify(). This path no longer sleeps, so is safe
to call from any context. Since it doesn't sleep, it doesn't need to
know if it is OK to sleep or not.

Reviewed by: markj@
Differential Revision: https://reviews.freebsd.org/D26140
2020-08-29 04:30:06 +00:00
Alan Somers 7d874f0f36 geli: use unmapped I/O
Use unmapped I/O for geli. Unlike most geom providers, geli needs to
manipulate data on every read or write. Previously it would always map bios.

On my 16-core, dual socket server using geli atop md(4) devices, with 512B
sectors, this change increases geli IOPs by about 3x.

Note that geli still can't use unmapped I/O when data integrity verification
is enabled (but it could, with a little more work).  And it can't use
unmapped I/O in combination with ZFS, because ZFS uses mapped bios.

Reviewed by:	markj, kib, jhb, mjg, mat, bcr (manpages)
MFC after:	1 week
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D25671
2020-08-26 02:44:35 +00:00
Warner Losh 773e541e8d Use devctl.h instead of bus.h to reduce newbus pollution.
There's no need for these parts of the kernel to know about newbus,
so narrow what is included to devctl.h for device_notify_*.

Suggested by: kib@
2020-08-21 00:03:24 +00:00
Conrad Meyer cb1480f8d4 gpart(8): Recognize apple-zfs and solaris-reserved partition ids
Introduce G_PART_ALIAS_SOLARIS_RESERVED, GPT_ENT_TYPE_SOLARIS_RESERVED et al.,
to make gpart show output more convenient on systems with illumos/openindiana
disks visible.

Submitted by:	Juraj Lutter <otis AT sk.FreeBSD.org>
Reviewed by:	bcr(manpages), delphij, myself
Differential Revision:	https://reviews.freebsd.org/D26012
2020-08-17 17:07:05 +00:00
John Baldwin e2bbd168ad Fix indentation. 2020-07-27 16:31:21 +00:00
Xin LI a450ecfdbd gctl_get_geom: Skip validation of g_class.
The caller from kernel is expected to provide an valid g_class
pointer, instead of traversing the global g_class list, just
use that pointer directly instead.

Reviewed by:		mav
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D25811
2020-07-26 22:30:55 +00:00
Xin LI 178d88fa39 geom_map and geom_redboot: Remove unused ctlreq handler.
The two classes do not take any verbs and always gctl_error for
all requests, so don't bother to provide a ctlreq handler.

Reviewed by:		mav
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D25810
2020-07-26 22:30:01 +00:00
Xin LI 7201590bbf Use snprintf instead of sprintf.
MFC after:	2 weeks
2020-07-26 01:45:26 +00:00
Xin LI 795c5f365e geom_label: Make glabel labels more trivial by separating the tasting
routines out.

While there, also simplify the creation of label paths a little bit
by requiring the / suffix for label directory prefixes (ld_dir renamed
to ld_dirprefix to indicate the change) and stop defining macros for
these when they are only used once.

Reviewed by:		cem
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D25597
2020-07-26 00:44:59 +00:00
Xin LI fcf69f3dbc Consistently use gctl_get_provider instead of home-grown variants.
Reviewed by:		cem, imp
MFC after:		2 weeks
Differential revision:	https://reviews.freebsd.org/D25739
2020-07-22 02:15:21 +00:00
Xin LI 0ab851aac3 gctl_get_class, gctl_get_geom and gctl_get_provider: provide feedback
when the requested argument is missing.

Reviewed by:		cem
MFC after:		2 weeks
Differential revision:	https://reviews.freebsd.org/D25738
2020-07-22 02:14:27 +00:00
Alan Somers aafaa8b794 Fix geli's null cipher, and add a test case
PR:		247954
Submitted by:	jhb (sys), asomers (tests)
Reviewed by:	jhb (tests), asomers (sys)
MFC after:	2 weeks
Sponsored by:	Axcient
2020-07-21 19:18:29 +00:00
Xin LI 82b17c8e91 Fix indent for if clause.
MFC after:	2 weeks
2020-07-20 01:55:19 +00:00
Xin LI b23a7fbaab g_concat_find_device: trim /dev/ if it is present, like other GEOM
classes.

Reviewed by:	cem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25596
2020-07-09 08:00:46 +00:00
Xin LI 8510f61acd sys/geom: consistently use _PATH_DEV instead of hardcoding "/dev/".
Reviewed by:	cem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25565
2020-07-09 02:52:39 +00:00
Alan Somers 6f818c1fb0 geli: enable direct dispatch
geli does all of its crypto operations in a separate thread pool, so
g_eli_start, g_eli_read_done, and g_eli_write_done don't actually do very
much work. Enabling direct dispatch eliminates the g_up/g_down bottlenecks,
doubling IOPs on my system. This change does not affect the thread pool.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D25587
2020-07-08 17:12:12 +00:00
Conrad Meyer 64612d4e44 geom(4): Kill GEOM_PART_EBR_COMPAT option
Take advantage of Warner's nice new real GEOM aliasing system and use it for
aliased partition names that actually work.

Our canonical EBR partition name is the weird, not-default-on-x86-prior-to-
this-revision "da1p4+00001234."  However, if compatibility mode (tunable
kern.geom.part.ebr.compat_aliases) is enabled (1, default), we continue to
provide the alias names like "da1p5" in addition to the weird canonical
names.

Naming partition providers was just one aspect of the COMPAT knob; in
addition it limited mutability, in part because it did not preserve existing
EBR header content aside from that of LBA 0.  This change saves the EBR
header for LBA 0, as well as for every EBR partition encountered.  That way,
when we write out the EBR partition table on modification, we can restore
any bootloader or other metadata in both LBA0 (the first data-containing EBR
may start after 0) as well as every logical EBR we read from the disk, and
only update the geometry metadata and linked list pointers that describe the
actual partitioning.

(This change does not add support for the 'bootcode' verb to EBR.)

PR:		232463
Reported by:	Manish Jain <bourne.identity AT hotmail.com>
Discussed with:	ae (no objection)
Relnotes:	maybe
Differential Revision:	https://reviews.freebsd.org/D24939
2020-07-01 02:16:36 +00:00
John Baldwin 6572e5ff66 Use explicit_bzero() instead of bzero() for sensitive data.
Reviewed by:	delphij
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25441
2020-06-25 20:25:35 +00:00
John Baldwin b172f23dd7 Use zfree() instead of bzero() and free().
These bzero's should have been explicit_bzero's.

Reviewed by:	cem, delphij
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25437
2020-06-25 20:20:22 +00:00
John Baldwin 4a711b8d04 Use zfree() instead of explicit_bzero() and free().
In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().

Suggested by:	cem
Reviewed by:	cem, delphij
Approved by:	csprng (cem)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25435
2020-06-25 20:17:34 +00:00
Kirk McKusick 9407f25df2 Optimize g_journal's superblock update by noting that the summary
information is neither read nor written so it need not be written
out when updating the superblock.

PR:           247425
Sponsored by: Netflix
2020-06-23 21:44:00 +00:00
Baptiste Daroussin 5b990a9463 Revert r362466
Such change should not have happen without prior discussion and review.

With hat:	transitioning core
2020-06-22 07:46:24 +00:00
Hans Petter Selasky 7747001b12 Improve wording to be more precise and clear.
No functional change intended.

s/Master Boot/Main Boot/ (also called MBR)

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-21 13:34:08 +00:00
Kirk McKusick 34816cb9ae Move the pointers stored in the superblock into a separate
fs_summary_info structure. This change was originally done
by the CheriBSD project as they need larger pointers that
do not fit in the existing superblock.

This cleanup of the superblock eases the task of the commit
that immediately follows this one.

Suggested by: brooks
Reviewed by:  kib
PR:           246983
Sponsored by: Netflix
2020-06-19 01:02:53 +00:00
John Baldwin a3d565a118 Add a crypto capability flag for accelerated software drivers.
Use this in GELI to print out a different message when accelerated
software such as AESNI is used vs plain software crypto.

While here, simplify the logic in GELI a bit for determing which type
of crypto driver was chosen the first time by examining the
capabilities of the matched driver after a single call to
crypto_newsession rather than making separate calls with different
flags.

Reviewed by:	delphij
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25126
2020-06-09 22:26:07 +00:00
Conrad Meyer a9ca503b52 Revert r361838
Reported by:	delphij
2020-06-06 14:19:16 +00:00
Conrad Meyer 5b9b571cb3 geom_label: Use provider aliasing to alias upstream geoms
For synthetic aliases (just pseudonyms inferred from metadata like GPT or
UFS labels, GPT UUIDs, etc), use the GEOM provider aliasing system to create
a symlink to the real device instead of creating an independent device.
This makes it more clear which labels and devices correspond, and we can
safely have multiple labels to a single device accessed at once.

The confusingly named geom_label on-disk construct continues to behave
identically to how it did before.

This requires teaching GEOM's provider aliasing about the possibility
that aliases might be added later in time, and GEOM's devfs interaction
layer not to worry about existing aliases during retaste.

Discussed with:	imp
Relnotes:	sure, if we don't end up reverting it
Differential Revision:	https://reviews.freebsd.org/D24968
2020-06-05 16:12:21 +00:00
Conrad Meyer c726a670df geom: Don't re-add duplicate aliases
Reviewed by:	imp (informal +1; extracted from phab 24968)
2020-06-05 16:05:09 +00:00
Conrad Meyer b71dc87559 geom_part: Dispatch to partitions to create providers and aliases
This allows partitions to create additional aliases of their own.  The
default method implementations preserve the existing behavior.

No functional change.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D24938
2020-05-29 19:44:18 +00:00
Alan Somers 2a2306099d geli: fix a livelock during panic
During any kind of shutdown, kern_reboot calls geli's pre_sync event hook,
which tries to destroy all unused geli devices. But during a panic, geli
can't destroy any devices, because the scheduler is stopped, so it can't
switch threads. A livelock results, and the system never dumps core.

This commit fixes the problem by refusing to destroy any devices during
panic, used or otherwise.

PR:		246207
Reviewed by:	jhb
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D24697
2020-05-27 19:13:26 +00:00
Chuck Silvers d79ff54b5c This commit enables a UFS filesystem to do a forcible unmount when
the underlying media fails or becomes inaccessible. For example
when a USB flash memory card hosting a UFS filesystem is unplugged.

The strategy for handling disk I/O errors when soft updates are
enabled is to stop writing to the disk of the affected file system
but continue to accept I/O requests and report that all future
writes by the file system to that disk actually succeed. Then
initiate an asynchronous forced unmount of the affected file system.

There are two cases for disk I/O errors:

   - ENXIO, which means that this disk is gone and the lower layers
     of the storage stack already guarantee that no future I/O to
     this disk will succeed.

   - EIO (or most other errors), which means that this particular
     I/O request has failed but subsequent I/O requests to this
     disk might still succeed.

For ENXIO, we can just clear the error and continue, because we
know that the file system cannot affect the on-disk state after we
see this error. For EIO or other errors, we arrange for the geom_vfs
layer to reject all future I/O requests with ENXIO just like is
done when the geom_vfs is orphaned. In both cases, the file system
code can just clear the error and proceed with the forcible unmount.

This new treatment of I/O errors is needed for writes of any buffer
that is involved in a dependency. Most dependencies are described
by a structure attached to the buffer's b_dep field. But some are
created and processed as a result of the completion of the dependencies
attached to the buffer.

Clearing of some dependencies require a read. For example if there
is a dependency that requires an inode to be written, the disk block
containing that inode must be read, the updated inode copied into
place in that buffer, and the buffer then written back to disk.

Often the needed buffer is already in memory and can be used. But
if it needs to be read from the disk, the read will fail, so we
fabricate a buffer full of zeroes and pretend that the read succeeded.
This zero'ed buffer can be updated and written back to disk.

The only case where a buffer full of zeros causes the code to do
the wrong thing is when reading an inode buffer containing an inode
that still has an inode dependency in memory that will reinitialize
the effective link count (i_effnlink) based on the actual link count
(i_nlink) that we read. To handle this case we now store the i_nlink
value that we wrote in the inode dependency so that it can be
restored into the zero'ed buffer thus keeping the tracking of the
inode link count consistent.

Because applications depend on knowing when an attempt to write
their data to stable storage has failed, the fsync(2) and msync(2)
system calls need to return errors if data fails to be written to
stable storage. So these operations return ENXIO for every call
made on files in a file system where we have otherwise been ignoring
I/O errors.

Coauthered by: mckusick
Reviewed by:   kib
Tested by:     Peter Holm
Approved by:   mckusick (mentor)
Sponsored by:  Netflix
Differential Revision:  https://reviews.freebsd.org/D24088
2020-05-25 23:47:31 +00:00
John Baldwin 9c0e3d3a53 Add support for optional separate output buffers to in-kernel crypto.
Some crypto consumers such as GELI and KTLS for file-backed sendfile
need to store their output in a separate buffer from the input.
Currently these consumers copy the contents of the input buffer into
the output buffer and queue an in-place crypto operation on the output
buffer.  Using a separate output buffer avoids this copy.

- Create a new 'struct crypto_buffer' describing a crypto buffer
  containing a type and type-specific fields.  crp_ilen is gone,
  instead buffers that use a flat kernel buffer have a cb_buf_len
  field for their length.  The length of other buffer types is
  inferred from the backing store (e.g. uio_resid for a uio).
  Requests now have two such structures: crp_buf for the input buffer,
  and crp_obuf for the output buffer.

- Consumers now use helper functions (crypto_use_*,
  e.g. crypto_use_mbuf()) to configure the input buffer.  If an output
  buffer is not configured, the request still modifies the input
  buffer in-place.  A consumer uses a second set of helper functions
  (crypto_use_output_*) to configure an output buffer.

- Consumers must request support for separate output buffers when
  creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are
  only permitted to queue a request with a separate output buffer on
  sessions with this flag set.  Existing drivers already reject
  sessions with unknown flags, so this permits drivers to be modified
  to support this extension without requiring all drivers to change.

- Several data-related functions now have matching versions that
  operate on an explicit buffer (e.g. crypto_apply_buf,
  crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf).

- Most of the existing data-related functions operate on the input
  buffer.  However crypto_copyback always writes to the output buffer
  if a request uses a separate output buffer.

- For the regions in input/output buffers, the following conventions
  are followed:
  - AAD and IV are always present in input only and their
    fields are offsets into the input buffer.
  - payload is always present in both buffers.  If a request uses a
    separate output buffer, it must set a new crp_payload_start_output
    field to the offset of the payload in the output buffer.
  - digest is in the input buffer for verify operations, and in the
    output buffer for compute operations.  crp_digest_start is relative
    to the appropriate buffer.

- Add a crypto buffer cursor abstraction.  This is a more general form
  of some bits in the cryptosoft driver that tried to always use uio's.
  However, compared to the original code, this avoids rewalking the uio
  iovec array for requests with multiple vectors.  It also avoids
  allocate an iovec array for mbufs and populating it by instead walking
  the mbuf chain directly.

- Update the cryptosoft(4) driver to support separate output buffers
  making use of the cursor abstraction.

Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D24545
2020-05-25 22:12:04 +00:00
Warner Losh ae1cce524e Reimplement aliases in geom
The alias needs to be part of the provider instead of the geom to work
properly. To bind the DEV geom, we need to look at the provider's names and
aliases and create the dev entries from there. If this lives in the GEOM, then
it won't propigate down the tree properly. Remove it from geom, add it provider.

Update geli, gmountver, gnop, gpart, and guzip to use it, which handles the bulk
of the uses in FreeBSD. I think this is all the providers that create a new name
based on their parent's name.
2020-05-13 19:17:28 +00:00
Conrad Meyer 844b743d31 geom(4) mirror: Do not panic on gmirror(8) insert, resize
Geom_mirror initialization occurs in spurts and the present of a
non-destroyed g_mirror softc does not always indicate that the geom has
launched (i.e., has an sc_provider).

Some gmirror(8) commands (via g_mirror_ctl) depend on a g_mirror's
sc_provider (insert and resize).  For those commands, g_mirror_ctl is
modified to sleep-poll in an interruptible way until the target geom is
either launched or destroyed.

Reviewed by:	markj
Tested by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D24780
2020-05-11 22:39:53 +00:00
Pawel Jakub Dawidek cefbc0d19b Add g_topology_locked() macro that returns true if we already hold the GEOM
topology lock.
2020-04-25 21:41:09 +00:00
John Baldwin bfe26b9707 Mark eli_metadata_crypto_supported inline.
This quiets warnings about it not being always used.

Reported by:	kevans
2020-04-15 18:27:28 +00:00
John Baldwin e2b9919398 Remove support for geli(4) algorithms deprecated in r348206.
This removes support for reading and writing volumes using the
following algorithms:

- Triple DES
- Blowfish
- MD5 HMAC integrity

In addition, this commit adds an explicit whitelist of supported
algorithms to give a better error message when an invalid or
unsupported algorithm is used by an existing volume.

Reviewed by:	cem
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24343
2020-04-15 00:14:50 +00:00
Warner Losh 9cf738228d Now that we don't have special-case geom hacking defined in md_var.h, stop
including it. sparc64 was the last straggler here, but these weren't removed at
the time.
2020-04-07 22:23:22 +00:00
Mark Johnston c205ac921b geom_journal: Only stop the switcher process if one was started.
PR:		243196
MFC after:	1 week
2020-04-03 13:57:41 +00:00
John Baldwin c034143269 Refactor driver and consumer interfaces for OCF (in-kernel crypto).
- The linked list of cryptoini structures used in session
  initialization is replaced with a new flat structure: struct
  crypto_session_params.  This session includes a new mode to define
  how the other fields should be interpreted.  Available modes
  include:

  - COMPRESS (for compression/decompression)
  - CIPHER (for simply encryption/decryption)
  - DIGEST (computing and verifying digests)
  - AEAD (combined auth and encryption such as AES-GCM and AES-CCM)
  - ETA (combined auth and encryption using encrypt-then-authenticate)

  Additional modes could be added in the future (e.g. if we wanted to
  support TLS MtE for AES-CBC in the kernel we could add a new mode
  for that.  TLS modes might also affect how AAD is interpreted, etc.)

  The flat structure also includes the key lengths and algorithms as
  before.  However, code doesn't have to walk the linked list and
  switch on the algorithm to determine which key is the auth key vs
  encryption key.  The 'csp_auth_*' fields are always used for auth
  keys and settings and 'csp_cipher_*' for cipher.  (Compression
  algorithms are stored in csp_cipher_alg.)

- Drivers no longer register a list of supported algorithms.  This
  doesn't quite work when you factor in modes (e.g. a driver might
  support both AES-CBC and SHA2-256-HMAC separately but not combined
  for ETA).  Instead, a new 'crypto_probesession' method has been
  added to the kobj interface for symmteric crypto drivers.  This
  method returns a negative value on success (similar to how
  device_probe works) and the crypto framework uses this value to pick
  the "best" driver.  There are three constants for hardware
  (e.g. ccr), accelerated software (e.g. aesni), and plain software
  (cryptosoft) that give preference in that order.  One effect of this
  is that if you request only hardware when creating a new session,
  you will no longer get a session using accelerated software.
  Another effect is that the default setting to disallow software
  crypto via /dev/crypto now disables accelerated software.

  Once a driver is chosen, 'crypto_newsession' is invoked as before.

- Crypto operations are now solely described by the flat 'cryptop'
  structure.  The linked list of descriptors has been removed.

  A separate enum has been added to describe the type of data buffer
  in use instead of using CRYPTO_F_* flags to make it easier to add
  more types in the future if needed (e.g. wired userspace buffers for
  zero-copy).  It will also make it easier to re-introduce separate
  input and output buffers (in-kernel TLS would benefit from this).

  Try to make the flags related to IV handling less insane:

  - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv'
    member of the operation structure.  If this flag is not set, the
    IV is stored in the data buffer at the 'crp_iv_start' offset.

  - CRYPTO_F_IV_GENERATE means that a random IV should be generated
    and stored into the data buffer.  This cannot be used with
    CRYPTO_F_IV_SEPARATE.

  If a consumer wants to deal with explicit vs implicit IVs, etc. it
  can always generate the IV however it needs and store partial IVs in
  the buffer and the full IV/nonce in crp_iv and set
  CRYPTO_F_IV_SEPARATE.

  The layout of the buffer is now described via fields in cryptop.
  crp_aad_start and crp_aad_length define the boundaries of any AAD.
  Previously with GCM and CCM you defined an auth crd with this range,
  but for ETA your auth crd had to span both the AAD and plaintext
  (and they had to be adjacent).

  crp_payload_start and crp_payload_length define the boundaries of
  the plaintext/ciphertext.  Modes that only do a single operation
  (COMPRESS, CIPHER, DIGEST) should only use this region and leave the
  AAD region empty.

  If a digest is present (or should be generated), it's starting
  location is marked by crp_digest_start.

  Instead of using the CRD_F_ENCRYPT flag to determine the direction
  of the operation, cryptop now includes an 'op' field defining the
  operation to perform.  For digests I've added a new VERIFY digest
  mode which assumes a digest is present in the input and fails the
  request with EBADMSG if it doesn't match the internally-computed
  digest.  GCM and CCM already assumed this, and the new AEAD mode
  requires this for decryption.  The new ETA mode now also requires
  this for decryption, so IPsec and GELI no longer do their own
  authentication verification.  Simple DIGEST operations can also do
  this, though there are no in-tree consumers.

  To eventually support some refcounting to close races, the session
  cookie is now passed to crypto_getop() and clients should no longer
  set crp_sesssion directly.

- Assymteric crypto operation structures should be allocated via
  crypto_getkreq() and freed via crypto_freekreq().  This permits the
  crypto layer to track open asym requests and close races with a
  driver trying to unregister while asym requests are in flight.

- crypto_copyback, crypto_copydata, crypto_apply, and
  crypto_contiguous_subsegment now accept the 'crp' object as the
  first parameter instead of individual members.  This makes it easier
  to deal with different buffer types in the future as well as
  separate input and output buffers.  It's also simpler for driver
  writers to use.

- bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer.
  This understands the various types of buffers so that drivers that
  use DMA do not have to be aware of different buffer types.

- Helper routines now exist to build an auth context for HMAC IPAD
  and OPAD.  This reduces some duplicated work among drivers.

- Key buffers are now treated as const throughout the framework and in
  device drivers.  However, session key buffers provided when a session
  is created are expected to remain alive for the duration of the
  session.

- GCM and CCM sessions now only specify a cipher algorithm and a cipher
  key.  The redundant auth information is not needed or used.

- For cryptosoft, split up the code a bit such that the 'process'
  callback now invokes a function pointer in the session.  This
  function pointer is set based on the mode (in effect) though it
  simplifies a few edge cases that would otherwise be in the switch in
  'process'.

  It does split up GCM vs CCM which I think is more readable even if there
  is some duplication.

- I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC
  as an auth algorithm and updated cryptocheck to work with it.

- Combined cipher and auth sessions via /dev/crypto now always use ETA
  mode.  The COP_F_CIPHER_FIRST flag is now a no-op that is ignored.
  This was actually documented as being true in crypto(4) before, but
  the code had not implemented this before I added the CIPHER_FIRST
  flag.

- I have not yet updated /dev/crypto to be aware of explicit modes for
  sessions.  I will probably do that at some point in the future as well
  as teach it about IV/nonce and tag lengths for AEAD so we can support
  all of the NIST KAT tests for GCM and CCM.

- I've split up the exising crypto.9 manpage into several pages
  of which many are written from scratch.

- I have converted all drivers and consumers in the tree and verified
  that they compile, but I have not tested all of them.  I have tested
  the following drivers:

  - cryptosoft
  - aesni (AES only)
  - blake2
  - ccr

  and the following consumers:

  - cryptodev
  - IPsec
  - ktls_ocf
  - GELI (lightly)

  I have not tested the following:

  - ccp
  - aesni with sha
  - hifn
  - kgssapi_krb5
  - ubsec
  - padlock
  - safe
  - armv8_crypto (aarch64)
  - glxsb (i386)
  - sec (ppc)
  - cesa (armv7)
  - cryptocteon (mips64)
  - nlmsec (mips64)

Discussed with:	cem
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D23677
2020-03-27 18:25:23 +00:00
John Baldwin 47172feb8d Use the newer EINTEGRITY error when authentication fails.
GELI used to fail with EINVAL when a read request spanned a disk
sector whose contents did not match the sector's authentication tag.
The recently-added EINTEGRITY more closely matches to the error in
this case.

Reviewed by:	cem, mckusick
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24131
2020-03-23 21:26:32 +00:00
Pawel Biernacki 7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Pawel Biernacki 53a6215c83 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (12 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Approved by:	kib (mentor, blanket)
Differential Revision:	https://reviews.freebsd.org/D23637
2020-02-24 10:42:56 +00:00
Kyle Evans c81929d343 geli taste: allow GELIBOOT tagged providers as well
Currently the installer will tag geliboot partitions with both BOOT and
GELIBOOT; the former allows the kernel to taste it at boot, while the latter
is what loaders keys off of.

However, it seems reasonable to assume that if a provider's been tagged with
GELIBOOT that the kernel should also take that as a hint to taste/attach at
boot. This would allow us to stop tagging GELIBOOT partitions with BOOT in
bsdinstall, but I'm not sure that there's a compelling reason to do so any
time soon.

Reviewed by:	oshogbo
Differential Revision:	https://reviews.freebsd.org/D23387
2020-02-07 21:36:14 +00:00
Warner Losh 9133f3d097 Supress not supported message
For the moment, supress the operation not supported messages at this level.  In
the fullness of time, we will have better error tracking so we can diagnose
issues in the future.

Reviewed by: scottl@
2020-02-07 17:47:08 +00:00
Pawel Jakub Dawidek 76b47dfb8f The error variable is not really needed. Remove it. 2020-02-01 10:15:23 +00:00
Konstantin Belousov fd99699d7e Fix aggregating geoms for BIO_SPEEDUP.
If the bio was split into several bios going down, completion computes
bio_completed of the original bio as sum of the bio_completes of the
splits.  For BIO_SETUP, bio_length means something different than the
length. it is the requested speedup amount, and is duplicated into the
splits, which is in fact reasonable, since we cannot know how the
previous activity was distributed among subordinate geoms.  Obviously,
the sum of n bio_length is greater than bio_length for n > 1, which
triggers assert that bio_length >= bio_completed for e.g. geom_stripe
and geom_raid3.

Fix this by reassigning bio_completed from bio_length for completed
BIO_SPEEDED, I do not think it really mattters what we return in
bio_completed.

Reported and tested by:	pho
Reviewed by:	imp
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23380
2020-01-27 13:15:16 +00:00
Conrad Meyer 151e04b3fe GEOM label: strip leading/trailing space synthesizing devfs names
%20%20%20 is ugly and doesn't really help make human-readable devfs names.

PR:		243318
Reported by:	Peter Eriksson <pen AT lysator.liu.se>
Relnotes:	yes
2020-01-18 03:33:44 +00:00
Warner Losh 3cf5dd8401 Use buf to send speedup
It turns out there's a problem with using g_io to send the speedup. It leads to
a race when there's a resource shortage when a disk fails.

Instead, send BIO_SPEEDUP via struct buf. This is pretty straight forward,
except we need to transfer the bio_flags from b_ioflags for BIO_SPEEDUP commands
in g_vfs_strategy.

Reviewed by: kirk, chs
Differential Revision: https://reviews.freebsd.org/D23117
2020-01-17 01:16:19 +00:00
Warner Losh 8b522bdae6 Pass BIO_SPEEDUP through all the geom layers
While some geom layers pass unknown commands down, not all do. For the ones that
don't, pass BIO_SPEEDUP down to the providers that constittue the geom, as
applicable. No changes to vinum or virstor because I was unsure how to add this
support, and I'm also unsure how to test these. gvinum doesn't implement
BIO_FLUSH either, so it may just be poorly maintained. gvirstor is for testing
and not supportig BIO_SPEEDUP is fine.

Reviewed by: chs
Differential Revision: https://reviews.freebsd.org/D23183
2020-01-17 01:15:55 +00:00
Mateusz Guzik 879e0604ee Add KERNEL_PANICKED macro for use in place of direct panicstr tests 2020-01-12 06:07:54 +00:00
Mateusz Guzik c8b3463dd0 vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)
The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.

Reviewed by:	kib (previous version)
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D23036
2020-01-07 15:56:24 +00:00
Alexander Motin 0aabbeff36 Remove extra check for provider being closed.
We already checked for that earlier, and since we hold topology lock
it could not change.

MFC after:	1 week
2020-01-02 20:30:53 +00:00
Alexander Motin 4aa1289a38 Avoid few memory accesses in g_disk_done(). 2019-12-31 03:43:13 +00:00
Alexander Motin 024932aae9 Use atomic for start_count in devstat_start_transaction().
Combined with earlier nstart/nend removal it allows to remove several locks
from request path of GEOM and few other places.  It would be cool if we had
more SMP-friendly statistics, but this helps too.

Sponsored by:	iXsystems, Inc.
2019-12-30 03:13:38 +00:00
Alexander Motin 9794a803fd Retire nstart/nend counters.
Those counters were abused for decade to workaround broken orphanization
process in different classes by delaying the call while there are active
requests.  But from one side it did not close all the races, while from
another was quite expensive on SMP due to trashing twice per request cache
lines of consumer and provider and requiring locks.  It lost its sense
after I manually went through all the GEOM classes in base and made
orphanization wait for either provider close or request completion.

Consumer counters are still used under INVARIANTS to detect premature
consumer close and detach.  Provider counters are removed completely.

Sponsored by:	iXsystems, Inc.
2019-12-30 00:46:10 +00:00
Alexander Motin 86c06ff886 Remove GEOM_SCHED class and gsched tool.
This code was not actively maintained since it was introduced 10 years ago.
It lacks support for many later GEOM features, such as direct dispatch,
unmapped I/O, stripesize/stripeoffset, resize, etc.  Plus it is the only
remaining use of GEOM nstart/nend request counters, used there to implement
live insertion/removal, questionable by itself.  Plus, as number of people
commented, GEOM is not the best place for I/O scheduler, since it has
limited information about layers both above and below it, required for
efficient scheduling.  Plus with the modern shift to SSDs there is just no
more significant need for this kind of scheduling.

Approved by:	imp, phk, luigi
Relnotes:	yes
2019-12-29 21:16:03 +00:00
Alexander Motin cfdb91850c Missed part of r356162.
If we postpone consumer destruction till close, then the close calls should
not be ignored.  Delay geom withering till the last close too.

MFC after:	2 weeks
X-MFC-with:	r356162
Sponsored by:	iXsystems, Inc.
2019-12-29 19:33:41 +00:00
Alexander Motin 1d301810d3 Fix GEOM_VIRSTOR orphanization.
Previous code closed and destroyed consumer even with I/O in progress.
This patch postpones the destruction till the last close.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-29 19:21:29 +00:00
Alexander Motin d2d5fee931 Fix GEOM_MOUNTVER orphanization.
Previous code closed and detached consumer even with I/O still in progress.
This patch adds locking and request counting to postpone the close till
the last of running requests completes.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-29 17:10:21 +00:00
Mariusz Zaborski 645532a448 gnop: change the "count until fail" option
Change the "count_until_fail" option of gnop, now it enables the failing
rating instead of setting them to 100%.

The original patch introduced the new flag, which sets the fail/rate to 100%
after N requests. In some cases, we don't want to have 100% of failure
probabilities. We want to start failing at some point.
For example, on the early stage, we may like to allow some read/writes requests
before having some requests delayed - when we try to mount the partition,
or when we are trying to import the pool.
Another case may be to check how scrub in ZFS will behave on different stages.

This allows us to cover more cases.
The previous behavior still may be configured.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22632
2019-12-29 15:47:37 +00:00
Mariusz Zaborski 80e63e0a90 gnop: allow to change the name of created device
Thanks to this option we can create more then one gnop provider from
single provider. This may be useful for temporary labeling some data
on the disk.

Reviewed by:	markj, allanjude, bcr
Differential Revision:	https://reviews.freebsd.org/D22304
2019-12-29 15:40:02 +00:00
Alexander Motin 6a8eef35b5 Fix GEOM_SHSEC orphanization.
Previous code closed and destroyed consumer even with I/O in progress.
This patch postpones the destruction till the last close, identical to
GEOM_STRIPE, since they seem to have common origin.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-28 23:21:53 +00:00
Alexander Motin 351d0fa6df Fix GEOM_GATE orphanization.
Previous code closed and destroyed direct read consumer even with I/O still
in progress.  This patch adds locking and request counting to postpone the
close till the last of running requests completes.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-28 17:52:53 +00:00
Alexander Motin 2178f45b86 Fix GEOM_UZIP orphanization.
Previous code destroyed softc even with provider still open, that resulted
in panic under load.  This change postpones the free till the final close,
when we know for sure there will be no more I/O requests.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-27 21:44:13 +00:00
Alexander Motin a29df733fa Reimplement gvinum orphanization.
gvinum was the only GEOM class, using consumer nstart/nend fields. Making
it do its own accounting for orphanization purposes allows in perspective
to remove burden of that expensive for SMP accounting from GEOM.

Also the previous implementation spinned in a tight event loop, waiting
for all active BIOs to complete, while the new one knows exactly when it
is possible to close the consumer.

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2019-12-27 01:36:53 +00:00
Warner Losh b182c79211 Add BIO_SPEEDUP
Add BIO_SPEEDUP bio command and g_io_speedup wrapper. It tells the
lower layers that the upper layers are dealing with some shortage
(dirty pages and/or disk blocks). The lower layers should do what they
can to speed up anything that's been delayed.

The first use will be to tell the CAM I/O scheduler that any TRIM
shaping should be short-circuited because the system needs
blocks. We'll also call it when there's too many resources used by
UFS.

Reviewed by: kirk, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D18351
2019-12-17 00:13:35 +00:00
Edward Tomasz Napierala 2006d590d6 Add kern.geom.part.separator tunable. This makes it possible
to specify an optional separator to insert before partition name;
eg if it's set to "c/", you'll get "ada0c/s1" instead of "ada0s1".
(It cannot be set to just “/“, since ada0 is a device node, not
a directory.)

Reviewed by:	imp
MFC after:	2 weeks
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D22193
2019-12-13 09:28:44 +00:00
Alexander Motin 5ccbeea1c5 Remove some branching from GEOM_DISK hot path.
pp->private just can not be NULL in those places.

In g_disk_start() and g_disk_ioctl() both dp != NULL and !dp->d_destroyed
should always be true if disk_gone() and disk_destroy() are used properly,
since GEOM does not send requests to errored providers.  If the protocol is
not followed, then no amount of additional checks here give real safety.

In g_disk_access() though the checks are useful, since GEOM blocks only
new opens for errored providers, but allows closes.  It should not happen
if disk_gone() and disk_destroy() are used properly, but may otherwise.

To improve cases when disk_gone() is not used, call it from disk_destroy().
It does not give full guaranties, but it errors the provider and makes
GEOM block unwanted requests at least after some race.

MFC after:	2 weeks
2019-12-06 16:48:36 +00:00
Alexander Motin 19cfcf253e Block ioctls for dying GEOM_DEV instances.
For normal I/Os consumer and provider statuses are checked by g_io_check().
But ioctl calls often do not go through it, being dispatched directly. This
change makes their semantics more alike, protecting lower levels.

MFC after:	2 weeks
2019-12-06 03:46:38 +00:00
Alexander Motin 6b3c68bf09 Make GEOM_DEV code slightly more compact.
Should be no functional change.

MFC after:	2 weeks
2019-12-06 03:18:37 +00:00
Alan Somers 67f72211dd gmultipath: add ATF tests
Add ATF tests for most gmultipath operations. Add some dtrace probes too,
primarily for configuration changes that happen in response to provider
errors.

PR:		178473
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D22235
2019-12-06 00:12:14 +00:00
Alexander Motin c4c88d4718 Remove duplicate g_debugflags declaration.
While there, define G_F_FOOTSHOOTING instead of numeric constants.

MFC after:	13 days
X-MFX-with:	r355412
2019-12-05 15:07:32 +00:00
Alexander Motin 2efaef42e4 Wrap g_trace() into a macro to avoid unneeded calls.
In most cases with debug disabled this function does nothing, but argument
passing and the call still cost measurable time due to cache misses, etc.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-05 04:52:19 +00:00
Alexander Motin e61ed7983e Switch GEOM_DEV from make_dev_p() to make_dev_s().
It closes the race condition and so allows to remove few NULL checks.

Also while there, use dev->si_drv1 in addition to cp->private to store
softc pointer.  For calls coming from the dev side it gives reliable cache
hit instead of often miss before.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-05 04:03:08 +00:00
Alexander Motin 61322a0a8a Mark some more hot global variables with __read_mostly.
MFC after:	1 week
2019-12-04 21:26:03 +00:00
Warner Losh 283a5a3796 We don't even need Giant here. It isn't protecting anything internal
to geom, and nothing we call requires it to be held. It's left over
from a time when the latter wasn't the case. Retire it.

Reviewed in concept: scottl@
2019-11-23 23:44:00 +00:00
Edward Tomasz Napierala b5961be1ab Add GEOM attribute to report physical device name, and report it
via 'diskinfo -v'.  This avoids the need to track it down via CAM,
and should also work for disks that don't use CAM.  And since it's
inherited thru the GEOM hierarchy, in most cases one doesn't need
to walk the GEOM graph either, eg you can use it on a partition
instead of disk itself.

Reviewed by:	allanjude, imp
Sponsored by:	Klara Inc
Differential Revision:	https://reviews.freebsd.org/D22249
2019-11-09 17:30:19 +00:00
Chuck Silvers 30738a349d Make all the gnop parameters optional in the request from userland,
filling in the same defaults that the current userland module uses.
This allows an old geom_nop.so userland module to work with a new kernel.

Approved by:	imp (mentor)
Reviewed by:	cem
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21972
2019-10-16 21:49:44 +00:00
Chuck Silvers 1e00bb458d Add a new gctl_get_paraml_opt() interface to extract optional parameters from
the request.  It is the same as gctl_get_paraml() except that the request
is not marked with an error if the parameter is not present.

Approved by:	imp (mentor)
Reviewed by:	cem
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21972
2019-10-16 21:49:39 +00:00
Chuck Silvers 090a3ea3c2 Add a "count_until_fail" option to gnop, which says to start failing
I/O requests after the given number have been allowed though.

Approved by:    imp (mentor)
Reviewed by:    rpokala kib 0mp mckusick
Sponsored by:   Netflix
Differential Revision:  https://reviews.freebsd.org/D21593
2019-09-13 23:03:56 +00:00
Kyle Evans ef03f57dd2 Allow more nesting of GEOM partitioning schemes
GEOM is supposed to be topology-agnostic, but the GPT and BSD partition code
has arbitrary restrictions on nesting that are annoying in cases such as
running VMs on raw partitions (since the VM's partitioning scheme is not
visible to the host).

This patch adds sysctls to disable the restrictions except in the case of
BSD label (and similar) partitions with offset 0 (where we need to avoid
recursively recognizing the label).

Submitted by:	Andrew Gierth
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D21350
2019-09-03 20:57:20 +00:00
Conrad Meyer eefd8f96fb geom_uzip(4), mkuzip(8): Add Zstd image mode
The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility
with older systems.  Support in geom_uzip(4) is conditional on the ZSTDIO
kernel option, which is enabled in amd64 GENERIC, but not all in-tree
configurations.

mkuzip(8) was modified slightly to always initialize the nblocks + 1'th
offset in the CLOOP file format.  Previously, it was only initialized in the
case where the final compressed block happened to be unaligned w.r.t.
DEV_BSIZE.  The "Fake" last+1 block change in r298619 means that the final
compressed block's 'blen' was never correct unless the compressed uzip image
happened to be BSIZE-aligned.  This happened in about 1 out of every 512
cases.  The zlib and lzma decompressors are probably tolerant of extra trash
following the frame they were told to decode, but Zstd complains that the
input size is incorrect.

Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the
nblocks + 1'th offset when it is known to be initialized to a good value.
This corrects the calculated final real cluster compressed length to match
that printed by mkuzip(8).

mkuzip(8) was refactored somewhat to reduce code duplication and increase
ease of adding other compression formats.

  * Input block size validation was pulled out of individual compression
    init routines into main().

  * Init routines now validate a user-provided compression level or select
    an algorithm-specific default, if none was provided.

  * A new interface for calculating the maximal compressed size of an
    incompressible input block was added for each driver.  The generic code
    uses it to validate against MAXPHYS as well as to allocate compression
    result buffers in the generic code.

  * Algorithm selection is now driven by a table lookup, to increase ease of
    adding other formats in the future.

mkuzip(8) gained the ability to explicitly specify a compression level with
'-C'.  The prior defaults -- 9 for zlib and 6 for lzma -- are maintained.
The new zstd default is 9, to match zlib.

Rather than select lzma or zlib with '-L' or its absense, respectively, a
new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or
'zstd'.  '-L' is considered deprecated, but will probably never be removed.

All of the new features were documented in mkuzip.8; the page was also
cleaned up slightly.

Relnotes:	yes
2019-08-13 23:32:56 +00:00
Conrad Meyer ac8e5d02cf Remove deprecated GEOM classes
Follow-up on r322318 and r322319 and remove the deprecated modules.

Shift some now-unused kernel files into userspace utilities that incorporate
them.  Remove references to removed GEOM classes in userspace utilities.

Reviewed by:	imp (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21249
2019-08-13 20:06:55 +00:00
Xin LI 2b0cabbdae Update geom_uzip to use new zlib:
- Use new zlib headers;
 - Removed z_alloc and z_free to use the common sys/dev/zlib version.
 - Replace z_compressBound with compressBound from zlib.

While there, limit LZMA CFLAGS to apply only for g_uzip_lzma.c.

PR:		229763
Submitted by:	Yoshihiro Ota <ota j email ne jp> (with changes,
		bugs are mine)
Differential Revision:	https://reviews.freebsd.org/D20271
2019-08-08 06:27:39 +00:00
Conrad Meyer ac03832ef3 GEOM: Reduce unnecessary log interleaving with sbufs
Similar to what was done for device_printfs in r347229.

Convert g_print_bio() to a thin shim around g_format_bio(), which acts on an
sbuf; documented in g_bio.9.

Reviewed by:	markj
Discussed with:	rlibby
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21165
2019-08-07 19:28:35 +00:00
Kirk McKusick e9660daffb Ignore UFS/FFS superblock check hash failures so as to allow a higher
level in the filesystem stack to decide what to do about them.

Reported by:  Peter Holm
Tested by:    Peter Holm
Sponsored by: Netflix
2019-08-06 18:28:44 +00:00
Mariusz Zaborski 4d7486c30f gnop: style nits 2019-07-31 17:51:06 +00:00
Mariusz Zaborski 4f80c85519 gnop: Introduce requests delay.
This allows to simulated disk that is responding slowly to the IO requests.

Reviewed by:	markj, bcr, pjd (previous version)
Differential Revision:	https://reviews.freebsd.org/D21052
2019-07-31 17:47:12 +00:00