Commit graph

183 commits

Author SHA1 Message Date
Warner Losh fdafd315ad sys: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by:		Netflix
2023-11-26 22:24:00 -07:00
Warner Losh 685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Zhenlei Huang bd5d9037c5 GEOM: Remove redundant NULL pointer check before g_free()
Reviewed by:	melifaro, pjd, imp
Approved by:	kp (mentor)
Differential Revision:	https://reviews.freebsd.org/D37779
2022-12-28 23:34:09 +08:00
Alan Somers 5f438dd3ac ses: don't panic if disk elements have really weird descriptors
SES allows element descriptors to contain characters like spaces and
quotes that devfs does not allow to appear in device aliases.  Since SES
element descriptors are outside of the kernel's control, we should
gracefully handle a failure to create a device physical path alias.

PR:		264513
Reported by:	Yuri <yuri@aetern.org>
Reviewed by:	imp, mav
Sponsored by:	Axcient
MFC after:	2 weeks
2022-06-23 11:19:20 -06:00
Robert Wing 65c87a6c81 geom_dev: extend kevent support for geom dev
Add support for the following NOTE events:
    NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, NOTE_READ, and NOTE_WRITE.

Differential Revision:	https://reviews.freebsd.org/D34777
2022-04-28 08:40:13 -08:00
Mitchell Horne 0a90043e63 Remove 12.x ABI compat for kernel dump ioctls
This code was marked gone_in(14), so it can now be removed.

The only consumer of this interface is dumpon(8). We do not maintain
strict backwards compatibility for this utility because a) it
can't/shouldn't be used from a jail or chroot and b) it is highly
specific interface unique to FreeBSD. The host's (presumably more
up-to-date) copy of dumpon(8) should be used to configure kernel dump
devices.

Reviewed by:	markj, emaste
MFC after:	never
Differential Revision:	https://reviews.freebsd.org/D34914
2022-04-15 12:06:05 -03:00
Mitchell Horne 9c90bfcd31 Remove 11.x ABI compat for kernel dump ioctls
This code was marked gone_in(13), so its time has passed.

The only consumer of this interface is dumpon(8). We do not maintain
strict backwards compatibility for this utility because a) it
can't/shouldn't be used from a jail or chroot and b) it is highly
specific interface unique to FreeBSD. The host's (presumably more
up-to-date) copy of dumpon(8) should be used to configure kernel dump
devices.

Reviewed by:	markj, emaste
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D34913
2022-04-15 12:06:04 -03:00
Mark Johnston d91d2b513e geom: Handle partial I/O in g_{read,write,delete}_data()
These routines are used internally by GEOM to dispatch I/O requests to a
provider, typically for tasting or for updating GEOM class metadata
blocks.

These routines assumed that partial I/O did not occur without setting
BIO_ERROR, but this is possible in at least two cases:
- Some or all of the I/O range is beyond the provider's mediasize.
  In this scenario g_io_check() truncates the bounds of the request
  before it is handed to the target provider.
- A read from vnode-backed md(4) device returns EOF (the backing vnode
  is allowed to be smaller than the device itself) or partial vnode I/O
  occurs.
In these scenarios g_read_data() could return a partially uninitialized
buffer.  Many consumers are not affected by the first case, since the
offsets used for provider metadata or tasting are relative to the
provider's mediasize, but in some cases metadata is read at fixed
offsets, such as when searching for a UFS superblock using the offsets
defined by SBLOCKSEARCH.

Thus, modify the routines to explicitly check for a non-zero residual
and return EIO in that case.  Remove a related check from the
DIOCGDELETE ioctl handler, it is handled within g_delete_data() now.

Reviewed by:	mav, imp, kib
Reported by:	KMSAN
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31293
2022-01-20 08:29:39 -05:00
Robert Wing a50e92cc20 geom: add kqfilter support for geom dev
The only event hooked up is NOTE_ATTRIB, which is triggered when the
device is resized. Support for other NOTE_* events to follow.

Reviewed by:	kib, jhb
Differential Revision:	https://reviews.freebsd.org/D33402
2022-01-18 10:54:59 -09:00
Konstantin Belousov cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Warner Losh a3f4217ec0 Remove frontstuff
Nothing implements this in the tree. Remove the ioctl and the
conversion to the geom atttribute stuff.

This was introduced in r94287 in 2002 and was retired in r113390
2003. It appeared in FreeBSD 5.0, but no other releases. This is a
vestige that was missed at the time and overlooked until now. No
compat is provided for this reason.  And there's no implementation of
it today. And it was never part of a release from a stable branch.

Reviewed by: phk@
Differential Revision: https://reviews.freebsd.org/D26967
2020-10-27 06:43:24 +00:00
Edward Tomasz Napierala 3001e97deb Fix fallout from r366811.
PR:		250442
Reported by:	lwhsu
Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26855
2020-10-19 20:26:37 +00:00
Edward Tomasz Napierala d22ff249d9 Make g_attach() return ENXIO for orphaned providers; update various
classes to add missing error checking.

Reviewed by:	imp
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26658
2020-10-18 16:24:08 +00:00
Warner Losh 887611b122 Retire devctl_notify_f()
devctl_notify_f isn't needed, so retire it. The flags argument is now
unused, so rather than keep it around, retire it. Convert all old
users of it to devctl_notify(). This path no longer sleeps, so is safe
to call from any context. Since it doesn't sleep, it doesn't need to
know if it is OK to sleep or not.

Reviewed by: markj@
Differential Revision: https://reviews.freebsd.org/D26140
2020-08-29 04:30:06 +00:00
Warner Losh 773e541e8d Use devctl.h instead of bus.h to reduce newbus pollution.
There's no need for these parts of the kernel to know about newbus,
so narrow what is included to devctl.h for device_notify_*.

Suggested by: kib@
2020-08-21 00:03:24 +00:00
Xin LI 8510f61acd sys/geom: consistently use _PATH_DEV instead of hardcoding "/dev/".
Reviewed by:	cem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25565
2020-07-09 02:52:39 +00:00
John Baldwin 4a711b8d04 Use zfree() instead of explicit_bzero() and free().
In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().

Suggested by:	cem
Reviewed by:	cem, delphij
Approved by:	csprng (cem)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25435
2020-06-25 20:17:34 +00:00
Conrad Meyer a9ca503b52 Revert r361838
Reported by:	delphij
2020-06-06 14:19:16 +00:00
Conrad Meyer 5b9b571cb3 geom_label: Use provider aliasing to alias upstream geoms
For synthetic aliases (just pseudonyms inferred from metadata like GPT or
UFS labels, GPT UUIDs, etc), use the GEOM provider aliasing system to create
a symlink to the real device instead of creating an independent device.
This makes it more clear which labels and devices correspond, and we can
safely have multiple labels to a single device accessed at once.

The confusingly named geom_label on-disk construct continues to behave
identically to how it did before.

This requires teaching GEOM's provider aliasing about the possibility
that aliases might be added later in time, and GEOM's devfs interaction
layer not to worry about existing aliases during retaste.

Discussed with:	imp
Relnotes:	sure, if we don't end up reverting it
Differential Revision:	https://reviews.freebsd.org/D24968
2020-06-05 16:12:21 +00:00
Warner Losh ae1cce524e Reimplement aliases in geom
The alias needs to be part of the provider instead of the geom to work
properly. To bind the DEV geom, we need to look at the provider's names and
aliases and create the dev entries from there. If this lives in the GEOM, then
it won't propigate down the tree properly. Remove it from geom, add it provider.

Update geli, gmountver, gnop, gpart, and guzip to use it, which handles the bulk
of the uses in FreeBSD. I think this is all the providers that create a new name
based on their parent's name.
2020-05-13 19:17:28 +00:00
Pawel Biernacki 7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Alexander Motin 19cfcf253e Block ioctls for dying GEOM_DEV instances.
For normal I/Os consumer and provider statuses are checked by g_io_check().
But ioctl calls often do not go through it, being dispatched directly. This
change makes their semantics more alike, protecting lower levels.

MFC after:	2 weeks
2019-12-06 03:46:38 +00:00
Alexander Motin 6b3c68bf09 Make GEOM_DEV code slightly more compact.
Should be no functional change.

MFC after:	2 weeks
2019-12-06 03:18:37 +00:00
Alexander Motin e61ed7983e Switch GEOM_DEV from make_dev_p() to make_dev_s().
It closes the race condition and so allows to remove few NULL checks.

Also while there, use dev->si_drv1 in addition to cp->private to store
softc pointer.  For calls coming from the dev side it gives reliable cache
hit instead of often miss before.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-05 04:03:08 +00:00
Conrad Meyer 6b6e2954dd List-ify kernel dump device configuration
Allow users to specify multiple dump configurations in a prioritized list.
This enables fallback to secondary device(s) if primary dump fails.  E.g.,
one might configure a preference for netdump, but fallback to disk dump as a
second choice if netdump is unavailable.

This change does not list-ify netdump configuration, which is tracked
separately from ordinary disk dumps internally; only one netdump
configuration can be made at a time, for now.  It also does not implement
IPv6 netdump.

savecore(8) is already capable of scanning and iterating multiple devices
from /etc/fstab or passed on the command line.

This change doesn't update the rc or loader variables 'dumpdev' in any way;
it can still be set to configure a single dump device, and rc.d/savecore
still uses it as a single device.  Only dumpon(8) is updated to be able to
configure the more complicated configurations for now.

As part of revving the ABI, unify netdump and disk dump configuration ioctl
/ structure, and leave room for ipv6 netdump as a future possibility.
Backwards-compatibility ioctls are added to smooth ABI transition,
especially for developers who may not keep kernel and userspace perfectly
synced.

Reviewed by:	markj, scottl (earlier version)
Relnotes:	maybe
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19996
2019-05-06 18:24:07 +00:00
Alexander Motin 9c498bd5c3 Call delist_dev() before destroy_dev_sched_cb().
destroy_dev_sched_cb() is excessively asynchronous, and during media change
retaste new provider may appear sooner then device of the previous one get
destroyed.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-04-24 19:56:02 +00:00
Pawel Jakub Dawidek e6b0d5eb9f Introduce new event SIZECHANGE within GEOM system to inform about GEOM
providers mediasize changes.

While here, use GEOM nomenclature to describe providers instead of calling
them device nodes.

Obtained from:	Fudo Security
Tested in:	AWS
2019-03-30 07:24:34 +00:00
Mark Johnston d4fbe32c65 Limit the number of entries allocated for a REPORT_ZONES command.
The DIOCGETZONE ioctl can be used to fetch the zone list of an SMR
drive, and the caller specifies the number of entries it wants to fetch.
Clamp the caller's request to a sane limit so that a user cannot attempt
large allocations. Callers already need to invoke the ioctl multiple
times to fetch the full list in general, so there's no harm in limiting
the number of entries returned.

Fix style while here.

admbug:		807
Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by:	asomers, ken
Tested by:	ken
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19249
2019-02-19 21:33:02 +00:00
Alexander Motin 02a9923034 Switch from mutexes to atomics in GEOM_DEV I/O path.
Mutexes in I/O path there were used twice per I/O to atomically access
several variables to close and/or destroy the device on last request
completion.  I found the way to fit all required info into one integer,
suitable for atomic operations.  It opened race window on device close,
but addition of timeout to the msleep() there should cover it.

Profiling shows removal of significant spinning time on those mutexes
and IOPS increase from ~600K to >800K to NVMe on 72-core systems.

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2018-12-27 19:15:24 +00:00
Maxim Sobolev 9dcafe16d4 Another attempt to fix issue with the DIOCGDELETE ioctl(2) not
handling slightly out-of-bound requests properly (r340187).
Perform range check here rather then rely on g_delete_data() to DTRT.

The g_delete_data() would always return success for requests
starting just the next byte after providers media boundary.

MFC after:	4 weeks
2018-12-04 21:48:56 +00:00
Mark Johnston bd92e6b6f5 Refactor some of the MI kernel dump code in preparation for netdump.
- Add clear_dumper() to complement set_dumper().
- Drain netdump's preallocated mbuf pool when clearing the dumper.
- Don't do bounds checking for dumpers with mediasize 0.
- Add dumper callbacks for initialization for writing out headers.

Reviewed by:	sbruno
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15252
2018-05-06 00:22:38 +00:00
Brooks Davis 6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Pedro F. Giffuni 3728855a0f sys/geom: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:17:37 +00:00
Mark Johnston 64a16434d8 Add support for compressed kernel dumps.
When using a kernel built with the GZIO config option, dumpon -z can be
used to configure gzip compression using the in-kernel copy of zlib.
This is useful on systems with large amounts of RAM, which require a
correspondingly large dump device. Recovery of compressed dumps is also
faster since fewer bytes need to be copied from the dump device.

Because we have no way of knowing the final size of a compressed dump
until it is written, the kernel will always attempt to dump when
compression is configured, regardless of the dump device size. If the
dump is aborted because we run out of space, an error is reported on
the console.

savecore(8) is modified to handle compressed dumps and save them to
vmcore.<index>.gz, as it does when given the -z option.

A new rc.conf variable, dumpon_flags, is added. Its value is added to
the boot-time dumpon(8) invocation that occurs when a dump device is
configured in rc.conf.

Reviewed by:	cem (earlier version)
Discussed with:	def, rgrimes
Relnotes:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11723
2017-10-25 00:51:00 +00:00
Warner Losh 36d6e01474 Eliminate useless adjustments of aliased device.
No need to set any fields in the cloned device. devfs uses symlinks,
so the adev entries returned won't be presented to the drivers. Since
we don't save copies, nothing else will see them. This code came from
the old compat code, and it appears to be obsolete or never needed.

Submitted by: kib@
Differential Review: https://reviews.freebsd.org/D11919
2017-08-07 22:42:46 +00:00
Warner Losh c624eb2598 Add aliasing concept to geom.
Add an alias name list to geoms. Use them in geom_dev to create
aliases. Previously, geom_dev would create an device node for the name
of the geom. Now, additional nodes are created pointing back to the
primary node with make_dev_alias_p. Aliases must be in place on the
geom before any tasting occurs.

Differential Revision: https://reviews.freebsd.org/D11873
2017-08-07 21:12:28 +00:00
Alexander Motin 4d5832bc12 When chunking large DIOCGDELETE, do it on stripe edge.
MFC after:	2 weeks
2017-03-08 12:18:58 +00:00
Konrad Witaszczyk 480f31c214 Add support for encrypted kernel crash dumps.
Changes include modifications in kernel crash dump routines, dumpon(8) and
savecore(8). A new tool called decryptcore(8) was added.

A new DIOCSKERNELDUMP I/O control was added to send a kernel crash dump
configuration in the diocskerneldump_arg structure to the kernel.
The old DIOCSKERNELDUMP I/O control was renamed to DIOCSKERNELDUMP_FREEBSD11 for
backward ABI compatibility.

dumpon(8) generates an one-time random symmetric key and encrypts it using
an RSA public key in capability mode. Currently only AES-256-CBC is supported
but EKCD was designed to implement support for other algorithms in the future.
The public key is chosen using the -k flag. The dumpon rc(8) script can do this
automatically during startup using the dumppubkey rc.conf(5) variable.  Once the
keys are calculated dumpon sends them to the kernel via DIOCSKERNELDUMP I/O
control.

When the kernel receives the DIOCSKERNELDUMP I/O control it generates a random
IV and sets up the key schedule for the specified algorithm. Each time the
kernel tries to write a crash dump to the dump device, the IV is replaced by
a SHA-256 hash of the previous value. This is intended to make a possible
differential cryptanalysis harder since it is possible to write multiple crash
dumps without reboot by repeating the following commands:
# sysctl debug.kdb.enter=1
db> call doadump(0)
db> continue
# savecore

A kernel dump key consists of an algorithm identifier, an IV and an encrypted
symmetric key. The kernel dump key size is included in a kernel dump header.
The size is an unsigned 32-bit integer and it is aligned to a block size.
The header structure has 512 bytes to match the block size so it was required to
make a panic string 4 bytes shorter to add a new field to the header structure.
If the kernel dump key size in the header is nonzero it is assumed that the
kernel dump key is placed after the first header on the dump device and the core
dump is encrypted.

Separate functions were implemented to write the kernel dump header and the
kernel dump key as they need to be unencrypted. The dump_write function encrypts
data if the kernel was compiled with the EKCD option. Encrypted kernel textdumps
are not supported due to the way they are constructed which makes it impossible
to use the CBC mode for encryption. It should be also noted that textdumps don't
contain sensitive data by design as a user decides what information should be
dumped.

savecore(8) writes the kernel dump key to a key.# file if its size in the header
is nonzero. # is the number of the current core dump.

decryptcore(8) decrypts the core dump using a private RSA key and the kernel
dump key. This is performed by a child process in capability mode.
If the decryption was not successful the parent process removes a partially
decrypted core dump.

Description on how to encrypt crash dumps was added to the decryptcore(8),
dumpon(8), rc.conf(5) and savecore(8) manual pages.

EKCD was tested on amd64 using bhyve and i386, mipsel and sparc64 using QEMU.
The feature still has to be tested on arm and arm64 as it wasn't possible to run
FreeBSD due to the problems with QEMU emulation and lack of hardware.

Designed by:	def, pjd
Reviewed by:	cem, oshogbo, pjd
Partial review:	delphij, emaste, jhb, kib
Approved by:	pjd (mentor)
Differential Revision:	https://reviews.freebsd.org/D4712
2016-12-10 16:20:39 +00:00
Conrad Meyer 8532d381a9 Add BUF_TRACKING and FULL_BUF_TRACKING buffer debugging
Upstream the BUF_TRACKING and FULL_BUF_TRACKING buffer debugging code.
This can be handy in tracking down what code touched hung bios and bufs
last. The full history is especially useful, but adds enough bloat that
it shouldn't be enabled in release builds.

Function names (or arbitrary string constants) are tracked in a
fixed-size ring in bufs. Bios gain a pointer to the upper buf for
tracking. SCSI CCBs gain a pointer to the upper bio for tracking.

Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8366
2016-10-31 23:09:52 +00:00
Alan Somers 151746b244 Avoid issuing spa config updates for physical path when not necessary
ZFS's configuration needs to be updated whenever the physical path for a
device changes, but not when a new device is introduced. This is because new
devices necessarily cause config updates, but only if they are actually
accepted into the pool.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
	Split vdev_geom_set_physpath out of vdev_geom_attrchanged.  When
	setting the vdev's physical path, only request a config update if
	the physical path has changed.  Don't request it when opening a
	device for the first time, because the config sync will happen
	anyway upstack.

sys/geom/geom_dev.c
	Split g_dev_set_physpath and g_dev_set_media out of
	g_dev_attrchanged

Submitted by:	will, asomers
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D6428
2016-05-27 22:32:44 +00:00
Kenneth D. Merry 9a6844d55f Add support for managing Shingled Magnetic Recording (SMR) drives.
This change includes support for SCSI SMR drives (which conform to the
Zoned Block Commands or ZBC spec) and ATA SMR drives (which conform to
the Zoned ATA Command Set or ZAC spec) behind SAS expanders.

This includes full management support through the GEOM BIO interface, and
through a new userland utility, zonectl(8), and through camcontrol(8).

This is now ready for filesystems to use to detect and manage zoned drives.
(There is no work in progress that I know of to use this for ZFS or UFS, if
anyone is interested, let me know and I may have some suggestions.)

Also, improve ATA command passthrough and dispatch support, both via ATA
and ATA passthrough over SCSI.

Also, add support to camcontrol(8) for the ATA Extended Power Conditions
feature set.  You can now manage ATA device power states, and set various
idle time thresholds for a drive to enter lower power states.

Note that this change cannot be MFCed in full, because it depends on
changes to the struct bio API that break compatilibity.  In order to
avoid breaking the stable API, only changes that don't touch or depend on
the struct bio changes can be merged.  For example, the camcontrol(8)
changes don't depend on the new bio API, but zonectl(8) and the probe
changes to the da(4) and ada(4) drivers do depend on it.

Also note that the SMR changes have not yet been tested with an actual
SCSI ZBC device, or a SCSI to ATA translation layer (SAT) that supports
ZBC to ZAC translation.  I have not yet gotten a suitable drive or SAT
layer, so any testing help would be appreciated.  These changes have been
tested with Seagate Host Aware SATA drives attached to both SAS and SATA
controllers.  Also, I do not have any SATA Host Managed devices, and I
suspect that it may take additional (hopefully minor) changes to support
them.

Thanks to Seagate for supplying the test hardware and answering questions.

sbin/camcontrol/Makefile:
	Add epc.c and zone.c.

sbin/camcontrol/camcontrol.8:
	Document the zone and epc subcommands.

sbin/camcontrol/camcontrol.c:
	Add the zone and epc subcommands.

	Add auxiliary register support to build_ata_cmd().  Make sure to
	set the CAM_ATAIO_NEEDRESULT, CAM_ATAIO_DMA, and CAM_ATAIO_FPDMA
	flags as appropriate for ATA commands.

	Add a new get_ata_status() function to parse ATA result from SCSI
	sense descriptors (for ATA passthrough over SCSI) and ATA I/O
	requests.

sbin/camcontrol/camcontrol.h:
	Update the build_ata_cmd() prototype

	Add get_ata_status(), zone(), and epc().

sbin/camcontrol/epc.c:
	Support for ATA Extended Power Conditions features.  This includes
	support for all features documented in the ACS-4 Revision 12
	specification from t13.org (dated February 18, 2016).

	The EPC feature set allows putting a drive into a power power mode
	immediately, or setting timeouts so that the drive will
	automatically enter progressively lower power states after various
	idle times.

sbin/camcontrol/fwdownload.c:
	Update the firmware download code for the new build_ata_cmd()
	arguments.

sbin/camcontrol/zone.c:
	Implement support for Shingled Magnetic Recording (SMR) drives
	via SCSI Zoned Block Commands (ZBC) and ATA Zoned Device ATA
	Command Set (ZAC).

	These specs were developed in concert, and are functionally
	identical.  The primary differences are due to SCSI and ATA
	differences.  (SCSI is big endian, ATA is little endian, for
	example.)

	This includes support for all commands defined in the ZBC and
	ZAC specs.

sys/cam/ata/ata_all.c:
	Decode a number of additional ATA command names in ata_op_string().

	Add a new CCB building function, ata_read_log().

	Add ata_zac_mgmt_in() and ata_zac_mgmt_out() CCB building
	functions.  These support both DMA and NCQ encapsulation.

sys/cam/ata/ata_all.h:
	Add prototypes for ata_read_log(), ata_zac_mgmt_out(), and
	ata_zac_mgmt_in().

sys/cam/ata/ata_da.c:
	Revamp the ada(4) driver to support zoned devices.

	Add four new probe states to gather information needed for zone
	support.

	Add a new adasetflags() function to avoid duplication of large
	blocks of flag setting between the async handler and register
	functions.

	Add new sysctl variables that describe zone support and paramters.

	Add support for the new BIO_ZONE bio, and all of its subcommands:
	DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
	DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

sys/cam/scsi/scsi_all.c:
	Add command descriptions for the ZBC IN/OUT commands.

	Add descriptions for ZBC Host Managed devices.

	Add a new function, scsi_ata_pass() to do ATA passthrough over
	SCSI.  This will eventually replace scsi_ata_pass_16() -- it
	can create the 12, 16, and 32-byte variants of the ATA
	PASS-THROUGH command, and supports setting all of the
	registers defined as of SAT-4, Revision 5 (March 11, 2016).

	Change scsi_ata_identify() to use scsi_ata_pass() instead of
	scsi_ata_pass_16().

	Add a new scsi_ata_read_log() function to facilitate reading
	ATA logs via SCSI.

sys/cam/scsi/scsi_all.h:
	Add the new ATA PASS-THROUGH(32) command CDB.  Add extended and
	variable CDB opcodes.

	Add Zoned Block Device Characteristics VPD page.

	Add ATA Return SCSI sense descriptor.

	Add prototypes for scsi_ata_read_log() and scsi_ata_pass().

sys/cam/scsi/scsi_da.c:
	Revamp the da(4) driver to support zoned devices.

	Add five new probe states, four of which are needed for ATA
	devices.

	Add five new sysctl variables that describe zone support and
	parameters.

	The da(4) driver supports SCSI ZBC devices, as well as ATA ZAC
	devices when they are attached via a SCSI to ATA Translation (SAT)
	layer.  Since ZBC -> ZAC translation is a new feature in the T10
	SAT-4 spec, most SATA drives will be supported via ATA commands
	sent via the SCSI ATA PASS-THROUGH command.  The da(4) driver will
	prefer the ZBC interface, if it is available, for performance
	reasons, but will use the ATA PASS-THROUGH interface to the ZAC
	command set if the SAT layer doesn't support translation yet.
	As I mentioned above, ZBC command support is untested.

	Add support for the new BIO_ZONE bio, and all of its subcommands:
	DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
	DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

	Add scsi_zbc_in() and scsi_zbc_out() CCB building functions.

	Add scsi_ata_zac_mgmt_out() and scsi_ata_zac_mgmt_in() CCB/CDB
	building functions.  Note that these have return values, unlike
	almost all other CCB building functions in CAM.  The reason is
	that they can fail, depending upon the particular combination
	of input parameters.  The primary failure case is if the user
	wants NCQ, but fails to specify additional CDB storage.  NCQ
	requires using the 32-byte version of the SCSI ATA PASS-THROUGH
	command, and the current CAM CDB size is 16 bytes.

sys/cam/scsi/scsi_da.h:
	Add ZBC IN and ZBC OUT CDBs and opcodes.

	Add SCSI Report Zones data structures.

	Add scsi_zbc_in(), scsi_zbc_out(), scsi_ata_zac_mgmt_out(), and
	scsi_ata_zac_mgmt_in() prototypes.

sys/dev/ahci/ahci.c:
	Fix SEND / RECEIVE FPDMA QUEUED in the ahci(4) driver.

	ahci_setup_fis() previously set the top bits of the sector count
	register in the FIS to 0 for FPDMA commands.  This is okay for
	read and write, because the PRIO field is in the only thing in
	those bits, and we don't implement that further up the stack.

	But, for SEND and RECEIVE FPDMA QUEUED, the subcommand is in that
	byte, so it needs to be transmitted to the drive.

	In ahci_setup_fis(), always set the the top 8 bits of the
	sector count register.  We need it in both the standard
	and NCQ / FPDMA cases.

sys/geom/eli/g_eli.c:
	Pass BIO_ZONE commands through the GELI class.

sys/geom/geom.h:
	Add g_io_zonecmd() prototype.

sys/geom/geom_dev.c:
	Add new DIOCZONECMD ioctl, which allows sending zone commands to
	disks.

sys/geom/geom_disk.c:
	Add support for BIO_ZONE commands.

sys/geom/geom_disk.h:
	Add a new flag, DISKFLAG_CANZONE, that indicates that a given
	GEOM disk client can handle BIO_ZONE commands.

sys/geom/geom_io.c:
	Add a new function, g_io_zonecmd(), that handles execution of
	BIO_ZONE commands.

	Add permissions check for BIO_ZONE commands.

	Add command decoding for BIO_ZONE commands.

sys/geom/geom_subr.c:
	Add DDB command decoding for BIO_ZONE commands.

sys/kern/subr_devstat.c:
	Record statistics for REPORT ZONES commands.  Note that the
	number of bytes transferred for REPORT ZONES won't quite match
	what is received from the harware.  This is because we're
	necessarily counting bytes coming from the da(4) / ada(4) drivers,
	which are using the disk_zone.h interface to communicate up
	the stack.  The structure sizes it uses are slightly different
	than the SCSI and ATA structure sizes.

sys/sys/ata.h:
	Add many bit and structure definitions for ZAC, NCQ, and EPC
	command support.

sys/sys/bio.h:
	Convert the bio_cmd field to a straight enumeration.  This will
	yield more space for additional commands in the future.  After
	change r297955 and other related changes, this is now possible.
	Converting to an enumeration will also prevent use as a bitmask
	in the future.

sys/sys/disk.h:
	Define the DIOCZONECMD ioctl.

sys/sys/disk_zone.h:
	Add a new API for managing zoned disks.  This is very close to
	the SCSI ZBC and ATA ZAC standards, but uses integers in native
	byte order instead of big endian (SCSI) or little endian (ATA)
	byte arrays.

	This is intended to offer to the complete feature set of the ZBC
	and ZAC disk management without requiring the application developer
	to include SCSI or ATA headers.  We also use one set of headers
	for ioctl consumers and kernel bio-level consumers.

sys/sys/param.h:
	Bump __FreeBSD_version for sys/bio.h command changes, and inclusion
	of SMR support.

usr.sbin/Makefile:
	Add the zonectl utility.

usr.sbin/diskinfo/diskinfo.c
	Add disk zoning capability to the 'diskinfo -v' output.

usr.sbin/zonectl/Makefile:
	Add zonectl makefile.

usr.sbin/zonectl/zonectl.8
	zonectl(8) man page.

usr.sbin/zonectl/zonectl.c
	The zonectl(8) utility.  This allows managing SCSI or ATA zoned
	disks via the disk_zone.h API.  You can report zones, reset write
	pointers, get parameters, etc.

Sponsored by:	Spectra Logic
Differential Revision:	https://reviews.freebsd.org/D6147
Reviewed by:	wblock (documentation)
2016-05-19 14:08:36 +00:00
Pedro F. Giffuni e8d5712284 sys/geom: spelling fixes in comments.
No functional change.
2016-04-29 20:56:58 +00:00
Steven Hartland 86787e8d97 Fix early kernel dump via dumpdev env
Setting the dumpdev via env e.g. loader.conf provides the ability to
configure the kernel dump device during early boot. When using this
g_io_getattr was returning EPERM due to cp->acr == 0.

Fix this by calling g_access to ensure we're a read consumer prior
to calling g_dev_setdumpdev.

MFC after:	2 weeks
Sponsored by:	Multiplay
2015-11-17 20:55:50 +00:00
Alexander Motin 4a3760bae6 Remove compatibility shims for legacy ATA device names.
We got new ATA stack in FreeBSD 8.x, switched to it at 9.x, completely
removed old stack at 10.x, so at 11.x it is time to remove compat shims.
2015-10-11 13:01:51 +00:00
Conrad Meyer b4d7290796 geom_dev: Use kenv 'dumpdev' in the same way as rc/etc.d/dumpon
Skip a /dev/ prefix, if one is present, when checking for matching
device names for dump.

Suggested by:	avg
Reviewed by:	markj
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3725
2015-09-23 21:08:52 +00:00
Edward Tomasz Napierala 72800098bf Fix panic triggered by code like this:
open("/dev/md0", O_EXEC);

Discussed with:	kib@, mav@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3051
2015-08-04 10:40:08 +00:00
Edward Tomasz Napierala d6cc35b287 Fix panic that would happen on forcibly unmounting devfs (note that
as it is now, devfs ignores MNT_FORCE anyway, so it needs to be modified
to trigger the panic) with consumers still opened.

Note that this still results in a leak of r/w/e counters.  It seems
to be harmless, though.  If anyone knows a better way to approach
this - please tell.

Discussed with:	kib@, mav@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3050
2015-08-03 16:35:18 +00:00
Alexander Motin 0ada3afc25 Remove sleeps from geom_up thread on device destruction.
MFC after:	3 days.
2015-04-09 13:09:05 +00:00
Edward Tomasz Napierala 01de1a0650 Add devd(8) notifications for creation and destruction of GEOM devices.
Differential Revision:	https://reviews.freebsd.org/D1211
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-01-14 11:15:57 +00:00
Pawel Jakub Dawidek 5ebb15b942 Add missing privilege check when setting the dump device. Before that change it
was possible for a regular user to setup the dump device if he had write access
to the given device. In theory it is a security issue as user might get access
to kernel's memory after provoking kernel crash, but in practise it is not
recommended to give regular users direct access to storage devices.

Rework the code so that we do privileges check within the set_dumper() function
to avoid similar problems in the future.

Discussed with:	secteam
2014-11-11 04:48:09 +00:00