Commit graph

2709 commits

Author SHA1 Message Date
Warner Losh b2b381d365 cam: Add human readable statuses for some CAM_ status values.
CAM_NVME_STATUS and CAM_REQ_SOFTTIMEOUT were missing, though the latter
hasn't been used yet. The former is being used and showing up in dmesg
output as Unknown 0x420.

Fixes: f564de00f7
Fixes: 774ab87cf2
Sponsored by: Netflix
2023-11-08 15:38:16 -07:00
Warner Losh 2ffd30f7ee cam: Remove left-over sys/cdefs.h in sys/cam
These weren't removed when $FreeBSD$ was removed. They aren't needed and
now are a style(9) nonconformity.

Sponsored by:		Netflix
2023-11-06 12:20:23 -07:00
Warner Losh 500196c5c7 cam: Add nvme error devctl publishing
Start reporting nvme errors from devices, like we report ata and scsi
errors.

Sponsored by:		Netflix
Reviewed by:		mav, jhb
Differential Revision:	https://reviews.freebsd.org/D41086
2023-11-06 12:10:42 -07:00
Warner Losh fd9a4a67d0 cam: Minor opt_cam.h cleanup
sys/cam/cam.h includes opt_cam.h, so none of the clients need to do
this. cam.h does all the right dancing to conditionally include
opt_cam.h only when it makes sense. It generally only matters when
cam_debug.h is included (it must be included before that). Many of the
stray opt_cam.h includes were after cam_debug.h which would be a problem
were it not included in cam/cam.h. The other users of CAM options that
aren't debug all already include cam/cam.h.

Also trim unneeded sys/cdefs.h files from the files touched.

Sponsored by:		Netflix
2023-11-06 10:47:15 -07:00
Warner Losh 70f2356d36 cam: Make cam_debug macros atomic
The CAM_DEBUG* macros use multiple printfs to dump the data. This is
suboptimal when tracing things that produce even a moderate amount since
it gets intertwingled. I can't even turn on tracing with a 24-disk HBA
on boot without it getting messed up. Add helper routines to work around
clang's over-use of the stack: that way we only pay the stack penalty
when a trace hits.

Sponsored by:		Netflix
Reviewed by:		ken, mav
Differential Revision:	https://reviews.freebsd.org/D42411
2023-11-03 10:15:38 -06:00
Zhenlei Huang d24729b2fd cam/ata: Postpone removal of two compat sysctls until 15
Prefer UNMAPPEDIO and ROTATING from flags sysctl. See
 1. aeab0812e6 (Add flags sysctl to ada)
 2. cf3ff63e55 (Convert unmappedio over to a flag)
 3. 96eb32bf0f (Convert rotating to a flag bit)

Reviewed by:	imp, ken, #cam
MFC after:	immediately (we want this in 14.0)
Differential Revision:	https://reviews.freebsd.org/D42402
2023-11-02 13:14:40 +08:00
John Baldwin e846a3e016 ctl: Use ctl_io_sbuf in ctl_process_done
This reduces a second copy of (mostly) the same code.

Reviewed by:	ken, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D42210
2023-10-16 15:47:09 -07:00
John Baldwin fc8cf0a8de ctl: Make ctl_private.h more self-contained
Include <sys/sysctl.h> for sysctl context types.

Reviewed by:	ken, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D42209
2023-10-16 15:46:43 -07:00
John Baldwin 10b1a66934 ctl: Make ctl_ha.h more self-contained
Include <sys/queue.h> for queue macros

Reviewed by:	ken, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D42208
2023-10-16 15:46:11 -07:00
John Baldwin 4efebb3de3 ctl: Make ctl_io.h more self-contained
Include <cam/scsi/scsi_all.h> for struct scsi_sense_data.
Include <sys/queue.h> for queue macros.

Reviewed by:	ken, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D42207
2023-10-16 15:45:43 -07:00
John Baldwin 55231cd180 ctl: Make ctl.h more self-contained
Make MALLOC_DECLARE conditional on <sys/malloc.h> and forward declare
several types.

Reviewed by:	ken, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D42206
2023-10-16 15:45:15 -07:00
John Baldwin 2e539c6f5a cam: Make <cam/scsi/scsi_all.h> more self-contained
Include <sys/malloc.h> in the kernel for struct malloc_type.

Reviewed by:	ken, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D42205
2023-10-16 15:44:46 -07:00
Zhenlei Huang e2ad7ce37b cam/scsi: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'kern.cam.scsi_delay' is actually a loader tunable.
Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will report it
correctly.

No functional change intended.

Reviewed by:	kib, imp (for #cam)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D42113
2023-10-09 18:30:21 +08:00
Warner Losh c1944a82ae cam: Remove extra break
Sponsored by:		Netflix
2023-09-09 11:13:25 -06:00
Warner Losh d9fee1d021 cam/scsi_da: Bump deprecation one release.
These are still used in a quick poll that I've done, so we can't remove
them in 14. Reset the removal to FreeBSD 15.

Sponsored by:		Netflix
2023-08-23 22:34:41 -06:00
Warner Losh 031beb4e23 sys: Remove $FreeBSD$: one-line sh pattern
Remove /^\s*#[#!]?\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:58 -06:00
Warner Losh 685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh 71625ec9ad sys: Remove $FreeBSD$: one-line .c comment pattern
Remove /^/[*/]\s*\$FreeBSD\$.*\n/
2023-08-16 11:54:24 -06:00
Warner Losh 2ff63af9b8 sys: Remove $FreeBSD$: one-line .h pattern
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:18 -06:00
Warner Losh 95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Gordon Bergling 586eda6b24 cam(4): Fix a typo in a source code comment
- s/uppper/upper/

MFC after:	3 days
2023-08-02 11:14:53 +02:00
John Baldwin 83453b46e8 mmc_xpt: Update function name in debug trace
Reported by:	mav
Fixes:		7eb538974c cam mmc_xpt/nvme_xpt: Add _sbuf variants of {an,de}nounce xport and proto ops
2023-08-01 16:14:58 -07:00
John Baldwin b2c44f1fc1 cam: Remove non-sbuf announce/denounce proto and xport ops
Reviewed by:	mav, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D41264
2023-08-01 15:25:38 -07:00
John Baldwin 0a57cdd971 cam_xpt: Reimplement xpt_*nounce_periph in terms of the _sbuf versions
Use an sbuf that drains to printf to avoid duplicating code in the two
versions of each function.

Reviewed by:	mav
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D41263
2023-08-01 15:24:36 -07:00
John Baldwin b82711764c cam_xpt: Remove fallbacks for non-sbuf protocol methods
This includes removing the kern.cam.announce_nosbuf sysctl.

Reviewed by:	mav, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D41262
2023-08-01 15:24:10 -07:00
John Baldwin 7eb538974c cam mmc_xpt/nvme_xpt: Add _sbuf variants of {an,de}nounce xport and proto ops
Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D41261
2023-08-01 15:23:25 -07:00
John Baldwin 32734eb14c mmc_xpt: Remove dubious end of mmc_print_ident
The end of this function finishes the passed in sbuf, calls printf
manually on the contents, and then clears it.  The caller then tries
to print the resulting sbuf.  This works currently but will not work
for future callers that pass in an external sbuf to be appended to.

Reviewed by:	mav
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D41260
2023-08-01 15:20:53 -07:00
John Baldwin 76b2e3907c cam xpt_*nounce_periph*: Various fixes for periphs without a protocol
If the periph doesn't have a valid protocol, these routines emit
fallback messages.  However, the fallback messages duplicated the
periph name and unit number, and in the case of *denounce* included a
spurious newline.

Reviewed by:	mav, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D41177
2023-08-01 15:20:25 -07:00
John Baldwin 0be27bde5f cam/nvme: Remove spurious newline during periph detach announcement
Other protocol denounce routines use a "short" variant of announce
that does not include a trailing newline.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D41176
2023-08-01 15:19:50 -07:00
Warner Losh cf0a543f1f cam: Log more error codes
Log CAM_DEV_NOT_THERE status CCBs because they get dropped if a drive
disappears and these requests timeout or are cancelled. It's useful to
know the outstanding commands for failure analysis. Log
CAM_NVME_STATUS_ERROR status CCBs to bring in NVMe errors (this will be
more important in future commits that expand the information logged).

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D41168
2023-08-01 14:51:54 -06:00
Warner Losh e474a8e243 cam: Log errors from passthru commands
Since a30ecd42b8 we've logged almost all unexpected errors from
commands. However, some passthru commands were not logged via devctl. To
fix this, pass all requests through passerror (which calls
cam_periph_error), but flag those requests that didn't want error
recovery as SF_NO_RECOVERY, like we do for device probing. By doing this
we get identical behavior to the current code, but log these errors.

We have had hangs on drives that seems to show no error. Vendor analysis
of the drive found an illegal command that happen to hang the drive. In
verifying their analysis, we discovered that the pass through commands
from things like smartctl that encountered errors or timeouts weren't
logged.

Sponsored by:		Netflix
Reviewed by:		ken, mav
Differential Revision:	https://reviews.freebsd.org/D41167
2023-07-28 12:11:21 -06:00
Warner Losh 7872131605 cam: Fail 2/0 asc/ascq return code
This asc/ascq code 2/0 ("No seek complete") is a fatal error on modern
drives indicating a sensor failure. One of our vendors noticed we
retried 2/0 so many times in their failure analysis and asked why (no
other OS else does). They've indicated that this failures means the
track couldn't be located (something that's not going to change, except
if the environment changes significantly, which won't happen on a
timescale useful to retries).

Sponsored by:		Netflix
2023-07-27 22:28:01 -06:00
Warner Losh 73551d4f60 cam/mmc: Migrate to modern uintXX_t from u_intXX_t
As per https://lists.freebsd.org/archives/freebsd-scsi/2023-July/000257.html
move to the modern uintXX_t.

MFC After:	3 days
Sponsored by:	Netflix
2023-07-24 21:32:57 -06:00
Warner Losh 7f85b11c57 cam/nvme: Migrate to modern uintXX_t from u_intXX_t
As per https://lists.freebsd.org/archives/freebsd-scsi/2023-July/000257.html
move to the modern uintXX_t.

MFC After:	3 days
Sponsored by:	Netflix
2023-07-24 21:32:56 -06:00
Warner Losh a74530d9e2 cam/ctl: Migrate to modern uintXX_t from u_intXX_t
As per https://lists.freebsd.org/archives/freebsd-scsi/2023-July/000257.html
move to the modern uintXX_t.

MFC After:	3 days
Sponsored by:	Netflix
2023-07-24 21:32:56 -06:00
Warner Losh 7c5d20a6c2 cam/scsi: Migrate to modern uintXX_t from u_intXX_t
As per https://lists.freebsd.org/archives/freebsd-scsi/2023-July/000257.html
move to the modern uintXX_t.

MFC After:	3 days
Sponsored by:	Netflix
2023-07-24 21:32:56 -06:00
Warner Losh 9db2db6bf6 cam/ata: Migrate to modern uintXX_t from u_intXX_t
As per https://lists.freebsd.org/archives/freebsd-scsi/2023-July/000257.html
move to the modern uintXX_t.

MFC After:	3 days
Sponsored by:	Netflix
2023-07-24 21:32:56 -06:00
Warner Losh 7af2f2c801 cam: Migrate to modern uintXX_t from u_intXX_t
As per https://lists.freebsd.org/archives/freebsd-scsi/2023-July/000257.html
move to the modern uintXX_t.

MFC After:	3 days
Sponsored by:	Netflix
2023-07-24 21:32:56 -06:00
Warner Losh ff4633d9f8 cam_periph: Comment about why we need to reset cbfcnp
Just spent a few minutes puzzling out why we do this. Add a comment to
remind my future self (and other intersted folk) why we do the reset
here when we'd set it a few lines above.

Sponsored by:		Netflix
2023-07-21 10:11:37 -06:00
Warner Losh b4993704d6 cam_periph: Fix a comment
Add a couple of words so that this sentence makes sense.

Sponsored by:		Netflix
2023-07-21 10:07:13 -06:00
Warner Losh 774ab87cf2 cam: Add CAM_NVME_STATUS_ERROR error code
Add CAM_NVME_STATUS_ERROR error code. Flag all NVME commands that
completed with an error status as CAM_NVME_STATUS_ERROR (a new value)
instaead of CAM_REQ_CMP_ERR. This indicates to the upper layers of CAM
that the 'cpl' field for nvmeio CCBs is valid and can be examined for
error recovery, if desired.

No functional change. nda will still see these as errors, call
ndaerror() to get the error recovery action, etc. cam_periph_error will
select the same case as before (even w/o the change, though the change
makes it explicit).

Sponsored by:		Netflix
Reviewed by:		chuck, mav, jhb
Differential Revision:	https://reviews.freebsd.org/D41085
2023-07-20 22:32:31 -06:00
Warner Losh 0732617ec1 cam/nda: Remove impossible CAM codes
The NVME SIM does not generate these status values, so remove them.

Sponsored by:		Netflix
Reviewed by:		jhb
Differential Revision:	https://reviews.freebsd.org/D41084
2023-07-20 22:32:31 -06:00
Warner Losh 33734ddf2b cam: Be explict about CAM_SMP_STATUS_ERROR
This is normally caught by default:, but no harm in making it explicit
that we'll retry valid periphs.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D41083
2023-07-20 22:32:31 -06:00
Warner Losh 367699ca7a cam/scsi: Better action for ASC/ASCQ 0x18/0x08
0x18/0x8 is another code to indicate that the data was recovered
successfully, so complete the command w/o an error rather than retry the
operation.

Sponsored by:		Netflix
Reviewed by:		mav, jhb
Differential Revision:	https://reviews.freebsd.org/D41082
2023-07-20 22:32:30 -06:00
Warner Losh 38e831a895 cam: Add comment about recovery ccbs
SS_START and higher actions (currently only SS_TUR) allocate a recovery
CCB to send a command to the periph. Add a quick comment about that here.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D41081
2023-07-20 22:32:30 -06:00
John Baldwin c5312bd79e cam: Move bus_dmamap_load_ccb into cam.c.
This routine is specific to CAM and no longer assumes any internal
bus_dma knowledge as it is simple wrapper around bus_dmamap_load_mem.

Fixes:		60381fd1ee memdesc: Retire MEMDESC_CCB.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D41058
2023-07-18 18:19:27 -07:00
John Baldwin 60381fd1ee memdesc: Retire MEMDESC_CCB.
Instead, change memdesc_ccb to examine the CCB and return a memdesc of
a more generic type describing the data buffer.

Reviewed by:	imp, markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D40880
2023-07-14 11:32:16 -07:00
John Baldwin af296130ea nvme_xpt: Tidy nvme_announce_periph for fabrics support.
- Read the version from cts.protocol_version.

- Only check xport_specific.nvme for PCI-e info for XPORT_NVME.

Reviewed by:	chuck, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D40618
2023-06-26 20:37:43 -07:00
John Baldwin e932f0d2a3 cam_xpt: Properly fail if a sim uses an unsupported transport.
The default xport ops for a new bus is xport_default, not NULL, so
check for that when determining if a bus failed to find a suitable
transport.  In addition, the path needs to be freed with xpt_free_path
instead of a plain free so that the path's reference on the sim is
dropped; otherwise, cam_sim_free in the caller after xpt_bus_register
returns failure will hang forever.

Note that we have to exempt the xpt bus from this check as it uses
xport_default on purpose.

Reviewed by:	mav, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D40617
2023-06-26 20:36:06 -07:00
Warner Losh e6f37dce96 scsi_all.c: Update to latest asc-num.txt at T10
This updates our table to Sat Mar 25 2023 at 04:30 of the T10
asc-num.txt. I added all the codes that weren't present in the tree,
corrected a couple of the 'alphabet' comments about where the ASC/ASCQ
was defined. I did not, however, make the transition that the
asc-num.txt file made (it deleted W between P and R and added Z after D
so the first few letters shifted a bit). I've not removed the 'W' nor
added the 'Z' at this time. I'm looking for some way to do this
automatically. Try to pick reasonable responses for new entries. When in
doubt, I selected SS_RDEF.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D40718
2023-06-22 20:51:30 -06:00
Warner Losh 92c8803ce3 scsi_all.c: Minor formatting nits
Noticed the whitespace nits when updating for other reasons.

Sponsored by:		Netflix
2023-06-22 20:51:25 -06:00
Warner Losh a960d3c07b cam: Remove duplicate definition for READ_DEFECT_DATA_10
This isn't needed by all devices and is only used by the da device (in
camcontrol). All the other da specific da scsi opcodes are only in
scsi_da.h.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D40527
2023-06-19 14:45:43 -06:00
Warner Losh 4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Warner Losh 214909d669 Revert "cam: fix up world compilation after previous"
This reverts commit 1d35493e46. It was the wrong fix. 757fc6666b has
the proper fix to include stdbool for userland.

Sponsored by:		Netflix
2023-04-15 18:25:55 -06:00
Warner Losh 757fc6666b cam: Include stdbool.h for userland
Sponsored by:		Netflix
2023-04-15 18:25:22 -06:00
Mateusz Guzik 1d35493e46 cam: fix up world compilation after previous
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-15 23:11:27 +00:00
Warner Losh fd02926a68 cam: Properly mask out the status bits to get completion code
ccb_h.status has two parts: the actual status and some addition bits to
indicate additional information. It must be masked before comparing
against completion codes. Add new inline function cam_ccb_success to
simplify this to test whether or not the request succeeded. Most of the
code already does this, but a few places don't (the rest likely should
be converted to use cam_ccb_status and/or cam_ccb_success, but that's
for another day). This caused at least one bug in recognizing devices
behind a SATA port multiplexer, though some of these checks were
fine with the special knowledge of the code paths involved.

PR:			270459
Sponsored by:		Netflix
MFC After:		1 week (and maybe a EN requst)
Reviewed by:		ken, mav
Differential Revision:	https://reviews.freebsd.org/D39572
2023-04-15 16:32:41 -06:00
Zhenlei Huang 69cb72b872 cam iosched: Use the existing CTLFLAG_RDTUN and CTLFLAG_RWTUN flag definitions
Use them when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Yuri Pankov 6aa5b10d0c nvme: fix resv commands with nda device
- passing I/O commands through nda requires nsid field to be set (it was
  unused when going through nvme_ns_ioctl())
- ccb's status can be OR'ed with the flags, use CAM_STATUS_MASK

Reviewed by:	imp (cam)
Differential Revision:	https://reviews.freebsd.org/D37696
2023-03-27 14:53:24 +02:00
Alexander Motin 7467a69536 CTL: Allow userland supply tags via ioctl frontend.
Before this ioctl frontend always replaced tags with sequential ones.
It was done for ctladm, that can not keep track of global tag list.
But in case of virtio-scsi in bhyve we can pass provided tags as-is.
It should be on virtio-scsi initiator to provide us valid tags.  It
should allow proper task management, error reporting, etc.  In case
of several virtio-scsi devices, they should use different CTL ports
or initiator IDs to avoid conflicts, but this is expected by design.

PR:	267539
2022-12-03 12:05:05 -05:00
Alexander Motin 0acc026dda CTL: Increase maximum SCSI tag size from 32 to 64 bits.
SAM-5 specification states maximum size of command identifier (tag),
defined by specific transports, should not be larger than 64 bits.
While most of supported transports use 32 bits or less, it was
reported that virtio-scsi uses 64 bits.  Truncation to 32 bits in
bhyve code caused false tag conflict errors reported and possibly
other issues.

This changes CTL ABI and HA protocol, so CTL_HA_VERSION is bumped.

While we make HA protocol incompatible, increase default maximum
number of ports in CTL from 256 to 1024, matching number of LUNs.
There are many reports from people who need many iSCSI targets with
only one LUN each.  Increased memory consumption should be less of
a problem these days.

PR:	267539
2022-12-03 10:23:29 -05:00
Warner Losh 891c69864e cam: Use FreeBSD standard copyright
For CAM, move to the FreeBSD standard copyright rather than the 'put it
at the front' variation. This variaiton has been flagged as potentially
problematic in other contexts. Since this variation wasn't a conscious
decision on our part, use the standard license from src/COPYRIGHT.
Also, remove the -FreeBSD suffix in SPDX-License-Identifier. It's
obsolete at SPDX and even the original text didn't match it.

MFC After:		3 days
Sponsored by:		Netflix
2022-10-07 23:37:46 -06:00
Mark Johnston 0cd631ee06 cam: Provide compatibility for CAMGETPASSTHRU for periph drivers
The CAM version bump 0x19 -> 0x1a changed the CAMGETPASSTHRU definition,
so applications using the old ioctl are broken.  However, that version
change did not affect anything relating to the ioctl implementation for
periphs.

Fixes:		8f9be1eed1 ("cam(4): Improve XPT_DEV_MATCH")
PR:		264709
Tested by:	andreas.mahling@googlemail.com
Reviewed by:	imp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D36389
2022-09-29 13:14:57 -04:00
Alexander Motin 0586be48a9 CTL: Validate IOCTL parameters.
It was possible to cause kernel panic by passing too large args_len
or non-NULL result_nvl.

Though since the /dev/cam/ctl device is accessible only by root and
used only by limited number of tools it was not a big problem.

PR:	266115
PR:	266136
Reported by:	Robert Morris <rtm@lcs.mit.edu>
MFC after:	1 week
2022-09-06 21:58:27 -04:00
Alexander Motin 90bcc81bc3 Delay GEOM disk_create() until CAM periph probe completes.
Before this patch CAM periph drivers called both disk_alloc() and
disk_create() same time on periph creation.  But then prevented disks
from opening until the periph probe completion with cam_periph_hold().
As result, especially if disk misbehaves during the probe, GEOM event
thread, triggered to taste the disk, got blocked on open attempt,
potentially for a long time, unable to process other events.

This patch moves disk_create() call from periph creation to the end of
the probe. To allow disk_create() calls from non-sleepable CAM contexts
some of its duties requiring memory allocations are moved either back
to disk_alloc() or forward to g_disk_create(), so now disk_alloc() and
disk_add_alias() are the only disk methods that require sleeping.  If
disk fails during the probe disk_create() may just be skipped, going
directly to disk_destroy().  Other method calls during that time are
just ignored.  Since GEOM may now see the disks after CAM bus scan is
already completed, introduce per-periph boot hold functions. Enclosure
driver already had such mechanism, so just generalize it.

Reviewed by:	imp
MFC after:	1 month
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D35784
2022-07-14 16:17:36 -04:00
Alan Somers 5f438dd3ac ses: don't panic if disk elements have really weird descriptors
SES allows element descriptors to contain characters like spaces and
quotes that devfs does not allow to appear in device aliases.  Since SES
element descriptors are outside of the kernel's control, we should
gracefully handle a failure to create a device physical path alias.

PR:		264513
Reported by:	Yuri <yuri@aetern.org>
Reviewed by:	imp, mav
Sponsored by:	Axcient
MFC after:	2 weeks
2022-06-23 11:19:20 -06:00
Alexander Motin 3b0e3e8d2a CTL: Fix double command completions on HA failover.
I've found couple cases when CTL_FLAG_SENT_2OTHER_SC flags were not
cleared on commands return from active node or the send failure.  It
created races when ctl_failover_lun() call before ctl_process_done()
could cause second ctl_done() and ctl_process_done() calls, causing
all sorts of problems.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2022-06-14 00:24:39 -04:00
Dmitry Chagin 31d1b816fe sysent: Get rid of bogus sys/sysent.h include.
Where appropriate hide sysent.h under proper condition.

MFC after:	2 weeks
2022-05-28 20:52:17 +03:00
Mitchell Horne 489ba22236 kerneldump: remove physical argument from d_dumper
The physical address argument is essentially ignored by every dumper
method. In addition, the dump routines don't actually pass a real
address; every call to dump_append() passes a value of zero for
physical.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D35173
2022-05-13 10:42:48 -03:00
Alexander Motin 356155fe02 Fix bd82711aff serial number trailing spaces removal.
For devices returning 16 byte serial numbers with 8 leading spaces
this falsely removed everything due to looking at wrong offset.
2022-05-09 10:30:04 -04:00
Warner Losh a85fea31c5 iosched: remove stray debug
This printf was designed to catch misqueued bio requests. Prior to
supporting read_bias == 0, we couldn't get anything but reads and writes
in this queue. However, for read_bias == 0 we queue everything except
BIO_DELETE to this queue, so remove the printf. We don't need to update
any statistics.

Sponsored by:		Netflix
2022-05-04 20:28:00 -06:00
Warner Losh 1907e1c07c ada: Move comment
Move the comment about releasing ccb before periph to adaprobedone()
where it belongs.

Sponsored by:		Netflix
2022-05-04 16:54:38 -06:00
Warner Losh 6f78dae849 cam: Remove redunant static __inline forward decls
Sponsored by:		Netflix
2022-05-02 09:30:07 -06:00
Warner Losh 1599fc904d iosched: Move bio_next() inside of the CAM_IOSCHED_DYNAMIC ifdef
bio_next() is only used by the dynamic scheduler, so move it under that
ifdef.

Sponsored by:		Netflix
2022-05-01 16:54:15 -06:00
Warner Losh d095d6a34c cam_xpt: Prefer bool to int where it's a boolean
In the places where we set an integer to 0 or 1 and then use it like a
boolean, replace int with bool and 0/1 with false/true. Left alone
places where this is a function argument or return value. No functional
changes intended.

Sponsored by:		Netflix
2022-05-01 12:09:42 -06:00
Warner Losh d592c0db8b cam: add hw.cam.iosched.read_bias
Allow a global setting for the read_bias for the dynamic io
scheduler. This allows global policy to be set, in addition to the
existing per-drive policy. kern.cam.iosched.read_bias is a new tunable.

Sponsored by:		Netflix
Reviewed by:		chs
Differential Revision:	https://reviews.freebsd.org/D34365
2022-05-01 11:27:34 -06:00
Warner Losh b65803ba57 cam iosched: default to no read bias in dynamic ioscheduling
When we're doing dynamic I/O scheduling, don't default to a read bias of
100. Default it to 0 so turning on dynamic scheduling only does
scheduling tweaks that are requested. The other limiters are off by
default, and need no further adjustment.

Sponsored by:		Netflix
2022-05-01 11:27:34 -06:00
Warner Losh cc1572ddeb cam iosched: Remove write bias when read bias = 0
Change the meaning of read bias == 0 in the dynamic I/O scheduler. Prior
to this change, a read bias of 0 would mean prefer writes.  Now, when
read bias is 0, we queue all requests to the same queue removing the
bias. When it's non-zero, we still separate the queues we use so we can
bias reads vs writes for workloads that are read centric. These changes
restore the typical bias you get from disksort or ordered insertion at
the end of the list.

Sponsored by:		Netflix
2022-05-01 11:27:34 -06:00
Warner Losh 6c8ab086fe ada: Retry commands with retries left on CAM_SEL_TIMEOUT
The AHCI and ATA SIMs will return CAM_SEL_TIMEOUT when an underlying
device has stopped responding. This is usually seen after a timeouted
out command and can be a transient event. Rather than fail the
peripheral immediately after seeing this, queue a retry. For transient
events, this allows drives to continue to provide data, though with some
added latency, just like we do when we have some other kind of retriable
error. If the error isn't transient (the drive is truly gone), then
we'll discover that eventually and fail the transaction and invalidate
the drive like we do today.

This helps us avoid a panic at the end of camperiphfree when
CAM_PERIPH_NEW_DEV_FOUND is set. However, the deferred callback should
be queued to xpt_async_td instead of being made inline there. This issue
will be solved in a different patch that does that. PR 263703.

This also helps us avoid another bug where we can drop all references to
the device (causing us to go through camperiphfree and destroy the path)
while we have an I/O pending in the ata_da state machine (usually in
state ADA_STATE_RAHEAD with ATA_SETFEATURES ATA_SF_ENAB_RCACHE
command). It's not clear why the reference that we take out to do the
reprobe isn't effective at blocking this. By retrying this condition,
though we avoid this bug (at least more often, I don't have a good
reproduction test case, I just see this panic a few times a month at
work on systems that have transient disk errors on ahci connected SATA
SSDs). PR 263704. It's too soon to know how much this helps us avoid
this bug.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D34977
2022-05-01 11:08:56 -06:00
Warner Losh 9fb40baf60 cam_periph: Return ENXIO when peripheral is invalidated
When the peripheral is invalidated, no further I/O is possible. Signal
this up the stack with ENXIO now that upper layers of the stack
differentiate sometimes. In order for there to be further I/O, and new
open is required for any block device that a future periph might
instantiate for devices at this location that might return or otherwise
become available. The I/O scheduler flushes its I/O with the ENXIO error
for pending I/O that didn't make it to the device, so this makes the two
paths match.

MFC After:		3 days
Sponsored by:		Netflix
Reviewed by:		chs, mav
Differential Revision:	https://reviews.freebsd.org/D35093
2022-04-28 16:30:00 -06:00
Alexander Motin 404f001161 CAM: Keep periph_links when restoring CCB in camperiphdone().
While recovery command executed, some other commands from the periph
may complete, that may affect periph_links of this CCB.  So restoring
original CCB we must keep current periph_links as more up to date.

I've found this triggering assertions with debug kernel and suspect
some memory corruptions otherwise when spun down disk receives two
or sometimes more concurrent requests.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2022-04-27 21:39:50 -04:00
Warner Losh e4b1ae2147 ndaasync: sync to SCSI's daasyncs cam_periph_async() calls
Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D35059
2022-04-26 11:01:39 -06:00
Warner Losh ae1955cd67 adaasync: Harmonize with daasync
We should call cam_periph_async() always, like SCSI does. This routine
is supposed to be more of a catch-all.

cam_periph_async() only does actions for AC_LOST_DEVICE. It ignores all
other events (today), but this may not always be true. So this is a nop
change.

Drop in a 'break' so we don't fall through unnecessarily.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D35057
2022-04-26 11:01:39 -06:00
Warner Losh ccaec73d0b ada: Eliminate dead code
We never use the cgd that we get from the XPT_GDEV_TYPE call. Prior to
9a6844d55f we used it to determine if READ AHEAD or WRITE CACHING was
supported. However, all that information was moved into adasetflags so
we no longer need to this since it's cached in the softc and updated
with the IDENTIFY data changes automatically.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D35039
2022-04-25 12:55:04 -06:00
Warner Losh 9d899bbcb7 cam: Small reorg of ata xpt async code
Use a switch rather than a nested if to simplify the async event
processing code. No functional changes intended.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D35038
2022-04-25 12:55:04 -06:00
Warner Losh b43cfe7171 ada/da: Borrow comment from nda about cleanup
Remove a XXX comment and replace it with a more accurate comment about
what happens to I/O queued to the hardware.

Sponsored by:		Netflix
2022-04-24 15:11:56 -06:00
Warner Losh 48ae3f4f64 ata/nvme: Add comment
Steal the comment from daonninvalidate about the call to disk_gone().

Sponsored by:		Netflix
2022-04-24 15:11:52 -06:00
Warner Losh c08ceddbf7 nda: Fix comment
Fix a comment that was left over from the orignial
implementation. Explain how pending transactions in hardware are
completed/aborted in the SIM prior to ndacleanup being called.

Sponsored by:		Netflix
2022-04-23 20:31:46 -06:00
Alexander Motin 38f8addaab CAM: Replicate e0ceec676d from da to ada and nda.
MFC after:	1 week
2022-04-23 20:15:17 -04:00
John Baldwin 7b02c1e8c6 iscsi: Fetch limits based on a socket rather than assuming global limits.
cxgbei needs the ability to return different limits based on the
connection (e.g. if the connection is over a T5 adapter or a T6
adapter as well as factoring in the MTU).

This change plumbs through the changes in the ioctls without changing
any of the backends.  The limits callback passed to icl_register now
accepts a second socket argument which holds the integer file
descriptor.  To support ABI compatiblity for old binaries, the
callback should return "global" values if the socket fd is zero.

The CTL_ISCSI_LIMITS argument used with CTL_ISCSI by ctld(8) now
accepts the socket fd in a field that was previously part of a
reserved spare field.  Old binaries zero this request which results in
passing a socket fd of 0 to the limits callback.

The ISCSIDREQUEST ioctl no longer returns limits.  Instead, iscsid(8)
invokes a new ISCSIDLIMITS ioctl after establishing the connection via
connect(2).  For ABI compat, if the old ISCSIDREQUEST is invoked, the
global limits are still fetched (with a socket fd of 0) and returned.

Reviewed by:	mav
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D34928
2022-04-18 12:53:28 -07:00
Gordon Bergling 456b2bb97e cam(4): Remove a double word in a source code comment
- s/this this/this/

MFC after:	3 days
2022-04-09 10:13:59 +02:00
Gordon Bergling 49dace1d46 cam: Fix typos in source code comments
- s/paniced/panicked/

MFC after:	3 days
2022-04-02 10:13:35 +02:00
Mateusz Guzik bb92cd7bcd vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd) 2022-03-24 10:20:51 +00:00
Alexander Motin f00ced06da CTL: Rework 05c3e8e871 using %zu format.
MFC after:	2 days
2022-02-25 11:53:53 -05:00
Alexander Motin 05c3e8e871 Fix 32-bit build after 530d274c15.
MFC after:	3 days
2022-02-24 18:11:36 -05:00
Alexander Motin 530d274c15 CTL: Add length validation for incoming HA messages.
This should fix uninitialized memory reads when working with broken
HA peer, like one fixed in 1a8d8a3a90.  Instead print error message
and kill the HA link.

MFC after:	3 days
Sponsored by:	iXsystems, Inc.
2022-02-24 16:24:43 -05:00
Warner Losh e3d92d4cb8 cam iosched: Update comment for when we schedule writes.
Sponsored by:		Netflix
2022-02-23 16:21:27 -07:00
John Baldwin bd6e8729d6 ctl ramdisk: Free compare buffer after a compare I/O request.
For a compare request, the ramdisk backend allocates a temporary
buffer to hold the I/O data and then compares it against the LUN's
pages in ctl_backend_ramdisk_cmp after the data has been filled.
However, the tempory buffer was leaked when after the comparison was
complete.  Fix this by freeing the buffer after the comparison.

Reviewed by:	mav
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D34316
2022-02-18 15:20:14 -08:00
Kenneth D. Merry 3090d5045a Fix non-printable characters in NVMe model and serial numbers.
The NVMe 1.4 spec simply says that Model and Serial numbers are
ASCII strings.  Unlike SCSI, it doesn't prohibit non-printable
characters or say that the strings should be padded with spaces.

Since 2014, we have had cam_strvis_sbuf(), which gives additional
options for handling non-ASCII characters.  That behavior hasn't
been available for non-sbuf consumers, so users of cam_strvis()
were left with having octal ASCII codes inserted.

So, to avoid having garbage or octal chracters in the strings, use
cam_strvis_sbuf() to create a new function, cam_strvis_flag(), and
re-implement cam_strvis() using cam_strvis_flag().

Now, for the NVMe drives, we can use cam_strvis_flag with the
CAM_STRVIS_FLAG_NONASCII_SPC flag.  This transforms non-printable
characters into spaces.

sys/cam/cam.c:
	Add a new function, cam_strvis_flag(), that creates an sbuf
	on the stack with the user's destination buffer, and calls
	cam_strvis_sbuf() with the given flag argument.

	Re-implement cam_strvis() to call cam_strvis_flag with the
	CAM_STRVIS_FLAG_NONASCII_ESC argument.  This should be the
	equivalent of the old cam_strvis() function, except for the
	overhead of creating the sbuf and calling sbuf_putc/printf.

sys/cam/cam.h:
	Declaration for cam_strvis_flag.

sys/cam/nvme/nvme_all.c:
	In nvme_print_ident, use the NONASCII_SPC flag with
	cam_strvis_flag().

sys/cam/nvme/nvme_da.c:
	In ndaregister(), use cam_strvis_flag() with the
	NONASCII_SPC flag for the disk description and serial
	number we report to GEOM.

sys/cam/nvme/nvme_xpt.c:
	In nvme_probe_done(), use cam_strvis_flag with the
	NONASCII_SPC flag when storing the drive serial number
	in the CAM EDT.

MFC after:	1 week
Sponsored by:	Spectra Logic
Differential Revision:	https://reviews.freebsd.org/D33973
2022-02-09 17:09:25 -05:00
John Baldwin a3d71fffa7 cfiscsi_done: Free the dummy PDU earlier.
The dummy PDU needs to be freed before marking task abortion complete
as otherwise cfiscsi_session_terminate_tasks can return and destroy
the session in another thread before the PDU is freed.

Fixes:		2e8d1a5525 iscsi: Allocate a dummy PDU for the internal nexus reset task.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D34176
2022-02-07 12:55:08 -08:00