Commit graph

205 commits

Author SHA1 Message Date
Mark Johnston d068ea16e3 cam: Let cam_periph_unmapmem() return an error
As of commit b059686a71, cam_periph_unmapmem() can legitimately fail
if the copyout() operation fails.  However, this failure was never
signaled to upper layers.  In practice it is unlikely to occur
since cap_periph_mapmem() would most likely fail in such
circumstances anyway, but an error is nonetheless possible.

However, some code reading revealed a few paths where the return value
of cam_periph_mapmem() is not checked, and this is definitely a bug.
Add error checking there and let cam_periph_unmapmem() return errors
from copyout().

Reviewed by:	dab, mav
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D43201
2023-12-28 12:36:15 -05:00
Alexander Motin 519b24f029 CAM: Replace random sbuf_printf() with cheaper cat/putc. 2023-11-22 18:04:05 -05:00
Warner Losh 2ffd30f7ee cam: Remove left-over sys/cdefs.h in sys/cam
These weren't removed when $FreeBSD$ was removed. They aren't needed and
now are a style(9) nonconformity.

Sponsored by:		Netflix
2023-11-06 12:20:23 -07:00
Warner Losh 500196c5c7 cam: Add nvme error devctl publishing
Start reporting nvme errors from devices, like we report ata and scsi
errors.

Sponsored by:		Netflix
Reviewed by:		mav, jhb
Differential Revision:	https://reviews.freebsd.org/D41086
2023-11-06 12:10:42 -07:00
Warner Losh 685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh cf0a543f1f cam: Log more error codes
Log CAM_DEV_NOT_THERE status CCBs because they get dropped if a drive
disappears and these requests timeout or are cancelled. It's useful to
know the outstanding commands for failure analysis. Log
CAM_NVME_STATUS_ERROR status CCBs to bring in NVMe errors (this will be
more important in future commits that expand the information logged).

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D41168
2023-08-01 14:51:54 -06:00
Warner Losh 7af2f2c801 cam: Migrate to modern uintXX_t from u_intXX_t
As per https://lists.freebsd.org/archives/freebsd-scsi/2023-July/000257.html
move to the modern uintXX_t.

MFC After:	3 days
Sponsored by:	Netflix
2023-07-24 21:32:56 -06:00
Warner Losh ff4633d9f8 cam_periph: Comment about why we need to reset cbfcnp
Just spent a few minutes puzzling out why we do this. Add a comment to
remind my future self (and other intersted folk) why we do the reset
here when we'd set it a few lines above.

Sponsored by:		Netflix
2023-07-21 10:11:37 -06:00
Warner Losh b4993704d6 cam_periph: Fix a comment
Add a couple of words so that this sentence makes sense.

Sponsored by:		Netflix
2023-07-21 10:07:13 -06:00
Warner Losh 774ab87cf2 cam: Add CAM_NVME_STATUS_ERROR error code
Add CAM_NVME_STATUS_ERROR error code. Flag all NVME commands that
completed with an error status as CAM_NVME_STATUS_ERROR (a new value)
instaead of CAM_REQ_CMP_ERR. This indicates to the upper layers of CAM
that the 'cpl' field for nvmeio CCBs is valid and can be examined for
error recovery, if desired.

No functional change. nda will still see these as errors, call
ndaerror() to get the error recovery action, etc. cam_periph_error will
select the same case as before (even w/o the change, though the change
makes it explicit).

Sponsored by:		Netflix
Reviewed by:		chuck, mav, jhb
Differential Revision:	https://reviews.freebsd.org/D41085
2023-07-20 22:32:31 -06:00
Warner Losh 33734ddf2b cam: Be explict about CAM_SMP_STATUS_ERROR
This is normally caught by default:, but no harm in making it explicit
that we'll retry valid periphs.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D41083
2023-07-20 22:32:31 -06:00
Warner Losh 4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Mark Johnston 0cd631ee06 cam: Provide compatibility for CAMGETPASSTHRU for periph drivers
The CAM version bump 0x19 -> 0x1a changed the CAMGETPASSTHRU definition,
so applications using the old ioctl are broken.  However, that version
change did not affect anything relating to the ioctl implementation for
periphs.

Fixes:		8f9be1eed1 ("cam(4): Improve XPT_DEV_MATCH")
PR:		264709
Tested by:	andreas.mahling@googlemail.com
Reviewed by:	imp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D36389
2022-09-29 13:14:57 -04:00
Alexander Motin 90bcc81bc3 Delay GEOM disk_create() until CAM periph probe completes.
Before this patch CAM periph drivers called both disk_alloc() and
disk_create() same time on periph creation.  But then prevented disks
from opening until the periph probe completion with cam_periph_hold().
As result, especially if disk misbehaves during the probe, GEOM event
thread, triggered to taste the disk, got blocked on open attempt,
potentially for a long time, unable to process other events.

This patch moves disk_create() call from periph creation to the end of
the probe. To allow disk_create() calls from non-sleepable CAM contexts
some of its duties requiring memory allocations are moved either back
to disk_alloc() or forward to g_disk_create(), so now disk_alloc() and
disk_add_alias() are the only disk methods that require sleeping.  If
disk fails during the probe disk_create() may just be skipped, going
directly to disk_destroy().  Other method calls during that time are
just ignored.  Since GEOM may now see the disks after CAM bus scan is
already completed, introduce per-periph boot hold functions. Enclosure
driver already had such mechanism, so just generalize it.

Reviewed by:	imp
MFC after:	1 month
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D35784
2022-07-14 16:17:36 -04:00
Warner Losh 9fb40baf60 cam_periph: Return ENXIO when peripheral is invalidated
When the peripheral is invalidated, no further I/O is possible. Signal
this up the stack with ENXIO now that upper layers of the stack
differentiate sometimes. In order for there to be further I/O, and new
open is required for any block device that a future periph might
instantiate for devices at this location that might return or otherwise
become available. The I/O scheduler flushes its I/O with the ENXIO error
for pending I/O that didn't make it to the device, so this makes the two
paths match.

MFC After:		3 days
Sponsored by:		Netflix
Reviewed by:		chs, mav
Differential Revision:	https://reviews.freebsd.org/D35093
2022-04-28 16:30:00 -06:00
Alexander Motin 404f001161 CAM: Keep periph_links when restoring CCB in camperiphdone().
While recovery command executed, some other commands from the periph
may complete, that may affect periph_links of this CCB.  So restoring
original CCB we must keep current periph_links as more up to date.

I've found this triggering assertions with debug kernel and suspect
some memory corruptions otherwise when spun down disk receives two
or sometimes more concurrent requests.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2022-04-27 21:39:50 -04:00
Warner Losh 272e4f5384 cam: Fix wiring fence post error
If the last matching device entry partially matched in camperiphunit,
but then hit a continue case, we'd mistakenly think we had a match on
that entry. This lead to a number of problems downstream (usually a
belief that we had a duplicate wiring hint because unit = 0 is the
default). Fix this by using a for loop that does the assignment before
the loop termination test.

Sponsored by:		Netflix
Reviewed by:		jhb
Differential Revision:	https://reviews.freebsd.org/D33873
2022-01-13 15:22:56 -07:00
Warner Losh 3846662dab cam: Initialize wired to false
As part of converting the code to a while loop, the unconditional
initialization of wired to false was lost.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D33163
2021-11-30 14:40:30 -07:00
Warner Losh d836c48e71 cam_periph: wired is really a bool, update it to a bool.
Sponsored by:		Netflix
Reviewed by:		scottl
Differential Revision:	https://reviews.freebsd.org/D32823
2021-11-05 08:56:48 -06:00
Warner Losh 577f9aa266 cam_periph: Add ability to wire units to a serial number
For scsi, ata and nvme, at least, we read a serial number from the
device (if the device supports it, some scsi drives do not) and record
it during the *_xpt probe device state machine before it posts the
AC_FOUND_DEVICE async event. For mmc, no serial number is ever
retrieved, so it's always NULL. Add the ability to match this serial
number during device wiring.

This mechanism is competely optional, and often times using a label
and/or some other attribute of the device is easier. However, other
times wiring a unit to a serial number simplifies management as most
monitoring tools require the *daX device and having it stable from boot
to boot helps with data continuity. It can be especially helpful for
nvme where no other means exists to reliably tie a ndaX device to an
underlying nvme drive and namespace.

A similar mechanism exists in Linux to mange device unit numbers with
udev.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D32683
2021-11-05 08:56:33 -06:00
Warner Losh 710a519ebb cam_periph: fix bug in camperiphunitnext logic
If we assigned just a lun as a wired unit (something that camperiphunit
will accept), we failed to properly skip over that unit when computing a
next unit number. Add lun so the code matches the comments that we have
to skip all the same criteria that camperiphunit uses to select wired
units for a driver.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D32682
2021-11-05 08:56:27 -06:00
Warner Losh bee0133fb9 cam_periph: switch from negative logic to positive logic
When scanning the resources that are wired for this driver, skip any
that whose number doesn't match newunit. They aren't relevant. Switch to
positive logic to break out of the loop (and thus go to the next unit)
if we find either a target resource or an at resource. This makes the
code easier to read and modify.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D32681
2021-11-05 08:56:22 -06:00
Warner Losh 00f79c97a4 cam_periph: Remove vestigial "scbus" comparison
The code in camperiphunit rejects "scbus" as an 'at' location that would
allow any other wiring to use that unit number. Yet in
camperiphunitnext, if we have a no target and the 'at' location of
'scbus' it would be excluded on the basis that it's a wiring
cadidate. This is improper and appears to be a hold-over of the
pre-hints / pre-newbus config system, so remove it.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D32680
2021-11-05 08:56:13 -06:00
Warner Losh dbfe5dd3f9 cam_periph: style change
wrap a long line at 80 columns

Sponsored by:		Netflix
Reviewed by:		chs
Differential Revision:	https://reviews.freebsd.org/D32679
2021-11-03 08:03:07 -06:00
Edward Tomasz Napierala 13aa56fcd5 cam(4): preserve alloc_flags when copying CCBs
Before UMA CCBs, all CCBs were of the same size, and could
be trivially copied using bcopy(9).  Now we have to preserve
alloc_flags, otherwise we might end up attempting to free
stack-allocated CCB to UMA; we also need to take CCB size
into account.

This fixes kernel panic which would occur when trying to access
a stopped (as in, SCSI START STOP, also "ctladm stop") SCSI device.

Reported By:	Gary Jennejohn <gljennjohn@gmail.com>
Tested By:	Gary Jennejohn <gljennjohn@gmail.com>
Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31054
2021-07-06 09:27:22 +01:00
Edward Tomasz Napierala 076686fe07 cam: make sure to clear CCBs allocated on the stack
This is required for small CCBs support, where we need to track
whether the CCB was allocated from an UMA zone or not.  There are
no (intended) functional changes with the current source.

Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D29484
2021-03-30 19:15:43 +01:00
John Baldwin 447b3557a9 cam: Permit non-pollable sims.
Some CAM sim drivers do not support polling (notably iscsi(4)).
Rather than using a no-op poll routine that always times out requests,
permit a SIM to set a NULL poll callback.  cam_periph_runccb() will
fail polled requests non-pollable sims immediately as if they had
timed out.

Reviewed by:	scottl, mav (earlier version)
Reviewed by:	imp
MFC after:	2 weeks
Sponsored by:	Chelsio
Differential Revision:	https://reviews.freebsd.org/D28453
2021-02-11 13:52:12 -08:00
Alexander Motin 9093e27cc1 Remove alignment requirements for KVA buffer mapping.
After r368124 vmapbuf() should happily map misaligned maxphys-sized buffers
thanks to extra page added to pbuf_zone.
2020-11-29 00:49:14 +00:00
Konstantin Belousov cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Brooks Davis 44ca4575ea vmapbuf: don't smuggle address or length in buf
Instead, add arguments to vmapbuf.  Since this argument is
always a pointer use a type of void * and cast to vm_offset_t in
vmapbuf.  (In CheriBSD we've altered vm_fault_quick_hold_pages to
take a pointer and check its bounds.)

In no other situtation does b_data contain a user pointer and vmapbuf
replaces b_data with the actual mapping.

Suggested by:	jhb
Reviewed by:	imp, jhb
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26784
2020-10-21 16:00:15 +00:00
Mateusz Guzik 27dcd3d90b cam: clean up empty lines in .c and .h files 2020-09-01 22:13:48 +00:00
Warner Losh 773e541e8d Use devctl.h instead of bus.h to reduce newbus pollution.
There's no need for these parts of the kernel to know about newbus,
so narrow what is included to devctl.h for device_notify_*.

Suggested by: kib@
2020-08-21 00:03:24 +00:00
Warner Losh 450a2e2a12 Remove stale comment
There's no useracc here, and even if there was it shouldn't be here. vmapbuf is
sufficient and as the comment says, useracc is racy.
2020-04-13 21:03:30 +00:00
Warner Losh 81490eda60 Add comment about how the deferred callback for AC_FOUND_DEVICE we
generate for a race where a device goes away, we start to tear down
the periph state for the device, and then the device suddently
reappears. The key that makes it work is removal of periph from the
drv list before calling the deferred callback.

Hat tip to: mav@
2020-03-14 02:36:45 +00:00
Warner Losh 6fda2c54da Reword a comment to describe what's actually going on. We can call invalidate
several times potentially. We just don't do anything on the second and
subsequent calls.
2020-03-07 00:29:12 +00:00
Warner Losh 3750f5ff89 The KASSERT is too strict: revert r357897
It's valid for a periph to be removed with outstanding transactions on the
device. In CAM, multiple periphs attach to a single device. There's no interlock
to prevent one of these going away while other periphs have outstanding CCBs and
it's not an error either. Remove this overly agressive KASSERT to prevent
false-positive panics when devices depart.
2020-02-15 18:14:23 +00:00
Warner Losh 2100c6d00f Add a KASSERT that there's no outstanding CCBs when we call camperiphfree. We
know that if there are any outstanding CCBs, then when they dereference the path
that's freed at the bottom of camperiphfree there will be some flavor of
panic. This moves that eventual panic to a traceback of when we free the last
reference on the device, which is earlier but may not be early enough.
2020-02-14 00:13:23 +00:00
Alexander Motin c389a786dd Make pass(4) handle misaligned buffers of MAXPHYS size.
Since we are already using malloc()+copyin()/copyout() for smaller data
blocks, and since new asynchronous API does it always, I see no reason
to keep this ugly artificial size/alignment limitation in old API.

Tape applications suffer enough from the MAXPHYS limitations by itself,
and additional alignment requirement, often halving effectively usable
block size, does not help.

It would be good to use unmapped I/O here instead, but it require some
HBA drivers polishing first to support non-BIO unmapped buffers.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-23 20:41:55 +00:00
Alexander Motin bae3729be4 Do not retry long ready waits if previous gave nothing.
I have some disks reporting "Logical unit is in process of becoming ready"
for about half an hour before finally reporting failure.  During that time
CAM waits for the readiness during ~2 minutes for each request, that makes
system boot take very long time.

This change reduces wait times for the following requests to ~1 second if
previously long wait for that device has timed out.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-11-22 21:31:59 +00:00
Alexander Motin 07f7e4c8b0 Fix assumptions of only one device per SES slot.
It is typical to have one, but no longer true for multi-actuator HDDs
with separate LUN for each actuator.

MFC after:	4 days
Sponsored by:	iXsystems, Inc.
2019-09-11 03:25:30 +00:00
Alexander Motin 99bad9ca9a Unify SCSI_STATUS_BUSY retry handling with other cases.
- Do not retry if periph was invalidated.
 - Do not decrement retry_count if already zero.
 - Report action_string when applicable.

MFC after:	2 weeks
2019-04-02 14:46:10 +00:00
Alexander Motin b059686a71 Do not map small IOCTL buffers to KVA, but copy.
CAM IOCTL interfaces traditionally mapped user-space data buffers to KVA.
It was nice originally, but now it takes too much to handle respective
TLB shootdowns, while small kernel memory allocations up to 64KB backed
by UMA and accompanied by copyin()/copyout() can be much cheaper.

For large buffers mapping still may have sense, and unmapped I/O would
be even better, but the last unfortunately is more tricky, since unmapped
I/O API is too specific to struct bio now.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-03-28 20:41:02 +00:00
Gleb Smirnoff 756a541279 Allocate pager bufs from UMA instead of 80-ish mutex protected linked list.
o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many
  pbufs are we going to have set.
  In various subsystems that are going to utilize pbufs create private zones
  via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(),
  and sets a limit on created zone. After startup preallocate pbufs according
  to requirements of all pbuf zones.

  Subsystems that used to have a private limit with old allocator now have
  private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS,
  swap, vnode pager.

  The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9),
  aio(4). They should have their private limits, but changing that is out of
  scope of this commit.

o Fetch tunable value of kern.nswbuf from init_param2() and while here move
  NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only
  this option.
  Default values aren't touched by this commit, but they probably should be
  reviewed wrt to modern hardware.

This change removes a tight bottleneck from sendfile(2) operation, that
uses pbufs in vnode pager. Other pagers also would benefit from faster
allocation.

Together with:	gallatin
Tested by:	pho
2019-01-15 01:02:16 +00:00
Warner Losh 9385e92b25 Add comments explaining what hold/unhold do
They act as a simple one-deep semaphore to keep open/close/probe from
running at the same time to avoid races that creates.
2018-11-01 21:51:41 +00:00
Alexander Motin 79fab7d48a Stop further SCSI recovery attempts after one has failed.
We've got a set of probably damaged hard disks, reporting 0x04,0x02
("Logical unit not ready, initializing command required") in response
to READ CAPACITY(16), where attempts to use START STOP UNIT for recovery
results in 0x44,0x00 ("Internal target failure") after ~1 second delay.
As result of all recovery retries, device open attempt took ~3 seconds
before finally reporting to GEOM that device is opened, but has no media.
If the open was for writing and since it hasn't formally failed, following
close triggered GEOM retaste, opening device few more times with respective
delays.

This change reduces whole time of this cycle from ~12 seconds to ~3 by
giving up on recovery after the first failure.

Reviewed by:	ken
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2018-07-21 21:34:10 +00:00
Scott Long 7631477269 Add and fix comments for cam_periph_runccb()
Sponsored by:	Netflix
2018-05-01 17:48:50 +00:00
Warner Losh d38677d23c Create a sysctl kern.cam.{,a,n}da.X.invalidate
kern.cam.{,a,n}da.X.invalidate=1 forces *daX to detach by calling
cam_periph_invalidate on the underlying periph. This is for testing
purposes only. Include only with options CAM_TEST_FAILURE and rename
the former [AN]DA_TEST_FAILURE, and fix nda to compile with it set.
We're using it at work to harden geom and the buffer cache to be
resilient in the face of drive failure. Today, it far too often
results in a panic. While much work was done on SIM initiated removal
for the USB thumnb drive removal work, little has been done for periph
initiated removal. This simulates what *daerror() does for some errors
nicely: we get the same panics with it that we do with failing drives.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14581
2018-03-14 17:53:37 +00:00
Warner Losh bc40691e40 Report the number of remaining retries when we have an error that
we're retrying.
2018-02-15 18:57:54 +00:00
Scott Long 99e7a4ad9e Return a C errno for cam_periph_acquire().
There's no compelling reason to return a cam_status type for this
function and doing so only creates confusion with normal C
coding practices. It's technically an API change, but the periph API
isn't widely used. No efffective change to operation.

Reviewed by:	imp, mav, ken
Sponsored by:	Netflix
Differential Revision:	D14063
2018-02-06 06:42:25 +00:00
Warner Losh 045f8bc8e4 When we crash, we'll stop the scheduler before we call the
shutdown_post_sync event.  For adashutdown, this causes problems
because we need to poll for completion of the commands, but we're not
yet officially dumping yet, so the code from r326964 assumed we could
use the interrupt-driven commands rather than the polled ones. This
lead to a hang. Prevent this by also checking to see if the scheduler
is stopped to do the polling.

Reported by: markj@
Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D13845
2018-01-11 03:11:41 +00:00