When an iSCSI target session is terminated, an internal nexus reset
task is posted to abort existing tasks belonging to the session.
Previously, the ctl_io for this internal nexus reset stored a pointer
to the session in the slot that normally holds a pointer to the PDU
from the initiator that triggered the I/O request. The completion
handler then assumed that any nexus reset I/O was due to an internal
request and fetched the session pointer (instead of the PDU pointer)
from the ctl_io. However, it is possible to trigger a nexus reset via
an on-the-wire task management PDU. If such a PDU were sent to the
target, then the completion handler would incorrectly treat this
request as an internal request and treat the pointer to the received
PDU as a pointer to the session instead.
To fix, allocate a dummy PDU for the internal reset task and use an
invalid opcode to differentiate internal nexus resets from resets
requested by the initiator.
PR: 260449
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: mav
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D34055
It does not look like there is anything in mmc_da code that actually
requires protocol 2.0 or later. dev/mmc code also does not have such a
restriction.
Tested with a very old 2GB mini-SD card. Prior to this change mmc_da
would claim the card but would not expose it to GEOM.
Without MMCCAM:
mmc0: <MMC/SD bus> on sdhci_pci0
mmc0: Probing bus
mmc0: SD probe: OK (OCR: 0x00ff8000)
mmc0: Current OCR: 0x00ff8000
mmc0: CMD8 failed, RESULT: 1
mmc0: Probing cards
mmc0: New card detected (CID 1c53565344432020100002982e007600)
mmc0: New card detected (CSD 005e00325f5a83d02db7ffbf96800000)
mmc0: Card at relative address 0xb368 added:
mmc0: card: SD SDC 1.0 SN 0002982E MFG 06/2007 by 28 SV
mmc0: quirks: 0
mmc0: bus: 4bit, 50MHz (high speed timing)
mmc0: memory: 3998720 blocks, erase sector 256 blocks
mmc0: setting transfer rate to 50.000MHz (high speed timing)
GEOM: new disk mmcsd0
mmcsd0: 2GB <SD SDC 1.0 SN 0002982E MFG 06/2007 by 28 SV> at mmc0 50.0MHz/4bit/65535-block
mmc0: setting bus width to 4 bits high speed timing
With MMCCAM and this change:
sdda0 at sdhci_slot0 bus 0 scbus2 target 0 lun 0
sdda0: Relative addr: 0000b368
Card features: <Memory>
sdda0: Serial Number 0002982E
sdda0: SD SDC 1.0 SN 0002982E MFG 06/2007 by 28 SV
GEOM: new disk sdda0
Reviewed by: manu
MFC after: 3 weeks
* Prefer variables of small scope rather than large scope
* Remove a magic number
* style(9) for return statements
* Remove the get_enc_status method, which never did anything
* Fix a variable type in the handle_string method
* Proofread some comments
MFC after: 2 weeks
Sponsored by: Spectra Logic, Axcient
Reviewed by: ken, mav
Differential Revision: https://reviews.freebsd.org/D31686
sys/cam/scsi/scsi_sa.c:
Add comments explaining the priority order of the various
sources of timeout values. Also, explain that the probe
that pulls in drive recommended timeouts via the REPORT
SUPPORTED OPERATION CODES command is in a race with the
thread that creates the sysctl variables. Because of that
race, it is important that the sysctl thread not load any
timeout values from the kernel environment.
share/man/man4/sa.4:
Use the Sy macro to emphasize thousandths of a second
instead of capitalizing it.
Requested by: Warner Losh <imp@freebsd.org>
Requested by: Daniel Ebdrup Jensen <debdrup@freebsd.org>
Sponsored by: Spectra Logic
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33883
Summary:
The sa(4) driver has historically used tape drive timeouts that
were one-size fits all, with compile-time options to adjust a few
of them.
LTO-9 drives (and presumably other tape drives in the future)
implement a tape characterization process that happens the first
time a tape is loaded. The characterization process formats the
tape to account for the temperature and humidity in the environment
it is being used in. The process for LTO-9 tapes can take from 20
minutes (I have observed 17-18 minutes) to 2 hours according to the
documentation.
As a result, LTO-9 drives have significantly longer recommended
load times than previous LTO generations.
To handle this, change the sa(4) driver over to using timeouts
supplied by the tape drive using the timeout descriptors obtained
through the REPORT SUPPORTED OPERATION CODES command. That command
was introduced in SPC-4. IBM tape drives going back to at least
LTO-5 report timeout values. Oracle/Sun/StorageTek tape drives
going back to at least the T10000C report timeout values. HP LTO-5
and newer drives report timeout values. The sa(4) driver only
queries drives that claim to support SPC-4.
This makes the timeout settings automatic and accurate for newer
tape drives.
Also, add loader tunable and sysctl support so that the user can
override individual command type timeouts for all tape drives in
the system, or only for specific drives.
The new global (these affect all tape drives) loader tunables are:
kern.cam.sa.timeout.erase
kern.cam.sa.timeout.load
kern.cam.sa.timeout.locate
kern.cam.sa.timeout.mode_select
kern.cam.sa.timeout.mode_sense
kern.cam.sa.timeout.prevent
kern.cam.sa.timeout.read
kern.cam.sa.timeout.read_position
kern.cam.sa.timeout.read_block_limits
kern.cam.sa.timeout.report_density
kern.cam.sa.timeout.reserve
kern.cam.sa.timeout.rewind
kern.cam.sa.timeout.space
kern.cam.sa.timeout.tur
kern.cam.sa.timeout.write
kern.cam.sa.timeout.write_filemarks
The new per-instance loader tunable / sysctl variables are:
kern.cam.sa.%d.timeout.erase
kern.cam.sa.%d.timeout.load
kern.cam.sa.%d.timeout.locate
kern.cam.sa.%d.timeout.mode_select
kern.cam.sa.%d.timeout.mode_sense
kern.cam.sa.%d.timeout.prevent
kern.cam.sa.%d.timeout.read
kern.cam.sa.%d.timeout.read_position
kern.cam.sa.%d.timeout.read_block_limits
kern.cam.sa.%d.timeout.report_density
kern.cam.sa.%d.timeout.reserve
kern.cam.sa.%d.timeout.rewind
kern.cam.sa.%d.timeout.space
kern.cam.sa.%d.timeout.tur
kern.cam.sa.%d.timeout.write
kern.cam.sa.%d.timeout.write_filemarks
The values are reported and set in units of thousandths of a
second.
share/man/man4/sa.4:
Document the new loader tunables in the sa(4) man page.
sys/cam/scsi/scsi_sa.c:
Add a new timeout_info array to the softc.
Add a default timeouts array, along with descriptions.
Add a new sysctl tree to the softc to handle the timeout
sysctl values.
Add a new function, saloadtotunables(), that will load
the global loader tunables first and then any per-instance
loader tunables second.
Add creation of the new timeout sysctl variables in
sasysctlinit().
Add a new, optional probe state to the sa(4) driver. We
previously didn't do any probing, but now we probe for
timeout descriptors if the drive claims to support SPC-4 or
later. In saregister(), we check the SCSI revision and
either launch the probe state machine, or announce the
device and become ready.
In sastart() and sadone(), add support for the new
SA_STATE_PROBE. If we're probing, we don't go through
saerror(), since that is currently only written to handle
I/O errors in the normal state.
Change every place in the sa(4) driver that fills in
timeout values in a CCB to use the new timeout_info[] array
in the softc.
Add a new saloadtimeouts() routine to parse the returned
timeout descriptors from a completed REPORT SUPPORTED
OPERATION CODES command, and set the values for the
commands we support.
MFC after: 1 week
Sponsored by: Spectra Logic
Test Plan:
Try this out with a variety of tape drives and make sure the timeouts that
result (sysctl kern.cam.sa to see them) are reasonable.
Reviewers: #manpages, #cam
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D33883
Since both scsi_xpt and ata_xpt use the same name for the softc, this
can lead to problems in gdb. Avoid the issue by renaming the ata
probe_softc to aprobe_softc as has been done for the aprobe in
0f280cbd0a. This was overlooked at the time.
Sponsored by: Netflix
MFC After: 2 weeks
Before this change on every open da(4) driver read all mode pages to
use only one bit. It was done so to not depend on the list of pages
supported by the disk. But I've found that at least for SATL of LSI/
Broadcom HBAs with WD HDDs Power Condition mode page reading may take
significant amount of time, much more than any other mode page, that
visibly increased disk retaste time by GEOM.
Address that by using data returned by the first MODE SENSE request
to limit the following ones to only one (the first for now) mode page.
With the change simultaneous retaste of 39 SATA disks takes about 2.5s
instead of more than 4s before, and I no longer see "dareprobe" status
on GEOM event thread.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
If the last matching device entry partially matched in camperiphunit,
but then hit a continue case, we'd mistakenly think we had a match on
that entry. This lead to a number of problems downstream (usually a
belief that we had a duplicate wiring hint because unit = 0 is the
default). Fix this by using a for loop that does the assignment before
the loop termination test.
Sponsored by: Netflix
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D33873
Weird side of SES specification is that some bits have different
meaning or semantics in status and control pages. This patch fixes
non-zero writes into reserved fields, that caused errors on some
enclosures when trying to control locate/fault LEDs, keeping other
bits unchanged.
MFC after: 2 weeks
Sposonred by: iXsystems, Inc.
If the UMA zones are not freed, we get warnings about re-using the
sysctl variables associated with the UMA zones, and we're leaking
the other memory associated with the zone structures. e.g.:
sysctl_warn_reuse: can't re-use a leaf (vm.uma.pass44.size)!
sysctl_warn_reuse: can't re-use a leaf (vm.uma.pass44.flags)!
sysctl_warn_reuse: can't re-use a leaf (vm.uma.pass44.bucket_size)!
sysctl_warn_reuse: can't re-use a leaf (vm.uma.pass44.bucket_size_max)!
sysctl_warn_reuse: can't re-use a leaf (vm.uma.pass44.keg.name)!
sysctl_warn_reuse: can't re-use a leaf (vm.uma.pass44.keg.rsize)!
sysctl_warn_reuse: can't re-use a leaf (vm.uma.pass44.keg.ppera)!
sysctl_warn_reuse: can't re-use a leaf (vm.uma.pass44.keg.ipers)!
Also, correctly clear the PASS_FLAG_ZONE_INPROG flag in
passcreatezone(). The way it was previously done, it would have
had set the flag and cleared all other flags that were set at
that point.
MFC after: 1 week
Sponsored by: Spectra Logic
sddadump has been derived from sddastart.
mmc_sim interface has grown a new method, cam_poll, to support polled
operation.
mmc_sim code has been changed to provide a sim_poll hook only if the
controller implements the new method. The hooks is implemented in terms
of the new mmc_sim_cam_poll method.
Additionally, in-progress CCB-s now have CAM_REQ_INPROG status to
satisfy xpt_pollwait().
mmc_sim_cam_poll method has been implemented in dwmmc host controller.
Reviewed by: manu, mav, imp
MFC after: 2 weeks
Relnotes: perhaps
Differential Revision: https://reviews.freebsd.org/D33843
Those can be returned by CHECK POWER MODE command (0xe5).
Note that some of the definitions duplicate definitions for Extended
Power Conditions.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33646
Tag more status return values as an error for the
I/O. CAM_SCSI_STATUS_ERROR is returned for medium errors, for example,
but the counts weren't increased. The added errors all indicate a
problem with the device request.
Sponsored by: Netflix
PR: 260257
Feedback from: ken
Reviewed by: asomers
Differential Revision: https://reviews.freebsd.org/D33783
On large systems even relatively rare callouts may fire many times
per second. This should allow them to aggregate better, since we do
not require any precision when polling for media change, etc.
MFC after: 2 weeks
This matches icl_conn_task_setup() which passes the PDU and avoids the
need for a layering violation in cxgbei to fetch the request PDU from
the ctl_io.
Reviewed by: mav
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D33746
Don't take the device_mtx lock in daasync() when handling an
AC_UNIT_ATTENTION. Instead, assert the lock is held before modifying the
periph's softc flags.
The device_mtx lock is taken in xptdevicetraverse() before daasync()
is eventually called in xpt_async_bcast().
PR: 240917, 226510, 226578
Reviewed by: imp
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D27735
For years CTL block backend limited I/O size to 1MB, splitting larger
requests into sequentially processed chunks. It is sufficient for
most of use cases, since typical initiators rarely use bigger I/Os.
One of known exceptions is VMWare VAAI offload, by default sending up
to 8 4MB EXTENDED COPY requests same time. CTL internally converted
those into 32 1MB READ/WRITE requests, that could overwhelm the block
backend, having finite number of processing threads and making more
important interactive I/Os to wait in its queue. Previously it was
partially covered by CTL core serializing sequential reads to help
ZFS speculative prefetcher, but that serialization was significantly
relaxed after recent ZFS improvements.
With the new settings block backend receives 8 4MB requests, that
should be easier for both CTL itself and the underlying storage.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
If a disk is already in STANDBY mode, then setting IDLE mode can
actually spin it up.
Reviewed by: mav
MFC after: 4 weeks
Differential Revision: https://reviews.freebsd.org/D33588
Don't send the SCSI comand SYNCHRONIZE CACHE on devices that are still
open when RB_NOSYNC is the reboot method. This may avoid recursive panics
when doadump is called due to a SCSI/CAM/USB error/bug.
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: imp
Differential revision: https://reviews.freebsd.org/D31549
To quote the manual:
The pointer passed in as name and type is saved rather than the data
it points to. The data pointed to must remain stable until the mutex
is destroyed.
It seems that the type is actually copied, but the name is stored as
a pointer indeed.
mmc_cam_sim_alloc used a name stored on stack.
So, a corrupt mutex name would be reported.
For example:
lock order reversal: (sleepable after non-sleepable)
1st 0xd7285b20 <8A><C0><C0>P@<C1><D0>P@<C1>^D^A (aw_mmc_sim, sleep mutex) @ sys/cam/cam_xpt.c:2804
This change moves the name to struct mmc_sim.
Also, that name is used as the sim name as well.
Unused mtx_name variable is removed too.
The name buffer is reduced to 16 characters.
Reviewed by: manu, bz
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D33412
To quote the manual:
The pointer passed in as name and type is saved rather than the data
it points to. The data pointed to must remain stable until the mutex
is destroyed.
It seems that the type is actually copied, but the name is stored as
a pointer indeed.
mmc_cam_sim_alloc used a name stored on stack.
So, a corrupt mutex name would be reported.
For example:
lock order reversal: (sleepable after non-sleepable)
1st 0xd7285b20 <8A><C0><C0>P@<C1><D0>P@<C1>^D^A (aw_mmc_sim, sleep mutex) @ /usr/devel/git/orange/sys/cam/cam_xpt.c:2804
This change moves the name to struct mmc_sim.
Also, that name is used as the sim name as well.
Unused mtx_name variable is removed too.
Set the bucket size to be SBT_1US/50000 + 1 to be the first number >
20us. I had this uncommitted in my three when I pushed 2283206935
since kern.cam.iosched.bucket_base_us was reporting 19us.
Fixes: 2283206935
Sponsored by: Netflix
Add sysctls to publish the latency bucket size, number, and stride. Move
to putting all the iosched stuff under kern.cam.iosched as well and move
kern.cam.do_dynamic_iosched to kern.cam.iosched.dynamic. In addition, move
kern.cam.io_sched_alpha_bits to kern.cam.iosched.alpha_bits. Publish
kern.cam.iosched.bucket_base (the smallest bucket time), .bucket_ratio
(the geometric progression factor * 100), and .buckets (the total number
of buckets).
Move to publishing 20 buckets, starting at 20us. This allows us to get
better resolution on the short end and detect preformance degredation of
the NVMe drives I've tested on, even with the uncertainty of bucketing.
Sponsored by: Netflix
As part of converting the code to a while loop, the unconditional
initialization of wired to false was lost.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33163
Delete all the write only variables in CAM. At worst, the only behavior
change would be to prevent core dumps from chasing NULL pointers (though
I think in all these cases the pointers can't be NULL).
Sponsored by: Netflix
The SanDisk SD8SB8U1 and likely others pad their serial number with
spaces on the end rather than the start (at least when connected to a
SAS3008). This makes them difficult to wire unit numbers to with the
serial because you have to specify the trailing spaces. Instead, strip
out the trailing spaces.
We already strip leading spaces both here. In addition, when glabel
creates the devfs device nodes, leading and trailing spaces are removed
already (so there will be no change there either).
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D32684
For scsi, ata and nvme, at least, we read a serial number from the
device (if the device supports it, some scsi drives do not) and record
it during the *_xpt probe device state machine before it posts the
AC_FOUND_DEVICE async event. For mmc, no serial number is ever
retrieved, so it's always NULL. Add the ability to match this serial
number during device wiring.
This mechanism is competely optional, and often times using a label
and/or some other attribute of the device is easier. However, other
times wiring a unit to a serial number simplifies management as most
monitoring tools require the *daX device and having it stable from boot
to boot helps with data continuity. It can be especially helpful for
nvme where no other means exists to reliably tie a ndaX device to an
underlying nvme drive and namespace.
A similar mechanism exists in Linux to mange device unit numbers with
udev.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D32683
If we assigned just a lun as a wired unit (something that camperiphunit
will accept), we failed to properly skip over that unit when computing a
next unit number. Add lun so the code matches the comments that we have
to skip all the same criteria that camperiphunit uses to select wired
units for a driver.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D32682
When scanning the resources that are wired for this driver, skip any
that whose number doesn't match newunit. They aren't relevant. Switch to
positive logic to break out of the loop (and thus go to the next unit)
if we find either a target resource or an at resource. This makes the
code easier to read and modify.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D32681
The code in camperiphunit rejects "scbus" as an 'at' location that would
allow any other wiring to use that unit number. Yet in
camperiphunitnext, if we have a no target and the 'at' location of
'scbus' it would be excluded on the basis that it's a wiring
cadidate. This is improper and appears to be a hold-over of the
pre-hints / pre-newbus config system, so remove it.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D32680
1. During CD probing, we read the TOC header to find the number of
entries, then read the TOC itself. The header determines the number
of entries, which determines the amount of data to read from the
device into the softc in the CD_STATE_MEDIA_TOC_FULL state. We
hard-code a limit of 99 tracks (plus one for the lead-out) in the
softc, but were not validating that the size reported by the media
would fit in this hard-coded limit. Kernel memory corruption could
occur if not.[1] Add validation to check this, and refuse to cache
the TOC if it would not fit.
2. The CDIOCPLAYTRACKS ioctl uses caller provided track numbers to index
into the TOC, but we only validate the starting index. Add
validation of the ending index.
Also, raise the hard-coded limit from 100 tracks to 170, per a
suggestion from Ken.
Reported by: C Turt <ecturt@gmail.com> [1]
Reviewed by: ken, avg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32803
In the ATA/ATAPI spec these are space-padded fixed-length strings with
no NUL-terminator (and byte swapped). When performing the identify we
call ata_param_fixup to swap the bytes back to be in order, strip any
leading/trailing spaces and coalesce consecutive spaces, padding with
NULs. However, if the input has no padding spaces, the fixed-up strings
are still not NUL-terminated. This causes two issues. The first is that
strlcpy will truncate the string by replacing the final byte with a NUL.
The second is that strlcpy will keep reading src until it finds a NUL in
order to calculate the return value, which is defined as the length of
src (so that callers can then compare it with the dsize input to see if
the input string was truncated), thereby reading past the end of the
buffer and into whatever adjacent fields are in the structure. In
practice there's a NUL byte somewhere in the structure, but on CHERI
with subobject bounds enabled in the compiler this overread will be
detected and trap as a bounds violation.
Note this matches ata_xpt's aprobedone, which does a bcopy to a
malloc'ed buffer and manually NUL-terminates it for the CAM path's
device's serial_num.
Found by: CHERI
Reviewed by: imp, scottl
Differential Revision: https://reviews.freebsd.org/D32567