system/freebsd-src

mirror of https://github.com/freebsd/freebsd-src synced 2024-09-16 06:52:27 +00:00

Author	SHA1	Message	Date
Chuck Tuffli	ce75bfcac9	nvme: Change namespace device name Changes the device name for NVMe and NVMe-oF namespaces from using "ns" to "n" to be more compatible with other operating systems. For example, a device which was previously /dev/nvme0ns1 is now /dev/nvme0n1. Preserves the existing functionality by creating alias from nvmeXnY to nvmeXnsY. Reviewed by: imp MFC after: 1 month Relnotes: yes Differential Revision: https://reviews.freebsd.org/D45414	2024-06-01 04:14:14 -07:00
Warner Losh	d09ee08f10	nvme: Count number of alginment splits When possible, we split up I/Os to NVMe drives that advertise a preferred alignment. Add a counter for this. Sponsored by: Netflix Reviewed by: chuck, mav Differential Revision: https://reviews.freebsd.org/D45311	2024-05-24 08:32:47 -06:00
Warner Losh	0dd84c3b11	nvme: Add comment about where tr->deadline is set It's easy to overlook the chain of events that lead to tr->deadline being updated. Add a comment here to explain what otherwise looks like an oversight w/o careful study. Sponsored by: Netflix	2024-05-13 16:14:04 -06:00
Warner Losh	c931cf6af0	nvme: Slight simplification We don't need to dereference qpair to get the ctrlr pointer each time, so use the cached value. It's not going to change. No change intended. Sponsored by: Netflix	2024-05-13 16:14:04 -06:00
Warner Losh	9db8ca92b9	nvme: Slight reworking this loop to match FreeBSD style Update the comment for the code, and slightly rework the code in the 'fast exit' paradigm that FreeBSD generally tries to do. Sponsored by: Netflix	2024-05-13 16:14:04 -06:00
Warner Losh	5a178b831a	nvme: Add locking asserts nvme_qpair_complete_tracker and nvme_qpair_manual_complete_tracker have to be called without the qpair lock, so assert its unowned. Sponsored by: Netflix	2024-05-13 16:14:03 -06:00
John Baldwin	da4230af3f	nvme/f: Use strlcpy instead of strncpy + manual string termination Reviewed by: dab, imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D45153	2024-05-13 12:04:03 -07:00
John Baldwin	01fc488381	nvme: Use strlcpy instead of strncpy to ensure termination Reviewed by: dab, imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D45152	2024-05-13 12:03:49 -07:00
Warner Losh	e84a75f936	nvme: Add telemetry page definitions Add definition for page types 7 and 8 for host initiated telemetry and controller initiated telemetry (they differ by one byte, but that byte that's defined in the host version is reserved in the controller version). Sponsored by: Netflix	2024-05-11 12:09:50 -06:00
John Baldwin	ebcfab998e	nvme: Explicitly align struct nvme_command on an 8 byte boundary This was already true for most architectures due to uint64_t structure members. However, i386 is special in that it only requires 4 byte alignment for uint64_t. As a result, casts from struct nvme_command to struct nvmf_fabric_cmd were raising a "cast increases alignment" warning on i386. Explicitly aligning struct nvme_command pacifies this warning on i386. Reported by: rscheff Sponsored by: Chelsio Communications	2024-05-08 16:05:39 -07:00
John Baldwin	29d7e39f56	nvme: Bump the alignment of struct nvme_health_information_page to 8 This ensures that embedded uint64_t values used for statistics counters are aligned when allocating a structure on the stack or as part of a containing structure. In particular this quiets -Waddress-of-packed-member warnings from GCC when compiling the code in nvmfd to update the stats. Reported by: GCC	2024-05-07 13:54:00 -07:00
John Baldwin	5e3e444230	nvme: Add constants for the Fused Operation (FUSE) field in commands Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44845	2024-05-02 16:31:02 -07:00
John Baldwin	d86edc181a	nvmf.h: New header defining ioctls for NVMe over Fabrics This defines structures, ioctl commands, and related constants used for both the Fabrics host and controller. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44706	2024-05-02 16:27:13 -07:00
Warner Losh	97b77de2d9	nvme: Eliminate intel_log_temp_stats_swapbytes We can't post a AER for this page, so there's no need to be able to swap it to host byte order. It's not one of the standard defined pages that can post via AER, and the vendor's public docs for this temperature page don't suggest it's possible to get over or under event changes. Since nvmecontrol no longer needsd the swap routine, remove it since it's now unused. Sponsored by: Netflix Reviewed by: chuck Differential Revision: https://reviews.freebsd.org/D44659	2024-04-16 21:30:19 -06:00
Brooks Davis	6bb132ba1e	Reduce reliance on sys/sysproto.h pollution Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed. sys/sysproto.h currently includes sys/acl.h which currently includes sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in sys/errno.h sys/malloc.h. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D44465	2024-04-15 21:35:40 +01:00
Warner Losh	0b8f21e8d1	nvme: Add LPA bits Add all the bits from the NVMe 2.0 base specification: CMD_EFFECTS to indicate the commands and effects log page is supported, TELEMETRY to indicate that the telemetry log pages and protocols are supported, PERSISTENT_EVENTS to indicate the persistent event log is supported, LOG_PAGES_PAGE to indicate that various log pages related to log page and command support are supported: L0, L5, L12, and L13. and DA4_TELEMETRY to indicate that the DA4 area is supported for telemetry data. Sponsored by: Netflix	2024-04-05 16:53:47 -06:00
John Baldwin	21d3a84db4	nvme: Add NVMe over Fabrics fields to nvme_controller_data Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44448	2024-03-22 17:24:52 -07:00
John Baldwin	7fa8adb8c5	nvme: Add constants for the Controller Attributes field in cdata Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44447	2024-03-22 17:24:31 -07:00
John Baldwin	88ecf154c7	nvme: Add constants and types for the discovery log page This is used in NVMe over Fabrics to enumerate a list of available controllers. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44446	2024-03-22 17:24:18 -07:00
John Baldwin	b354bb04cb	nvme: Add constants for fields in AER completion dword 0 Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44445	2024-03-22 17:24:06 -07:00
John Baldwin	cbda1886ab	nvme: Add constants for the extended data for Get Log Page command flag nvme(4) doesn't check this flag, but Fabrics implementations may need to set this flag in the log page attributes cdata field. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44444	2024-03-22 17:23:46 -07:00
John Baldwin	b8cb8dd362	nvme: Add constants for the PSDT field in cdw0 This is not used in nvme(4) but is used in NVMe over Fabrics transports which use SGLs to describe buffers instead of PRPs. While here, adjust the shift value for the FUSE field to be relative to the 'fuse' member of 'struct nvme_command'. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44443	2024-03-22 17:23:24 -07:00
John Baldwin	f21a54d190	nvme: Add SGL structure and constants for use in NVMe commands Fabrics capsules use an SGL structure instead of prp1/2 addresses to describe the data buffer used for a command. The SGL structure is added to a union with the existing prp1/2 fields. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44442	2024-03-22 17:23:09 -07:00
John Baldwin	1931b75e00	nvme: Export constants for min and max queue sizes These are useful for NVMe over Fabrics. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44441	2024-03-22 17:23:02 -07:00
Warner Losh	fe52c3384c	nvme_sim: Add comment about the is_failed test We only see a request with a failed controller while we're in the process of failing the controller. Add a comment to that effect. Sponsored by: Netflix	2024-03-07 12:05:28 -07:00
Warner Losh	2a2682ee53	nvme: Add SMART WARNING for persistent memory region NVME 2.0 added persistent memory regions, and this bit reports critical warnings / errors with those regions. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D44213	2024-03-06 18:38:59 -07:00
Warner Losh	5cdedf676d	nvme: Log reset success or failure to devd We're logging when we start a reset, but not when we complete it, nor the result. Create now log a success or timed_out event for the reset. Currently, the only detectable error we have from reset is 'failure to become ready in time,' though the code looks like it might be more generic. Log this and if we ever have other failure modes, change the logging to devd when that happens. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D44211	2024-03-06 18:38:59 -07:00
Warner Losh	4f817fcf6a	nvme: Change devctl events for the controller Change the devctl events slightly for the controller. SMART errors will log the changed bits in the NVME SMART Critical Warning State as its event. Reset will now emit 'event=start'. Soon more. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D44210	2024-03-06 18:38:59 -07:00
Warner Losh	fc3afe9395	nvme: split devctl out to its own function Split the devctl aspect of things out to its own function in nvme_ctrlr_devctl_log. In preparing to document this, and based on actual use, we want something different for the SMART errors, so this will facilitate that. Sponsored by: Netflix Reviewed by: chuck, mav Differential Revision: https://reviews.freebsd.org/D44209	2024-03-06 18:38:59 -07:00
Warner Losh	c5246cb7b0	nvme: Report only the unknown bits When we get a smart error that's unknown, report only the unknown (reserved) bits of the Critical Warning Bitfield. Sponsored by: Netflix	2024-03-01 16:04:27 -07:00
John Baldwin	7485926e09	nvme: Firmware revisions in the firmware slot info logpage are ASCII strings In particular, don't try to byteswap the values as 64-bit integers and always print a non-empty version as a string. Reviewed by: chuck, imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44121	2024-03-01 14:18:43 -08:00
John Baldwin	5650bd3fe8	nvme: Use the NVMEF macro to construct fields Reviewed by: chuck, imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D43605	2024-01-29 11:01:13 -08:00
John Baldwin	3a477a9b70	nvme: Add NVMEF helper macro as the inverse of NVMEV This macro accepts a field name and a value for the field and constructs the shifted field value. Reviewed by: chuck Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D43604	2024-01-29 11:00:57 -08:00
John Baldwin	8488fc417f	nvme: Use the NVMEM macro instead of expanded versions A few of these omitted a shift of 0, but this is more consistent. Reviewed by: chuck Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D43602	2024-01-29 10:59:37 -08:00
John Baldwin	1dade1f255	nvme: Rename NVMEB helper macro to NVMEM The current macro always builds a full mask for a named field, so use the M suffix for mask. Reviewed by: chuck, imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D43601	2024-01-29 10:58:28 -08:00
John Baldwin	479680f235	nvme: Use the NVMEV macro instead of expanded versions Reviewed by: chuck Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D43595	2024-01-29 10:30:54 -08:00
Alexander Motin	b46c7b1ed4	nvme: Add some bits from NVMe 2.0c spec. MFC after: 1 week	2023-12-27 13:50:54 -05:00
Mark Johnston	d9b7301bb7	nvme: Initialize HMB entries before loading them into the controller struct nvme_hmb_desc contains a pad field which was not getting initialized before being synced. This doesn't have much consequence but triggers a report from KMSAN, which verifies that host-filled DMA memory is initialized before it is made visible to the device. So, let's just initialize it properly. Reported by: KMSAN Reviewed by: mav, imp MFC after: 1 week Sponsored by: Klara, Inc. Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D43090	2023-12-18 17:45:24 -05:00
Warner Losh	fdafd315ad	sys: Automated cleanup of cdefs and other formatting Apply the following automated changes to try to eliminate no-longer-needed sys/cdefs.h includes as well as now-empty blank lines in a row. Remove /^#if.\n#endif.\n#include\s+<sys/cdefs.h>.\n/ Remove /\n+#include\s+<sys/cdefs.h>.\n+#if.\n#endif.\n+/ Remove /\n+#if.\n#endif.\n+/ Remove /^#if.\n#endif.\n/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/ Sponsored by: Netflix	2023-11-26 22:24:00 -07:00
Warner Losh	34a6ad848f	nvme: Don't use version to listen for events for ns and fw changes Instead, use the attribtue bits from the identification data to determine if we should listen to namespace changes and firmware activation. Should have no functional change, though we may stop listening for events that will never happen. Sponsored by: Netflix	2023-11-17 21:25:57 -07:00
Warner Losh	fd9a4a67d0	cam: Minor opt_cam.h cleanup sys/cam/cam.h includes opt_cam.h, so none of the clients need to do this. cam.h does all the right dancing to conditionally include opt_cam.h only when it makes sense. It generally only matters when cam_debug.h is included (it must be included before that). Many of the stray opt_cam.h includes were after cam_debug.h which would be a problem were it not included in cam/cam.h. The other users of CAM options that aren't debug all already include cam/cam.h. Also trim unneeded sys/cdefs.h files from the files touched. Sponsored by: Netflix	2023-11-06 10:47:15 -07:00
Alexander Motin	8d6c0743e3	nvme: Introduce longer timeouts for admin queue KIOXIA CD8 SSDs routinely take ~25 seconds to delete non-empty namespace. In some cases like hot-plug it takes longer, triggering timeout and controller resets after just 30 seconds. Linux for many years has separate 60 seconds timeout for admin queue. This patch does the same. And it is good to be consistent. Sponsored by: iXsystems, Inc. Reviewed by: imp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D42454	2023-11-06 11:05:48 -05:00
Warner Losh	afc3d49b17	nvme: Close a race in destroying qpair and timeouts While we should have cleared all the pending I/O prior to calling nvme_qpair_destroy, which should ensure that if the callout_drain causes a call to nvme_qpair_timeout(), it won't schedule any new timeout. However, it doesn't hurt to set timeout_pending to false in nvme_qpair_destroy() and have nvme_qpair_timeout() exit early if it sees it w/o scheduling a timeout. Since we don't otherwise stop the timeout until we're about to destroy the qpair, this ensures we fail safe. The lock/unlock also ensures the callout_drain will either remove the callout, or wait for it to run with the early bailout. We can likely further improve this by using callout_stop() inside the pending lock. I'll investigate that for future refinement. Sponsored by: Netflix Suggestions by: jhb Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D42065	2023-10-10 16:13:57 -06:00
Warner Losh	9cd7b62473	nvme: Eliminate RECOVERY_FAILED state While it seemed like a good idea to have this state, we can do everything we wanted with the state by checking ctrlr->is_failed since that's set before we start failing the qpairs. Add some comments about racing when we're failing the controller, though in practice I'm not sure that kind of race could even be lost. Sponsored by: Netflix Reviewed by: chuck, gallatin, jhb Differential Revision: https://reviews.freebsd.org/D42051	2023-10-10 16:13:57 -06:00
Warner Losh	6b2a6e9cb0	nvme: Remove stale comment After `da8324a925`, the pre/post hooks are gone. So remove a coment about why we don't call them in this case. Sponsored by: Netflix Reviewed by: chuck, jhb Differential Revision: https://reviews.freebsd.org/D42050	2023-10-10 16:13:56 -06:00
Warner Losh	4026128983	nvme: Really remove NVME_2X_RESET `da8324a925` removed one of the two instances of NVME_2X_RESET. It failed to snag the other one, and remove it from the options file. Remove from both of those here. Sponsored by: Netflix Reviewed by: chuck, gallatin, jhb Differential Revision: https://reviews.freebsd.org/D42049	2023-10-10 16:13:56 -06:00
Warner Losh	bc85cd303c	nvme: gc nvme_ctrlr_post_failed_request and related task stuff In `4b977e6dda` we removed the call to nvme_ctrlr_post_failed_request because we can now directly fail requests in this context since we're in the reset task already. No need to queue it. I left it in place against future need, but it's been two years and no panics have resulted. Since the static analysis (code checking) and the dyanmic analysis (surviving in the field for 2 years, including at $WORK where we know we've gone through this path when we've failed drives) both signal that it's not really needed, go ahead and GC it. If we discover at a later date a flaw in this analysis, we can add it back easily enough by reverting this and `4b977e6dda`. Sponsored by: Netflix Reviewed by: chuck, gallatin, jhb Differential Revision: https://reviews.freebsd.org/D42048	2023-10-10 16:13:56 -06:00
David Sloan	7ea866eb14	nvme: Fix memory leak in pt ioctl commands When running nvme passthrough commands through the ioctl interface memory is mapped with vmapbuf() but not unmapped. This results in leaked memory whenever a process executes an nvme passthrough command with a data buffer. This can be replicated with a simple c function (error checks skipped for brevity): void leak_memory(int nvme_ns_fd, uint16_t nblocks) { struct nvme_pt_command pt = { .cmd = { .opc = NVME_OPC_READ, .cdw12 = nblocks - 1, }, .len = nblocks * 512, // Assumes devices with 512 byte lba .is_read = 1, // Reads and writes should both trigger leak } void buf; posix_memalign(&buf, nblocks 512); pt.buf = buf; ioctl(nvme_ns_fd, NVME_PASSTHROUGH_COMMAND, &pt); free(buf); } Signed-off-by: David Sloan <david.sloan@eideticom.com> PR: 273626 Reviewed by: imp, markj MFC after: 1 week	2023-10-02 11:50:14 -04:00
Warner Losh	1d6021cd72	nvme: Supress noise messages When we're suspending, we get messages about waiting for the controller to reset. These are in error: we're not waiting for it to reset. We put the recovery state as part of suspending, so we should suppress these as a false positive. Also remove a stray debug that's left over from earlier versions of the recovery code that no longer makes sense. Sponsored by: Netflix	2023-09-25 22:21:58 -06:00
Warner Losh	da8324a925	nvme: Fix locking protocol violation to fix suspend / resume Currently, when we suspend, we need to tear down all the qpairs. We call nvme_admin_qpair_abort_aers with the admin qpair lock held, but the tracker it will call for the pending AER also locks it (recursively) hitting an assert. This routine is called without the qpair lock held when we destroy the device entirely in a number of places. Add an assert to this effect and drop the qpair lock before calling it. nvme_admin_qpair_abort_aers then locks the qpair lock to traverse the list, dropping it around calls to nvme_qpair_complete_tracker, and restarting the list scan after picking it back up. Note: If interrupts are still running, there's a tiny window for these AERs: If one fires just an instant after we manually complete it, then we'll be fine: we set the state of the queue to 'waiting' and we ignore interrupts while 'waiting'. We know we'll destroy all the queue state with these pending interrupts before looking at them again and we know all the TRs will have been completed or rescheduled. So either way we're covered. Also, tidy up the failure case as well: failing a queue is a superset of disabling it, so no need to call disable first. This solves solves some locking issues with recursion since we don't need to recurse.. Set the qpair state of failed queues to RECOVERY_FAILED and stop scheduling the watchdog. Assert we're not failed when we're enabling a qpair, since failure currently is one-way. Make failure a little less verbose. Next, kill the pre/post reset stuff. It's completely bogus since we disable the qparis, we don't need to also hold the lock through the reset: disabling will cause the ISR to return early. This keeps us from recursing on the recovery lock when resuming. We only need the recovery lock to avoid a specific race between the timer and the ISR. Finally, kill NVME_RESET_2X. It'S been a major release since we put it in and nobody has used it as far as I can tell. And it was a motivator for the pre/post uglification. These are all interrelated, so need to be done at the same time. Sponsored by: Netflix Reviewed by: jhb Tested by: jhb (made sure suspend / resume worked) MFC After: 3 days Differential Revision: https://reviews.freebsd.org/D41866	2023-09-24 07:17:18 -06:00

1 2 3 4 5 ...

433 commits