nvmf_tcp: Add a TCP transport for NVMe over Fabrics

Structurally this is very similar to the TCP transport for iSCSI
(icl_soft.c).  One key difference is that NVMeoF transports use a more
abstract interface working with NVMe commands rather than transport
PDUs.  Thus, the data transfer for a given command is managed entirely
in the transport backend.

Similar to icl_soft.c, separate kthreads are used to handle transmit
and receive for each queue pair.  On the transmit side, when a capsule
is transmitted by an upper layer, it is placed on a queue for
processing by the transmit thread.  The transmit thread converts
command response capsules into suitable TCP PDUs where each PDU is
described by an mbuf chain that is then queued to the backing socket's
send buffer.  Command capsules can embed data along with the NVMe
command.

On the receive side, a socket upcall notifies the receive kthread when
more data arrives.  Once enough data has arrived for a PDU, the PDU is
handled synchronously in the kthread.  PDUs such as R2T or data
related PDUs are handled internally, with callbacks invoked if a data
transfer encounters an error, or once the data transfer has completed.
Received capsule PDUs invoke the upper layer's capsule_received
callback.

struct nvmf_tcp_command_buffer manages a TCP command buffer for data
transfers that do not use in-capsule-data as described in the NVMeoF
spec.  Data related PDUs such as R2T, C2H, and H2C are associated with
a command buffer except in the case of the send_controller_data
transport method which simply constructs one or more C2H PDUs from the
caller's mbuf chain.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44712
This commit is contained in:
John Baldwin 2024-05-02 16:28:47 -07:00
parent aa1207ea4f
commit 59144db3fc
7 changed files with 1937 additions and 1 deletions

View file

@ -408,6 +408,7 @@ MAN= aac.4 \
nvd.4 \
${_nvdimm.4} \
nvme.4 \
nvmf_tcp.4 \
${_nvram.4} \
oce.4 \
ocs_fc.4\

57
share/man/man4/nvmf_tcp.4 Normal file
View file

@ -0,0 +1,57 @@
.\"
.\" SPDX-License-Identifier: BSD-2-Clause
.\"
.\" Copyright (c) 2024 Chelsio Communications, Inc.
.\"
.Dd May 2, 2024
.Dt NVMF_TCP 4
.Os
.Sh NAME
.Nm nvmf_tcp
.Nd "TCP transport for NVM Express over Fabrics"
.Sh SYNOPSIS
To compile the module into the kernel,
place the following line in the
kernel configuration file:
.Bd -ragged -offset indent
.Cd "device nvmf_tcp"
.Ed
.Pp
Alternatively, to load the
module at boot time, place the following line in
.Xr loader.conf 5 :
.Bd -literal -offset indent
nvmf_tcp_load="YES"
.Ed
.Sh DESCRIPTION
The
.Nm
module implements the software TCP/IP transport for NVM Express over Fabrics.
It can be used by either the in-kernel NVMeoF host driver or controller.
.Sh SYSCTL VARIABLES
The following variables are available as both
.Xr sysctl 8
variables and
.Xr loader 8
tunables:
.Bl -tag -width indent
.It Va kern.nvmf.tcp.max_c2hdata
The maximum data payload size of a
.Va C2H_DATA
PDU sent by the controller to a remote host.
The default size is 256 kilobytes.
.El
.Sh SEE ALSO
.Xr nvmf 4 ,
.Xr nvmft 4
.Sh HISTORY
The
.Nm
module first appeared in
.Fx 15.0 .
.Sh AUTHORS
The
.Nm
module was developed by
.An John Baldwin Aq Mt jhb@FreeBSD.org
under sponsorship from Chelsio Communications, Inc.

View file

@ -1676,11 +1676,13 @@ device mrsas # LSI/Avago MegaRAID SAS/SATA, 6Gb/s and 12Gb/s
# NVM Express
#
# nvme: PCI-express NVM Express host controllers
# nvmf_tcp: TCP transport for NVM Express over Fabrics
# nda: CAM NVMe disk driver
# nvd: non-CAM NVMe disk driver
device nvme # base NVMe driver
options NVME_USE_NVD=1 # Use nvd(4) instead of the CAM nda(4) driver
device nvmf_tcp # NVMeoF TCP transport
device nda # NVMe direct access devices (aka disks)
device nvd # expose NVMe namespaces as disks, depends on nvme

View file

@ -2533,6 +2533,7 @@ dev/nvme/nvme_test.c optional nvme
dev/nvme/nvme_util.c optional nvme
dev/nvmem/nvmem.c optional nvmem fdt
dev/nvmem/nvmem_if.m optional nvmem
dev/nvmf/nvmf_tcp.c optional nvmf_tcp
dev/oce/oce_hw.c optional oce pci
dev/oce/oce_if.c optional oce pci
dev/oce/oce_mbox.c optional oce pci

1867
sys/dev/nvmf/nvmf_tcp.c Normal file

File diff suppressed because it is too large Load diff

View file

@ -1,3 +1,4 @@
SUBDIR= nvmf_transport
SUBDIR= nvmf_tcp \
nvmf_transport
.include <bsd.subdir.mk>

View file

@ -0,0 +1,7 @@
.PATH: ${SRCTOP}/sys/dev/nvmf
KMOD= nvmf_tcp
SRCS= nvmf_tcp.c
.include <bsd.kmod.mk>