This includes static inline functions to serve as getters/setters for
fields shared between SCSI and NVMe I/O requests to manage data
buffers.
Reviewed by: ken, imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44717
Currently, this pattern is commonly used to assert that a union ctl_io
is a SCSI request. In the future it will be used to assert other
types.
Suggested by: imp
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44844
Change the first argument of ctl_scsi_path_string to be the embedded
header structure instead of the union. Currently union ctl_io and
struct ctl_scsiio have the same alignment, but this changes on i386 if
a new union member is added that contains a uint64_t member (such as
an embedded struct nvme_command for NVMeoF). In that case, union
ctl_io requires stronger alignment, so the upcast from struct
ctl_scsiio to union ctl_io in ctl_scsi_sense_sbuf raises an increasing
alignment warning on i386.
Avoid the warning by passing struct ctl_io_hdr as the first argument
to ctl_scsi_path_string instead.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44716
- discover: Connects to a remote Discovery controller, fetches its
Discovery Log Page, and enumerates the remote controllers described
in the log page.
The -v option can be used to display the Identify Controller data
structure for the Discovery controller. This is only really useful
for debugging.
- connect: Connects to a remote I/O controller and establishes an
association of an admin queue and a single I/O queue. The
association is handed off to the in-kernel host to create a new
nvmeX device.
- connect-all: Connects to a Discovery controller and attempts to
create an association with each I/O controller enumerated in the
Discovery controller's Discovery Log Page.
- reconnect: Establishes a new association with a remote I/O
controller for an existing nvmeX device. This can be used to
restore access to a remote I/O controller after the loss of a prior
association due to a transport error, controller reboot, etc.
- disconnect: Deletes one or more nvmeX devices after detaching its
namespaces and terminating any active associations. The devices to
delete can be identified by either a nvmeX device name or the NQN of
the remote controller.
- disconnect-all: Deletes all active associations with remote
controllers.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44715
This is the client (initiator in SCSI terms) for NVMe over Fabrics.
Userland is responsible for creating a set of queue pairs and then
handing them off via an ioctl to this driver, e.g. via the 'connect'
command from nvmecontrol(8). An nvmeX new-bus device is created
at the top-level to represent the remote controller similar to PCI
nvmeX devices for PCI-express controllers.
As with nvme(4), namespace devices named /dev/nvmeXnsY are created and
pass through commands can be submitted to either the namespace devices
or the controller device. For example, 'nvmecontrol identify nvmeX'
works for a remote Fabrics controller the same as for a PCI-express
controller.
nvmf exports remote namespaces via nda(4) devices using the new NVMF
CAM transport. nvmf does not support nvd(4), only nda(4).
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44714
Structurally this is very similar to the TCP transport for iSCSI
(icl_soft.c). One key difference is that NVMeoF transports use a more
abstract interface working with NVMe commands rather than transport
PDUs. Thus, the data transfer for a given command is managed entirely
in the transport backend.
Similar to icl_soft.c, separate kthreads are used to handle transmit
and receive for each queue pair. On the transmit side, when a capsule
is transmitted by an upper layer, it is placed on a queue for
processing by the transmit thread. The transmit thread converts
command response capsules into suitable TCP PDUs where each PDU is
described by an mbuf chain that is then queued to the backing socket's
send buffer. Command capsules can embed data along with the NVMe
command.
On the receive side, a socket upcall notifies the receive kthread when
more data arrives. Once enough data has arrived for a PDU, the PDU is
handled synchronously in the kthread. PDUs such as R2T or data
related PDUs are handled internally, with callbacks invoked if a data
transfer encounters an error, or once the data transfer has completed.
Received capsule PDUs invoke the upper layer's capsule_received
callback.
struct nvmf_tcp_command_buffer manages a TCP command buffer for data
transfers that do not use in-capsule-data as described in the NVMeoF
spec. Data related PDUs such as R2T, C2H, and H2C are associated with
a command buffer except in the case of the send_controller_data
transport method which simply constructs one or more C2H PDUs from the
caller's mbuf chain.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44712
nvmf_transport.ko provides routines for managing NVMeoF queue pairs
and capsules. It provides a glue layer between transports (such as
TCP or RDMA) and an NVMeoF host (initiator) and controller (target).
Unlike the synchronous API exposed to the host and controller by
libnvmf, the kernel's transport layer uses an asynchronous API built
on callbacks. Upper layers provide callbacks on queue pairs that are
invoked for transport errors (error_cb) or anytime a capsule is
received (receive_cb).
Data transfers for a command are usually associated with a callback
that is invoked once a transfer has finished either due to an error
or successful completion.
For an upper layer that is a host, command capsules are allocated and
populated with an NVMe SQE by calling nvmf_allocate_command. A data
buffer (described by a struct memdesc) can be associated with a
command capsule before it is transmitted via nvmf_capsule_append_data.
This function accepts a direction (send vs receive) as well as the
data transfer callback. The host then transmits the command via
nvmf_transmit_capsule. The host must ensure that the data buffer
described by the 'struct memdesc' remains valid until the data
transfer callback is called. The queue pair's receive_cb callback
should match received response capsules up with previously transmitted
commands.
For the controller, incoming commands are received via the queue
pair's receive_cb callback. nvmf_receive_controller_data is used to
retrieve any data from a command (e.g. the data for a WRITE command).
It can be called multiple times to split the data transfer into
smaller sizes. This function accepts an I/O completion callback that
is invoked once the data transfer has completed.
nvmf_send_controller_data is used to send data to a remote host in
response to a command. In this case a callback function is not used
but the status is returned synchronously. Finally, the controller can
allocate a response capsule via nvmf_allocate_response populated with
a supplied CQE and send the response via nvmf_transmit_capsule.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44711
libnvmf provides APIs for transmitting and receiving Command and
Response capsules along with data associated with NVMe commands.
Capsules are represented by 'struct nvmf_capsule' objects.
Capsules are transmitted and received on queue pairs represented by
'struct nvmf_qpair' objects.
Queue pairs belong to an association represented by a 'struct
nvmf_association' object.
libnvmf provides additional helper APIs to assist with constructing
command capsules for a host, response capsules for a controller,
connecting queue pairs to a remote controller and optionally
offloading connected queues to an in-kernel host, accepting queue pair
connections from remote hosts and optionally offloading connected
queues to an in-kernel controller, constructing controller data
structures for local controllers, etc.
libnvmf also includes an internal transport abstraction as well as an
implementation of a userspace TCP transport.
libnvmf is primarily intended for ease of use and low-traffic use cases
such as establishing connections that are handed off to the kernel.
As such, it uses a simple API built on blocking I/O.
For a host, a consumer first populates an 'struct
nvmf_association_params' with a set of parameters shared by all queue
pairs for a single association such as whether or not to use SQ flow
control and header and data digests and creates a 'struct
nvmf_association' object. The consumer is responsible for
establishing a TCP socket for each queue pair. This socket is
included in the 'struct nvmf_qpair_params' passed to 'nvmf_connect' to
complete transport-specific negotiation, send a Fabrics Connect
command, and wait for the Connect reply. Upon success, a new 'struct
nvmf_qpair' object is returned. This queue pair can then be used to
send and receive capsules. A command capsule is allocated, populated
with an SQE and optional data buffer, and transmitted via
nvmf_host_transmit_command. The consumer can then wait for a reply
via nvmf_host_wait_for_response. The library also provides some
wrapper functions such as nvmf_read_property and nvmf_write_property
which send a command and wait for a response synchronously.
For a controller, a consumer uses a single association for a set of
incoming connections. A consumer can choose to use multiple
associations (e.g. a separate association for connections to a
discovery controller listening on a different port than I/O
controllers). The consumer is responsible for accepting TCP sockets
directly, but once a socket has been accepted it is passed to
nvmf_accept to perform transport-specific negotiation and wait for the
Connect command. Similar to nvmf_connect, nvmf_accept returns a newly
construct nvmf_qpair. However, in contrast to nvmf_connect,
nvmf_accept does not complete the Fabrics negotiation. The consumer
must explicitly send a response capsule before waiting for additional
command capsules to arrive. In particular, in the kernel offload
case, the Connect command and data are provided to the kernel
controller and the Connect response capsule is sent by the kernel once
it is ready to handle the new queue pair.
For userspace controller command handling, the consumer uses
nvmf_controller_receive_capsule to wait for a command capsule.
nvmf_receive_controller_data is used to retrieve any data from a
command (e.g. the data for a WRITE command). It can be called
multiple times to split the data transfer into smaller sizes.
nvmf_send_controller_data is used to send data to a remote host in
response to a command. It also sends a response capsule indicating
success, or an error if an internal error occurs. nvmf_send_response
is used to send a response without associated data. There are also
several convenience wrappers such as nvmf_send_success and
nvmf_send_generic_error.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44710
This includes functions to validate NVMe Qualified Names, compute an
initial value of the CAP property, validate changes to the CC
property, and populate the Identify Controller data structure for an
I/O controller.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44709
- Helper macros for specific SGL types used with the TCP transport
- An inline function which validates various fields in TCP PDUs
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44708
This defines structures, ioctl commands, and related constants used
for both the Fabrics host and controller.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44706
- Add opcode, command structure, and new error code for Disconnect
fabrics opcode.
- Add a generic struct nvmf_fabric_command.
- Add constants for special controller ID values.
- Add constants for the cattr field in the Connect command and the
default value for the kato field in the Connect command.
- Add constants for the offset of controller properties (Fabrics
version of controller registers).
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44705
- Replace SPDK_STATIC_ASSERT with _Static_assert.
- Remove SPDK_ and spdk_ prefixes from types and constants.
- Switch to using FreeBSD headers, e.g. <dev/nvme/nvme.h> in place of
"spdk/nvme_spec.h".
- Add a definition of NVME_NQN_FIELD_SIZE (from SPDK's nvme_spec.h).
- Remove constant for the fabrics opcode as this is already present in
<dev/nvme/nvme.h>.
- Use types from <dev/nvme/nvme.h> for NVMe structures including
struct nvme_sgl_descriptor, struct nvme_command, and
struct nvme_completion.
- Use plain uint16_t in place of struct spdk_nvme_status.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44704
This is a copy of spdk/include/spdk/nvmf_spec.h as of commit
470e851852bb948334a272c9f8de495020fa082f from Intel's SPDK.
Subsequent commits will modify it to be suitable header for the
kernel, but importing the stock file first makes it easier to see
how the resulting header is derived from the original.
Reviewed by: imp
Obtained from: SPDK (https://github.com/spdk/spdk.git)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44703
Document how latency buckets are actually computed: They are a doubling
from 20us to 10.485s by default, but based at
kern.cam.iosched.bucket_base_us and increase with a ratio of
kern.cam.iosched.bucket_ration / 100 from one to the next.
Sponsored by: Netflix
nvmecontrol operates on devices. Allow a user to specify the /dev/ if
they want. Any device that starts with / will be treated as if it was a
full path for maximum flexbility.
Sponsored by: Netflix
Clear the list before returning so that sysctl_ctx_free() can be called
more than once on the same list without side effects. This simplifies
error handling in drivers; previously, drivers would have to be careful
to call sysctl_ctx_free() at most once to avoid a use-after-free.
While here, use TAILQ_FOREACH_SAFE in the loop which unregisters OIDs.
Reviewed by: thj, emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D45041
Link with --no-undefined-version by default. Will detect and prevent
the accidental removal of symbols from versioned libraries.
Reviewed by: arichardson, kib, dim, emaste
Differential Revision: https://reviews.freebsd.org/D44216
The only element of of in6_addr that is specified in RFC 3493 or
in POSIX.1-2017 is s6_addr, implemented via a #define to a union
member. However, FreeBSD and other BSD systems have additional
definitions for the other union members, s6_addr{8,16,32} which
are defined for the kernel and loader. Some Linux applications
also use them, and they seem to be allowed by the RFC and POSIX.
Remove the current ifdefs, exposing the additional fields to user
level, and replace with #if __BSD_VISIBLE. Add an explanatory
comment expanding on the previous "nonstandard" comment.
MFC after: 1 week
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D44979
In the error path during allocating an in_pcb, the credentials
associated with the new struct get their reference count
increased early on, but not decremented when the allocation
fails.
Reported by: cmiller_netapp.com
MFC after: 3 days
Reviewed by: jhb, tuexen
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D45033
Mergemaster has been deprecated for some time, and will be retired.
Reviewed by: kevans
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41799
Convert existing FreeBSD vmware_hvcall function to take a channel
and parameter arguments.
Added vmware_guestrpc_cmd() to send GuestRPC commands to the VMware
hypervisor. The sbuf argument is used for both the command to send
and to store the data to return to the caller.
The following KPIs can be used to get and set FreeBSD-specific guest
information in key/value pairs:
* vmware_guestrpc_set_guestinfo
- set a value into the guestinfo.fbsd.<keyword> key
* vmware_guestrpc_get_guestinfo
- get the value stored in the guestinfo.fbsd.<keyword> key
Add VMware devices to x86 NOTES
Reviewed by: jhb
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D44528
When the VM image building code was updated to support building
non-UFS images, the vm-images-stage target was not updated to
install those newly built images to the FTP site. As a result, we
have been sending weekly snapshot announcements since August claiming
that ZFS VM images are available when they are not in fact present
anywhere publicly accessible.
Fixes: 32ae9a6b39 "release: Build UFS and ZFS VM images"
Reported by: Michael Dexter
MFC after: 5 days
Currently, lock of uart in bhyve is placed in frontend. There are some
problems about it:
1. If every frontend should has a lock, why not move it inside backend
as they all have same uart_softc.
2. If backend needs to modify the information of uart after initialize,
it will be impossible as backend cannot use lock. For example, if we
want implement a telnet support for uart in backend, It should wait
for connection when initialize. After some remote process connect it,
it needs to modify rfd and wfd in backend.
So I decide to move it to backend.
Reviewed by: corvink, jhb, markj
Differential Revision: https://reviews.freebsd.org/D44947
For now, we enumerate disk devices before network devices. This is to
work around a problem wherein u-boot remaps BARs during boot in a way
that bhyve does not handle. Some discussion and experiments suggest
that this can be handled by having bhyve not map BARs during boot on
arm64; until a solution is implemented, however, this workaround is
sufficient for simple usage and doesn't have any real downsides.
The console and bootrom are specified slightly differently versus amd64,
and a few of vmrun.sh's command-line options are amd64-only.
Reviewed by: corvink, jhb
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D44933
For now this implementation doesn't provide any machine dependent
functionality on arm64, but it's enough to be able to reset and destroy
VMs.
Reviewed by: jhb
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D44932
Move MD code into a separate directory and add a simple interface which
lets the MD bits register options and handle them.
No functional change intended.
Reviewed by: jhb
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D44932
It turns out that the only conversion issue was in fattime2timespec, where
multiplying the number of seconds in a day by the number of days overflowed
32-bit unsigned int for dates beyond 2106-02-07 06:28:15.
Casting one of the multiplicands as time_t forces a 64-bit multiplication on
systems where time_t is 64-bits and produces no binary changes on the one
remaining system with 32-bit time_t (namely i386).
Since the code is now tested & fixed, this change removes the fixme comments.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D44755
On systems that have a 64-bit time_t, the test code now exercises the whole
range of fattime. To do this, this commit...
1. replaces the call to random() with two calls to arc4random() to
generate a 33-bit number of seconds in order to cover the entire range of
fattime [1970,2107]. (32-bits stops just short - in January 2106.)
On systems with 32-bit time_t, the extra bits are discarded and only the
time_t expressible range is tested.
2. casts time_t values passed to printf as longs and changes the format
string to match.
Now, the test code builds, runs, and exercises what it can (i.e., the whole
fattime range or the 32-bit time_t subset of it) on both 32-bit and 64-bit
time_t systems.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D44754
This change...
1. replaces calls to timet2fattime/fattime2timet with calls to
timespec2fattime/fattime2timespec. The functions got renamed shortly
after they landed in the kernel but the test code wasn't updated (see
7ea93e912b).
2. adds a utc_offset stub.
With this, the test code builds and runs as a 32-bit binary (cc -Wall -O2
-m32 subr_fattime.c).
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D44753
This affects TOE operation when multiple rx c-channels are in use for
offload, which is an unusual configuration.
MFC after: 1 week
Sponsored by: Chelsio Communications
It is the equivalent of tx_chan but for receive so rx_chan is a better
name. Initialize both using helper functions and make sure both are
displayed in the sysctl MIB.
MFC after: 1 week
Sponsored by: Chelsio Communications
Invert KeepEmptyLinesAtTheStartOfBlocks. We used to require an empty
line at the beginning of functions with no local variables, which I
believe is the reason for this setting. Now it is discouraged in new
code.
Tell clang-format to align consecutive macros, since we tend to do that.
clang-format's output isn't quite what we want here. Typically we have
a tab after a #define for some reason, and clang-format doesn't appear
to have an option for that. clang-format will also use a mix of tabs
and spaces to minimize indentation, which is also against our
convention. However, the result looks better with this setting than
without.
Reviewed by: emaste
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29870