For a multiple block write operation, each block begins with a multi
write start token. Unlike the SD mode that the multiple block write
ends when receiving a STOP_TRAN command (CMD12), a special stop tran
token is used to signal the card.
Emulating this by manually sending a CMD12 to the SD card core, to
bring it out of the receiving data state.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20210128063035.15674-7-bmeng.cn@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Add 2 more states for the block write operation. The SPI host needs
to send a data start token to start the transfer, and the data block
written to the card will be acknowledged by a data response token.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
[PMD: Change VMState version id 6 -> 7]
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20210128063035.15674-6-bmeng.cn@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
At present there is a data_ready() callback for the SD data read
path. Let's add a receive_ready() for the SD data write path.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20210128063035.15674-5-bmeng.cn@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
At present the single/multiple block write in SPI mode is blocked
by sd_normal_command(). Remove the limitation.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210128063035.15674-4-bmeng.cn@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
The single block read (CMD17) codes are the same as the multiple
block read (CMD18). Merge them into one. The same applies to single
block write (CMD24) and multiple block write (CMD25).
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20210128063035.15674-3-bmeng.cn@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
In the case of a multiple block read operation every transferred
block has its suffix of CRC16. Update the state machine logic to
handle multiple block read.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
[PMD: Change VMState version id 5 -> 6]
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20210128063035.15674-2-bmeng.cn@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
When viewing/debugging memory regions it is sometimes hard to figure
out which PCI device something belongs to. Make the names unique by
including the vdev name in the name string.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20210213130325.14781-2-alex.bennee@linaro.org>
Without that wireshark complains about invalid control setup data
for non-control transfers.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20210216144939.841873-1-kraxel@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
In order to keep track of the alternate setting that should be used for
a given interface, the USBDevice struct keeps an array of alternate
setting values, which is indexed by the interface number. In
usb_host_set_interface, when this array is updated, usb_host_ep_update
is called as a result. However, when usb_host_ep_update accesses the
active libusb_config_descriptor, it indexes udev->altsetting with the
loop variable, rather than the interface number.
With the simple trace backend enable, this behavior can be seen:
[...]
usb_xhci_xfer_start 0.440 pid=1215 xfer=0x5596a4b85930 slotid=0x1 epid=0x1 streamid=0x0
usb_packet_state_change 1.703 pid=1215 bus=0x1 port=b'1' ep=0x0 p=0x5596a4b85938 o=b'undef' n=b'setup'
usb_host_req_control 2.269 pid=1215 bus=0x1 addr=0x5 p=0x5596a4b85938 req=0x10b value=0x1 index=0xd
usb_host_set_interface 0.449 pid=1215 bus=0x1 addr=0x5 interface=0xd alt=0x1
usb_host_parse_config 2542.648 pid=1215 bus=0x1 addr=0x5 value=0x2 active=0x1
usb_host_parse_interface 1.804 pid=1215 bus=0x1 addr=0x5 num=0xc alt=0x0 active=0x1
usb_host_parse_endpoint 2.012 pid=1215 bus=0x1 addr=0x5 ep=0x2 dir=b'in' type=b'int' active=0x1
usb_host_parse_interface 1.598 pid=1215 bus=0x1 addr=0x5 num=0xd alt=0x0 active=0x1
usb_host_req_emulated 3.593 pid=1215 bus=0x1 addr=0x5 p=0x5596a4b85938 status=0x0
usb_packet_state_change 2.550 pid=1215 bus=0x1 port=b'1' ep=0x0 p=0x5596a4b85938 o=b'setup' n=b'complete'
usb_xhci_xfer_success 4.298 pid=1215 xfer=0x5596a4b85930 bytes=0x0
[...]
In particular, it is seen that although usb_host_set_interface sets the
alternate setting of interface 0xd to 0x1, usb_host_ep_update uses 0x0
as the alternate setting due to using the incorrect index to
udev->altsetting.
Fix this problem by getting the interface number from the active
libusb_config_descriptor, and then using that as the index to
udev->altsetting.
Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Message-Id: <20210201213021.500277-1-rosbrookn@ainfosec.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
When SEV-ES is enabled, it is not possible modify the guests register
state after it has been initially created, encrypted and measured.
Normally, an INIT-SIPI-SIPI request is used to boot the AP. However, the
hypervisor cannot emulate this because it cannot update the AP register
state. For the very first boot by an AP, the reset vector CS segment
value and the EIP value must be programmed before the register has been
encrypted and measured. Search the guest firmware for the guest for a
specific GUID that tells Qemu the value of the reset vector to use.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <22db2bfb4d6551aed661a9ae95b4fdbef613ca21.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
OVMF is developing a mechanism for depositing a GUIDed table just
below the known location of the reset vector. The table goes
backwards in memory so all entries are of the form
<data>|len|<GUID>
Where <data> is arbtrary size and type, <len> is a uint16_t and
describes the entire length of the entry from the beginning of the
data to the end of the guid.
The foot of the table is of this form and <len> for this case
describes the entire size of the table. The table foot GUID is
defined by OVMF as 96b582de-1fb2-45f7-baea-a366c55a082d and if the
table is present this GUID is just below the reset vector, 48 bytes
before the end of the firmware file.
Add a parser for the ovmf reset block which takes a copy of the block,
if the table foot guid is found, minus the footer and a function for
later traversal to return the data area of any specified GUIDs.
Signed-off-by: James Bottomley <jejb@linux.ibm.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210204193939.16617-2-jejb@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch implements the FIFO mode of the SMBus module. In FIFO, the
user transmits or receives at most 16 bytes at a time. The FIFO mode
allows the module to transmit large amount of data faster than single
byte mode.
Since we only added the device in a patch that is only a few commits
away in the same patch set. We do not increase the VMstate version
number in this special case.
Reviewed-by: Doug Evans<dje@google.com>
Reviewed-by: Tyrong Ting<kfting@nuvoton.com>
Signed-off-by: Hao Wu <wuhaotsh@google.com>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20210210220426.3577804-6-wuhaotsh@google.com
Acked-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This commit implements the single-byte mode of the SMBus.
Each Nuvoton SoC has 16 System Management Bus (SMBus). These buses
compliant with SMBus and I2C protocol.
This patch implements the single-byte mode of the SMBus. In this mode,
the user sends or receives a byte each time. The SMBus device transmits
it to the underlying i2c device and sends an interrupt back to the QEMU
guest.
Reviewed-by: Doug Evans<dje@google.com>
Reviewed-by: Tyrong Ting<kfting@nuvoton.com>
Signed-off-by: Hao Wu <wuhaotsh@google.com>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20210210220426.3577804-2-wuhaotsh@google.com
Acked-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Linux blkfront expects both "discard-granularity" and
"discard-alignment" present on xenbus in order to properly enable the
feature, not exposing "discard-alignment" left some Linux blkfront
versions with a broken discard setup. This has also been addressed in
Linux with:
https://lore.kernel.org/lkml/20210118151528.81668-1-roger.pau@citrix.com/T/#u
Fix QEMU to report a "discard-alignment" of 0, in order for it to work
with older Linux frontends.
Reported-by: Arthur Borsboom <arthurborsboom@gmail.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Message-Id: <20210118153330.82324-1-roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
cmd_fis is mapped as DMA_DIRECTION_FROM_DEVICE, however, it is read
from, and not written to anywhere. Fix the DMA_DIRECTION and mark
cmd_fis as read-only in the code.
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20210119164051.89268-1-alxndr@bu.edu>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Copy bootinfo.h and bootinfo-mac.h from arch/m68k/include/uapi/asm/
to include/standard-headers/asm-m68k/
Imported from linux v5.9 but didn't change since v4.14 (header update)
and since v4.10 (content update).
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20201220112615.933036-2-laurent@vivier.eu>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Use nr_apu_cpus in favor of hard coding 2.
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc@lmichel.fr>
Message-id: 20210210142048.3125878-2-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
nvme_ns_realize passes errp to nvme_register_namespaces, but then try to
prepend errp with local_err.
Just remove the local_err and use errp directly.
Fixes: 15d024d4aa ("hw/block/nvme: split setup and register for namespace")
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Current QEMU HEAD nvme.c does not compile with the default GCC 5.4
on a Ubuntu 16.04 host:
hw/block/nvme.c:3242:9: error: ‘result’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
trace_pci_nvme_getfeat_vwcache(result ? "enabled" : "disabled");
^
hw/block/nvme.c:3150:14: note: ‘result’ was declared here
uint32_t result;
^
Explicitly initialize the result to fix it.
Fixes: aa5e55e3b0 ("hw/block/nvme: open code for volatile write cache")
Fixes: Coverity CID 1446371
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Moving namespace registration to the nvme-ns realization function had
the unintended side-effect of breaking legacy namespace registration.
Fix this.
Fixes: 15d024d4aa ("hw/block/nvme: split setup and register for namespace")
Reported-by: Alexander Graf <agraf@csgraf.de>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Update infocenter.arm.com URLs for various pieces of Arm
documentation to the new developer.arm.com equivalents. (There is a
redirection in place from the old URLs, but we might as well update
our comments in case the redirect ever disappears in future.)
This patch covers all the URLs which are not MPS2/SSE-200/IoTKit
related (those are dealt with in a different patch).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210205171456.19939-1-peter.maydell@linaro.org
NPCM7XX GPIO devices have been implemented in hw/gpio/npcm7xx-gpio.c. So
we removed them from the unimplemented devices list.
Reviewed-by: Doug Evans<dje@google.com>
Reviewed-by: Tyrong Ting<kfting@nuvoton.com>
Signed-off-by: Hao Wu<wuhaotsh@google.com>
Message-id: 20210129005845.416272-2-wuhaotsh@google.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Perform device reset in the remote process when QEMU performs
device reset. This is required to reset the internal state
(like registers, etc...) of emulated devices
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 7cb220a51f565dc0817bd76e2f540e89c2d2b850.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Retrieve PCI configuration info about the remote device and
configure the Proxy PCI object based on the returned information
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 85ee367bbb993aa23699b44cfedd83b4ea6d5221.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
IOHUB object is added to manage PCI IRQs. It uses KVM_IRQFD
ioctl to create irqfd to injecting PCI interrupts to the guest.
IOHUB object forwards the irqfd to the remote process. Remote process
uses this fd to directly send interrupts to the guest, bypassing QEMU.
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 51d5c3d54e28a68b002e3875c59599c9f5a424a1.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add ProxyMemoryListener object which is used to keep the view of the RAM
in sync between QEMU and remote process.
A MemoryListener is registered for system-memory AddressSpace. The
listener sends SYNC_SYSMEM message to the remote process when memory
listener commits the changes to memory, the remote process receives
the message and processes it in the handler for SYNC_SYSMEM message.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 04fe4e6a9ca90d4f11ab6f59be7652f5b086a071.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Proxy device object implements handler for PCI BAR writes and reads.
The handler uses BAR_WRITE/BAR_READ message to communicate to the
remote process with the BAR address and value to be written/read.
The remote process implements handler for BAR_WRITE/BAR_READ
message.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: a8b76714a9688be5552c4c92d089bc9e8a4707ff.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The Proxy Object sends the PCI config space accesses as messages
to the remote process over the communication channel
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: d3c94f4618813234655356c60e6f0d0362ff42d6.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: d54edb4176361eed86b903e8f27058363b6c83b3.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Defines a PCI Device proxy object as a child of TYPE_PCI_DEVICE.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: b5186ebfedf8e557044d09a768846c59230ad3a7.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
SyncSysMemMsg message format is defined. It is used to send
file descriptors of the RAM regions to remote device.
RAM on the remote device is configured with a set of file descriptors.
Old RAM regions are deleted and new regions, each with an fd, is
added to the RAM.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 7d2d1831d812e85f681e7a8ab99e032cf4704689.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Associate the file descriptor for a PCIDevice in remote process with
DeviceState object.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: f405a2ed5d7518b87bea7c59cfdf334d67e5ee51.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Initializes the message handler function in the remote process. It is
called whenever there's an event pending on QIOChannel that registers
this function.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 99d38d8b93753a6409ac2340e858858cda59ab1b.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Defines MPQemuMsg, which is the message that is sent to the remote
process. This message is sent over QIOChannel and is used to
command the remote process to perform various tasks.
Define transmission functions used by proxy and by remote.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 56ca8bcf95195b2b195b08f6b9565b6d7410bce5.1611938319.git.jag.raman@oracle.com
[Replace struct iovec send[2] = {0} with {} to make clang happy as
suggested by Peter Maydell <peter.maydell@linaro.org>.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
x-remote-machine object sets up various subsystems of the remote
device process. Instantiate PCI host bridge object and initialize RAM, IO &
PCI memory regions.
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: c537f38d17f90453ca610c6b70cf3480274e0ba1.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
PCI host bridge is setup for the remote device process. It is
implemented using remote-pcihost object. It is an extension of the PCI
host bridge setup by QEMU.
Remote-pcihost configures a PCI bus which could be used by the remote
PCI device to latch on to.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 0871ba857abb2eafacde07e7fe66a3f12415bfb2.1611938319.git.jag.raman@oracle.com
[Added PCI_EXPRESS condition in hw/remote/Kconfig since remote-pcihost
needs PCIe. This solves "make check" failure on s390x. Fix suggested by
Philippe Mathieu-Daudé <philmd@redhat.com> and Thomas Huth
<thuth@redhat.com>.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Per MPC8548ERM [1] chapter 14.5.3.4.1:
When RCTRL.RSF is 1, frames less than 64 bytes are accepted upon
a DA match. But currently QEMU does the opposite. This commit
reverses the RCTRL.RSF testing logic to match the manual.
Due to the reverse of the logic, certain guests may potentially
break if they don't program eTSEC to have RCTRL.RSF bit set.
When RCTRL.RSF is 0, short frames are silently dropped, however
as of today both slirp and tap networking do not pad short frames
(e.g.: an ARP packet) to the minimum frame size of 60 bytes. So
ARP requests will be dropped, preventing the guest from becoming
visible on the network.
The same issue was reported on e1000 and vmxenet3 before, see:
commit 78aeb23ede ("e1000: Pad short frames to minimum size (60 bytes)")
commit 40a87c6c9b ("vmxnet3: Pad short frames to minimum size (60 bytes)")
[1] https://www.nxp.com/docs/en/reference-manual/MPC8548ERM.pdf
Fixes: eb1e7c3e51 ("Add Enhanced Three-Speed Ethernet Controller (eTSEC)")
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <1612923021-19746-1-git-send-email-bmeng.cn@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
At present the <clock-frequency> property of the serial node is
populated with value zero. U-Boot's ns16550 driver is not happy
about this, so let's fill in a meaningful value.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1612362288-22216-2-git-send-email-bmeng.cn@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
At present the platform clock frequency is using a magic number.
Convert it to a macro and use it everywhere.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1612362288-22216-1-git-send-email-bmeng.cn@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The current logic for calculating 'maxdomain' making it a sum of
numa_state->num_nodes with spapr->gpu_numa_id. spapr->gpu_numa_id is
used as a index to determine the next available NUMA id that a
given NVGPU can use.
The problem is that the initial value of gpu_numa_id, for any topology
that has more than one NUMA node, is equal to numa_state->num_nodes.
This means that our maxdomain will always be, at least, twice the
amount of existing NUMA nodes. This means that a guest with 4 NUMA
nodes will end up with the following max-associativity-domains:
rtas/ibm,max-associativity-domains
00000004 00000008 00000008 00000008 00000008
This overtuning of maxdomains doesn't go unnoticed in the guest, being
detected in SLUB during boot:
dmesg | grep SLUB
[ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=8
SLUB is detecting 8 total nodes, with 4 nodes being online.
This patch fixes ibm,max-associativity-domains by considering the amount
of NVGPUs NUMA nodes presented in the guest, instead of just
spapr->gpu_numa_id.
Reported-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210128174213.1349181-4-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We'll need to check the initial value given to spapr->gpu_numa_id when
building the rtas DT, so put it in a helper for easier access and to
avoid repetition.
Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210128174213.1349181-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This function is used only in spapr_numa.c.
Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210128174213.1349181-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This to map the PNOR from the machine init handler directly and finish
the cleanup of the LPC model.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-8-clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On PowerNV systems, the BMC is in charge of mapping the PNOR contents
on the LPC FW address space using the HIOMAP protocol. Under QEMU, we
emulate this behavior and we also add an extra control on the flash
accesses by letting the HIOMAP command handler decide whether the
memory region is accessible or not depending on the firmware requests.
However, this behavior is not compatible with hostboot like firmwares
which need this mapping to be always available. For this reason, the
PNOR memory region is initially disabled for skiboot mode only.
This is badly placed under the LPC model and requires the use of the
machine. Since it doesn't add much, simply remove the initial setting.
The extra control in the HIOMAP command handler will still be performed.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-7-clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PowerNV machine can be run with an external IPMI BMC device
connected to a remote QEMU machine acting as BMC, using these options :
-chardev socket,id=ipmi0,host=localhost,port=9002,reconnect=10 \
-device ipmi-bmc-extern,id=bmc0,chardev=ipmi0 \
-device isa-ipmi-bt,bmc=bmc0,irq=10 \
-nodefaults
In that case, some aspects of the BMC initialization should be
skipped, since they rely on the simulator interface.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-6-clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
and reuse pnv_bmc_set_pnor() to share the setting of the PNOR.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The current settings are useful to load large kernels (with debug) but
it moves the initrd image in a memory region not protected by
skiboot. If skiboot is compiled with DEBUG=1, memory poisoning will
corrupt the initrd.
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-4-clg@kaod.org>
Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ENDs allocated by OPAL for the HW thread VPs are tagged as owned by FW.
Dump the state in 'info pic'.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On POWER9 systems, PHB controllers signal the XIVE interrupt controller
of a source interrupt notification using a store on a MMIO region. Add
traces for such events.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It is currently not possible to perform a strict boot from USB storage:
$ qemu-system-ppc64 -accel kvm -nodefaults -nographic -serial stdio \
-boot strict=on \
-device qemu-xhci \
-device usb-storage,drive=disk,bootindex=0 \
-blockdev driver=file,node-name=disk,filename=fedora-ppc64le.qcow2
SLOF **********************************************************************
QEMU Starting
Build Date = Jul 17 2020 11:15:24
FW Version = git-e18ddad8516ff2cf
Press "s" to enter Open Firmware.
Populating /vdevice methods
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /pci@800000020000000
00 0000 (D) : 1b36 000d serial bus [ usb-xhci ]
No NVRAM common partition, re-initializing...
Scanning USB
XHCI: Initializing
USB Storage
SCSI: Looking for devices
101000000000000 DISK : "QEMU QEMU HARDDISK 2.5+"
Using default console: /vdevice/vty@71000000
Welcome to Open Firmware
Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
This program and the accompanying materials are made available
under the terms of the BSD License available at
http://www.opensource.org/licenses/bsd-license.php
Trying to load: from: /pci@800000020000000/usb@0/storage@1/disk@101000000000000 ...
E3405: No such device
E3407: Load failed
Type 'boot' and press return to continue booting the system.
Type 'reset-all' and press return to reboot the system.
Ready!
0 >
The device tree handed over by QEMU to SLOF indeed contains:
qemu,boot-list =
"/pci@800000020000000/usb@0/storage@1/disk@101000000000000 HALT";
but the device node is named usb-xhci@0, not usb@0.
This happens because the firmware names of PCI devices returned
by get_boot_devices_list() come from pcibus_get_fw_dev_path(),
while the sPAPR PHB code uses a different naming scheme for
device nodes. This inconsistency has always been there but it was
hidden for a long time because SLOF used to rename USB device
nodes, until this commit, merged in QEMU 4.2.0 :
commit 85164ad4ed
Author: Alexey Kardashevskiy <aik@ozlabs.ru>
Date: Wed Sep 11 16:24:32 2019 +1000
pseries: Update SLOF firmware image
This fixes USB host bus adapter name in the device tree to match QEMU's
one.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Fortunately, sPAPR implements the firmware path provider interface.
This provides a way to override the default firmware paths.
Just factor out the sPAPR PHB naming logic from spapr_dt_pci_device()
to a helper, and use it in the sPAPR firmware path provider hook.
Fixes: 85164ad4ed ("pseries: Update SLOF firmware image")
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210122170157.246374-1-groug@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In the CPU hotunplug bug [1] the guest kernel throws a scary
message in dmesg:
pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16
The reason isn't related to the bug though. This happens because the
kernel file arch/powerpc/platform/pseries/hotplug-cpu.c, function
dlpar_cpu_remove(), is not finding the device_node.name of the offending
CPU.
We're not populating the 'name' property for hotplugged CPUs. Since the
kernel relies on device_node.name for identifying CPU nodes, and the
CPUs that are coldplugged has the 'name' property filled by SLOF, this
is creating an unneeded inconsistency between hotplug and coldplug CPUs
in the kernel.
Let's fill the 'name' property for hotplugged CPUs as well. This will
make the guest dmesg throws a less intimidating message when we try to
unplug the last online CPU:
pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9@1, rc: -16
[1] https://bugzilla.redhat.com/1911414
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210120232305.241521-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Next patch will use the 'nodename' string in spapr_core_dt_populate()
after the point it's being freed today.
Instead of moving 'g_free(nodename)' around, let's do a QoL change in
both CPU DT functions where 'nodename' is being freed, and use
g_autofree to avoid the 'g_free()' call altogether.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210120232305.241521-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add configuration options to enable or disable multiprocess QEMU code
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 6cc37253e35418ebd7b675a31a3df6e3c7a12dc1.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Allow RAM MemoryRegion to be created from an offset in a file, instead
of allocating at offset of 0 by default. This is needed to synchronize
RAM between QEMU & remote process.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 609996697ad8617e3b01df38accc5c208c24d74e.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Refactor the zone write check logic such that the most "meaningful"
error is returned first. That is, first, if the zone is not writable,
return an appropriate status code for that. Then, make sure we are
actually writing at the write pointer and finally check that we do not
cross the zone write boundary. This aligns with the "priority" of status
codes for zone read checks.
Also add a couple of additional descriptive trace events and remove an
always true assert.
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
When a zone append is processed the controller checks that validity of
the write before assigning the LBA to the append command. This causes
the boundary check to be wrong.
Fix this by checking the write *after* assigning the LBA. Remove the
append special case from the nvme_check_zone_write and open code it in
nvme_do_write, assigning the slba when basic sanity checks have been
performed. Then check the validity of the resulting write like any other
write command.
In the process, also fix a missing endianness conversion for the zone
append ALBA.
Reported-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The actual parameter name is 'cross_read' rather than 'cross_zone_read'.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Change status checks to align with the existing style and remove the
explicit check against NVME_SUCCESS.
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Currently, no features are saveable, so the current check is not wrong,
but add a check against the feature capabilities to make sure this will
not regress if saveable features are added later.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Only enable DULBE if the namespace supports it.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
If a user assigns a backing device with less capacity than the size of a
single zone, the namespace capacity will be reported as zero and the
kernel will silently fail to allocate the namespace.
This patch errors out in case that the backing device cannot accomodate
at least a single zone.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
[k.jensen: small fixup in the error and commit message]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The controller now implements v1.4 and we can lift the restrictions on
CMB Data Pointer and Command Independent Locations Support (CDPCILS) and
CMB Data Pointer Mixed Locations Support (CDPMLS) since the device
really does not care about mixed host/cmb pointers in those cases.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
With the new CMB logic in place, bump the implemented specification
version to v1.4 by default.
This requires adding the setting the CNTRLTYPE field and modifying the
VWC field since 0x00 is no longer a valid value for bits 2:1.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Implement v1.4 logic for configuring the Controller Memory Buffer. By
default, the v1.4 scheme will be used (CMB must be explicitly enabled by
the host), so drivers that only support v1.3 will not be able to use the
CMB anymore.
To retain the v1.3 behavior, set the boolean 'legacy-cmb' nvme device
parameter.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Padmakar Kalghatgi <p.kalghatgi@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add support for the PMRMSCL and PMRMSCU MMIO registers. This allows
adding RDS/WDS support for PMR as well.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Naveen Nagar <naveen.n1@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The PMR should not be enabled at boot up. Disable the PMR MemoryRegion
initially and implement MMIO for PMRCTL, allowing the host to enable the
PMR explicitly.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The controller registers are initially zero. Remove the redundant
zeroing.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Use the correct field names.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
With BAR 4 now free to use, allow PMR and CMB to be enabled
simultaneously.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
In the interest of supporting both CMB and PMR to be enabled on the same
device, move the MSI-X table and pending bit array out of BAR 4 and into
BAR 0.
This is a simplified version of the patch contributed by Andrzej
Jakowski (see [1]). Leaving the CMB at offset 0 removes the need for
changes to CMB address mapping code.
[1]: https://lore.kernel.org/qemu-devel/20200729220107.37758-3-andrzej.jakowski@linux.intel.com/
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
This patch sets CMBS bit in controller capabilities register when user
configures NVMe driver with CMB support, so capabilites are correctly
reported to guest OS.
Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
Reviewed-by: Maxim Levitsky <mlevitsky@gmail.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
64 bit registers like ASQ and ACQ should be writable by both a hi/lo 32
bit write combination as well as a plain 64 bit write. The spec does not
define ordering on the hi/lo split, but the code currently assumes that
the low order bits are written first. Additionally, the code does not
consider that another address might already have been written into the
register, causing the OR'ing to result in a bad address.
Fix this by explicitly overwriting only the low or high order bits for
32 bit writes.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add the size of the mmio read/write to the trace event.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
During smart critical warning injection by setting property from QMP
command, also try to trigger asynchronous event.
Suggested by Keith, if a event has already been raised, there is no
need to enqueue the duplicate event any more.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
[k.jensen: fix typo in commit message]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
There is a very low probability that hitting physical NVMe disk
hardware critical warning case, it's hard to write & test a monitor
agent service.
For debugging purposes, add a new 'smart_critical_warning' property
to emulate this situation.
The orignal version of this change is implemented by adding a fixed
property which could be initialized by QEMU command line. Suggested
by Philippe & Klaus, rework like current version.
Test with this patch:
1, change smart_critical_warning property for a running VM:
#virsh qemu-monitor-command nvme-upstream '{ "execute": "qom-set",
"arguments": { "path": "/machine/peripheral-anon/device[0]",
"property": "smart_critical_warning", "value":16 } }'
2, run smartctl in guest
#smartctl -H -l error /dev/nvme0n1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- volatile memory backup device has failed
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The zone write pointer is unconditionally advanced, even for write
faults. Make sure that the zone is always transitioned to Full if the
write pointer reaches zone capacity.
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme_ns_setup() finally does not have nothing to do with NvmeCtrl
instance.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
In NVMe, namespace is being attached to process I/O. We register NVMe
namespace to a controller via nvme_register_namespace() during
nvme_ns_setup(). This is main reason of receiving NvmeCtrl object
instance to this function to map the namespace to a controller.
To make namespace instance more independent, it should be split into two
parts: setup and register. This patch split them into two differnt
parts, and finally nvme_ns_setup() does not have nothing to do with
NvmeCtrl instance at all.
This patch is a former patch to introduce NVMe subsystem scheme to the
existing design especially for multi-path. In that case, it should be
split into two to make namespace independent from a controller.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Removed no longer used aregument NvmeCtrl object in nvme_ns_init_blk().
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Volatile Write Cache(VWC) feature is set in nvme_ns_setup() in the
initial time. This feature is related to block device backed, but this
feature is controlled in controller level via Set/Get Features command.
This patch removed dependency between nvme and nvme-ns to manage the VWC
flag value. Also, it open coded the Get Features for VWC to check all
namespaces attached to the controller, and if false detected, return
directly false.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
[k.jensen: report write cache preset if present on ANY namespace]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme_ns_init_zoned() has no use for given NvmeCtrl object.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
TP 4053 says (in section 2.3.1.1) -
... if a Zone Append command specifies a ZSLBA that is not the lowest
logical block address in that zone, then the controller shall abort
that command with a status code of Invalid Field In Command.
In the code, Zone Invalid Write is returned instead, fix this.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme_io_cmd already checks if the namespace supports the Zone Append
command, so the removed check is dead code.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Add missing string representations for a couple of new commands.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
The zoned command set specification states that "All logical blocks in a
zone *shall* be marked as deallocated when [the zone is reset]". Since
the device guarantees 0x00 to be read from deallocated blocks we have to
issue a pwrite_zeroes since we cannot be sure that a discard will do
anything. But typically, this will be achieved with an efficient
unmap/discard operation.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Align with existing style and use a typedef for header-file enums.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Implicitly and explicitly opended zones are always bulk processed
together, so merge the two processing masks.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
A shutdown is only about flushing stuff. It is the host that should
delete any queues, so do not perform a reset here.
Also, on shutdown, make sure that the PMR is flushed if in use.
Fixes: 368f4e752cf9 ("hw/block/nvme: Process controller reset and shutdown differently")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
The device uses the BDRV_BLOCK_ZERO flag to determine the "deallocated"
status of logical blocks. Since the zoned namespaces command set
specification defines that logical blocks SHALL be marked as deallocated
when the zone is in the Empty or Offline states, DULBE can only be
supported if the zone size is a multiple of the calculated deallocation
granularity (reported in NPDG) which depends on the underlying block
device cluster size (if applicable) or the configured
discard_granularity.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Commit 1c0c2163aa ("hw/block/nvme: verify msix_init_exclusive_bar()
return value") had the unintended effect of breaking support on
several platforms not supporting MSI-X.
Still check for errors, but only report that MSI-X is unsupported
instead of bailing out.
Fixes: 1c0c2163aa ("hw/block/nvme: verify msix_init_exclusive_bar() return value")
Fixes: fbf2e5375e ("hw/block/nvme: Verify msix_vector_use() returned value")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>