linux

mirror of https://github.com/torvalds/linux synced 2024-10-21 02:39:51 +00:00

Author	SHA1	Message	Date
Jens Axboe	9abd68ef45	nvme: add quirk to force medium priority for SQ creation Some P3100 drives have a bug where they think WRRU (weighted round robin) is always enabled, even though the host doesn't set it. Since they think it's enabled, they also look at the submission queue creation priority. We used to set that to MEDIUM by default, but that was removed in commit `81c1cd9835`. This causes various issues on that drive. Add a quirk to still set MEDIUM priority for that controller. Fixes: `81c1cd9835` ("nvme/pci: Don't set reserved SQ create flags") Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <keith.busch@intel.com>	2018-05-11 13:37:14 -06:00
Charles Machalow	4e50d9ebae	nvme: Fix sync controller reset return If a controller reset is requested while the device has no namespaces, we were incorrectly returning ENETRESET. This patch adds the check for ADMIN_ONLY controller state to indicate a successful reset. Fixes: `8000d1fdb0` ("nvme-rdma: fix sysfs invoked reset_ctrl error flow ") Cc: <stable@vger.kernel.org> Signed-off-by: Charles Machalow <charles.machalow@intel.com> [changelog] Signed-off-by: Keith Busch <keith.busch@intel.com>	2018-05-11 10:51:45 -06:00
Jianchao Wang	12d9f07022	nvme: fix use-after-free in nvme_free_ns_head Currently only nvme_ctrl will take a reference counter of nvme_subsystem, nvme_ns_head also needs it. Otherwise nvme_free_ns_head will access the nvme_subsystem.ns_ida which has been freed by __nvme_release_subsystem after all the reference of nvme_subsystem have been released by nvme_free_ctrl. This could cause memory corruption. BUG: KASAN: use-after-free in radix_tree_next_chunk+0x9f/0x4b0 Read of size 8 at addr ffff88036494d2e8 by task fio/1815 CPU: 1 PID: 1815 Comm: fio Kdump: loaded Tainted: G W 4.17.0-rc1+ #18 Hardware name: LENOVO 10MLS0E339/3106, BIOS M1AKT22A 06/27/2017 Call Trace: dump_stack+0x91/0xeb print_address_description+0x6b/0x290 kasan_report+0x261/0x360 radix_tree_next_chunk+0x9f/0x4b0 ida_remove+0x8b/0x180 ida_simple_remove+0x26/0x40 nvme_free_ns_head+0x58/0xc0 __blkdev_put+0x30a/0x3a0 blkdev_close+0x44/0x50 __fput+0x184/0x380 task_work_run+0xaf/0xe0 do_exit+0x501/0x1440 do_group_exit+0x89/0x140 __x64_sys_exit_group+0x28/0x30 do_syscall_64+0x72/0x230 Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com>	2018-05-07 08:25:08 -06:00
Linus Torvalds	eb4f959b26	First pull request for 4.17-rc - Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off) - SPDX license tag cleanups (new tag Linux-OpenIB) - RoCE GID fixes related to default GIDs - Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch), mlx4, mlx5, and hfi1 (medium batch) -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJa7JXPAAoJELgmozMOVy/dc0AP/0i7EajAmgl1ihka6BYVj2pa DV8iSrVMDPulh9AVnAtwLJSbdwmgN/HeVzLzcutHyMYk6tAf8RCs6TsyoB36XiOL oUh5+V2GyNnyh9veWPwyGTgZKCpPJc3uQFV6502lZVDYwArMfGatumApBgQVKiJ+ YdPEXEQZPNIs6YZB1WXkNYV/ra9u0aBByQvUrxwVZ2AND+srJYO82tqZit2wBtjK UXrhmZbWXGWMFg8K3/lpfUkQhkG3Arj+tMKkCfqsVzC7wUPhlTKBHR9NmvdLIiy9 5Vhv7Xp78udcxZKtUeTFsbhaMqqK7x7sKHnpKAs7hOZNZ/Eg47BrMwMrZVLOFuDF nBLUL1H+nJ1mASZoMWH5xzOpVew+e9X0cot09pVDBIvsOIh97wCG7hgptQ2Z5xig fcDiMmg6tuakMsaiD0dzC9JI5HR6Z7+6oR1tBkQFDxQ+XkkcoFabdmkJaIRRwOj7 CUhXRgcm0UgVd03Jdta6CtYXsjSODirWg4AvSSMt9lUFpjYf9WZP00/YojcBbBEH UlVrPbsKGyncgrm3FUP6kXmScESfdTljTPDLiY9cO9+bhhPGo1OHf005EfAp178B jGp6hbKlt+rNs9cdXrPSPhjds+QF8HyfSlwyYVWKw8VWlh/5DG8uyGYjF05hYO0q xhjIS6/EZjcTAh5e4LzR =PI8v -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Doug Ledford: "This is our first pull request of the rc cycle. It's not that it's been overly quiet, we were just waiting on a few things before sending this off. For instance, the 6 patch series from Intel for the hfi1 driver had actually been pulled in on Tuesday for a Wednesday pull request, only to have Jason notice something I missed, so we held off for some testing, and then on Thursday had to respin the series because the very first patch needed a minor fix (unnecessary cast is all). There is a sizable hns patch series in here, as well as a reasonably largish hfi1 patch series, then all of the lines of uapi updates are just the change to the new official Linux-OpenIB SPDX tag (a bunch of our files had what amounts to a BSD-2-Clause + MIT Warranty statement as their license as a result of the initial code submission years ago, and the SPDX folks decided it was unique enough to warrant a unique tag), then the typical mlx4 and mlx5 updates, and finally some cxgb4 and core/cache/cma updates to round out the bunch. None of it was overly large by itself, but in the 2 1/2 weeks we've been collecting patches, it has added up :-/. As best I can tell, it's been through 0day (I got a notice about my last for-next push, but not for my for-rc push, but Jason seems to think that failure messages are prioritized and success messages not so much). It's also been through linux-next. And yes, we did notice in the context portion of the CMA query gid fix patch that there is a dubious BUG_ON() in the code, and have plans to audit our BUG_ON usage and remove it anywhere we can. Summary: - Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off) - SPDX license tag cleanups (new tag Linux-OpenIB) - RoCE GID fixes related to default GIDs - Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch), mlx4, mlx5, and hfi1 (medium batch)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (52 commits) RDMA/cma: Do not query GID during QP state transition to RTR IB/mlx4: Fix integer overflow when calculating optimal MTT size IB/hfi1: Fix memory leak in exception path in get_irq_affinity() IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used IB/hfi1: Fix loss of BECN with AHG IB/hfi1 Use correct type for num_user_context IB/hfi1: Fix handling of FECN marked multicast packet IB/core: Make ib_mad_client_id atomic iw_cxgb4: Atomically flush per QP HW CQEs IB/uverbs: Fix kernel crash during MR deregistration flow IB/uverbs: Prevent reregistration of DM_MR to regular MR RDMA/mlx4: Add missed RSS hash inner header flag RDMA/hns: Fix a couple misspellings RDMA/hns: Submit bad wr RDMA/hns: Update assignment method for owner field of send wqe RDMA/hns: Adjust the order of cleanup hem table RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set RDMA/hns: Remove some unnecessary attr_mask judgement RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set ...	2018-05-04 20:51:10 -10:00
Johannes Thumshirn	8bfc3b4c6f	nvmet: switch loopback target state to connecting when resetting After commit `bb06ec3145` ("nvme: expand nvmf_check_if_ready checks") resetting of the loopback nvme target failed as we forgot to switch it's state to NVME_CTRL_CONNECTING before we reconnect the admin queues. Therefore the checks in nvmf_check_if_ready() choose to go to the reject_io case and thus we couldn't sent out an identify controller command to reconnect. Change the controller state to NVME_CTRL_CONNECTING after tearing down the old connection and before re-establishing the connection. Fixes: `bb06ec3145` ("nvme: expand nvmf_check_if_ready checks") Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-03 09:37:50 -06:00
Keith Busch	a785dbccd9	nvme/multipath: Fix multipath disabled naming collisions When CONFIG_NVME_MULTIPATH is set, but we're not using nvme to multipath, namespaces with multiple paths were not creating unique names due to reusing the same instance number from the namespace's head. This patch fixes this by falling back to the non-multipath naming method when the parameter disabled using multipath. Reported-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-03 09:37:50 -06:00
Keith Busch	5cadde8019	nvme/multipath: Disable runtime writable enabling parameter We can't allow the user to change multipath settings at runtime, as this will create naming conflicts due to the different naming schemes used for each mode. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-03 09:37:50 -06:00
Keith Busch	f31a21103c	nvme: Set integrity flag for user passthrough commands If the command a separate metadata buffer attached, the request needs to have the integrity flag set so the driver knows to map it. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-03 09:37:50 -06:00
Chengguang Xu	59a2f3f00f	nvme: fix potential memory leak in option parsing When specifying same string type option several times, current option parsing may cause memory leak. Hence, call kfree for previous one in this case. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-03 09:37:50 -06:00
Greg Thelen	d6fc6a22fc	nvmet-rdma: depend on INFINIBAND_ADDR_TRANS NVME_TARGET_RDMA code depends on INFINIBAND_ADDR_TRANS provided symbols. So declare the kconfig dependency. This is necessary to allow for enabling INFINIBAND without INFINIBAND_ADDR_TRANS. Signed-off-by: Greg Thelen <gthelen@google.com> Cc: Tarick Bedeir <tarick@google.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 11:15:43 -04:00
Greg Thelen	3af7a156bd	nvme: depend on INFINIBAND_ADDR_TRANS NVME_RDMA code depends on INFINIBAND_ADDR_TRANS provided symbols. So declare the kconfig dependency. This is necessary to allow for enabling INFINIBAND without INFINIBAND_ADDR_TRANS. Signed-off-by: Greg Thelen <gthelen@google.com> Cc: Tarick Bedeir <tarick@google.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 11:15:43 -04:00
James Smart	bb06ec3145	nvme: expand nvmf_check_if_ready checks The nvmf_check_if_ready() checks that were added are very simplistic. As such, the routine allows a lot of cases to fail ios during windows of reset or re-connection. In cases where there are not multi-path options present, the error goes back to the callee - the filesystem or application. Not good. The common routine was rewritten and calling syntax slightly expanded so that per-transport is_ready routines don't need to be present. The transports now call the routine directly. The routine is now a fabrics routine rather than an inline function. The routine now looks at controller state to decide the action to take. Some states mandate io failure. Others define the condition where a command can be accepted. When the decision is unclear, a generic queue-or-reject check is made to look for failfast or multipath ios and only fails the io if it is so marked. Otherwise, the io will be queued and wait for the controller state to resolve. Admin commands issued via ioctl share a live admin queue with commands from the transport for controller init. The ioctls could be intermixed with the initialization commands. It's possible for the ioctl cmd to be issued prior to the controller being enabled. To block this, the ioctl admin commands need to be distinguished from admin commands used for controller init. Added a USERCMD nvme_req(req)->rq_flags bit to reflect this division and set it on ioctls requests. As the nvmf_check_if_ready() routine is called prior to nvme_setup_cmd(), ensure that commands allocated by the ioctl path (actually anything in core.c) preps the nvme_req(req) before starting the io. This will preserve the USERCMD flag during execution and/or retry. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.e> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-12 09:58:27 -06:00
Keith Busch	62843c2e42	nvme: Use admin command effects for admin commands Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-12 09:58:27 -06:00
Daniel Verkamp	c739969849	nvmet: fix space padding in serial number Commit `42de82a8b5` previously attempted to fix this, and it did correctly pad the MN and FR fields with spaces, but the SN field still contains 0 bytes. The current code fills out the first 16 bytes with hex2bin, leaving the last 4 bytes zeroed. Rather than adding a lot of error-prone math to avoid overwriting SN twice, just set the whole thing to spaces up front (it's only 20 bytes). Fixes: `42de82a8b5` ("nvmet: don't report 0-bytes in serial number") Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-12 09:58:27 -06:00
Max Gurtovoy	fd92c77f58	nvme: check return value of init_srcu_struct function Also add error flow in case srcu initialization function fails. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-12 09:58:27 -06:00
Rodrigo R. Galvao	543c09c89f	nvmet: Fix nvmet_execute_write_zeroes sector count We have to increment the number of logical blocks to a 1's based value in the native format prior to converting to 512b units. Signed-off-by: Rodrigo R. Galvao <rosattig@linux.vnet.ibm.com> [changelog] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-12 09:58:27 -06:00
Keith Busch	22b5560195	nvme-pci: Separate IO and admin queue IRQ vectors The admin and first IO queues shared the first irq vector, which has an affinity mask including cpu0. If a system allows cpu0 to be offlined, the admin queue may not be usable if no other CPUs in the affinity mask are online. This is a problem since unlike IO queues, there is only one admin queue that always needs to be usable. To fix, this patch allocates one pre_vector for the admin queue that is assigned all CPUs, so will always be accessible. The IO queues are assigned the remaining managed vectors. In case a controller has only one interrupt vector available, the admin and IO queues will share the pre_vector with all CPUs assigned. Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-12 09:58:27 -06:00
Keith Busch	a6ff7262c2	nvme-pci: Remove unused queue parameter All the queue memory is allocated up front. We don't take the node into consideration when creating queues anymore, so removing the unused parameter. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-12 09:58:27 -06:00
Keith Busch	64ee0ac052	nvme-pci: Skip queue deletion if there are no queues User reported controller always retains CSTS.RDY to 1, which fails controller disabling when resetting the controller. This is also before the admin queue is allocated, and trying to disable an unallocated queue results in a NULL dereference. Reported-by: Alex Gagniuc <Alex_Gagniuc@Dellteam.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-12 09:58:27 -06:00
Arnd Bergmann	6038aa532a	nvme: target: fix buffer overflow nvmet_execute_get_disc_log_page() passes a fixed-length string into nvmet_format_discovery_entry(), which then does a longer memcpy() on it, as pointed out by gcc-8: In function 'nvmet_format_discovery_entry', inlined from 'nvmet_execute_get_disc_log_page' at drivers/nvme/target/discovery.c:126:4: drivers/nvme/target/discovery.c:62:2: error: 'memcpy' forming offset [38, 223] is out of the bounds [0, 37] [-Werror=array-bounds] memcpy(e->subnqn, subsys_nqn, NVMF_NQN_SIZE); Using strncpy() will make this well-defined, filling the rest of the buffer with zeroes, under the assumption that the input is either a NUL-terminated string, or a byte sequence containing no zeroes. If the input is a string that is longer than NVMF_NQN_SIZE, we continue to have no NUL-termination in the output. Fixes: `a07b4970f4` ("nvmet: add a generic NVMe target") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-12 09:58:27 -06:00
Johannes Thumshirn	74c6c71530	nvme: don't send keep-alives to the discovery controller NVMe over Fabrics 1.0 Section 5.2 "Discovery Controller Properties and Command Support" Figure 31 "Discovery Controller – Admin Commands" explicitly listst all commands but "Get Log Page" and "Identify" as reserved, but NetApp report the Linux host is sending Keep Alive commands to the discovery controller, which is a violation of the Spec. We're already checking for discovery controllers when configuring the keep alive timeout but when creating a discovery controller we're not hard wiring the keep alive timeout to 0 and thus remain on NVME_DEFAULT_KATO for the discovery controller. This can be easily remproduced when issuing a direct connect to the discovery susbsystem using: 'nvme connect [...] --nqn=nqn.2014-08.org.nvmexpress.discovery' Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Fixes: `07bfcd09a2` ("nvme-fabrics: add a generic NVMe over Fabrics library") Reported-by: Martin George <marting@netapp.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-12 09:58:27 -06:00
Johannes Thumshirn	00b683dbab	nvme: unexport nvme_start_keep_alive nvme_start_keep_alive() isn't used outside core.c so unexport it and make it static. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-12 09:58:27 -06:00
Ming Lei	11d9ea6f2c	nvme-loop: fix kernel oops in case of unhandled command When nvmet_req_init() fails, __nvmet_req_complete() is called to handle the target request via .queue_response(), so nvme_loop_queue_response() shouldn't be called again for handling the failure. This patch fixes this case by the following way: - move blk_mq_start_request() before nvmet_req_init(), so nvme_loop_queue_response() may work well to complete this host request - don't call nvme_cleanup_cmd() which is done in nvme_loop_complete_rq() - don't call nvme_loop_queue_response() which is done via .queue_response() Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [trimmed changelog] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-12 09:58:27 -06:00
Matias Bjørling	7ec6074ff0	nvme: enforce 64bit offset for nvme_get_log_ext fn Compiling on 32 bits system produces a warning for the shift width when shifting 32 bit integer with 64bit integer. Make sure that offset always is 64bit, and use macros for retrieving lower and upper bits of the offset. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-12 09:58:27 -06:00
Linus Torvalds	3526dd0c78	for-4.17/block-20180402 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJawr05AAoJEPfTWPspceCmT2UP/1uuaqwzyl4VjFNb/k7KS7UM +Cs/1HBlGomgMA8orDTGqtWqLRdR3z4RSh0+MvXTzQ78HpFVYz7CbDc9itHm+G9M X0ypD4kF/JGCFb5cxk+x6qv28uO2nv4DP3+0hHqJWLH4UVJBWDY6bs4BPShsf9QB I6XjioNMhoqylXgdOITLODJZz+TcChlJMDAqwhpJwh9TH1wjobleAZ6AdmCPfgi5 h0UCKMUKzcVJlNZwQUrzrs2cxcx9Uhunnbz7HK0ZV4n/FKFtDpGynFpQQ71pZxKe Be0ZOBPCQvC3ykOM/egCIvC/e5y7FgrjORD6jxyu1PTwAugI5E1VYSMxHkXvgPAx zOo9A7RT4GPO2tDQv+DbzNFpqeSAclTgSmr+/y1wmheBs8DiSt7MPVBiNM4zdCNv NLk9z7IEjFhdmluSB/LbTb1aokypMb/q7QTLouPHdwGn80k7yrhFyLHgdjpNTQ2K UHfHZvGxkOX6SmFhBNOtIFUkuSceenh64a0RkRle7filx+ImpbCVm2/GYi9zZNCu EtctgzLbLmz40zMiyDaZS2bxBgGzfn6yf4xd9LsaAJPMhvZnmXogT0D9ctWXB0WU mMaS7sOkLnNjnGkzF1fHkeiZ/oigrstJbe+CA7BtOdwxpWn6MZBgKEoFQ6iA2b3X 5J1axMgVH5LAsIEcEQVq =RVhK -----END PGP SIGNATURE----- Merge tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block Pull block layer updates from Jens Axboe: "It's a pretty quiet round this time, which is nice. This contains: - series from Bart, cleaning up the way we set/test/clear atomic queue flags. - series from Bart, fixing races between gendisk and queue registration and removal. - set of bcache fixes and improvements from various folks, by way of Michael Lyle. - set of lightnvm updates from Matias, most of it being the 1.2 to 2.0 transition. - removal of unused DIO flags from Nikolay. - blk-mq/sbitmap memory ordering fixes from Omar. - divide-by-zero fix for BFQ from Paolo. - minor documentation patches from Randy. - timeout fix from Tejun. - Alpha "can't write a char atomically" fix from Mikulas. - set of NVMe fixes by way of Keith. - bsg and bsg-lib improvements from Christoph. - a few sed-opal fixes from Jonas. - cdrom check-disk-change deadlock fix from Maurizio. - various little fixes, comment fixes, etc from various folks" * tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits) blk-mq: Directly schedule q->timeout_work when aborting a request blktrace: fix comment in blktrace_api.h lightnvm: remove function name in strings lightnvm: pblk: remove some unnecessary NULL checks lightnvm: pblk: don't recover unwritten lines lightnvm: pblk: implement 2.0 support lightnvm: pblk: implement get log report chunk lightnvm: pblk: rename ppaf* to addrf* lightnvm: pblk: check for supported version lightnvm: implement get log report chunk helpers lightnvm: make address conversions depend on generic device lightnvm: add support for 2.0 address format lightnvm: normalize geometry nomenclature lightnvm: complete geo structure with maxoc* lightnvm: add shorten OCSSD version in geo lightnvm: add minor version to generic geometry lightnvm: simplify geometry structure lightnvm: pblk: refactor init/exit sequences lightnvm: Avoid validation of default op value lightnvm: centralize permission check for lightnvm ioctl ...	2018-04-05 14:27:02 -07:00
Matias Bjørling	b65125fa57	lightnvm: remove function name in strings For the sysfs functions, the function names are embedded into their error strings. If the function name later changes, the string may not be updated accordingly. Update the strings to use __func__ to avoid this. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Javier González	a294c19945	lightnvm: implement get log report chunk helpers The 2.0 spec provides a report chunk log page that can be retrieved using the stangard nvme get log page. This replaces the dedicated get/put bad block table in 1.2. This patch implements the helper functions to allow targets retrieve the chunk metadata using get log page. It makes nvme_get_log_ext available outside of nvme core so that we can use it form lightnvm. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Javier González	a40afad90b	lightnvm: normalize geometry nomenclature Normalize nomenclature for naming channels, luns, chunks, planes and sectors as well as derivations in order to improve readability. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Javier González	3f48021bad	lightnvm: complete geo structure with maxoc* Complete the generic geometry structure with the maxoc and maxocpu felds, present in the 2.0 spec. Also, expose them through sysfs. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Javier González	f1d4e8121f	lightnvm: add shorten OCSSD version in geo Create a shorten version to use in the generic geometry. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Javier González	3cb98f84d3	lightnvm: add minor version to generic geometry Separate the version between major and minor on the generic geometry and represent it through sysfs in the 2.0 path. The 1.2 path only shows the major version to preserve the existing user space interface. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Javier González	e46f4e4822	lightnvm: simplify geometry structure Currently, the device geometry is stored redundantly in the nvm_id and nvm_geo structures at a device level. Moreover, when instantiating targets on a specific number of LUNs, these structures are replicated and manually modified to fit the instance channel and LUN partitioning. Instead, create a generic geometry around nvm_geo, which can be used by (i) the underlying device to describe the geometry of the whole device, and (ii) instances to describe their geometry independently. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Matias Bjørling	96257a8a7f	nvme: lightnvm: add late setup of block size and metadata The nvme driver sets up the size of the nvme namespace in two steps. First it initializes the device with standard logical block and metadata sizes, and then sets the correct logical block and metadata size. Due to the OCSSD 2.0 specification relies on the namespace to expose these sizes for correct initialization, let it be updated appropriately on the LightNVM side as well. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Matias Bjørling	89a09c5643	lightnvm: remove nvm_dev_ops->max_phys_sect The value of max_phys_sect is always static. Instead of defining it in the nvm_dev_ops structure, declare it as a global value. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Matias Bjørling	62771fe0aa	lightnvm: add 2.0 geometry identification Implement the geometry data structures for 2.0 and enable a drive to be identified as one, including exposing the appropriate 2.0 sysfs entries. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Matias Bjørling	c6ac3f35d4	lightnvm: flatten nvm_id_group into nvm_id There are no groups in the 2.0 specification, make sure that the nvm_id structure is flattened before 2.0 data structures are added. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Matias Bjørling	a04e0cf93a	lightnvm: make 1.2 data structures explicit Make the 1.2 data structures explicit, so it will be easy to identify the 2.0 data structures. Also fix the order of which the nvme_nvm_* are declared, such that they follow the nvme_nvm_command order. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Matias Bjørling	ff12581ec7	lightnvm: remove multiple groups in 1.2 data structure Only one id group from the 1.2 specification is supported. Make sure that only the first group is accessible. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Matias Bjørling	d8a39caee0	lightnvm: remove mlc pairs structure The known implementations of the 1.2 specification, and upcoming 2.0 implementation all expose a sequential list of pages to write. Remove the data structure, as it is no longer needed. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Matias Bjørling	8f37d1913f	lightnvm: remove chnl_offset in nvme_nvm_identity The identity structure is initialized to zero in the beginning of the nvme_nvm_identity function. The chnl_offset is separately set to zero. Since both the variable and assignment is never changed, remove them. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-29 17:29:09 -06:00
Keith Busch	f23f5bece6	blk-mq: Allow PCI vector offset for mapping queues The PCI interrupt vectors intended to be associated with a queue may not start at 0; a driver may allocate pre_vectors for special use. This patch adds an offset parameter so blk-mq may find the intended affinity mask and updates all drivers using this API accordingly. Cc: Don Brace <don.brace@microsemi.com> Cc: <qla2xxx-upstream@qlogic.com> Cc: <linux-scsi@vger.kernel.org> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-27 21:25:36 -06:00
Matias Bjørling	d558fb51ad	nvme: make nvme_get_log_ext non-static Enable the lightnvm integration to use the nvme_get_log_ext() function. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-26 08:53:43 -06:00
Christoph Hellwig	e929f06d9e	nvmet: constify struct nvmet_fabrics_ops Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-26 08:53:43 -06:00
Christoph Hellwig	a5d1861229	nvmet: refactor configfs transport type handling Have a common table of mappings from numerical transport ids to names, and zero the transport specific area in common code in nvmet_addr_trtype_store. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-26 08:53:43 -06:00
Max Gurtovoy	f871749a9f	nvmet: move device_uuid configfs attr definition to suitable place Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-26 08:53:43 -06:00
Nitzan Carmi	b435ecea2a	nvme: Add .stop_ctrl to nvme ctrl ops For consistancy reasons, any fabric-specific works (e.g error recovery/reconnect) should be canceled in nvme_stop_ctrl, as for all other NVMe pending works (e.g. scan, keep alive). The patch aims to simplify the logic of the code, as we now only rely on a vague demand from any fabric to flush its private workqueues at the beginning of .delete_ctrl op. Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-26 08:53:43 -06:00
Nitzan Carmi	187c0832ee	nvme-rdma: Allow DELETING state change failure in error_recovery While error recovery is ongoing, it is OK to move ctrl to DELETING state (from concurrent delete_work). Thus we don't need a warning for that case. Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-26 08:53:43 -06:00
Keith Busch	2079699c10	nvme: Skip checking heads without namespaces If a task is holding a reference to a namespace on a removed controller, the head will not be released. If the same controller is added again later, its namespaces may not be successfully added. Instead, the user will see kernel message "Duplicate IDs for nsid <X>". This patch fixes that by skipping heads that don't have namespaces when considering if a new namespace is safe to add. Reported-by: Alex Gagniuc <Alex_Gagniuc@Dellteam.com> Cc: stable@vger.kernel.org Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-26 08:53:43 -06:00
Max Gurtovoy	9bad0404ec	nvme-rdma: Don't flush delete_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the work queue. Reviewed-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-26 08:53:43 -06:00
Max Gurtovoy	a3dd7d0022	nvmet-rdma: Don't flush system_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the system work queue. Reviewed-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-26 08:53:43 -06:00

1 2 3 4 5 ...

1026 commits