linux/block
Gabriel Krisman Bertazi d1b1cea1e5 blk-mq: Fix failed allocation path when mapping queues
In blk_mq_map_swqueue, there is a memory optimization that frees the
tags of a queue that has gone unmapped.  Later, if that hctx is remapped
after another topology change, the tags need to be reallocated.

If this allocation fails, a simple WARN_ON triggers, but the block layer
ends up with an active hctx without any corresponding set of tags.
Then, any income IO to that hctx can trigger an Oops.

I can reproduce it consistently by running IO, flipping CPUs on and off
and eventually injecting a memory allocation failure in that path.

In the fix below, if the system experiences a failed allocation of any
hctx's tags, we remap all the ctxs of that queue to the hctx_0, which
should always keep it's tags.  There is a minor performance hit, since
our mapping just got worse after the error path, but this is
the simplest solution to handle this error path.  The performance hit
will disappear after another successful remap.

I considered dropping the memory optimization all together, but it
seemed a bad trade-off to handle this very specific error case.

This should apply cleanly on top of Jens' for-next branch.

The Oops is the one below:

SP (3fff935ce4d0) is in userspace
1:mon> e
cpu 0x1: Vector: 300 (Data Access) at [c000000fe99eb110]
    pc: c0000000005e868c: __sbitmap_queue_get+0x2c/0x180
    lr: c000000000575328: __bt_get+0x48/0xd0
    sp: c000000fe99eb390
   msr: 900000010280b033
   dar: 28
 dsisr: 40000000
  current = 0xc000000fe9966800
  paca    = 0xc000000007e80300   softe: 0        irq_happened: 0x01
    pid   = 11035, comm = aio-stress
Linux version 4.8.0-rc6+ (root@bean) (gcc version 5.4.0 20160609
(Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2) ) #3 SMP Mon Oct 10 20:16:53 CDT 2016
1:mon> s
[c000000fe99eb3d0] c000000000575328 __bt_get+0x48/0xd0
[c000000fe99eb400] c000000000575838 bt_get.isra.1+0x78/0x2d0
[c000000fe99eb480] c000000000575cb4 blk_mq_get_tag+0x44/0x100
[c000000fe99eb4b0] c00000000056f6f4 __blk_mq_alloc_request+0x44/0x220
[c000000fe99eb500] c000000000570050 blk_mq_map_request+0x100/0x1f0
[c000000fe99eb580] c000000000574650 blk_mq_make_request+0xf0/0x540
[c000000fe99eb640] c000000000561c44 generic_make_request+0x144/0x230
[c000000fe99eb690] c000000000561e00 submit_bio+0xd0/0x200
[c000000fe99eb740] c0000000003ef740 ext4_io_submit+0x90/0xb0
[c000000fe99eb770] c0000000003e95d8 ext4_writepages+0x588/0xdd0
[c000000fe99eb910] c00000000025a9f0 do_writepages+0x60/0xc0
[c000000fe99eb940] c000000000246c88 __filemap_fdatawrite_range+0xf8/0x180
[c000000fe99eb9e0] c000000000246f90 filemap_write_and_wait_range+0x70/0xf0
[c000000fe99eba20] c0000000003dd844 ext4_sync_file+0x214/0x540
[c000000fe99eba80] c000000000364718 vfs_fsync_range+0x78/0x130
[c000000fe99ebad0] c0000000003dd46c ext4_file_write_iter+0x35c/0x430
[c000000fe99ebb90] c00000000038c280 aio_run_iocb+0x3b0/0x450
[c000000fe99ebce0] c00000000038dc28 do_io_submit+0x368/0x730
[c000000fe99ebe30] c000000000009404 system_call+0x38/0xec

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Cc: linux-block@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Reviewed-by: Douglas Miller <dougmill@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-14 13:57:47 -07:00
..
partitions block: atari: Return early for unsupported sector size 2016-07-13 09:31:44 -07:00
badblocks.c badblocks: badblocks_set/clear update unacked_exist 2016-10-21 15:45:47 -06:00
bio-integrity.c block: remove bio_is_rw 2016-10-28 08:45:17 -06:00
bio.c block: improve handling of the magic discard payload 2016-12-09 08:30:51 -07:00
blk-cgroup.c block,blkcg: use __GFP_NOWARN for best-effort allocations in blkcg 2016-11-22 08:59:49 -07:00
blk-core.c Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2016-12-13 13:26:24 -08:00
blk-exec.c block: split out request-only flags into a new namespace 2016-10-28 08:45:17 -06:00
blk-flush.c Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block 2016-12-13 10:19:16 -08:00
blk-integrity.c block, libnvdimm, nvme: provide a built-in blk_integrity nop profile 2015-10-21 14:43:45 -06:00
blk-ioc.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
blk-lib.c block: improve handling of the magic discard payload 2016-12-09 08:30:51 -07:00
blk-map.c Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block 2016-12-13 10:19:16 -08:00
blk-merge.c block: improve handling of the magic discard payload 2016-12-09 08:30:51 -07:00
blk-mq-cpumap.c blk-mq: allow the driver to pass in a queue mapping 2016-09-15 08:42:03 -06:00
blk-mq-pci.c blk_mq: linux/blk-mq.h does not include all the headers it depends on 2016-09-19 08:21:51 -06:00
blk-mq-sysfs.c block: add scalable completion tracking of requests 2016-11-10 13:53:26 -07:00
blk-mq-tag.c Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-block 2016-10-09 17:29:33 -07:00
blk-mq-tag.h Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-block 2016-10-09 17:29:33 -07:00
blk-mq.c blk-mq: Fix failed allocation path when mapping queues 2016-12-14 13:57:47 -07:00
blk-mq.h blk-mq: abstract out blk_mq_dispatch_rq_list() helper 2016-12-09 09:03:02 -07:00
blk-settings.c Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block 2016-12-13 10:19:16 -08:00
blk-softirq.c This adds a new gcc plugin named "latent_entropy". It is designed to 2016-10-15 10:03:15 -07:00
blk-stat.c blk-stat: fix a few cases of missing batch flushing 2016-12-09 13:08:35 -07:00
blk-stat.h block: add scalable completion tracking of requests 2016-11-10 13:53:26 -07:00
blk-sysfs.c Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block 2016-12-13 10:19:16 -08:00
blk-tag.c block: split out request-only flags into a new namespace 2016-10-28 08:45:17 -06:00
blk-throttle.c block: replace REQ_THROTTLED with a bio flag 2016-10-28 08:45:17 -06:00
blk-timeout.c block: remove REQ_NO_TIMEOUT flag 2015-12-22 09:38:34 -07:00
blk-wbt.c blk-wbt: don't throttle discard or write zeroes 2016-12-09 08:29:35 -07:00
blk-wbt.h blk-wbt: allow wbt to be enabled always through sysfs 2016-11-28 10:27:03 -07:00
blk-zoned.c block: zoned: fix harmless maybe-uninitialized warning 2016-10-24 20:51:22 -06:00
blk.h blk-mq: implement hybrid poll mode for sync O_DIRECT 2016-11-17 13:34:51 -07:00
bounce.c Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2015-09-19 18:57:09 -07:00
bsg-lib.c bsg: Add sparse annotations to bsg_request_fn() 2016-11-14 09:57:03 -07:00
bsg.c block: drop q argument from bsg_validate_sgv4_hdr 2016-11-03 07:56:14 -06:00
cfq-iosched.c blk-wbt: cleanup disable-by-default for CFQ 2016-11-28 10:27:03 -07:00
cmdline-parser.c block: remove unrelated header files and export symbol 2014-01-21 20:18:26 -08:00
compat_ioctl.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
deadline-iosched.c block: do not merge requests without consulting with io scheduler 2016-07-20 21:35:12 -06:00
elevator.c elevator: make the rqhash helpers exported 2016-12-09 09:03:02 -07:00
genhd.c block: fix bdi vs gendisk lifetime mismatch 2016-08-04 14:19:16 -06:00
ioctl.c blk-zoned: implement ioctls 2016-10-18 10:05:42 -06:00
ioprio.c block: fix use-after-free in sys_ioprio_get() 2016-07-01 08:39:24 -06:00
Kconfig block: hook up writeback throttling 2016-11-10 13:53:40 -07:00
Kconfig.iosched blkcg: make CONFIG_BLK_CGROUP bool 2012-03-06 21:27:21 +01:00
Makefile blk-wbt: add general throttling mechanism 2016-11-10 13:53:32 -07:00
noop-iosched.c elevator: use list_{first,prev,next}_entry 2015-11-16 15:21:48 -07:00
partition-generic.c block: Check partition alignment on zoned block devices 2016-12-01 07:56:53 -07:00
scsi_ioctl.c mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM 2015-11-06 17:50:42 -08:00
t10-pi.c block: Consolidate static integrity profile properties 2015-10-21 14:42:38 -06:00