linux/drivers
Guoqing Jiang e820d55cb9 md: fix raid10 hang issue caused by barrier
When both regular IO and resync IO happen at the same time,
and if we also need to split regular. Then we can see tasks
hang due to barrier.

1. resync thread
[ 1463.757205] INFO: task md1_resync:5215 blocked for more than 480 seconds.
[ 1463.757207]       Not tainted 4.19.5-1-default #1
[ 1463.757209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1463.757212] md1_resync      D    0  5215      2 0x80000000
[ 1463.757216] Call Trace:
[ 1463.757223]  ? __schedule+0x29a/0x880
[ 1463.757231]  ? raise_barrier+0x8d/0x140 [raid10]
[ 1463.757236]  schedule+0x78/0x110
[ 1463.757243]  raise_barrier+0x8d/0x140 [raid10]
[ 1463.757248]  ? wait_woken+0x80/0x80
[ 1463.757257]  raid10_sync_request+0x1f6/0x1e30 [raid10]
[ 1463.757265]  ? _raw_spin_unlock_irq+0x22/0x40
[ 1463.757284]  ? is_mddev_idle+0x125/0x137 [md_mod]
[ 1463.757302]  md_do_sync.cold.78+0x404/0x969 [md_mod]
[ 1463.757311]  ? wait_woken+0x80/0x80
[ 1463.757336]  ? md_rdev_init+0xb0/0xb0 [md_mod]
[ 1463.757351]  md_thread+0xe9/0x140 [md_mod]
[ 1463.757358]  ? _raw_spin_unlock_irqrestore+0x2e/0x60
[ 1463.757364]  ? __kthread_parkme+0x4c/0x70
[ 1463.757369]  kthread+0x112/0x130
[ 1463.757374]  ? kthread_create_worker_on_cpu+0x40/0x40
[ 1463.757380]  ret_from_fork+0x3a/0x50

2. regular IO
[ 1463.760679] INFO: task kworker/0:8:5367 blocked for more than 480 seconds.
[ 1463.760683]       Not tainted 4.19.5-1-default #1
[ 1463.760684] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1463.760687] kworker/0:8     D    0  5367      2 0x80000000
[ 1463.760718] Workqueue: md submit_flushes [md_mod]
[ 1463.760721] Call Trace:
[ 1463.760731]  ? __schedule+0x29a/0x880
[ 1463.760741]  ? wait_barrier+0xdd/0x170 [raid10]
[ 1463.760746]  schedule+0x78/0x110
[ 1463.760753]  wait_barrier+0xdd/0x170 [raid10]
[ 1463.760761]  ? wait_woken+0x80/0x80
[ 1463.760768]  raid10_write_request+0xf2/0x900 [raid10]
[ 1463.760774]  ? wait_woken+0x80/0x80
[ 1463.760778]  ? mempool_alloc+0x55/0x160
[ 1463.760795]  ? md_write_start+0xa9/0x270 [md_mod]
[ 1463.760801]  ? try_to_wake_up+0x44/0x470
[ 1463.760810]  raid10_make_request+0xc1/0x120 [raid10]
[ 1463.760816]  ? wait_woken+0x80/0x80
[ 1463.760831]  md_handle_request+0x121/0x190 [md_mod]
[ 1463.760851]  md_make_request+0x78/0x190 [md_mod]
[ 1463.760860]  generic_make_request+0x1c6/0x470
[ 1463.760870]  raid10_write_request+0x77a/0x900 [raid10]
[ 1463.760875]  ? wait_woken+0x80/0x80
[ 1463.760879]  ? mempool_alloc+0x55/0x160
[ 1463.760895]  ? md_write_start+0xa9/0x270 [md_mod]
[ 1463.760904]  raid10_make_request+0xc1/0x120 [raid10]
[ 1463.760910]  ? wait_woken+0x80/0x80
[ 1463.760926]  md_handle_request+0x121/0x190 [md_mod]
[ 1463.760931]  ? _raw_spin_unlock_irq+0x22/0x40
[ 1463.760936]  ? finish_task_switch+0x74/0x260
[ 1463.760954]  submit_flushes+0x21/0x40 [md_mod]

So resync io is waiting for regular write io to complete to
decrease nr_pending (conf->barrier++ is called before waiting).
The regular write io splits another bio after call wait_barrier
which call nr_pending++, then the splitted bio would continue
with raid10_write_request -> wait_barrier, so the splitted bio
has to wait for barrier to be zero, then deadlock happens as
follows.

	resync io		regular io

	raise_barrier
				wait_barrier
				generic_make_request
				wait_barrier

To resolve the issue, we need to call allow_barrier to decrease
nr_pending before generic_make_request since regular IO is not
issued to underlying devices, and wait_barrier is called again
to ensure no internal IO happening.

Fixes: fc9977dd06 ("md/raid10: simplify the splitting of requests.")
Reported-and-tested-by: Siniša Bandin <sinisa@4net.rs>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2018-12-20 08:53:24 -08:00
..
accessibility
acpi libnvdimm fixes 4.20-rc6 2018-12-09 09:46:54 -08:00
amba
android binder: fix race that allows malicious free of live buffer 2018-11-26 20:01:47 +01:00
ata libata: whitelist all SAMSUNG MZ7KM* solid-state disks 2018-12-03 12:54:39 -07:00
atm firestream: fix spelling mistake: "Inititing" -> "Initializing" 2018-11-27 15:32:06 -08:00
auxdisplay The Compiler Attributes series 2018-11-01 18:34:46 -07:00
base devres: Align data[] to ARCH_KMALLOC_MINALIGN 2018-11-11 11:40:04 -08:00
bcma
block for-linus-20181115 2018-11-16 09:31:59 -06:00
bluetooth Merge branch 'work.tty-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-10-24 14:43:41 +01:00
bus ARM: SoC driver updates for 4.17 2018-10-29 15:16:01 -07:00
cdrom gdrom: fix mistake in assignment of error 2018-10-25 11:17:40 -06:00
char RTC for 4.20 2018-10-27 09:24:24 -07:00
clk clk: qcom: qcs404: Fix gpll0_out_main parent 2018-12-10 11:31:30 -08:00
clocksource Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-11-11 16:41:50 -06:00
connector
cpufreq cpufreq: ti-cpufreq: Only register platform_device when supported 2018-11-19 11:26:06 +01:00
cpuidle ARM: cpuidle: Convert to use cpuidle_register|unregister() 2018-11-08 18:53:00 +01:00
crypto crypto/chelsio/chtls: send/recv window update 2018-12-14 13:40:42 -08:00
dax
dca
devfreq
dio
dma dmaengine: dw: Fix FIFO size for Intel Merrifield 2018-12-06 22:53:05 +05:30
dma-buf udmabuf: set read/write flag when exporting 2018-11-16 08:50:53 +01:00
edac * skx_edac: Address translation for NVDIMMs (Tony Luck and Qiuxu Zhuo) 2018-11-02 11:17:22 -07:00
eisa
extcon
firewire
firmware efi: Prevent GICv3 WARN() by mapping the memreserve table before first use 2018-11-27 13:50:20 +01:00
fmc
fpga fpga: add devm_fpga_region_create 2018-10-16 11:13:50 +02:00
fsi fsi: fsi-scom.c: Remove duplicate header 2018-11-26 10:13:04 +11:00
gnss gnss: sirf: fix activation retry handling 2018-12-06 17:22:23 +01:00
gpio ARM: SoC fixes 2018-12-02 12:19:44 -08:00
gpu Merge branch 'vmwgfx-fixes-4.20' of git://people.freedesktop.org/~thomash/linux into drm-fixes 2018-12-14 05:37:44 +10:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid 2018-12-10 11:04:41 -08:00
hsi
hv hyperv-fixes-4.20-rc6 2018-12-14 15:36:56 +01:00
hwmon hwmon: (w83795) temp4_type has writable permission 2018-11-18 14:34:56 -08:00
hwspinlock
hwtracing stm class: Use memcat_p() 2018-10-11 12:12:55 +02:00
i2c i2c: uniphier-f: fix violation of tLOW requirement for Fast-mode 2018-12-06 23:14:59 +01:00
ide ide: Change to use DEFINE_SHOW_ATTRIBUTE macro 2018-12-02 22:09:09 -08:00
idle Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-10-23 13:32:18 +01:00
iio iio/hid-sensors: Fix IIO_CHAN_INFO_RAW returning wrong values for signed numbers 2018-11-16 11:42:12 +00:00
infiniband IB/core: Fix oops in netdev_next_upper_dev_rcu() 2018-12-12 12:14:49 -05:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2018-12-04 08:47:04 -08:00
iommu iommu/vt-d: Use memunmap to free memremap 2018-11-22 17:02:21 +01:00
ipack
irqchip irqchip/irq-mvebu-sei: Fix a NULL vs IS_ERR() bug in probe function 2018-11-01 12:38:48 +01:00
isdn Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
leds LED fixes for 4.20-rc2 2018-11-08 17:49:04 -06:00
lightnvm lightnvm: pblk: guarantee that backpointer is respected on writer stall 2018-10-09 08:25:08 -06:00
macintosh memblock: stop using implicit alignment to SMP_CACHE_BYTES 2018-10-31 08:54:16 -07:00
mailbox - Convert print users to use the %pOFn format specifier 2018-10-29 10:30:44 -07:00
mcb
md md: fix raid10 hang issue caused by barrier 2018-12-20 08:53:24 -08:00
media media: Add a Kconfig option for the Request API 2018-12-05 13:07:43 -05:00
memory
memstick
message
mfd Revert "mfd: cros_ec: Use devm_kzalloc for private data" 2018-12-05 09:59:38 +00:00
misc misc: mic/scif: fix copy-paste error in scif_create_remote_lookup 2018-11-27 09:00:38 +01:00
mmc mmc: core: Use a minimum 1600ms timeout when enabling CACHE ctrl 2018-12-17 08:59:42 +01:00
mtd mtd: nand: Fix memory allocation in nanddev_bbt_init() 2018-11-28 15:41:50 +01:00
mux This is the bulk of GPIO changes for the v4.20 series: 2018-10-23 08:45:05 +01:00
net net: mvpp2: fix the phylink mode validation 2018-12-19 16:38:35 -08:00
nfc NFC: nfcmrvl_uart: fix OF child-node lookup 2018-10-23 13:28:53 -05:00
ntb ntb: idt: Alter the driver info comments 2018-11-01 10:33:12 -04:00
nubus
nvdimm libnvdimm, pfn: Pad pfn namespaces relative to other regions 2018-12-05 14:16:12 -08:00
nvme nvmet-rdma: fix response use after free 2018-12-07 07:11:11 -08:00
nvmem nvmem: core: fix regression in of_nvmem_cell_get() 2018-11-11 09:15:29 -08:00
of Devicetree fixes for 4.20-rc: 2018-11-09 16:41:58 -06:00
opp OPP: Fix parsing of multiple phandles in "operating-points-v2" property 2018-11-23 10:47:21 +05:30
oprofile
parisc parisc: Add alternative coding infrastructure 2018-10-17 17:22:26 +02:00
parport
pci PCI/AER: Queue one GHES event, not several uninitialized ones 2018-12-14 11:29:37 -06:00
pcmcia powerpc updates for 4.20 2018-10-26 14:36:21 -07:00
perf arm64 updates for 4.20: 2018-10-22 17:30:06 +01:00
phy phy: qcom-qusb2: Fix HSTX_TRIM tuning with fused value for SDM845 2018-11-21 13:13:58 +05:30
pinctrl pinctrl: sunxi: a83t: Fix IRQ offset typo for PH11 2018-12-07 13:32:19 +01:00
platform platform-drivers-x86 for v4.20-1 2018-11-01 08:42:21 -07:00
pnp
power Devicetree updates for 4.20: 2018-10-26 12:09:58 -07:00
powercap Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-10-23 13:32:18 +01:00
pps
ps3
ptp ptp: drop redundant kasprintf() to create worker name 2018-10-28 19:20:06 -07:00
pwm pwm: lpss: Only set update bit if we are actually changing the settings 2018-10-16 13:16:15 +02:00
rapidio
ras
regulator regulator: Regulator updates for next release 2018-10-23 01:54:44 +01:00
remoteproc remoteproc: qcom: q6v5-mss: Register segments/dumpfn for coredump 2018-10-19 12:54:03 -07:00
reset ARM: SoC driver updates for 4.17 2018-10-29 15:16:01 -07:00
rpmsg rpmsg: glink: smem: Support rx peak for size less than 4 bytes 2018-10-03 17:04:32 -07:00
rtc Staging and IIO driver fixes for 4.20-rc5 2018-11-30 12:23:44 -08:00
s390 virtio/s390: fix race in ccw_io_helper() 2018-12-06 14:22:35 -05:00
sbus drivers/sbus/char: add of_node_put() 2018-12-02 20:55:23 -08:00
scsi SCSI fixes on 20181218 2018-12-18 09:38:34 -08:00
sfi mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
sh
siox
slimbus slimbus: ngd: remove unnecessary check 2018-11-07 14:59:28 +01:00
sn
soc soc: ti: QMSS: Fix usage of irq_set_affinity_hint 2018-11-02 11:22:09 -07:00
soundwire
spi spi: Fixes for v4.20 2018-11-28 08:33:55 -08:00
spmi
ssb ssb: chipcommon: fix fall-through annotation 2018-10-05 11:37:20 +03:00
staging media fixes for v4.20-rc7 2018-12-12 18:24:32 -08:00
target scsi: target/core: Avoid that a kernel oops is triggered when COMPARE AND WRITE fails 2018-11-05 22:16:00 -05:00
tc TC: Set DMA masks for devices 2018-10-11 09:16:44 -07:00
tee
thermal thermal: stm32: Fix stm_thermal_read_factory_settings 2018-12-10 20:15:28 -08:00
thunderbolt thunderbolt: Prevent root port runtime suspend during NVM upgrade 2018-11-26 20:38:49 +01:00
tty Revert "serial: 8250: Fix clearing FIFOs in RS485 mode again" 2018-12-17 16:18:29 +01:00
uio uio_hv_generic: set callbacks on open 2018-12-11 14:23:17 +01:00
usb USB: xhci: fix 'broken_suspend' placement in struct xchi_hcd 2018-12-17 16:01:02 +01:00
uwb
vfio VFIO updates for v4.20 2018-10-31 11:01:38 -07:00
vhost Revert "net: vhost: lock the vqs one by one" 2018-12-12 21:56:20 -08:00
video backlight: pwm_bl: Fix brightness levels for non-DT case. 2018-12-10 15:37:47 +00:00
virt
virtio virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON 2018-10-24 20:57:55 -04:00
visorbus
vlynq
vme
w1 w1: IAD Register is yet readable trough iad sys file. Fix snprintf (%u for unsigned, count for max size). 2018-10-15 20:50:32 +02:00
watchdog watchdog: ts4800: release syscon device node in ts4800_wdt_probe() 2018-10-22 10:16:28 +02:00
xen xen: fixes for 4.20-rc5 2018-12-02 12:15:55 -08:00
zorro
Kconfig
Makefile