linux/drivers/scsi
Hugh Dickins 164fc5dcd6 scsi: fix sense_slab/bio swapping livelock
Since 2.6.25-rc7, I've been seeing an occasional livelock on one x86_64
machine, copying kernel trees to tmpfs, paging out to swap.

Signature: 6000 pages under writeback but never getting written; most
tasks of interest trying to reclaim, but each get_swap_bio waiting for a
bio in mempool_alloc's io_schedule_timeout(5*HZ); every five seconds an
atomic page allocation failure report from kblockd failing to allocate a
sense_buffer in __scsi_get_command.

__scsi_get_command has a (one item) free_list to protect against this,
but rc1's [SCSI] use dynamically allocated sense buffer
de25deb180 upset that slightly.  When it
fails to allocate from the separate sense_slab, instead of giving up, it
must fall back to the command free_list, which is sure to have a
sense_buffer attached.

Either my earlier -rc testing missed this, or there's some recent
contributory factor.  One very significant factor is SLUB, which merges
slab caches when it can, and on 64-bit happens to merge both bio cache
and sense_slab cache into kmalloc's 128-byte cache: so that under this
swapping load, bios above are liable to gobble up all the slots needed
for scsi_cmnd sense_buffers below.

That's disturbing behaviour, and I tried a few things to fix it.  Adding
a no-op constructor to the sense_slab inhibits SLUB from merging it, and
stops all the allocation failures I was seeing; but it's rather a hack,
and perhaps in different configurations we have other caches on the
swapout path which are ill-merged.

Another alternative is to revert the separate sense_slab, using
cache-line-aligned sense_buffer allocated beyond scsi_cmnd from the one
kmem_cache; but that might waste more memory, and is only a way of
diverting around the known problem.

While I don't like seeing the allocation failures, and hate the idea of
all those bios piled up above a scsi host working one by one, it does
seem to emerge fairly soon with the livelock fix.  So lacking better
ideas, stick with that one clear fix for now.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Zijlstra <a.p.ziljstra@chello.nl>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-06 16:10:08 -07:00
..
aacraid [SCSI] aacraid: informational sysfs value corrections 2008-02-11 10:20:54 -06:00
aic7xxx Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 2008-02-23 12:29:16 -08:00
aic7xxx_old
aic94xx [SCSI] aic94xx: fix TMF ascb handling to prevent sequencer panic 2008-02-24 00:40:57 -06:00
arcmsr [SCSI] arcmsr: fix iounmap error for Type B adapter 2008-03-14 15:25:26 -05:00
arm [SCSI] fas216: fix up the previous fas216 commit 2008-02-18 08:57:16 -06:00
dpt
ibmvscsi [SCSI] tgt: fix build errors when dprintk is defined 2008-03-03 13:19:52 -06:00
libsas [SCSI] libsas: Warn if ATA device detected but CONFIG_SCSI_SAS_ATA not set 2008-03-27 15:12:16 -07:00
lpfc [SCSI] lpfc: Balance locking 2008-02-22 17:15:35 -06:00
megaraid [SCSI] MegaRAID driver management char device moved to misc 2008-02-11 10:20:53 -06:00
pcmcia
qla2xxx [SCSI] qla2xxx: Update version number to 8.02.00-k9. 2008-03-03 13:11:50 -06:00
qla4xxx [SCSI] qla4xxx: regression - add start scan callout 2008-03-05 12:03:54 -06:00
sym53c8xx_2 [SCSI] sym53c8xx: fix resid calculation 2008-02-07 18:02:34 -06:00
.gitignore
3w-9xxx.c
3w-9xxx.h
3w-xxxx.c
3w-xxxx.h
53c700.c
53c700.h
53c700.scr
53c700_d.h_shipped
a100u2w.c [SCSI] a100u2w: fix bitmap lookup routine 2008-03-20 09:19:25 -05:00
a100u2w.h
a2091.c cleanup after APUS removal 2008-02-06 10:41:01 -08:00
a2091.h
a3000.c cleanup after APUS removal 2008-02-06 10:41:01 -08:00
a3000.h
a4000t.c
advansys.c [SCSI] advansys: Fix bug in AdvLoadMicrocode 2008-03-07 10:05:43 -06:00
aha152x.c
aha152x.h
aha1542.c
aha1542.h
aha1740.c
aha1740.h
aic7xxx_old.c Remove pointless casts from void pointers 2008-02-06 10:41:01 -08:00
atari_dma_emul.c
atari_NCR5380.c
atari_scsi.c
atari_scsi.h
atp870u.c
atp870u.h
BusLogic.c
BusLogic.h
bvme6000_scsi.c
ch.c
constants.c
dc395x.c [SCSI] dc395x: fix uninitialized var warning 2008-02-07 18:02:43 -06:00
dc395x.h
dmx3191d.c
dpt_i2o.c
dpti.h
dtc.c
dtc.h
eata.c
eata_generic.h
eata_pio.c
eata_pio.h
esp_scsi.c
esp_scsi.h
fd_mcs.c
fdomain.c
fdomain.h
FlashPoint.c
g_NCR5380.c
g_NCR5380.h
g_NCR5380_mmio.c
gdth.c [SCSI] gdth: Allocate sense_buffer to prevent NULL pointer dereference 2008-03-14 20:31:18 -05:00
gdth.h [SCSI] gdth: fix to internal commands execution 2008-02-27 15:54:26 -08:00
gdth_ioctl.h
gdth_proc.c [SCSI] gdth: don't call pci_free_consistent under spinlock 2008-02-18 09:02:25 -06:00
gdth_proc.h
gvp11.c cleanup after APUS removal 2008-02-06 10:41:01 -08:00
gvp11.h
hosts.c [SCSI] hosts.c: fixes for "no error" reported after error scenarios 2008-03-27 15:09:54 -07:00
hptiop.c
hptiop.h
ibmmca.c
ide-scsi.c ide: add ide_read_[alt]status() inline helpers 2008-02-06 02:57:51 +01:00
imm.c
imm.h
in2000.c
in2000.h
initio.c
initio.h
ipr.c libata: eliminate the home grown dma padding in favour of 2008-02-19 11:36:56 +01:00
ipr.h
ips.c [SCSI] ips: fix data buffer accessors conversion bug 2008-02-19 10:49:27 -06:00
ips.h
iscsi_tcp.c [SCSI] iscsi: fix up iscsi printk prefix 2008-02-07 18:02:37 -06:00
iscsi_tcp.h
jazz_esp.c
Kconfig [SCSI] Fix dependency problems in SCSI drivers 2008-03-08 18:30:19 -06:00
lasi700.c
libiscsi.c [SCSI] iscsi regression: check for zero max session cmds 2008-02-27 15:52:46 -08:00
libsrp.c
mac53c94.c
mac53c94.h
mac_scsi.c
mac_scsi.h
Makefile [SCSI] mvsas: Add Marvell 6440 SAS/SATA driver 2008-02-23 07:29:31 -06:00
megaraid.c [SCSI] megaraid: outb_p extermination 2008-02-18 08:57:16 -06:00
megaraid.h
mesh.c PM: Introduce PM_EVENT_HIBERNATE callback state 2008-02-23 10:40:04 -08:00
mesh.h
mvme16x_scsi.c
mvme147.c
mvme147.h
mvsas.c [SCSI] mvsas: check subsystem id 2008-03-28 12:32:22 -05:00
ncr53c8xx.c
ncr53c8xx.h
NCR53c406a.c
NCR5380.c
NCR5380.h
NCR_D700.c
NCR_D700.h
NCR_Q720.c
NCR_Q720.h
nsp32.c
nsp32.h
nsp32_debug.c
nsp32_io.h
osst.c
osst.h
osst_detect.h
osst_options.h
pas16.c
pas16.h
ppa.c
ppa.h
ps3rom.c [SCSI] ps3rom: disable clustering 2008-03-03 13:08:13 -06:00
ql1040_fw.h
ql1280_fw.h
ql12160_fw.h
qla1280.c
qla1280.h
qlogicfas.c
qlogicfas408.c
qlogicfas408.h
qlogicpti.c [SCSI] qlogicpt: section fixes 2008-02-23 09:07:32 -06:00
qlogicpti.h
qlogicpti_asm.c
raid_class.c
script_asm.pl
scsi.c scsi: fix sense_slab/bio swapping livelock 2008-04-06 16:10:08 -07:00
scsi.h
scsi_debug.c [SCSI] scsi_debug: disable clustering 2008-02-18 08:57:16 -06:00
scsi_debug.h
scsi_devinfo.c
scsi_error.c
scsi_ioctl.c
scsi_lib.c [SCSI] fix media change events for polled devices 2008-03-19 11:51:28 -05:00
scsi_lib_dma.c
scsi_logging.h
scsi_module.c
scsi_netlink.c
scsi_priv.h
scsi_proc.c
scsi_sas_internal.h
scsi_scan.c [SCSI] docbook: fix scsi source file 2008-03-03 13:17:14 -06:00
scsi_sysctl.c
scsi_sysfs.c Revert "[SCSI] fix bsg queue oops with iscsi logout" 2008-03-26 09:09:19 -07:00
scsi_tgt_if.c
scsi_tgt_lib.c [SCSI] tgt: set the data length properly 2008-03-03 13:19:35 -06:00
scsi_tgt_priv.h
scsi_transport_api.h
scsi_transport_fc.c
scsi_transport_fc_internal.h
scsi_transport_iscsi.c [SCSI] iscsi class: regression - fix races with state manipulation and blocking/unblocking 2008-03-05 12:04:09 -06:00
scsi_transport_sas.c
scsi_transport_spi.c
scsi_transport_srp.c
scsi_transport_srp_internal.h
scsi_typedefs.h
scsi_wait_scan.c
scsicam.c
sd.c [SCSI] sd, sr: do not emit change event at device add 2008-03-19 11:28:56 -05:00
ses.c [SCSI] ses: fix data corruption 2008-02-18 08:57:15 -06:00
sg.c Convert SG from nopage to fault. 2008-02-07 19:09:22 -08:00
sgiwd93.c
sim710.c
sni_53c710.c
sr.c [SCSI] sd, sr: do not emit change event at device add 2008-03-19 11:28:56 -05:00
sr.h [SCSI] sr: fix test unit ready responses 2008-02-07 18:02:44 -06:00
sr_ioctl.c [SCSI] sr: fix test unit ready responses 2008-02-07 18:02:44 -06:00
sr_vendor.c
st.c [SCSI] st: compile fix when DEBUG set to one 2008-02-22 17:21:37 -06:00
st.h [SCSI] st: compile fix when DEBUG set to one 2008-02-22 17:21:37 -06:00
st_options.h
stex.c [SCSI] stex: stex_internal_copy should be called with sg_count in struct st_ccb 2008-02-22 17:20:59 -06:00
sun3_NCR5380.c
sun3_scsi.c
sun3_scsi.h
sun3_scsi_vme.c
sun3x_esp.c [SCSI] sun3x_esp: convert to esp_scsi 2008-02-07 18:02:33 -06:00
sun_esp.c
sym53c416.c [SCSI] sym53c416: fix module parameters 2008-02-12 15:24:58 -06:00
sym53c416.h
t128.c
t128.h
tmscsim.c
tmscsim.h
u14-34f.c [SCSI] u14-34f: fix data direction bug 2008-02-07 18:02:44 -06:00
ultrastor.c
ultrastor.h
wd33c93.c
wd33c93.h
wd7000.c
zalon.c
zorro7xx.c