linux/include
NeilBrown dfc7064500 md: restart recovery cleanly after device failure.
When we get any IO error during a recovery (rebuilding a spare), we abort
the recovery and restart it.

For RAID6 (and multi-drive RAID1) it may not be best to restart at the
beginning: when multiple failures can be tolerated, the recovery may be
able to continue and re-doing all that has already been done doesn't make
sense.

We already have the infrastructure to record where a recovery is up to
and restart from there, but it is not being used properly.
This is because:
  - We sometimes abort with MD_RECOVERY_ERR rather than just MD_RECOVERY_INTR,
    which causes the recovery not be be checkpointed.
  - We remove spares and then re-added them which loses important state
    information.

The distinction between MD_RECOVERY_ERR and MD_RECOVERY_INTR really isn't
needed.  If there is an error, the relevant drive will be marked as
Faulty, and that is enough to ensure correct handling of the error.  So we
first remove MD_RECOVERY_ERR, changing some of the uses of it to
MD_RECOVERY_INTR.

Then we cause the attempt to remove a non-faulty device from an array to
fail (unless recovery is impossible as the array is too degraded).  Then
when remove_and_add_spares attempts to remove the devices on which
recovery can continue, it will fail, they will remain in place, and
recovery will continue on them as desired.

Issue:  If we are halfway through rebuilding a spare and another drive
fails, and a new spare is immediately available,  do we want to:
 1/ complete the current rebuild, then go back and rebuild the new spare or
 2/ restart the rebuild from the start and rebuild both devices in
    parallel.

Both options can be argued for.  The code currently takes option 2 as
  a/ this requires least code change
  b/ this results in a minimally-degraded array in minimal time.

Cc: "Eivind Sarto" <ivan@kasenna.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:10 -07:00
..
acpi
asm-alpha asm-{alpha,h8300,um,v850,xtensa}/param.h: unbreak HZ for userspace 2008-05-14 19:11:14 -07:00
asm-arm Merge branch 'omap-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6 2008-05-17 22:56:29 +01:00
asm-avr32
asm-blackfin Blackfin serial driver: add extra IRQ flag for 8250 serial driver 2008-05-17 18:21:57 +08:00
asm-cris
asm-frv read_barrier_depends arch fixlets 2008-05-14 10:05:18 -07:00
asm-generic
asm-h8300 asm-{alpha,h8300,um,v850,xtensa}/param.h: unbreak HZ for userspace 2008-05-14 19:11:14 -07:00
asm-ia64 KVM: ia64: Set KVM_IOAPIC_NUM_PINS to 48 2008-05-18 14:34:16 +03:00
asm-m32r
asm-m68k m68k: Prefix ISA type with ISA_TYPE_ 2008-05-18 13:28:50 -07:00
asm-m68knommu
asm-mips fix parenthesis in include/asm-mips/mach-au1x00/au1000.h 2008-05-24 09:56:08 -07:00
asm-mn10300 MN10300: Make cpu_relax() invoke barrier() 2008-05-08 10:49:39 -07:00
asm-parisc parisc: use conditional macro for 64-bit wide ops 2008-05-15 11:03:43 -04:00
asm-powerpc [POWERPC] mpic: Fix use of uninitialized variable 2008-05-23 16:15:37 +10:00
asm-ppc [POWERPC] ppc: More compile fixes 2008-05-12 22:57:51 +10:00
asm-s390 [S390] s390dbf: Use const char * for dbf name. 2008-05-15 16:52:39 +02:00
asm-sh sh: use the common ascii hex helpers 2008-05-16 15:09:08 +09:00
asm-sparc sparc: remove CVS keywords 2008-05-20 00:33:44 -07:00
asm-sparc64 sparc64: Add global register dumping facility. 2008-05-20 00:33:45 -07:00
asm-um asm-{alpha,h8300,um,v850,xtensa}/param.h: unbreak HZ for userspace 2008-05-14 19:11:14 -07:00
asm-v850 asm-{alpha,h8300,um,v850,xtensa}/param.h: unbreak HZ for userspace 2008-05-14 19:11:14 -07:00
asm-x86 x86: strengthen 64-bit p?d_bad() 2008-05-20 08:14:45 -07:00
asm-xtensa asm-{alpha,h8300,um,v850,xtensa}/param.h: unbreak HZ for userspace 2008-05-14 19:11:14 -07:00
crypto
keys
linux md: restart recovery cleanly after device failure. 2008-05-24 09:56:10 -07:00
math-emu
media Fix a deadlock in the bttv driver 2008-05-20 10:12:26 -07:00
mtd
net Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6 2008-05-19 16:29:40 -07:00
pcmcia
rdma
rxrpc
scsi
sound [ALSA] ASoC: build fix for snd_soc_info_bool_ext 2008-05-13 14:47:44 +02:00
video
xen
Kbuild