linux/drivers
Linas Vepstas 4c4bd5a97a spidernet: Cure RX ram full bug
This patch fixes a rare deadlock that can occur when the kernel
is not able to empty out the RX ring quickly enough. Below follows
a detailed description of the bug and the fix.

As long as the OS can empty out the RX buffers at a rate faster than
the hardware can fill them, there is no problem. If, for some reason,
the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
pointer will catch up to the head, notice the not-empty condition,
ad stop. However, RX packets may still continue arriving on the wire.
The spidernet chip can save some limited number of these in local RAM.
When this local ram fills up, the spider chip will issue an interrupt
indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
will be set in GHIINT1STS).  When te RX ram full condition occurs,
a certain bug/feature is triggered that has to be specially handled.
This section describes the special handling for this condition.

When the OS finally has a chance to run, it will empty out the RX ring.
In particular, it will clear the descriptor on which the hardware had
stopped. However, once the hardware has decided that a certain
descriptor is invalid, it will not restart at that descriptor; instead
it will restart at the next descr. This potentially will lead to a
deadlock condition, as the tail pointer will be pointing at this descr,
which, from the OS point of view, is empty; the OS will be waiting for
this descr to be filled. However, the hardware has skipped this descr,
and is filling the next descrs. Since the OS doesn't see this, there
is a potential deadlock, with the OS waiting for one descr to fill,
while the hardware is waiting for a differen set of descrs to become
empty.

A call to show_rx_chain() at this point indicates the nature of the
problem. A typical print when the network is hung shows the following:

net eth1: Spider RX RAM full, incoming packets might be discarded!
net eth1: Total number of descrs=256
net eth1: Chain tail located at descr=255
net eth1: Chain head is at 255
net eth1: HW curr desc (GDACTDPA) is at 0
net eth1: Have 1 descrs with stat=xa0800000
net eth1: HW next desc (GDACNEXTDA) is at 1
net eth1: Have 127 descrs with stat=x40800101
net eth1: Have 1 descrs with stat=x40800001
net eth1: Have 126 descrs with stat=x40800101
net eth1: Last 1 descrs with stat=xa0800000

Both the tail and head pointers are pointing at descr 255, which is
marked xa... which is "empty". Thus, from the OS point of view, there
is nothing to be done. In particular, there is the implicit assumption
that everything in front of the "empty" descr must surely also be empty,
as explained in the last section. The OS is waiting for descr 255 to
become non-empty, which, in this case, will never happen.

The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
Since its already full, the hardware can do nothing more, and thus has
halted processing. Notice that descrs 0 through 254 are all marked
"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
descr 254, since tail was at 255.) Thus, the system is deadlocked,
and there can be no forward progress; the OS thinks there's nothing
to do, and the hardware has nowhere to put incoming data.

This bug/feature is worked around with the spider_net_resync_head_ptr()
routine. When the driver receives RX interrupts, but an examination
of the RX chain seems to show it is empty, then it is probable that
the hardware has skipped a descr or two (sometimes dozens under heavy
network conditions). The spider_net_resync_head_ptr() subroutine will
search the ring for the next full descr, and the driver will resume
operations there.  Since this will leave "holes" in the ring, there
is also a spider_net_resync_tail_ptr() that will skip over such holes.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-06-20 19:09:32 -04:00
..
acorn
acpi toshiba_acpi: fix section mismatch in allyesconfig 2007-06-16 13:16:15 -07:00
amba
ata libata: limit post SRST nsect/lbal wait to ~100ms 2007-06-11 00:52:53 -04:00
atm [ATM]: Fix warning. 2007-06-03 18:08:44 -07:00
auxdisplay cfag12864bfb: Use sys_ instead of cfb_ framebuffer accessors 2007-06-01 08:18:28 -07:00
base firmware: remove orphaned Email 2007-06-08 12:41:08 -07:00
block loop: preallocate eight loop devices 2007-06-08 17:23:32 -07:00
bluetooth [Bluetooth] Always send HCI_Reset for Broadcom devices 2007-05-24 14:26:15 +02:00
cdrom potential parse error in ifdef part 3 2007-06-08 17:23:33 -07:00
char random: fix output buffer folding 2007-06-16 13:16:16 -07:00
clocksource
connector
cpufreq Add suspend-related notifications for CPU hotplug 2007-05-09 12:30:56 -07:00
crypto [CRYPTO] geode: Fix in-place operations and set key 2007-05-24 21:23:24 +10:00
dio
dma [S390] Kconfig: unwanted menus for s390. 2007-05-10 15:46:07 +02:00
edac [S390] Kconfig: menus with depends on HAS_IOMEM. 2007-05-10 15:46:07 +02:00
eisa virtual_eisa_root_init() should be __init 2007-05-08 11:15:02 -07:00
fc4
firewire firewire: Only set client->iso_context if allocation was successful. 2007-06-21 00:09:41 +02:00
firmware
hid USB HID: hiddev - fix race between hiddev_send_event() and hiddev_release() 2007-05-10 08:45:56 +02:00
hwmon hwmon/applesmc: Handle name file creation error and deletion 2007-05-27 22:17:43 +02:00
i2c [ARM] 4403/1: Make the PXA-I2C driver work with lockdep validator 2007-05-26 10:09:39 +01:00
ide Resume from RAM on HPC nx6325 broken 2007-06-16 02:24:43 +02:00
ieee1394 ieee1394: fix to ether1394_tx in ether1394.c 2007-06-16 12:43:20 +02:00
infiniband IB/mlx4: Make sure inline data segments don't cross a 64 byte boundary 2007-06-18 09:23:47 -07:00
input x86_64: Quieten Atari keyboard warnings in Kconfig 2007-06-20 14:27:26 -07:00
isdn isdn/diva: fix section mismatch 2007-06-08 17:23:33 -07:00
kvm KVM: Prevent guest fpu state from leaking into the host 2007-06-15 12:30:59 +03:00
leds [S390] Kconfig: menus with depends on HAS_IOMEM. 2007-05-10 15:46:07 +02:00
macintosh x86: Only make Macintosh drivers default on Macs 2007-06-20 14:27:26 -07:00
mca mca: add integrated device bus matching 2007-05-09 12:30:49 -07:00
md md: fix bug in error handling during raid1 repair 2007-06-16 13:16:15 -07:00
media V4L/DVB (5751): Ivtv: fix ia64 printk format warnings. 2007-06-08 08:54:41 -03:00
message [SCSI] fusion: fix for BZ 8426 - massive slowdown on SCSI CD/DVD drive 2007-06-05 11:04:56 -05:00
mfd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2007-06-04 13:27:33 -07:00
misc Pull now into release branch 2007-06-02 00:48:48 -04:00
mmc mmc: get back read-only switch function 2007-06-13 19:11:20 +02:00
mtd Merge git://git.infradead.org/mtd-2.6 2007-06-04 17:54:09 -07:00
net spidernet: Cure RX ram full bug 2007-06-20 19:09:32 -04:00
nubus
oprofile Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
parisc [PARISC] fix section mismatch in superio serial drivers 2007-05-27 15:01:19 -04:00
parport [PARISC] fix section mismatch in parport_gsc 2007-05-27 12:13:53 -04:00
pci msi: mask the msix vector before we unmap it 2007-06-01 08:18:27 -07:00
pcmcia at91: fix enable/disable_irq_wake symmetry in pcmcia driver 2007-05-31 07:58:13 -07:00
pnp [S390] Kconfig: menus with depends on HAS_IOMEM. 2007-05-10 15:46:07 +02:00
ps3 Merge branch 'linux-2.6' 2007-05-08 13:37:51 +10:00
rapidio
rtc RTC: use fallback IRQ if PNP tables don't provide one 2007-06-01 08:18:29 -07:00
s390 [S390] Fix zfcpdump header 2007-06-19 13:10:18 +02:00
sbus [SPARC]: Missing #include <linux/mm.h> in drivers/sbus/char/flash.c 2007-05-31 01:52:53 -07:00
scsi Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 2007-06-18 10:38:09 -07:00
serial Blackfin serial driver: decouple PARODD and CMSPAR checking from PARENB 2007-06-11 16:16:45 +08:00
sh
sn
spi Blackfin SPI driver: fix bug SPI DMA incomplete transmission 2007-06-11 17:34:17 +08:00
tc potential parse error in ifdef part 3 2007-06-08 17:23:33 -07:00
telephony [S390] Kconfig: menus with depends on HAS_IOMEM. 2007-05-10 15:46:07 +02:00
usb OHCI: Fix machine check in ohci_hub_status_data 2007-06-08 16:24:31 -07:00
video Merge master.kernel.org:/pub/scm/linux/kernel/git/kyle/parisc-2.6 2007-06-14 18:36:21 -07:00
w1 [S390] Kconfig: menus with depends on HAS_IOMEM. 2007-05-10 15:46:07 +02:00
zorro Amiga Zorro bus: kill resource_size_t warnings 2007-05-04 17:59:08 -07:00
Kconfig
Makefile Merge branch 'juju' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 2007-05-10 13:30:08 -07:00