Commit graph

45043 commits

Author SHA1 Message Date
Stephen Hemminger 25d82d7a2f sky2: dual port NAPI problem
Shutting down port 0 disables the NAPI poll used by both ports.
The long term fix will be to separate NAPI object from net device
until then just reenable if needed.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 16:36:42 -05:00
Randy Dunlap ce9f7fe3c3 via-velocity uses INET interfaces
via-velocity doesn't build when CONFIG_INET=n:

drivers/built-in.o: In function `velocity_unregister_notifier':
via-velocity.c:(.text+0xe9b46): undefined reference to `unregister_inetaddr_notifier'
drivers/built-in.o: In function `velocity_init_module':
via-velocity.c:(.init.text+0xa027): undefined reference to `register_inetaddr_notifier'

I wanted to make this change in drivers/net/Kconfig, but
this isn't legal kconfig language:

 config VIA_VELOCITY
        tristate "VIA Velocity support"
        depends on NET_PCI && PCI
+       depends on INET if PM
        select CRC32
        select CRC_CCITT
        select MII

so fix it in via-velocity.c instead.
Builds with all 4 combinations of CONFIG_NET & CONFIG_PM.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 16:28:20 -05:00
Herbert Xu 683a2aa339 e1000: Do not truncate TSO TCP header with 82544 workaround
The e1000 driver has a workaround for 82544 on PCI-X where if the
terminating byte of a buffer is at addresses 0-3 mod 8, then 4 bytes
are shaved off it and defered to a new segment.  This is due to an
erratum that could otherwise cause TX hangs.

Unfortunately this breaks TSO because it may cause the TCP header to
be split over two segments which itself causes TX hangs.  The solution
is to pull 4 bytes of data up from the next segment rather than pushing
4 bytes off.  This ensures the TCP header remains in one piece and
works around the PCI-X hang.

This patch is based on one from Jesse Brandeburg.

This bug has been trigered by both CONFIG_DEBUG_SLAB as well as Xen.

Note that the only reason we don't see this normally is because the
TCP stack starts writing from the end, i.e., it writes the TCP header
first then slaps on the IP header, etc.  So the end of the TCP header
(skb->tail - 1 here) is always aligned correctly.

Had we made the start of the IP header (e.g., IPv6) 8-byte aligned
instead, this would happen for normal TCP traffic as well.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 16:28:20 -05:00
Brice Goglin 1a63e846a4 myri10ge: handle failures in suspend and resume
On suspend, handle pci_set_power_state errors, and on resume
handle failures in pci_resume_state().

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 16:28:20 -05:00
Brice Goglin 83f6e15245 myri10ge: no need to save MSI and PCIe state in the driver
The PCI MSI and express state are already saved and restored by the
current versions of pci_save_state/pci_restore_state.
Therefore it is no longer necessary for the driver to do it.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 16:28:20 -05:00
Brice Goglin 3621cec5b5 myri10ge: make msi configurable at runtime through sysfs
Now that IRQ are requested is called on open() and freed on close(),
we can safely switch from/to MSI without unloading the module.

We are guaranteed to correctly free IRQ even if the sysfs file got
written in the meantime since the MSI initialization is stored in
mgp->msi_enabled.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 16:28:20 -05:00
Brice Goglin df30a740e4 myri10ge: move request_irq to myri10ge_open
Request IRQ in myri10ge_open() and free in close() instead of probe()
and remove() to eliminate potential race between the watchdog and the
interrupt handler. Additionaly, the interrupt handler won't get called
on shared irq anymore when the interface is down.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 16:28:20 -05:00
Brice Goglin 7adda30c82 myri10ge: match number of save_state and restore
Since pci_save_state() pushes MSI and PCIe states on a kind of stack,
myri10ge saving the state in advance for parity recovery will push the
state again on the stack on suspend. This leads to some memory leak.
We add a couple additional calls to save_state and restore_state so
that we don't leak anymore.

For the future, we are thinking of a better way to recover from parity
error without using pci_save_state().

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 16:28:20 -05:00
Francois Romieu a27993f3d9 r8169: use the broken_parity_status field in pci_dev
The former option is removed and platform code can now specify the
expected behavior.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 16:24:11 -05:00
Francois Romieu d15e9c4d9a netpoll: drivers must not enable IRQ unconditionally in their NAPI handler
net/core/netpoll.c::netpoll_send_skb() calls the poll handler when
it is available. As netconsole can be used from almost any context,
IRQ must not be enabled blindly in the NAPI handler of a driver which
supports netpoll.

b57bd06655 fixed the issue for the
8139too.c driver.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 16:24:11 -05:00
Jesse Brandeburg 79f3d3996f [PATCH] e1000: No-delay link detection at interface up
Currently after an interface up, the link state is detected 2 seconds later
when the first watchdog timer runs. This patch changes that by triggering
the hardware to generate a link-change interrupt from the up() function
instead. This has the result that the link state gets detected immediately
and without races. This has the potential to speed up booting since a normal
distribution boot process waits for a link before DHCP is attempted.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:30 -05:00
Jeff Garzik 15e376b4ee e1000: 3 new driver stats for managability testing
Add 3 extra packet redirect counters for tracking purposes to make sure
we can test that all packets arrive properly.

Originally from Jesse Brandeburg <jesse.brandeburg@intel.com>,
rewritten to use feature flags by me.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:30 -05:00
Jesse Brandeburg 1f753861d2 [PATCH] e1000: Make the copybreak value a module parameter
Allow the user to vary the size that copybreak works. Currently cb is enabled
for packets < 256 bytes, but various tests indicate that this should be
configurable for specific use cases. In addition, this parameter allows us
to force never/always during testing to get full and predictable coverage of
both code paths.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:30 -05:00
Bruce Allan 018ea44ef1 [PATCH] e1000: Fix PBA allocation calculations
Assign the PBA to be large enough to contain at least 2 jumbo frames on
all adapters. This dramatically increases performance on several adapters
and fixes TX performance degradation issues where the PBA was misallocated
in the old algorithm.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:30 -05:00
Jesse Brandeburg d89b6c6750 [PATCH] e1000: narrow down the scope of the tipg timer tweak
the driver has (ancient) code for messing with TIPG from the 82542 days.
Unfortunately this code was running on our current adapters and setting
TIPG for fiber to be +1 over the copper value.  This caused 1.45Mpps
to be sent instead of 1.487Mpps.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:30 -05:00
Jeff Kirsher c3813ae661 [PATCH] e1000: fix ethtool reported bus type for older adapters
For older adapters we know that they are of the PCI bus type, so we can
just set this.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:30 -05:00
Bruce Allan 83cd827977 [PATCH] e1000: fix to set the new max frame size before resetting the adapter
This bugfix makes sure that the driver data reflects the full new situation
before the adapter is reinitialized.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:30 -05:00
Jeff Garzik bb8e3311ef e1000: workaround for the ESB2 NIC RX unit issue
In rare occasions, ESB2 systems would end up started without the RX
unit being turned on. Add a check that runs post-init to work around
this issue.

Originally from Jesse Brandeburg <jesse.brandeburg@intel.com>,
rewritten to use feature flags by me.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:30 -05:00
Jesse Brandeburg 72f3ab7462 [PATCH] e1000: disable TSO on the 82544 with slab debugging
CONFIG_DEBUG_SLAB changes alignments of the data structures the slab
allocators return. These break certain workarounds for TSO on the 82544.
Since DEBUG_SLAB is relatively rare and not used for performance sensitive
cases, the simplest fix is to disable TSO in this special situation.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:29 -05:00
Jesse Brandeburg 3d5460a0ba [PATCH] e1000: Fix Wake-on-Lan with forced gigabit speed
If the user has forced gigabit speed, phy power management must be disabled;
otherwise the NIC would try to negotiate to a linkspeed of 10/100 mbit on
shutdown, which would lead to a total loss of link. This loss of link breaks
Wake-on-Lan and IPMI.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:29 -05:00
Jeff Garzik 0fccd0e9e3 e1000: consolidate managability enabling/disabling
Several bugs existed in how we handle manageability issues all
over the driver.  This patch consolidates all the managability
release and init code in two single functions and call them from
appropriate locations. This fixes several BMC packet redirect issues
and powerup/down hiccups.

Originally from Jesse Brandeburg <jesse.brandeburg@intel.com>, rewritten
to use feature flags by me.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:29 -05:00
Jeff Garzik 167fb28416 e1000: omit stats for broken counter in 82543
The 82543 chip does not count tx_carrier_errors properly in FD mode;
report zeros instead of garbage.

Originally from Jesse Brandeburg <jesse.brandeburg@intel.com>, rewritten
to use feature flags by me.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:28 -05:00
Jeff Garzik bd2371ebcc e1000: For sanity, reformat e1000_set_mac_type(), struct e1000_hw[_stats]
Makes future changes a bit more readable.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:28 -05:00
Jesse Brandeburg 2b65326e67 [PATCH] e1000: dynamic itr: take TSO and jumbo into account
The dynamic interrupt rate control patches omitted proper counting
for jumbo's and TSO resulting in suboptimal interrupt mitigation strategies.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:28 -05:00
Jesse Brandeburg 7d16e65ba5 [PATCH] e1000: The user-supplied itr setting needs the lower 2 bits masked off
The lower 2 bits of a user-supplied itr setting (via ethtool) need to be
masked off: These lower two bits are used as control bits.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-26 15:51:28 -05:00
Linus Torvalds 3bf8ba38f3 Linux 2.6.20-rc2 2006-12-23 20:00:32 -08:00
Linus Torvalds cb876f4514 Fix up CIFS for "test_clear_page_dirty()" removal
This also adds he required page "writeback" flag handling, that cifs
hasn't been doing and that the page dirty flag changes made obvious.

Acked-by: Steve French <smfltc@us.ibm.com>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-23 16:19:07 -08:00
OGAWA Hirofumi 8d1c481960 [PATCH] arch/i386/pci/mmconfig.c tlb flush fix
We use the fixmap for accessing pci config space in pci_mmcfg_read/write().
The problem is in pci_exp_set_dev_base(). It is caching a last
accessed address to avoid calling set_fixmap_nocache() whenever
pci_mmcfg_read/write() is used.

  static inline void pci_exp_set_dev_base(int bus, int devfn)
  {
	u32 dev_base = base | (bus << 20) | (devfn << 12);
	if (dev_base != mmcfg_last_accessed_device) {
		mmcfg_last_accessed_device = dev_base;
		set_fixmap_nocache(FIX_PCIE_MCFG, dev_base);
	}
  }

            cpu0                                        cpu1
  ---------------------------------------------------------------------------
    pci_mmcfg_read("device-A")
        pci_exp_set_dev_base()
            set_fixmap_nocache()
                                              pci_mmcfg_read("device-B")
                                                  pci_exp_set_dev_base()
                                                      set_fixmap_nocache()
    pci_mmcfg_read("device-B")
        pci_exp_set_dev_base()
            /* doesn't flush tlb */

But if cpus accessed the above order, the second pci_mmcfg_read() on
cpu0 doesn't flush the TLB, because "mmcfg_last_accessed_device" is
device-B.  So, second pci_mmcfg_read() on cpu0 accesses a device-A via
a previous TLB cache. This problem became the cause of several strange
behavior.

This patches fixes this situation by adds "mmcfg_last_accessed_cpu" check.

[ Alternatively, we could make a per-cpu mapping area or something. Not
  that it's probably worth it, but if we wanted to avoid all locking and
  instead just disable preemption, that would be the way to go. --Linus ]

Signed-off-by: OGAWA Hirofumi <hogawa@miraclelinux.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-23 14:06:33 -08:00
Ingo Molnar e1d9fd2e3d [PATCH] suspend: fix suspend on single-CPU systems
Clark Williams reported that suspend doesnt work on his laptop on
2.6.20-rc1-rt kernels. The bug was introduced by the following cleanup
commit:

 commit 112cecb2cc
 Author: Siddha, Suresh B <suresh.b.siddha@intel.com>
 Date:   Wed Dec 6 20:34:31 2006 -0800

    [PATCH] suspend: don't change cpus_allowed for task initiating the suspend

because with this change 'error' is not initialized to 0 anymore, if
there are no other online CPUs. (i.e. if the system is single-CPU).

the fix is the initialize it to 0. The really weird thing is that my
version of gcc does not warn about this non-initialized variable
situation ...

(also fix the kernel printk in the error branch, it was missing a
 newline)

Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-23 13:59:33 -08:00
Linus Torvalds ffaa82008f Fix reiserfs after "test_clear_page_dirty()" removal
Thanks to Len Brown for testing this fix, since while they have in the
past, none of my machines run reiserfs at the moment.

Cc: Vladimir V. Saveliev <vs@namesys.com>
Acked-by: Len Brown <lenb@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-23 09:32:45 -08:00
Linus Torvalds 8368e328df Clean up and export cancel_dirty_page() to modules
Make cancel_dirty_page() act more like all the other dirty and writeback
accounting functions: test for "mapping" being NULL, and do the
NR_FILE_DIRY accounting purely based on mapping_cap_account_dirty()).

Also, add it to the exports, so that modular filesystems can use it.

Acked-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-23 09:25:04 -08:00
Linus Torvalds 18ed1c0513 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (68 commits)
  ACPI: replace kmalloc+memset with kzalloc
  ACPI: Add support for acpi_load_table/acpi_unload_table_id
  fbdev: update after backlight argument change
  ACPI: video: Add dev argument for backlight_device_register
  ACPI: Implement acpi_video_get_next_level()
  ACPI: Kconfig - depend on PM rather than selecting it
  ACPI: fix NULL check in drivers/acpi/osl.c
  ACPI: make drivers/acpi/ec.c:ec_ecdt static
  ACPI: prevent processor module from loading on failures
  ACPI: fix single linked list manipulation
  ACPI: ibm_acpi: allow clean removal
  ACPI: fix git automerge failure
  ACPI: ibm_acpi: respond to workqueue update
  ACPI: dock: add uevent to indicate change in device status
  ACPI: ec: Lindent once again
  ACPI: ec: Change #define to enums there possible.
  ACPI: ec: Style changes.
  ACPI: ec: Acquire Global Lock under EC mutex.
  ACPI: ec: Drop udelay() from poll mode. Loop by reading status field instead.
  ACPI: ec: Rename gpe_bit to gpe
  ...
2006-12-22 18:46:56 -08:00
Marcel Holtmann dab6df6308 [PATCH] Call init_timer() for ISDN PPP CCP reset state timer
The function isdn_ppp_ccp_reset_alloc_state() sets ->timer.function
and ->timer.data and later on calls add_timer() with no init_timer()
ever done.

Noted by Al Viro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Karsten Keil <kkeil@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 14:31:24 -08:00
Linus Torvalds f2a67a5769 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [UDP]: Fix reversed logic in udp_get_port().
  [IPV6]: Dumb typo in generic csum_ipv6_magic()
  [SCTP]: make 2 functions static
  [SCTP]: Fix typo adaption -> adaptation as per the latest API draft.
  [SCTP]: Don't export include/linux/sctp.h to userspace.
  [TCP]: Fix ambiguity in the `before' relation.
  [ATM] drivers/atm/fore200e.c: Cleanups.
  [ATM]: Remove dead ATM_TNETA1570 option.
  NetLabel: correctly fill in unused CIPSOv4 level and category mappings
  NetLabel: perform input validation earlier on CIPSOv4 DOI add ops
2006-12-22 14:14:17 -08:00
Jens Axboe 719d34027e [PATCH] cfq-iosched: tighten allow merge criteria
The logic in cfq_allow_merge() wasn't clear enough - basically allow
merging for the same queues only.  Do a fast check for 'rq and bio both
sync/async' before doing the cfqq hash lookup.

This is verified to work with the fixed elv_try_merge() from commit
bb4067e341.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 14:13:08 -08:00
David S. Miller 5c668704b7 [UDP]: Fix reversed logic in udp_get_port().
When this code was converted to use sk_for_each() the
logic for the "best hash chain length" code was reversed,
breaking everything.

The original code was of the form:

			size = 0;
			do {
				if (++size >= best_size_so_far)
					goto next;
			} while ((sk = sk->next) != NULL);
			best_size_so_far = size;
			best = result;
		next:;

and this got converted into:

			sk_for_each(sk2, node, head)
				if (++size < best_size_so_far) {
					best_size_so_far = size;
					best = result;
				}

Which does something very very different from the original.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:42:26 -08:00
Al Viro b23e353666 [IPV6]: Dumb typo in generic csum_ipv6_magic()
... duh

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:12:07 -08:00
Adrian Bunk 24123186fa [SCTP]: make 2 functions static
This patch makes the following needlessly global functions static:
- ipv6.c: sctp_inet6addr_event()
- protocol.c: sctp_inetaddr_event()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:12:05 -08:00
Ivan Skytte Jorgensen 0f3fffd8ab [SCTP]: Fix typo adaption -> adaptation as per the latest API draft.
Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:12:04 -08:00
Sridhar Samudrala a3f7f142f7 [SCTP]: Don't export include/linux/sctp.h to userspace.
This file contains protocol definitions and there are no SCTP apps
that use this file.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:12:02 -08:00
Gerrit Renker 9a036b9c33 [TCP]: Fix ambiguity in the `before' relation.
While looking at DCCP sequence numbers, I stumbled over a problem with
the following definition of before in tcp.h:

static inline int before(__u32 seq1, __u32 seq2)
{
        return (__s32)(seq1-seq2) < 0;
}

Problem: This definition suffers from an an ambiguity, i.e. always

           before(a, (a + 2^31) % 2^32)) = 1
           before((a + 2^31) % 2^32), a) = 1

         In text: when the difference between a and b amounts to 2^31,
         a is always considered `before' b, the function can not decide.
         The reason is that implicitly 0 is `before' 1 ... 2^31-1 ... 2^31

Solution: There is a simple fix, by defining before in such a way that
          0 is no longer `before' 2^31, i.e. 0 `before' 1 ... 2^31-1
          By not using the middle between 0 and 2^32, before can be made
          unambiguous.
          This is achieved by testing whether seq2-seq1 > 0 (using signed
          32-bit arithmetic).

I attach a patch to codify this. Also the `after' relation is basically
a redefinition of `before', it is now defined as a macro after before.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:12:01 -08:00
Adrian Bunk 1f8a5fb80e [ATM] drivers/atm/fore200e.c: Cleanups.
This patch contains the following transformations from custom functions
to standard kernel version:
- fore200e_kmalloc() -> kzalloc()
- fore200e_kfree() -> kfree()
- fore200e_swap() -> cpu_to_be32()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:12:00 -08:00
Adrian Bunk 52a9107130 [ATM]: Remove dead ATM_TNETA1570 option.
This patch removes the unconverted ATM_TNETA1570 option that also lacks
any code in the kernel.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:11:59 -08:00
Paul Moore caff5b6a6b NetLabel: correctly fill in unused CIPSOv4 level and category mappings
Back when the original NetLabel patches were being changed to use Netlink
attributes correctly some code was accidentially dropped which set all of the
undefined CIPSOv4 level and category mappings to a sentinel value.  The result
is the mappings data in the kernel contains bogus mappings which always map to
zero.  This patch restores the old/correct behavior by initializing the mapping
data to the correct sentinel value.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2006-12-22 11:11:58 -08:00
Paul Moore 1fd2a25b77 NetLabel: perform input validation earlier on CIPSOv4 DOI add ops
There are a couple of cases where the user input for a CIPSOv4 DOI add
operation was not being done soon enough; the result was unexpected behavior
which was resulting in oops/panics/lockups on some platforms.  This patch moves
the existing input validation code earlier in the code path to protect against
bogus user input.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2006-12-22 11:11:56 -08:00
Peter Zijlstra c2fda5fed8 [PATCH] Fix up page_mkclean_one(): virtual caches, s390
- add flush_cache_page() for all those virtual indexed cache
   architectures.

 - handle s390.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 10:39:35 -08:00
Peter Korsgaard e21654a756 [PATCH] serial/uartlite: Only enable port if request_port succeeded
The uartlite driver used to always enable the port even if request_port
failed causing havoc. This patch fixes it.

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 09:58:48 -08:00
Eric W. Biederman b2b2cbc4b2 [PATCH] Fix reparenting to the same thread group. (take 2)
This patch fixes the case when we reparent to a different thread in the
same thread group.  This modifies the code so that we do not send
signals and do not change the signal to send to SIGCHLD unless we have
change the thread group of our parents.  It also suppresses sending
pdeath_sig in this cas as well since the result of geppid doesn't
change.

Thanks to Oleg for spotting my bug of only fixing this for non-ptraced
tasks.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Albert Cahalan <acahalan@gmail.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Coywolf Qi Hunt <qiyong@fc-cn.com>
Acked-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 09:03:41 -08:00
Andrew Morton ef129412b4 [PATCH] build compile.h earlier
compile.h is created super-late in the build.  But proc_misc.c want to include
it, and it's generally not sane to have a header file in include/linux be
created at the end of the build: it's either not present or, worse, wrong for
most of the build.

So the patch arranges for compile.h to be built at the start of the build
process.  It also consolidates the compile.h rules with those for version.h
and utsname.h, so they all get built together.

I hope.  My chances of having got this right are about 2%.

Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 08:55:51 -08:00
Ingo Molnar 0888f06ac9 [PATCH] sched: fix bad missed wakeups in the i386, x86_64, ia64, ACPI and APM idle code
Fernando Lopez-Lezcano reported frequent scheduling latencies and audio
xruns starting at the 2.6.18-rt kernel, and those problems persisted all
until current -rt kernels. The latencies were serious and unjustified by
system load, often in the milliseconds range.

After a patient and heroic multi-month effort of Fernando, where he
tested dozens of kernels, tried various configs, boot options,
test-patches of mine and provided latency traces of those incidents, the
following 'smoking gun' trace was captured by him:

                 _------=> CPU#
                / _-----=> irqs-off
               | / _----=> need-resched
               || / _---=> hardirq/softirq
               ||| / _--=> preempt-depth
               |||| /
               |||||     delay
   cmd     pid ||||| time  |   caller
      \   /    |||||   \   |   /
  IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup (try_to_wake_up)
  IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup <<...>-5856> (37 0)
  IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup (c01262ba 0 0)
  IRQ_19-1479  1D..1    0us : resched_task (try_to_wake_up)
  IRQ_19-1479  1D..1    0us : __spin_unlock_irqrestore (try_to_wake_up)
  ...
  <idle>-0     1...1   11us!: default_idle (cpu_idle)
  ...
  <idle>-0     0Dn.1  602us : smp_apic_timer_interrupt (c0103baf 1 0)
  ...
   <...>-5856  0D..2  618us : __switch_to (__schedule)
   <...>-5856  0D..2  618us : __schedule <<idle>-0> (20 162)
   <...>-5856  0D..2  619us : __spin_unlock_irq (__schedule)
   <...>-5856  0...1  619us : trace_stop_sched_switched (__schedule)
   <...>-5856  0D..1  619us : trace_stop_sched_switched <<...>-5856> (37 0)

what is visible in this trace is that CPU#1 ran try_to_wake_up() for
PID:5856, it placed PID:5856 on CPU#0's runqueue and ran resched_task()
for CPU#0. But it decided to not send an IPI that no CPU - due to
TS_POLLING. But CPU#0 never woke up after its NEED_RESCHED bit was set,
and only rescheduled to PID:5856 upon the next lapic timer IRQ. The
result was a 600+ usecs latency and a missed wakeup!

the bug turned out to be an idle-wakeup bug introduced into the mainline
kernel this summer via an optimization in the x86_64 tree:

    commit 495ab9c045
    Author: Andi Kleen <ak@suse.de>
    Date:   Mon Jun 26 13:59:11 2006 +0200

    [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status

    During some profiling I noticed that default_idle causes a lot of
    memory traffic. I think that is caused by the atomic operations
    to clear/set the polling flag in thread_info. There is actually
    no reason to make this atomic - only the idle thread does it
    to itself, other CPUs only read it. So I moved it into ti->status.

the problem is this type of change:

        if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
-               clear_thread_flag(TIF_POLLING_NRFLAG);
+               current_thread_info()->status &= ~TS_POLLING;
                smp_mb__after_clear_bit();
                while (!need_resched()) {
                        local_irq_disable();

this changes clear_thread_flag() to an explicit clearing of TS_POLLING.
clear_thread_flag() is defined as:

        clear_bit(flag, &ti->flags);

and clear_bit() is a LOCK-ed atomic instruction on all x86 platforms:

  static inline void clear_bit(int nr, volatile unsigned long * addr)
  {
          __asm__ __volatile__( LOCK_PREFIX
                  "btrl %1,%0"

hence smp_mb__after_clear_bit() is defined as a simple compile barrier:

  #define smp_mb__after_clear_bit()       barrier()

but the explicit TS_POLLING clearing introduced by the patch:

+               current_thread_info()->status &= ~TS_POLLING;

is not an atomic op! So the clearing of the TS_POLLING bit is freely
reorderable with the reading of the NEED_RESCHED bit - and both now
reside in different memory addresses.

CPU idle wakeup very much depends on ordered memory ops, the clearing of
the TS_POLLING flag must always be done before we test need_resched()
and hit the idle instruction(s). [Symmetrically, the wakeup code needs
to set NEED_RESCHED before it tests the TS_POLLING flag, so memory
ordering is paramount.]

Fernando's dual-core Athlon64 system has a sufficiently advanced memory
ordering model so that it triggered this scenario very often.

( And it also turned out that the reason why these latencies never
  triggered on my testsystems is that i routinely use idle=poll, which
  was the only idle variant not affected by this bug. )

The fix is to change the smp_mb__after_clear_bit() to an smp_mb(), to
act as an absolute barrier between the TS_POLLING write and the
NEED_RESCHED read. This affects almost all idling methods (default,
ACPI, APM), on all 3 x86 architectures: i386, x86_64, ia64.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 08:55:51 -08:00