If the current retran_path is the only active one, it should
update it to the the next inactive one.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This bug is able to corrupt fackets_out in very rare cases.
In order for this to cause corruption:
1) DSACK in the middle of previous SACK block must be generated.
2) In order to take that particular branch, part or all of the
DSACKed segment must already be SACKed so that we have that
in cache in the first place.
3) The new info must be top enough so that fackets_out will be
updated on this iteration.
...then fack_count is updated while skb wasn't, then we walk again
that particular segment thus updating fack_count twice for
a single skb and finally that value is assigned to fackets_out
by tcp_sacktag_one.
It is safe to call tcp_sacktag_one just once for a segment (at
DSACK), no need to call again for plain SACK.
Potential problem of the miscount are limited to premature entry
to recovery and to inflated reordering metric (which could even
cancel each other out in the most the luckiest scenarios :-)).
Both are quite insignificant in worst case too and there exists
also code to reset them (fackets_out once sacked_out becomes zero
and reordering metric on RTO).
This has been reported by a number of people, because it occurred
quite rarely, it has been very evasive. Andy Furniss was able to
get it to occur couple of times so that a bit more info was
collected about the problem using a debug patch, though it still
required lot of checking around. Thanks also to others who have
tried to help here.
This is listed as Bugzilla #10346. The bug was introduced by
me in commit 68f8353b48 ([TCP]: Rewrite SACK block processing &
sack_recv_cache use), I probably thought back then that there's
need to scan that entry twice or didn't dare to make it go
through it just once there. Going through twice would have
required restoring fack_count after the walk but as noted above,
I chose to drop the additional walk step altogether here.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of local_irq_save and local_irq_restore rather then the
deprecated save_and_cli and restore_flags calls.
Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the usage of RIPEMD-160 in xfrm_algo which in turn
allows hmac(rmd160) to be used as authentication mechanism in IPsec
ESP and AH (see RFC 2857).
Signed-off-by: Adrian-Ken Rueegsegger <rueegsegger@swiss-it.ch>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data
actually. In this case ip_flush_pending_frames should be called instead
of ip6_flush_pending_frames.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
It is not allowed to change underlying protocol for
int fd = socket(PF_INET6, SOCK_RAW, IPPROTO_UDP);
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
The outgoing interface index (ipi6_ifindex) in IPV6_PKTINFO
ancillary data, is not checked if the source address (ipi6_addr)
is unspecified. If the ipi6_ifindex is the not-exist interface,
it should be fail.
Based on patch from Shan Wei <shanwei@cn.fujitsu.com> and
Brian Haley <brian.haley@hp.com>.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
If get destination options with length which is not enough for that
option,getsockopt() will still return the real length of the option,
which is larger then the buffer space.
This is because ipv6_getsockopt_sticky() returns the real length of
the option.
This patch fix this problem.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
If we pass NULL data buffer to getsockopt(), it will return 0,
and the option length is set to -EFAULT:
getsockopt(sk, IPPROTO_IPV6, IPV6_DSTOPTS, NULL, &len);
This is because ipv6_getsockopt_sticky() will return -EFAULT or
-EINVAL if some error occur.
This patch fix this problem.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
- Allow longer lifetimes (>= 0x7fffffff/HZ) on 64bit archs
by using unsigned long.
- Shadow this arithmetic overflow workaround by introducing
helper functions: addrconf_timeout_fixup() and
addrconf_finite_timeout().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
I discover a strange behavior in [ipv4 in ipv6] tunnel. When IPv6 tunnel
payload is less than 40(0x28), packet can be sent to network, received in
physical interface, but not seen in IP tunnel interface. No counter increase
in tunnel interface.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
As of now, the prefix length is not vaildated when adding or deleting
addresses. The value is passed directly into the inet6_ifaddr structure
and later passed on to memcmp() as length indicator which relies on
the value never to exceed 128 (bits).
Due to the missing check, the currently code allows for any 8 bit
value to be passed on as prefix length while using the netlink
interface, and any 32 bit value while using the ioctl interface.
[Use unsigned int instead to generate better code - yoshfuji]
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
ip6_sk_dst_lookup returns held dst entry. It should be released
on all paths beyond this point. Add missed release when up->pending
is set.
Bug report and initial patch by Denis V. Lunev <den@openvz.org>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Denis V. Lunev <den@openvz.org>
Commit 7cbca67c07 ("[IPV6]: Support
Source Address Selection API (RFC5014)") introduced NULL dereference
of asoc to sctp_v6_get_saddr in net/sctp/ipv6.c.
Pointed out by Johann Felix Soden <johfel@users.sourceforge.net>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
It is possible that this skip path causes TCP to end up into an
invalid state where ca_state was left to CA_Open while some
segments already came into sacked_out. If next valid ACK doesn't
contain new SACK information TCP fails to enter into
tcp_fastretrans_alert(). Thus at least high_seq is set
incorrectly to a too high seqno because some new data segments
could be sent in between (and also, limited transmit is not
being correctly invoked there). Reordering in both directions
can easily cause this situation to occur.
I guess we would want to use tcp_moderate_cwnd(tp) there as well
as it may be possible to use this to trigger oversized burst to
network by sending an old ACK with huge amount of SACK info, but
I'm a bit unsure about its effects (mainly to FlightSize), so to
be on the safe side I just currently fixed it minimally to keep
TCP's state consistent (obviously, such nasty ACKs have been
possible this far). Though it seems that FlightSize is already
underestimated by some amount, so probably on the long term we
might want to trigger recovery there too, if appropriate, to make
FlightSize calculation to resemble reality at the time when the
losses where discovered (but such change scares me too much now
and requires some more thinking anyway how to do that as it
likely involves some code shuffling).
This bug was found by Brian Vowell while running my TCP debug
patch to find cause of another TCP issue (fackets_out
miscount).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 64e9159f5d ("serial_core:
uart_set_ldisc infrastructure") introduced the ability for low-level
serial drivers to be informed when the tty ldisc changes.
However, the actual tty-layer function that does this callback for
serial devices was declared with the wrong type, having a spurious and
unused 'ldisc' argument.
This fixed the resulting compiler warning by just removing it.
Acked-by: Blithering Idiot <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ 63.531438] =================================
[ 63.531520] [ INFO: inconsistent lock state ]
[ 63.531520] 2.6.26-rc4 #7
[ 63.531520] ---------------------------------
[ 63.531520] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
[ 63.531520] tcpsic6/3864 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 63.531520] (&q->lock#2){-+..}, at: [<c07175b0>] ipv6_frag_rcv+0xd0/0xbd0
[ 63.531520] {softirq-on-W} state was registered at:
[ 63.531520] [<c0143bba>] __lock_acquire+0x3aa/0x1080
[ 63.531520] [<c0144906>] lock_acquire+0x76/0xa0
[ 63.531520] [<c07a8f0b>] _spin_lock+0x2b/0x40
[ 63.531520] [<c0727636>] nf_ct_frag6_gather+0x3f6/0x910
...
According to this and another similar lockdep report inet_fragment
locks are taken from nf_ct_frag6_gather() with softirqs enabled, but
these locks are mainly used in softirq context, so disabling BHs is
necessary.
Reported-and-tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In xt_connlimit match module, the counter of an IP is decreased when
the TCP packet is go through the chain with ip_conntrack state TW.
Well, it's very natural that the server and client close the socket
with FIN packet. But when the client/server close the socket with RST
packet(using so_linger), the counter for this connection still exsit.
The following patch can fix it which is based on linux-2.6.25.4
Signed-off-by: Dong Wei <dwei.zh@gmail.com>
Acked-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86, fpu: fix CONFIG_PREEMPT=y corruption of application's FPU stack
suspend-vs-iommu: prevent suspend if we could not resume
x86: section mismatch fix
x86: fix Xorg crash with xf86MapVidMem error
x86: fix pointer type warning in arch/x86/mm/init_64.c:early_memtest
x86: fix bad pmd ffff810000207xxx(9090909090909090)
x86: ioremap fix failing nesting check
x86: fix broken math-emu with lazy allocation of fpu area
x86: enable preemption in delay
x86: disable preemption in native_smp_prepare_cpus
x86: fix APIC warning on 32bit v2
The d_instantiate hook for Smack can hang on the root inode of a
filesystem if the file system code has not really done all the set-up.
Fuse is known to encounter this problem.
This change detects an attempt to instantiate a root inode and addresses
it early in the processing, before any attempt is made to do something
that might hang.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Tested-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata-sff: Fix oops reported in kerneloops.org for pnp devices with no ctl
libata: kill unused constants
sata_mv: PHY_MODE4 cleanups
[libata] ata_piix: more acer short cable quirks
[libata] ACPI: Properly handle bay devices in dock stations
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
[SCSI] qla2xxx: Update version number to 8.02.01-k4.
[SCSI] qla2xxx: Correct handling of AENs postings for vports.
[SCSI] qla2xxx: Revert "qla2xxx: Use proper HA during asynchronous event handling."
[SCSI] ibmvscsi: Non SCSI error status fixup
[SCSI] fusion mpt: fix target missing after resetting external raid
[SCSI] fix intermittent oops in scsi_bus_uevent
[SCSI] qla2xxx: Update version number to 8.02.01-k3.
[SCSI] qla2xxx: Revert "qla2xxx: Validate mid-layer 'underflow' during check-condition handling."
[SCSI] qla2xxx: Disable local-interrupts while polling for RISC status.
[SCSI] qla2xxx: Extend the 'fw_dump' SYSFS node the ability to initiate a firmware dump.
[SCSI] qla2xxx: Don't depend on mailbox return values while enabling FCE tracing.
[SCSI] qla2xxx: Convert vport_sem to a mutex
[SCSI] qla2xxx: firmware semaphore to mutex
[SCSI] qla2xxx: Correct locking within MSI-X interrupt handlers.
[SCSI] qla2xxx: Display driver version at module init-time.
[SCSI] qla2xxx: Return correct port_type to FC-transport for Vports.
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdbts: Use HW breakpoints with CONFIG_DEBUG_RODATA
kgdb: use common ascii helpers and put_unaligned_be32 helper
__le16 fields used as host-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Steve French <smfrench@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* wMaxPacketSize is le16; copying it to a field of local structure and then
using that field as host-endian (size of object to be allocated) is broken.
* bMaxPacketSize0 is 8-bit; feeding it to le16_to_cpu() is bogus and since the
result is used as host-endian, it's not even misspelled cpu_to_le16().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jürgen Mell reported an FPU state corruption bug under CONFIG_PREEMPT,
and bisected it to commit v2.6.19-1363-gacc2076, "i386: add sleazy FPU
optimization".
Add tsk_used_math() checks to prevent calling math_state_restore()
which can sleep in the case of !tsk_used_math(). This prevents
making a blocking call in __switch_to().
Apparently "fpu_counter > 5" check is not enough, as in some signal handling
and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.
It's a side effect though. This is the failing scenario:
process 'A' in save_i387_ia32() just after clear_used_math()
Got an interrupt and pre-empted out.
At the next context switch to process 'A' again, kernel tries to restore
the math state proactively and sees a fpu_counter > 0 and !tsk_used_math()
This results in init_fpu() during the __switch_to()'s math_state_restore()
And resulting in fpu corruption which will be saved/restored
(save_i387_fxsave and restore_i387_fxsave) during the remaining
part of the signal handling after the context switch.
Bisected-by: Jürgen Mell <j.mell@t-online.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Jürgen Mell <j.mell@t-online.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
iommu/gart support misses suspend/resume code, which can do bad stuff,
including memory corruption on resume. Prevent system suspend in case we
would be unable to resume.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Tested-by: Patrick <ragamuffin@datacomm.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix this:
WARNING: vmlinux.o(.text+0x114bb): Section mismatch in reference from
the function nopat() to the function .cpuinit.text:pat_disable()
The function nopat() references
the function __cpuinit pat_disable().
This is often because nopat lacks a __cpuinit
annotation or the annotation of pat_disable is wrong.
Reported-by: "Fabio Comolli" <fabio.comolli@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Clarify the usage of mtrr_lookup() in PAT code, and to make PAT code
resilient to mtrr lookup problems.
Specifically, pat_x_mtrr_type() is restructured to highlight, under what
conditions we look for mtrr hint. pat_x_mtrr_type() uses a default type
when there are any errors in mtrr lookup (still maintaining the pat
consistency). And, reserve_memtype() highlights its usage ot mtrr_lookup
for request type of '-1' and also defaults in a sane way on any mtrr
lookup failure.
pat.c looks at mtrr type of a range to get a hint on what mapping type
to request when user/API: (1) hasn't specified any type (/dev/mem
mapping) and we do not want to take performance hit by always mapping
UC_MINUS. This will be the case for /dev/mem mappings used to map BIOS
area or ACPI region which are WB'able. In this case, as long as MTRR is
not WB, PAT will request UC_MINUS for such mappings.
(2) user/API requests WB mapping while in reality MTRR may have UC or
WC. In this case, PAT can map as WB (without checking MTRR) and still
effective type will be UC or WC. But, a subsequent request to map same
region as UC or WC may fail, as the region will get trackked as WB in
PAT list. Looking at MTRR hint helps us to track based on effective type
rather than what user requested. Again, here mtrr_lookup is only used as
hint and we fallback to WB mapping (as requested by user) as default.
In both cases, after using the mtrr hint, we still go through the
memtype list to make sure there are no inconsistencies among multiple
users.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Rufus & Azrael <rufus-azrael@numericable.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Changed the call to find_e820_area_size to pass u64 instead of unsigned long.
Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
OGAWA Hirofumi and Fede have reported rare pmd_ERROR messages:
mm/memory.c:127: bad pmd ffff810000207xxx(9090909090909090).
Initialization's cleanup_highmap was leaving alignment filler
behind in the pmd for MODULES_VADDR: when vmalloc's guard page
would occupy a new page table, it's not allocated, and then
module unload's vfree hits the bad 9090 pmd entry left over.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mika Kukkonen noticed that the nesting check in early_iounmap() is not
actually done.
Reported-by: Mika Kukkonen <mikukkon@srv1-m700-lanp.koti>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: torvalds@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: mikukkon@iki.fi
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix the math emulation that got broken with the recent lazy allocation of FPU
area. init_fpu() need to be added for the math-emulation path aswell
for the FPU area allocation.
math emulation enabled kernel booted fine with this, in the presence
of "no387 nofxsr" boot param.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: hpa@zytor.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The RT team has been searching for a nasty latency. This latency shows
up out of the blue and has been seen to be as big as 5ms!
Using ftrace I found the cause of the latency.
pcscd-2995 3dNh1 52360300us : irq_exit (smp_apic_timer_interrupt)
pcscd-2995 3dN.2 52360301us : idle_cpu (irq_exit)
pcscd-2995 3dN.2 52360301us : rcu_irq_exit (irq_exit)
pcscd-2995 3dN.1 52360771us : smp_apic_timer_interrupt (apic_timer_interrupt
)
pcscd-2995 3dN.1 52360771us : exit_idle (smp_apic_timer_interrupt)
Here's an example of a 400 us latency. pcscd took a timer interrupt and
returned with "need resched" enabled, but did not reschedule until after
the next interrupt came in at 52360771us 400us later!
At first I thought we somehow missed a preemption check in entry.S. But
I also noticed that this always seemed to happen during a __delay call.
pcscd-2995 3dN.2 52360836us : rcu_irq_exit (irq_exit)
pcscd-2995 3.N.. 52361265us : preempt_schedule (__delay)
Looking at the x86 delay, I found my problem.
In git commit 35d5d08a08, Andrew Morton
placed preempt_disable around the entire delay due to TSC's not working
nicely on SMP. Unfortunately for those that care about latencies this
is devastating! Especially when we have callers to mdelay(8).
Here I enable preemption during the loop and account for anytime the task
migrates to a new CPU. The delay asked for may be extended a bit by
the migration, but delay only guarantees that it will delay for that minimum
time. Delaying longer should not be an issue.
[
Thanks to Thomas Gleixner for spotting that cpu wasn't updated,
and to place the rep_nop between preempt_enabled/disable.
]
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: akpm@osdl.org
Cc: Clark Williams <clark.williams@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
Cc: Gregory Haskins <ghaskins@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi-suse@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>