Commit graph

31568 commits

Author SHA1 Message Date
Lorenzo Colitti bf439b3154 net: ipv6: ping: Use socket mark in routing lookup
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-27 16:08:46 -05:00
John W. Linville 8e2a89c515 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2014-02-27 15:05:51 -05:00
Johannes Berg cb66498160 mac80211: fix association to 20/40 MHz VHT networks
When a VHT network uses 20 or 40 MHz as per the HT operation
information, the channel center frequency segment 0 field in
the VHT operation information is reserved, so ignore it.

This fixes association with such networks when the AP puts 0
into the field, previously we'd disconnect due to an invalid
channel with the message
wlan0: AP VHT information is invalid, disable VHT

Cc: stable@vger.kernel.org
Fixes: f2d9d270c1 ("mac80211: support VHT association")
Reported-by: Tim Nelson <tim.l.nelson@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-27 20:53:01 +01:00
Hiroaki SHIMODA 724b9e1d75 sch_tbf: Fix potential memory leak in tbf_change().
The allocated child qdisc is not freed in error conditions.
Defer the allocation after user configuration turns out to be
valid and acceptable.

Fixes: cc106e441a ("net: sched: tbf: fix the calculation of max_size")
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-27 12:53:50 -05:00
Eric Dumazet 9a9bfd032f net: tcp: use NET_INC_STATS()
While LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES can only be incremented
in tcp_transmit_skb() from softirq (incoming message or timer
activation), it is better to use NET_INC_STATS() instead of
NET_INC_STATS_BH() as tcp_transmit_skb() can be called from process
context.

This will avoid copy/paste confusion when/if we want to add
other SNMP counters in tcp_transmit_skb()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-26 15:19:47 -05:00
Steffen Klassert 3a9016f97f xfrm: Fix unlink race when policies are deleted.
When a policy is unlinked from the lists in thread context,
the xfrm timer can fire before we can mark this policy as dead.
So reinitialize the bydst hlist, then hlist_unhashed() will
notice that this policy is not linked and will avoid a
doulble unlink of that policy.

Reported-by: Xianpeng Zhao <673321875@qq.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-26 09:52:02 +01:00
Mike Pecovnik 46833a86f7 net: Fix permission check in netlink_connect()
netlink_sendmsg() was changed to prevent non-root processes from sending
messages with dst_pid != 0.
netlink_connect() however still only checks if nladdr->nl_groups is set.
This patch modifies netlink_connect() to check for the same condition.

Signed-off-by: Mike Pecovnik <mike.pecovnik@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-25 18:35:14 -05:00
Hannes Frederic Sowa 91a48a2e85 ipv4: ipv6: better estimate tunnel header cut for correct ufo handling
Currently the UFO fragmentation process does not correctly handle inner
UDP frames.

(The following tcpdumps are captured on the parent interface with ufo
disabled while tunnel has ufo enabled, 2000 bytes payload, mtu 1280,
both sit device):

IPv6:
16:39:10.031613 IP (tos 0x0, ttl 64, id 3208, offset 0, flags [DF], proto IPv6 (41), length 1300)
    192.168.122.151 > 1.1.1.1: IP6 (hlim 64, next-header Fragment (44) payload length: 1240) 2001::1 > 2001::8: frag (0x00000001:0|1232) 44883 > distinct: UDP, length 2000
16:39:10.031709 IP (tos 0x0, ttl 64, id 3209, offset 0, flags [DF], proto IPv6 (41), length 844)
    192.168.122.151 > 1.1.1.1: IP6 (hlim 64, next-header Fragment (44) payload length: 784) 2001::1 > 2001::8: frag (0x00000001:0|776) 58979 > 46366: UDP, length 5471

We can see that fragmentation header offset is not correctly updated.
(fragmentation id handling is corrected by 916e4cf46d ("ipv6: reuse
ip6_frag_id from ip6_ufo_append_data")).

IPv4:
16:39:57.737761 IP (tos 0x0, ttl 64, id 3209, offset 0, flags [DF], proto IPIP (4), length 1296)
    192.168.122.151 > 1.1.1.1: IP (tos 0x0, ttl 64, id 57034, offset 0, flags [none], proto UDP (17), length 1276)
    192.168.99.1.35961 > 192.168.99.2.distinct: UDP, length 2000
16:39:57.738028 IP (tos 0x0, ttl 64, id 3210, offset 0, flags [DF], proto IPIP (4), length 792)
    192.168.122.151 > 1.1.1.1: IP (tos 0x0, ttl 64, id 57035, offset 0, flags [none], proto UDP (17), length 772)
    192.168.99.1.13531 > 192.168.99.2.20653: UDP, length 51109

In this case fragmentation id is incremented and offset is not updated.

First, I aligned inet_gso_segment and ipv6_gso_segment:
* align naming of flags
* ipv6_gso_segment: setting skb->encapsulation is unnecessary, as we
  always ensure that the state of this flag is left untouched when
  returning from upper gso segmenation function
* ipv6_gso_segment: move skb_reset_inner_headers below updating the
  fragmentation header data, we don't care for updating fragmentation
  header data
* remove currently unneeded comment indicating skb->encapsulation might
  get changed by upper gso_segment callback (gre and udp-tunnel reset
  encapsulation after segmentation on each fragment)

If we encounter an IPIP or SIT gso skb we now check for the protocol ==
IPPROTO_UDP and that we at least have already traversed another ip(6)
protocol header.

The reason why we have to special case GSO_IPIP and GSO_SIT is that
we reset skb->encapsulation to 0 while skb_mac_gso_segment the inner
protocol of GSO_UDP_TUNNEL or GSO_GRE packets.

Reported-by: Wolfgang Walter <linux@stwm.de>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-25 18:27:06 -05:00
Janusz Dziedzic 092008abee cfg80211: regulatory: reset regdomain in case of error
Reset regdomain to world regdomain in case
of errors in set_regdom() function.

This will fix a problem with such scenario:
- iw reg set US
- iw reg set 00
- iw reg set US
The last step always fail and we get deadlock
in kernel regulatory code. Next setting new
regulatory wasn't possible due to:

Pending regulatory request, waiting for it to be processed...

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-25 16:27:04 +01:00
Eric Dumazet d10473d4e3 tcp: reduce the bloat caused by tcp_is_cwnd_limited()
tcp_is_cwnd_limited() allows GSO/TSO enabled flows to increase
their cwnd to allow a full size (64KB) TSO packet to be sent.

Non GSO flows only allow an extra room of 3 MSS.

For most flows with a BDP below 10 MSS, this results in a bloat
of cwnd reaching 90, and an inflate of RTT.

Thanks to TSO auto sizing, we can restrict the bloat to the number
of MSS contained in a TSO packet (tp->xmit_size_goal_segs), to keep
original intent without performance impact.

Because we keep cwnd small, it helps to keep TSO packet size to their
optimal value.

Example for a 10Mbit flow, with low TCP Small queue limits (no more than
2 skb in qdisc/device tx ring)

Before patch :

lpk51:~# ./ss -i dst lpk52:44862 | grep cwnd
         cubic wscale:6,6 rto:215 rtt:15.875/2.5 mss:1448 cwnd:96
ssthresh:96
send 70.1Mbps unacked:14 rcv_space:29200

After patch :

lpk51:~# ./ss -i dst lpk52:52916 | grep cwnd
         cubic wscale:6,6 rto:206 rtt:5.206/0.036 mss:1448 cwnd:15
ssthresh:14
send 33.4Mbps unacked:4 rcv_space:29200

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Van Jacobson <vanj@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-24 19:13:38 -05:00
Johannes Berg 963a1852fb mac80211: don't validate unchanged AP bandwidth while tracking
The MLME code in mac80211 must track whether or not the AP changed
bandwidth, but if there's no change while tracking it shouldn't do
anything, otherwise regulatory updates can make it impossible to
connect to certain APs if the regulatory database doesn't match the
information from the AP. See the precise scenario described in the
code.

This still leaves some possible problems with CSA or if the AP
actually changed bandwidth, but those cases are less common and
won't completely prevent using it.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=70881

Cc: stable@vger.kernel.org
Reported-and-tested-by: Nate Carlson <kernel@natecarlson.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-24 10:16:40 +01:00
Amitkumar Karwar 44a589ca2d NFC: NCI: Fix NULL pointer dereference
The check should be for setup function pointer.

This patch fixes NULL pointer dereference issue for NCI
based NFC driver which doesn't define setup handler.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-02-23 23:14:45 +01:00
Hannes Frederic Sowa 916e4cf46d ipv6: reuse ip6_frag_id from ip6_ufo_append_data
Currently we generate a new fragmentation id on UFO segmentation. It
is pretty hairy to identify the correct net namespace and dst there.
Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst
available at all.

This causes unreliable or very predictable ipv6 fragmentation id
generation while segmentation.

Luckily we already have pregenerated the ip6_frag_id in
ip6_ufo_append_data and can use it here.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-22 00:28:21 -05:00
Daniel Borkmann 4c47af4d5e net: sctp: rework multihoming retransmission path selection to rfc4960
Problem statement: 1) both paths (primary path1 and alternate
path2) are up after the association has been established i.e.,
HB packets are normally exchanged, 2) path2 gets inactive after
path_max_retrans * max_rto timed out (i.e. path2 is down completely),
3) now, if a transmission times out on the only surviving/active
path1 (any ~1sec network service impact could cause this like
a channel bonding failover), then the retransmitted packets are
sent over the inactive path2; this happens with partial failover
and without it.

Besides not being optimal in the above scenario, a small failure
or timeout in the only existing path has the potential to cause
long delays in the retransmission (depending on RTO_MAX) until
the still active path is reselected. Further, when the T3-timeout
occurs, we have active_patch == retrans_path, and even though the
timeout occurred on the initial transmission of data, not a
retransmit, we end up updating retransmit path.

RFC4960, section 6.4. "Multi-Homed SCTP Endpoints" states under
6.4.1. "Failover from an Inactive Destination Address" the
following:

  Some of the transport addresses of a multi-homed SCTP endpoint
  may become inactive due to either the occurrence of certain
  error conditions (see Section 8.2) or adjustments from the
  SCTP user.

  When there is outbound data to send and the primary path
  becomes inactive (e.g., due to failures), or where the SCTP
  user explicitly requests to send data to an inactive
  destination transport address, before reporting an error to
  its ULP, the SCTP endpoint should try to send the data to an
  alternate __active__ destination transport address if one
  exists.

  When retransmitting data that timed out, if the endpoint is
  multihomed, it should consider each source-destination address
  pair in its retransmission selection policy. When retransmitting
  timed-out data, the endpoint should attempt to pick the most
  divergent source-destination pair from the original
  source-destination pair to which the packet was transmitted.

  Note: Rules for picking the most divergent source-destination
  pair are an implementation decision and are not specified
  within this document.

So, we should first reconsider to take the current active
retransmission transport if we cannot find an alternative
active one. If all of that fails, we can still round robin
through unkown, partial failover, and inactive ones in the
hope to find something still suitable.

Commit 4141ddc02a ("sctp: retran_path update bug fix") broke
that behaviour by selecting the next inactive transport when
no other active transport was found besides the current assoc's
peer.retran_path. Before commit 4141ddc02a, we would have
traversed through the list until we reach our peer.retran_path
again, and in case that is still in state SCTP_ACTIVE, we would
take it and return. Only if that is not the case either, we
take the next inactive transport.

Besides all that, another issue is that transports in state
SCTP_UNKNOWN could be preferred over transports in state
SCTP_ACTIVE in case a SCTP_ACTIVE transport appears after
SCTP_UNKNOWN in the transport list yielding a weaker transport
state to be used in retransmission.

This patch mostly reverts 4141ddc02a, but also rewrites
this function to introduce more clarity and strictness into
the code. A strict priority of transport states is enforced
in this patch, hence selection is active > unkown > partial
failover > inactive.

Fixes: 4141ddc02a ("sctp: retran_path update bug fix")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vlad Yasevich <yasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-22 00:26:05 -05:00
Jiri Pirko b194c1f1db neigh: fix setting of default gc_* values
This patch fixes bug introduced by:
commit 1d4c8c2984
"neigh: restore old behaviour of default parms values"

The thing is that in neigh_sysctl_register, extra1 and extra2 which were
previously set for NEIGH_VAR_GC_* are overwritten. That leads to
nonsense int limits for gc_* variables. So fix this by not touching
extra* fields for gc_* variables.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-22 00:08:10 -05:00
Eric Dumazet f5ddcbbb40 net-tcp: fastopen: fix high order allocations
This patch fixes two bugs in fastopen :

1) The tcp_sendmsg(...,  @size) argument was ignored.

   Code was relying on user not fooling the kernel with iovec mismatches

2) When MTU is about 64KB, tcp_send_syn_data() attempts order-5
allocations, which are likely to fail when memory gets fragmented.

Fixes: 783237e8da ("net-tcp: Fast Open client - sending SYN-data")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Tested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-22 00:05:21 -05:00
Ying Xue 970122fdf4 tipc: make bearer set up in module insertion stage
Accidentally a side effect is involved by commit 6e967adf7(tipc:
relocate common functions from media to bearer). Now tipc stack
handler of receiving packets from netdevices as well as netdevice
notification handler are registered when bearer is enabled rather
than tipc module initialization stage, but the two handlers are
both unregistered in tipc module exit phase. If tipc module is
inserted and then immediately removed, the following warning
message will appear:

"dev_remove_pack: ffffffffa0380940 not found"

This is because in module insertion stage tipc stack packet handler
is not registered at all, but in module exit phase dev_remove_pack()
needs to remove it. Of course, dev_remove_pack() cannot find tipc
protocol handler from the kernel protocol handler list so that the
warning message is printed out.

But if registering the two handlers is adjusted from enabling bearer
phase into inserting module stage, the warning message will be
eliminated. Due to this change, tipc_core_start_net() and
tipc_core_stop_net() can be deleted as well.

Reported-by: Wang Weidong <wangweidong1@huawei.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-22 00:00:15 -05:00
Ying Xue 9fe7ed4749 tipc: remove all enabled flags from all tipc components
When tipc module is inserted, many tipc components are initialized
one by one. During the initialization period, if one of them is
failed, tipc_core_stop() will be called to stop all components
whatever corresponding components are created or not. To avoid to
release uncreated ones, relevant components have to add necessary
enabled flags indicating whether they are created or not.

But in the initialization stage, if one component is unsuccessfully
created, we will just destroy successfully created components before
the failed component instead of all components. All enabled flags
defined in components, in turn, become redundant. Additionally it's
also unnecessary to identify whether table.types is NULL in
tipc_nametbl_stop() because name stable has been definitely created
successfully when tipc_nametbl_stop() is called.

Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-22 00:00:15 -05:00
Matija Glavinic Pecotic 7cce3b7568 net: sctp: Potentially-Failed state should not be reached from unconfirmed state
In current implementation it is possible to reach PF state from unconfirmed.
We can interpret sctp-failover-02 in a way that PF state is meant to be reached
only from active state, in the end, this is when entering PF state makes sense.
Here are few quotes from sctp-failover-02, but regardless of these, same
understanding can be reached from whole section 5:

Section 5.1, quickfailover guide:
    "The PF state is an intermediate state between Active and Failed states."

    "Each time the T3-rtx timer expires on an active or idle
    destination, the error counter of that destination address will
    be incremented.  When the value in the error counter exceeds
    PFMR, the endpoint should mark the destination transport address as PF."

There are several concrete reasons for such interpretation. For start, rfc4960
does not take into concern quickfailover algorithm. Therefore, quickfailover
must comply to 4960. Point where this compliance can be argued is following
behavior:
When PF is entered, association overall error counter is incremented for each
missed HB. This is contradictory to rfc4960, as address, while in unconfirmed
state, is subjected to probing, and while it is probed, it should not increment
association overall error counter. This has as a consequence that we might end
up in situation in which we drop association due path failure on unconfirmed
address, in case we have wrong configuration in a way:
Association.Max.Retrans == Path.Max.Retrans.

Another reason is that entering PF from unconfirmed will cause a loss of address
confirmed event when address is once (if) confirmed. This is fine from failover
guide point of view, but it is not consistent with behavior preceding failover
implementation and recommendation from 4960:

5.4.  Path Verification
   Whenever a path is confirmed, an indication MAY be given to the upper
   layer.

Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-20 13:24:56 -05:00
Nicolas Dichtel cf71d2bc0b sit: fix panic with route cache in ip tunnels
Bug introduced by commit 7d442fab0a ("ipv4: Cache dst in tunnels").

Because sit code does not call ip_tunnel_init(), the dst_cache was not
initialized.

CC: Tom Herbert <therbert@google.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-20 13:13:50 -05:00
Steffen Klassert ee5c23176f xfrm: Clone states properly on migration
We loose a lot of information of the original state if we
clone it with xfrm_state_clone(). In particular, there is
no crypto algorithm attached if the original state uses
an aead algorithm. This patch add the missing information
to the clone state.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-20 14:30:10 +01:00
Steffen Klassert 8c0cba22e1 xfrm: Take xfrm_state_lock in xfrm_migrate_state_find
A comment on xfrm_migrate_state_find() says that xfrm_state_lock
is held. This is apparently not the case, but we need it to
traverse through the state lists.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-20 14:30:04 +01:00
Steffen Klassert 35ea790d78 xfrm: Fix NULL pointer dereference on sub policy usage
xfrm_state_sort() takes the unsorted states from the src array
and stores them into the dst array. We try to get the namespace
from the dst array which is empty at this time, so take the
namespace from the src array instead.

Fixes: 283bc9f35b ("xfrm: Namespacify xfrm state/policy locks")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-20 14:29:58 +01:00
Steffen Klassert 876fc03aaa ip6_vti: Fix build when NET_IP_TUNNEL is not set.
Since commit 469bdcefdc ip6_vti uses ip_tunnel_get_stats64(),
so we need to select NET_IP_TUNNEL to have this function available.

Fixes: 469bdcefdc ("ipv6: fix the use of pcpu_tstats in ip6_vti.c")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-20 14:29:49 +01:00
Johannes Berg e3685e03b4 mac80211: fix station wakeup powersave race
Consider the following (relatively unlikely) scenario:
 1) station goes to sleep while frames are buffered in driver
 2) driver blocks wakeup (until no more frames are buffered)
 3) station wakes up again
 4) driver unblocks wakeup

In this case, the current mac80211 code will do the following:
 1) WLAN_STA_PS_STA set
 2) WLAN_STA_PS_DRIVER set
 3) - nothing -
 4) WLAN_STA_PS_DRIVER cleared

As a result, no frames will be delivered to the client, even
though it is awake, until it sends another frame to us that
triggers ieee80211_sta_ps_deliver_wakeup() in sta_ps_end().

Since we now take the PS spinlock, we can fix this while at
the same time removing the complexity with the pending skb
queue function. This was broken since my commit 50a9432dae
("mac80211: fix powersaving clients races") due to removing
the clearing of WLAN_STA_PS_STA in the RX path.

While at it, fix a cleanup path issue when a station is
removed while the driver is still blocking its wakeup.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-20 11:54:09 +01:00
Johannes Berg 5108ca8280 mac80211: insert stations before adding to driver
There's a race condition in mac80211 because we add stations
to the internal lists after adding them to the driver, which
means that (for example) the following can happen:
 1. a station connects and is added
 2. first, it is added to the driver
 3. then, it is added to the mac80211 lists

If the station goes to sleep between steps 2 and 3, and the
firmware/hardware records it as being asleep, mac80211 will
never instruct the driver to wake it up again as it never
realized it went to sleep since the RX path discarded the
frame as a "spurious class 3 frame", no station entry was
present yet.

Fix this by adding the station in software first, and only
then adding it to the driver. That way, any state that the
driver changes will be reflected properly in mac80211's
station state. The problematic part is the roll-back if the
driver fails to add the station, in that case a bit more is
needed. To not make that overly complex prevent starting BA
sessions in the meantime.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-20 10:34:33 +01:00
Emmanuel Grumbach 1d147bfa64 mac80211: fix AP powersave TX vs. wakeup race
There is a race between the TX path and the STA wakeup: while
a station is sleeping, mac80211 buffers frames until it wakes
up, then the frames are transmitted. However, the RX and TX
path are concurrent, so the packet indicating wakeup can be
processed while a packet is being transmitted.

This can lead to a situation where the buffered frames list
is emptied on the one side, while a frame is being added on
the other side, as the station is still seen as sleeping in
the TX path.

As a result, the newly added frame will not be send anytime
soon. It might be sent much later (and out of order) when the
station goes to sleep and wakes up the next time.

Additionally, it can lead to the crash below.

Fix all this by synchronising both paths with a new lock.
Both path are not fastpath since they handle PS situations.

In a later patch we'll remove the extra skb queue locks to
reduce locking overhead.

BUG: unable to handle kernel
NULL pointer dereference at 000000b0
IP: [<ff6f1791>] ieee80211_report_used_skb+0x11/0x3e0 [mac80211]
*pde = 00000000
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
EIP: 0060:[<ff6f1791>] EFLAGS: 00210282 CPU: 1
EIP is at ieee80211_report_used_skb+0x11/0x3e0 [mac80211]
EAX: e5900da0 EBX: 00000000 ECX: 00000001 EDX: 00000000
ESI: e41d00c0 EDI: e5900da0 EBP: ebe458e4 ESP: ebe458b0
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 000000b0 CR3: 25a78000 CR4: 000407d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process iperf (pid: 3934, ti=ebe44000 task=e757c0b0 task.ti=ebe44000)
iwlwifi 0000:02:00.0: I iwl_pcie_enqueue_hcmd Sending command LQ_CMD (#4e), seq: 0x0903, 92 bytes at 3[3]:9
Stack:
 e403b32c ebe458c4 00200002 00200286 e403b338 ebe458cc c10960bb e5900da0
 ff76a6ec ebe458d8 00000000 e41d00c0 e5900da0 ebe458f0 ff6f1b75 e403b210
 ebe4598c ff723dc1 00000000 ff76a6ec e597c978 e403b758 00000002 00000002
Call Trace:
 [<ff6f1b75>] ieee80211_free_txskb+0x15/0x20 [mac80211]
 [<ff723dc1>] invoke_tx_handlers+0x1661/0x1780 [mac80211]
 [<ff7248a5>] ieee80211_tx+0x75/0x100 [mac80211]
 [<ff7249bf>] ieee80211_xmit+0x8f/0xc0 [mac80211]
 [<ff72550e>] ieee80211_subif_start_xmit+0x4fe/0xe20 [mac80211]
 [<c149ef70>] dev_hard_start_xmit+0x450/0x950
 [<c14b9aa9>] sch_direct_xmit+0xa9/0x250
 [<c14b9c9b>] __qdisc_run+0x4b/0x150
 [<c149f732>] dev_queue_xmit+0x2c2/0xca0

Cc: stable@vger.kernel.org
Reported-by: Yaara Rozenblum <yaara.rozenblum@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
[reword commit log, use a separate lock]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-20 10:32:29 +01:00
David S. Miller ebe44f350e ip_tunnel: Move ip_tunnel_get_stats64 into ip_tunnel_core.c
net/built-in.o:(.rodata+0x1707c): undefined reference to `ip_tunnel_get_stats64'

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-20 02:14:23 -05:00
Linus Torvalds e95003c3f9 NFS client bugfixes for Linux 3.14
Highlights include stable fixes for the following bugs:
 
 - General performance regression due to NFS_INO_INVALID_LABEL being set
   when the server doesn't support labeled NFS
 - Hang in the RPC code due to a socket out-of-buffer race
 - Infinite loop when trying to establish the NFSv4 lease
 - Use-after-free bug in the RPCSEC gss code.
 - nfs4_select_rw_stateid is returning with a non-zero error value on success
 
 Other bug fixes:
 
 - Potential memory scribble in the RPC bi-directional RPC code
 - Pipe version reference leak
 - Use the correct net namespace in the new NFSv4 migration code
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTBMBlAAoJEGcL54qWCgDyOakP+gKDh0VhKw8GziJFfY+6kHHI
 wej86M/coNnRPUv8n7s3N5TXMoV36qisYpxbIG/WQbOn6MfR3qjto6WoP7+vsrEq
 iNomtLivgEsWJePydTuIyAR/TK0du/zqP4zoPEDgdLDenucEVvkCzGIkqzg8Mddc
 duknEhIq918BkXIe3hRBWuxl+pRjwZur+TY0h/OR11oodqTYHxrE37f5PvREWOmB
 08hhjOFBFYlyEnCjD3I1SFmcXQxkKzvACavvbhTyF6u/37oL/QC1/DZKL5mSdOJ6
 novO8sv9gIpn/RhsEMOdaeYMYM5QTvkYIJQyLpKAYyaLZ42EMbRkczwNE7C0ZWyi
 F9MizDMNTie+DvsSHZPYwABTDOOQOuWPa9PO3Lyo8UtWxfmTOjYr2Rre1wyG0/+0
 ywb3JiKQCtVDPnmSHhqxFfVp9XvS7D2/vz0udgKjrCLKyDC2OMMGLVa/acnrtZmz
 s94QpPiqhjnRqIuKo251HtbK3AVaLBNQxBCPszieYwPm9aZ04P7mjsGg1WuhhB2v
 +eSa9UkicGJwKWJWtIBr54qIAOlEXu2bXY+vio5UfbDb+5qfBHe0TmNrz5QNJ53A
 x0eUBocth9VpW1cv/Rf+o30tJyZGy6Jtv8kn2hfXkJXNL3N1Gn25rD3tpb+5TtdY
 b7gtHy13/oe8yi0tK91f
 =tll2
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.14-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include stable fixes for the following bugs:

   - General performance regression due to NFS_INO_INVALID_LABEL being
     set when the server doesn't support labeled NFS
   - Hang in the RPC code due to a socket out-of-buffer race
   - Infinite loop when trying to establish the NFSv4 lease
   - Use-after-free bug in the RPCSEC gss code.
   - nfs4_select_rw_stateid is returning with a non-zero error value on
     success

  Other bug fixes:

  - Potential memory scribble in the RPC bi-directional RPC code
  - Pipe version reference leak
  - Use the correct net namespace in the new NFSv4 migration code"

* tag 'nfs-for-3.14-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS fix error return in nfs4_select_rw_stateid
  NFSv4: Use the correct net namespace in nfs4_update_server
  SUNRPC: Fix a pipe_version reference leak
  SUNRPC: Ensure that gss_auth isn't freed before its upcall messages
  SUNRPC: Fix potential memory scribble in xprt_free_bc_request()
  SUNRPC: Fix races in xs_nospace()
  SUNRPC: Don't create a gss auth cache unless rpc.gssd is running
  NFS: Do not set NFS_INO_INVALID_LABEL unless server supports labeled NFS
2014-02-19 12:13:02 -08:00
David S. Miller 2e99c07fbe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

* Fix nf_trace in nftables if XT_TRACE=n, from Florian Westphal.

* Don't use the fast payload operation in nf_tables if the length is
  not power of 2 or it is not aligned, from Nikolay Aleksandrov.

* Fix missing break statement the inet flavour of nft_reject, which
  results in evaluating IPv4 packets with the IPv6 evaluation routine,
  from Patrick McHardy.

* Fix wrong kconfig symbol in nft_meta to match the routing realm,
  from Paul Bolle.

* Allocate the NAT null binding when creating new conntracks via
  ctnetlink to avoid that several packets race at initializing the
  the conntrack NAT extension, original patch from Florian Westphal,
  revisited version from me.

* Fix DNAT handling in the snmp NAT helper, the same handling was being
  done for SNAT and DNAT and 2.4 already contains that fix, from
  Francois-Xavier Le Bail.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-19 13:12:53 -05:00
Inbal Hacohen 50c11eb998 cfg80211: bugfix in regulatory user hint process
After processing hint_user, we would want to schedule the
timeout work only if we are actually waiting to CRDA. This happens
when the status is not "IGNORE" nor "ALREADY_SET".

Signed-off-by: Inbal Hacohen <Inbal.Hacohen@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-19 11:56:48 +01:00
Linus Torvalds 525b870974 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID update from Jiri Kosina:

 - fixes for several bugs in incorrect allocations of buffers by David
   Herrmann and Benjamin Tissoires.

 - support for a few new device IDs by Archana Patni, Benjamin
   Tissoires, Huei-Horng Yo, Reyad Attiyat and Yufeng Shen

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: hyperv: make sure input buffer is big enough
  HID: Bluetooth: hidp: make sure input buffers are big enough
  HID: hid-sensor-hub: quirk for STM Sensor hub
  HID: apple: add Apple wireless keyboard 2011 JIS model support
  HID: fix buffer allocations
  HID: multitouch: add FocalTech FTxxxx support
  HID: microsoft: Add ID's for Surface Type/Touch Cover 2
  HID: usbhid: quirk for CY-TM75 75 inch Touch Overlay
2014-02-18 16:29:46 -08:00
Dan Carpenter d7cf0c34af af_packet: remove a stray tab in packet_set_ring()
At first glance it looks like there is a missing curly brace but
actually the code works the same either way.  I have adjusted the
indenting but left the code the same.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-18 18:02:25 -05:00
Daniel Borkmann ffd5939381 net: sctp: fix sctp_connectx abi for ia32 emulation/compat mode
SCTP's sctp_connectx() abi breaks for 64bit kernels compiled with 32bit
emulation (e.g. ia32 emulation or x86_x32). Due to internal usage of
'struct sctp_getaddrs_old' which includes a struct sockaddr pointer,
sizeof(param) check will always fail in kernel as the structure in
64bit kernel space is 4bytes larger than for user binaries compiled
in 32bit mode. Thus, applications making use of sctp_connectx() won't
be able to run under such circumstances.

Introduce a compat interface in the kernel to deal with such
situations by using a 'struct compat_sctp_getaddrs_old' structure
where user data is copied into it, and then sucessively transformed
into a 'struct sctp_getaddrs_old' structure with the help of
compat_ptr(). That fixes sctp_connectx() abi without any changes
needed in user space, and lets the SCTP test suite pass when compiled
in 32bit and run on 64bit kernels.

Fixes: f9c67811eb ("sctp: Fix regression introduced by new sctp_connectx api")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-18 16:06:48 -05:00
David S. Miller 7ffb0d317d Included changes:
- fix soft-interface MTU computation
 - fix bogus pointer mangling when parsing the TT-TVLV
   container. This bug led to a wrong memory access.
 - fix memory leak by properly releasing the VLAN object
   after CRC check
 - properly check pskb_may_pull() return value
 - avoid potential race condition while adding new neighbour
 - fix potential memory leak by removing all the references
   to the orig_node object in case of initialization failure
 - fix the TT CRC computation by ensuring that every node uses
   the same byte order when hosts with different endianess are
   part of the same network
 - fix severe memory leak by freeing skb after a successful
   TVLV parsing
 - avoid potential double free when orig_node initialization
   fails
 - fix potential kernel paging error caused by the usage of
   the old value of skb->data after skb reallocation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTAj4LAAoJEEKTMo6mOh1VXoQP/2WVjuIrB7rd4mpq5MSXjkWm
 qCRRmuU9MVSbwBPvKcNAT4sDb9KliodqMu7jtUNKJ118afTK5VIh1EmbFGIm2vA+
 QowpfvSOFaDVrd6pB1bKlPlX5Xi9OF+hj82LalfMRWvsdvQUN00fkCMjyrxPivhR
 zq7ucyff1YTft/mSmD+X0gqNK1L99om2xNcWzPjl+CZ0LOBFe411/sWf8Ujldgl0
 F6jTPXckNBToukmYO8wwmtG8PFrIWNBRUEfpY/P+VNp+Cg7GF9KOts4mdym9PviI
 //PkonRNylfeTvBlztmCdTQB9vHhlT3e/9KTd/lXBQ669Mz/eQ6H1MascDZ8e0Ib
 1IeqL6cyOaEDIOh8Bgr2WcRTH/JCx0F0cy+PISJx0DEVYKLWZedm8ECIU9eXWMr6
 hnTcBue51IoVbDE5SJ0apoDmQOZZF2euaYBPXtRrziZBzcHubt69rQKOqQ/A5atR
 m5kuA7E14NR7F/FOTdKsfLyAVqx9j5mw7NQYAhlbXex0Lp+qQQ9YMtHBv4pgzYA3
 UYE9pnuMkr3EXOQ9wAt/ldq+hWBkXDFkg5nd3bzY8aKw5QLBPHZdrTgFtOmVO1RP
 Fa7fJSwt2ImCa50w59u4f22U870QK7AYK7xvHeLHbvIzthTDgA71OKRePcRu9EU3
 yN6J5h/+A4X7fGgz0Z/X
 =6IO8
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- fix soft-interface MTU computation
- fix bogus pointer mangling when parsing the TT-TVLV
  container. This bug led to a wrong memory access.
- fix memory leak by properly releasing the VLAN object
  after CRC check
- properly check pskb_may_pull() return value
- avoid potential race condition while adding new neighbour
- fix potential memory leak by removing all the references
  to the orig_node object in case of initialization failure
- fix the TT CRC computation by ensuring that every node uses
  the same byte order when hosts with different endianess are
  part of the same network
- fix severe memory leak by freeing skb after a successful
  TVLV parsing
- avoid potential double free when orig_node initialization
  fails
- fix potential kernel paging error caused by the usage of
  the old value of skb->data after skb reallocation

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-18 15:40:50 -05:00
Pablo Neira Ayuso 0eba801b64 netfilter: ctnetlink: force null nat binding on insert
Quoting Andrey Vagin:
  When a conntrack is created  by kernel, it is initialized (sets
  IPS_{DST,SRC}_NAT_DONE_BIT bits in nf_nat_setup_info) and only then it
  is added in hashes (__nf_conntrack_hash_insert), so one conntract
  can't be initialized from a few threads concurrently.

  ctnetlink can add an uninitialized conntrack (w/o
  IPS_{DST,SRC}_NAT_DONE_BIT) in hashes, then a few threads can look up
  this conntrack and start initialize it concurrently. It's dangerous,
  because BUG can be triggered from nf_nat_setup_info.

Fix this race by always setting up nat, even if no CTA_NAT_ attribute
was requested before inserting the ct into the hash table. In absence
of CTA_NAT_ attribute, a null binding is created.

This alters current behaviour: Before this patch, the first packet
matching the newly injected conntrack would be run through the nat
table since nf_nat_initialized() returns false.  IOW, this forces
ctnetlink users to specify the desired nat transformation on ct
creation time.

Thanks for Florian Westphal, this patch is based on his original
patch to address this problem, including this patch description.

Reported-By: Andrey Vagin <avagin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
2014-02-18 00:13:51 +01:00
Duan Jiong a6254864c0 ipv4: fix counter in_slow_tot
since commit 89aef8921bf("ipv4: Delete routing cache."), the counter
in_slow_tot can't work correctly.

The counter in_slow_tot increase by one when fib_lookup() return successfully
in ip_route_input_slow(), but actually the dst struct maybe not be created and
cached, so we can increase in_slow_tot after the dst struct is created.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 16:54:42 -05:00
David Herrmann a4b1b5877b HID: Bluetooth: hidp: make sure input buffers are big enough
HID core expects the input buffers to be at least of size 4096
(HID_MAX_BUFFER_SIZE). Other sizes will result in buffer-overflows if an
input-report is smaller than advertised. We could, like i2c, compute the
biggest report-size instead of using HID_MAX_BUFFER_SIZE, but this will
blow up if report-descriptors are changed after ->start() has been called.
So lets be safe and just use the biggest buffer we have.

Note that this adds an additional copy to the HIDP input path. If there is
a way to make sure the skb-buf is big enough, we should use that instead.

The best way would be to make hid-core honor the @size argument, though,
that sounds easier than it is. So lets just fix the buffer-overflows for
now and afterwards look for a faster way for all transport drivers.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-02-17 21:17:55 +01:00
Nicolas Dichtel 08b44656c0 gre: add link local route when local addr is any
This bug was reported by Steinar H. Gunderson and was introduced by commit
f7cb888633 ("sit/gre6: don't try to add the same route two times").

root@morgental:~# ip tunnel add foo mode gre remote 1.2.3.4 ttl 64
root@morgental:~# ip link set foo up mtu 1468
root@morgental:~# ip -6 route show dev foo
fe80::/64  proto kernel  metric 256

but after the above commit, no such route shows up.

There is no link local route because dev->dev_addr is 0 (because local ipv4
address is 0), hence no link local address is configured.

In this scenario, the link local address is added manually: 'ip -6 addr add
fe80::1 dev foo' and because prefix is /128, no link local route is added by the
kernel.

Even if the right things to do is to add the link local address with a /64
prefix, we need to restore the previous behavior to avoid breaking userpace.

Reported-by: Steinar H. Gunderson <sesse@samfundet.no>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 14:08:26 -05:00
Antonio Quartulli 70b271a78b batman-adv: fix potential kernel paging error for unicast transmissions
batadv_send_skb_prepare_unicast(_4addr) might reallocate the
skb's data. If it does then our ethhdr pointer is not valid
anymore in batadv_send_skb_unicast(), resulting in a kernel
paging error.

Fixing this by refetching the ethhdr pointer after the
potential reallocation.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-02-17 17:17:02 +01:00
Antonio Quartulli a5a5cb8cab batman-adv: avoid double free when orig_node initialization fails
In the failure path of the orig_node initialization routine
the orig_node->bat_iv.bcast_own field is free'd twice: first
in batadv_iv_ogm_orig_get() and then later in
batadv_orig_node_free_rcu().

Fix it by removing the kfree in batadv_iv_ogm_orig_get().

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-02-17 17:17:02 +01:00
Antonio Quartulli 05c3c8a636 batman-adv: free skb on TVLV parsing success
When the TVLV parsing routine succeed the skb is left
untouched thus leading to a memory leak.

Fix this by consuming the skb in case of success.

Introduced by ef26157747
("batman-adv: tvlv - basic infrastructure")

Reported-by: Russel Senior <russell@personaltelco.net>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Tested-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-02-17 17:17:02 +01:00
Antonio Quartulli a30e22ca84 batman-adv: fix TT CRC computation by ensuring byte order
When computing the CRC on a 2byte variable the order of
the bytes obviously alters the final result. This means
that computing the CRC over the same value on two archs
having different endianess leads to different numbers.

The global and local translation table CRC computation
routine makes this mistake while processing the clients
VIDs. The result is a continuous CRC mismatching between
nodes having different endianess.

Fix this by converting the VID to Network Order before
processing it. This guarantees that every node uses the same
byte order.

Introduced by 7ea7b4a142
("batman-adv: make the TT CRC logic VLAN specific")

Reported-by: Russel Senior <russell@personaltelco.net>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Tested-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-02-17 17:17:02 +01:00
Simon Wunderlich b2262df7fc batman-adv: fix potential orig_node reference leak
Since batadv_orig_node_new() sets the refcount to two, assuming that
the calling function will use a reference for putting the orig_node into
a hash or similar, both references must be freed if initialization of
the orig_node fails. Otherwise that object may be leaked in that error
case.

Reported-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-02-17 17:17:01 +01:00
Antonio Quartulli 08bf0ed29c batman-adv: avoid potential race condition when adding a new neighbour
When adding a new neighbour it is important to atomically
perform the following:
- check if the neighbour already exists
- append the neighbour to the proper list

If the two operations are not performed in an atomic context
it is possible that two concurrent insertions add the same
neighbour twice.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-02-17 17:17:01 +01:00
Antonio Quartulli f1791425cf batman-adv: properly check pskb_may_pull return value
pskb_may_pull() returns 1 on success and 0 in case of failure,
therefore checking for the return value being negative does
not make sense at all.

This way if the function fails we will probably read beyond the current
skb data buffer. Fix this by doing the proper check.

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-02-17 17:17:01 +01:00
Antonio Quartulli 91c2b1a9f6 batman-adv: release vlan object after checking the CRC
There is a refcounter unbalance in the CRC checking routine
invoked on OGM reception. A vlan object is retrieved (thus
its refcounter is increased by one) but it is never properly
released. This leads to a memleak because the vlan object
will never be free'd.

Fix this by releasing the vlan object after having read the
CRC.

Reported-by: Russell Senior <russell@personaltelco.net>
Reported-by: Daniel <daniel@makrotopia.org>
Reported-by: cmsv <cmsv@wirelesspt.net>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-02-17 17:17:00 +01:00
Antonio Quartulli e889241f45 batman-adv: fix TT-TVLV parsing on OGM reception
When accessing a TT-TVLV container in the OGM RX path
the variable pointing to the list of changes to apply is
altered by mistake.

This makes the TT component read data at the wrong position
in the OGM packet buffer.

Fix it by removing the bogus pointer alteration.

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-02-17 17:17:00 +01:00
Antonio Quartulli 930cd6e46e batman-adv: fix soft-interface MTU computation
The current MTU computation always returns a value
smaller than 1500bytes even if the real interfaces
have an MTU large enough to compensate the batman-adv
overhead.

Fix the computation by properly returning the highest
admitted value.

Introduced by a19d3d85e1
("batman-adv: limit local translation table max size")

Reported-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-02-17 17:17:00 +01:00
Nikolay Aleksandrov f627ed91d8 netfilter: nf_tables: check if payload length is a power of 2
Add a check if payload's length is a power of 2 when selecting ops.
The fast ops were meant for well aligned loads, also this fixes a
small bug when using a length of 3 with some offsets which causes
only 1 byte to be loaded because the fast ops are chosen.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-17 11:21:17 +01:00
Florian Westphal 478b360a47 netfilter: nf_tables: fix nf_trace always-on with XT_TRACE=n
When using nftables with CONFIG_NETFILTER_XT_TARGET_TRACE=n, we get
lots of "TRACE: filter:output:policy:1 IN=..." warnings as several
places will leave skb->nf_trace uninitialised.

Unlike iptables tracing functionality is not conditional in nftables,
so always copy/zero nf_trace setting when nftables is enabled.

Move this into __nf_copy() helper.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-17 11:20:12 +01:00
Daniel Borkmann 0fd5d57ba3 packet: check for ndo_select_queue during queue selection
Mathias reported that on an AMD Geode LX embedded board (ALiX)
with ath9k driver PACKET_QDISC_BYPASS, introduced in commit
d346a3fae3 ("packet: introduce PACKET_QDISC_BYPASS socket
option"), triggers a WARN_ON() coming from the driver itself
via 066dae93bd ("ath9k: rework tx queue selection and fix
queue stopping/waking").

The reason why this happened is that ndo_select_queue() call
is not invoked from direct xmit path i.e. for ieee80211 subsystem
that sets queue and TID (similar to 802.1d tag) which is being
put into the frame through 802.11e (WMM, QoS). If that is not
set, pending frame counter for e.g. ath9k can get messed up.

So the WARN_ON() in ath9k is absolutely legitimate. Generally,
the hw queue selection in ieee80211 depends on the type of
traffic, and priorities are set according to ieee80211_ac_numbers
mapping; working in a similar way as DiffServ only on a lower
layer, so that the AP can favour frames that have "real-time"
requirements like voice or video data frames.

Therefore, check for presence of ndo_select_queue() in netdev
ops and, if available, invoke it with a fallback handler to
__packet_pick_tx_queue(), so that driver such as bnx2x, ixgbe,
or mlx4 can still select a hw queue for transmission in
relation to the current CPU while e.g. ieee80211 subsystem
can make their own choices.

Reported-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 00:36:34 -05:00
Daniel Borkmann b9507bdaf4 netdevice: move netdev_cap_txqueue for shared usage to header
In order to allow users to invoke netdev_cap_txqueue, it needs to
be moved into netdevice.h header file. While at it, also add kernel
doc header to document the API.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 00:36:34 -05:00
Daniel Borkmann 99932d4fc0 netdevice: add queue selection fallback handler for ndo_select_queue
Add a new argument for ndo_select_queue() callback that passes a
fallback handler. This gets invoked through netdev_pick_tx();
fallback handler is currently __netdev_pick_tx() as most drivers
invoke this function within their customized implementation in
case for skbs that don't need any special handling. This fallback
handler can then be replaced on other call-sites with different
queue selection methods (e.g. in packet sockets, pktgen etc).

This also has the nice side-effect that __netdev_pick_tx() is
then only invoked from netdev_pick_tx() and export of that
function to modules can be undone.

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 00:36:34 -05:00
Matija Glavinic Pecotic ef2820a735 net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer
Implementation of (a)rwnd calculation might lead to severe performance issues
and associations completely stalling. These problems are described and solution
is proposed which improves lksctp's robustness in congestion state.

1) Sudden drop of a_rwnd and incomplete window recovery afterwards

Data accounted in sctp_assoc_rwnd_decrease takes only payload size (sctp data),
but size of sk_buff, which is blamed against receiver buffer, is not accounted
in rwnd. Theoretically, this should not be the problem as actual size of buffer
is double the amount requested on the socket (SO_RECVBUF). Problem here is
that this will have bad scaling for data which is less then sizeof sk_buff.
E.g. in 4G (LTE) networks, link interfacing radio side will have a large portion
of traffic of this size (less then 100B).

An example of sudden drop and incomplete window recovery is given below. Node B
exhibits problematic behavior. Node A initiates association and B is configured
to advertise rwnd of 10000. A sends messages of size 43B (size of typical sctp
message in 4G (LTE) network). On B data is left in buffer by not reading socket
in userspace.

Lets examine when we will hit pressure state and declare rwnd to be 0 for
scenario with above stated parameters (rwnd == 10000, chunk size == 43, each
chunk is sent in separate sctp packet)

Logic is implemented in sctp_assoc_rwnd_decrease:

socket_buffer (see below) is maximum size which can be held in socket buffer
(sk_rcvbuf). current_alloced is amount of data currently allocated (rx_count)

A simple expression is given for which it will be examined after how many
packets for above stated parameters we enter pressure state:

We start by condition which has to be met in order to enter pressure state:

	socket_buffer < currently_alloced;

currently_alloced is represented as size of sctp packets received so far and not
yet delivered to userspace. x is the number of chunks/packets (since there is no
bundling, and each chunk is delivered in separate packet, we can observe each
chunk also as sctp packet, and what is important here, having its own sk_buff):

	socket_buffer < x*each_sctp_packet;

each_sctp_packet is sctp chunk size + sizeof(struct sk_buff). socket_buffer is
twice the amount of initially requested size of socket buffer, which is in case
of sctp, twice the a_rwnd requested:

	2*rwnd < x*(payload+sizeof(struc sk_buff));

sizeof(struct sk_buff) is 190 (3.13.0-rc4+). Above is stated that rwnd is 10000
and each payload size is 43

	20000 < x(43+190);

	x > 20000/233;

	x ~> 84;

After ~84 messages, pressure state is entered and 0 rwnd is advertised while
received 84*43B ~= 3612B sctp data. This is why external observer notices sudden
drop from 6474 to 0, as it will be now shown in example:

IP A.34340 > B.12345: sctp (1) [INIT] [init tag: 1875509148] [rwnd: 81920] [OS: 10] [MIS: 65535] [init TSN: 1096057017]
IP B.12345 > A.34340: sctp (1) [INIT ACK] [init tag: 3198966556] [rwnd: 10000] [OS: 10] [MIS: 10] [init TSN: 902132839]
IP A.34340 > B.12345: sctp (1) [COOKIE ECHO]
IP B.12345 > A.34340: sctp (1) [COOKIE ACK]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057017] [SID: 0] [SSEQ 0] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057017] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057018] [SID: 0] [SSEQ 1] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057018] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057019] [SID: 0] [SSEQ 2] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057019] [a_rwnd 9914] [#gap acks 0] [#dup tsns 0]
<...>
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057098] [SID: 0] [SSEQ 81] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057098] [a_rwnd 6517] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057099] [SID: 0] [SSEQ 82] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057099] [a_rwnd 6474] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057100] [SID: 0] [SSEQ 83] [PPID 0x18]

--> Sudden drop

IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

At this point, rwnd_press stores current rwnd value so it can be later restored
in sctp_assoc_rwnd_increase. This however doesn't happen as condition to start
slowly increasing rwnd until rwnd_press is returned to rwnd is never met. This
condition is not met since rwnd, after it hit 0, must first reach rwnd_press by
adding amount which is read from userspace. Let us observe values in above
example. Initial a_rwnd is 10000, pressure was hit when rwnd was ~6500 and the
amount of actual sctp data currently waiting to be delivered to userspace
is ~3500. When userspace starts to read, sctp_assoc_rwnd_increase will be blamed
only for sctp data, which is ~3500. Condition is never met, and when userspace
reads all data, rwnd stays on 3569.

IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 1505] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 3010] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057101] [SID: 0] [SSEQ 84] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057101] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]

--> At this point userspace read everything, rwnd recovered only to 3569

IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057102] [SID: 0] [SSEQ 85] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057102] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]

Reproduction is straight forward, it is enough for sender to send packets of
size less then sizeof(struct sk_buff) and receiver keeping them in its buffers.

2) Minute size window for associations sharing the same socket buffer

In case multiple associations share the same socket, and same socket buffer
(sctp.rcvbuf_policy == 0), different scenarios exist in which congestion on one
of the associations can permanently drop rwnd of other association(s).

Situation will be typically observed as one association suddenly having rwnd
dropped to size of last packet received and never recovering beyond that point.
Different scenarios will lead to it, but all have in common that one of the
associations (let it be association from 1)) nearly depleted socket buffer, and
the other association blames socket buffer just for the amount enough to start
the pressure. This association will enter pressure state, set rwnd_press and
announce 0 rwnd.
When data is read by userspace, similar situation as in 1) will occur, rwnd will
increase just for the size read by userspace but rwnd_press will be high enough
so that association doesn't have enough credit to reach rwnd_press and restore
to previous state. This case is special case of 1), being worse as there is, in
the worst case, only one packet in buffer for which size rwnd will be increased.
Consequence is association which has very low maximum rwnd ('minute size', in
our case down to 43B - size of packet which caused pressure) and as such
unusable.

Scenario happened in the field and labs frequently after congestion state (link
breaks, different probabilities of packet drop, packet reordering) and with
scenario 1) preceding. Here is given a deterministic scenario for reproduction:

>From node A establish two associations on the same socket, with rcvbuf_policy
being set to share one common buffer (sctp.rcvbuf_policy == 0). On association 1
repeat scenario from 1), that is, bring it down to 0 and restore up. Observe
scenario 1). Use small payload size (here we use 43). Once rwnd is 'recovered',
bring it down close to 0, as in just one more packet would close it. This has as
a consequence that association number 2 is able to receive (at least) one more
packet which will bring it in pressure state. E.g. if association 2 had rwnd of
10000, packet received was 43, and we enter at this point into pressure,
rwnd_press will have 9957. Once payload is delivered to userspace, rwnd will
increase for 43, but conditions to restore rwnd to original state, just as in
1), will never be satisfied.

--> Association 1, between A.y and B.12345

IP A.55915 > B.12345: sctp (1) [INIT] [init tag: 836880897] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 4032536569]
IP B.12345 > A.55915: sctp (1) [INIT ACK] [init tag: 2873310749] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3799315613]
IP A.55915 > B.12345: sctp (1) [COOKIE ECHO]
IP B.12345 > A.55915: sctp (1) [COOKIE ACK]

--> Association 2, between A.z and B.12346

IP A.55915 > B.12346: sctp (1) [INIT] [init tag: 534798321] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 2099285173]
IP B.12346 > A.55915: sctp (1) [INIT ACK] [init tag: 516668823] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3676403240]
IP A.55915 > B.12346: sctp (1) [COOKIE ECHO]
IP B.12346 > A.55915: sctp (1) [COOKIE ACK]

--> Deplete socket buffer by sending messages of size 43B over association 1

IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315613] [SID: 0] [SSEQ 0] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315613] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]

<...>

IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315696] [a_rwnd 6388] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315697] [SID: 0] [SSEQ 84] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315697] [a_rwnd 6345] [#gap acks 0] [#dup tsns 0]

--> Sudden drop on 1

IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315698] [SID: 0] [SSEQ 85] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315698] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

--> Here userspace read, rwnd 'recovered' to 3698, now deplete again using
    association 1 so there is place in buffer for only one more packet

IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315799] [SID: 0] [SSEQ 186] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315799] [a_rwnd 86] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315800] [SID: 0] [SSEQ 187] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]

--> Socket buffer is almost depleted, but there is space for one more packet,
    send them over association 2, size 43B

IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403240] [SID: 0] [SSEQ 0] [PPID 0x18]
IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403240] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

--> Immediate drop

IP A.60995 > B.12346: sctp (1) [SACK] [cum ack 387491510] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

--> Read everything from the socket, both association recover up to maximum rwnd
    they are capable of reaching, note that association 1 recovered up to 3698,
    and association 2 recovered only to 43

IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 1548] [#gap acks 0] [#dup tsns 0]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 3053] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315801] [SID: 0] [SSEQ 188] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315801] [a_rwnd 3698] [#gap acks 0] [#dup tsns 0]
IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403241] [SID: 0] [SSEQ 1] [PPID 0x18]
IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403241] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]

A careful reader might wonder why it is necessary to reproduce 1) prior
reproduction of 2). It is simply easier to observe when to send packet over
association 2 which will push association into the pressure state.

Proposed solution:

Both problems share the same root cause, and that is improper scaling of socket
buffer with rwnd. Solution in which sizeof(sk_buff) is taken into concern while
calculating rwnd is not possible due to fact that there is no linear
relationship between amount of data blamed in increase/decrease with IP packet
in which payload arrived. Even in case such solution would be followed,
complexity of the code would increase. Due to nature of current rwnd handling,
slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure state is
entered is rationale, but it gives false representation to the sender of current
buffer space. Furthermore, it implements additional congestion control mechanism
which is defined on implementation, and not on standard basis.

Proposed solution simplifies whole algorithm having on mind definition from rfc:

o  Receiver Window (rwnd): This gives the sender an indication of the space
   available in the receiver's inbound buffer.

Core of the proposed solution is given with these lines:

sctp_assoc_rwnd_update:
	if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
		asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
	else
		asoc->rwnd = 0;

We advertise to sender (half of) actual space we have. Half is in the braces
depending whether you would like to observe size of socket buffer as SO_RECVBUF
or twice the amount, i.e. size is the one visible from userspace, that is,
from kernelspace.
In this way sender is given with good approximation of our buffer space,
regardless of the buffer policy - we always advertise what we have. Proposed
solution fixes described problems and removes necessity for rwnd restoration
algorithm. Finally, as proposed solution is simplification, some lines of code,
along with some bytes in struct sctp_association are saved.

Version 2 of the patch addressed comments from Vlad. Name of the function is set
to be more descriptive, and two parts of code are changed, in one removing the
superfluous call to sctp_assoc_rwnd_update since call would not result in update
of rwnd, and the other being reordering of the code in a way that call to
sctp_assoc_rwnd_update updates rwnd. Version 3 corrected change introduced in v2
in a way that existing function is not reordered/copied in line, but it is
correctly called. Thanks Vlad for suggesting.

Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 00:16:56 -05:00
Duan Jiong cd0f0b95fd ipv4: distinguish EHOSTUNREACH from the ENETUNREACH
since commit 251da413("ipv4: Cache ip_error() routes even when not forwarding."),
the counter IPSTATS_MIB_INADDRERRORS can't work correctly, because the value of
err was always set to ENETUNREACH.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-16 23:45:31 -05:00
Gerrit Renker 09db308053 dccp: re-enable debug macro
dccp tfrc: revert

This reverts 6aee49c558 ("dccp: make local variable static") since
the variable tfrc_debug is referenced by the tfrc_pr_debug(fmt, ...)
macro when TFRC debugging is enabled. If it is enabled, use of the
macro produces a compilation error.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-16 23:45:00 -05:00
Trond Myklebust e9776d0f4a SUNRPC: Fix a pipe_version reference leak
In gss_alloc_msg(), if the call to gss_encode_v1_msg() fails, we
want to release the reference to the pipe_version that was obtained
earlier in the function.

Fixes: 9d3a2260f0 (SUNRPC: Fix buffer overflow checking in...)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-16 13:28:01 -05:00
Trond Myklebust 9eb2ddb48c SUNRPC: Ensure that gss_auth isn't freed before its upcall messages
Fix a race in which the RPC client is shutting down while the
gss daemon is processing a downcall. If the RPC client manages to
shut down before the gss daemon is done, then the struct gss_auth
used in gss_release_msg() may have already been freed.

Link: http://lkml.kernel.org/r/1392494917.71728.YahooMailNeo@web140002.mail.bf1.yahoo.com
Reported-by: John <da_audiophile@yahoo.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-16 13:06:06 -05:00
FX Le Bail 2b7a79bae2 netfilter: nf_nat_snmp_basic: fix duplicates in if/else branches
The solution was found by Patrick in 2.4 kernel sources.

Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-14 11:37:36 +01:00
Paul Bolle 06efbd6d56 netfilter: nft_meta: fix typo "CONFIG_NET_CLS_ROUTE"
There are two checks for CONFIG_NET_CLS_ROUTE, but the corresponding
Kconfig symbol was dropped in v2.6.39. Since the code guards access to
dst_entry.tclassid it seems CONFIG_IP_ROUTE_CLASSID should be used
instead.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-14 11:37:34 +01:00
Patrick McHardy ce898ecb5a netfilter: nft_reject_inet: fix unintended fall-through in switch-statatement
For IPv4 packets, we call both IPv4 and IPv6 reject.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-14 11:37:33 +01:00
FX Le Bail 357137a422 ipv4: ipconfig.c: add parentheses in an if statement
Even if the 'time_before' macro expand with parentheses, the look is bad.

Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-14 00:14:23 -05:00
Vijay Subramanian 219e288e89 net: sched: Cleanup PIE comments
Fix incorrect comment reported by Norbert Kiesel. Edit another comment to add
more details. Also add references to algorithm (IETF draft and paper) to top of
file.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
CC: Mythili Prabhu <mysuryan@cisco.com>
CC: Norbert Kiesel <nkiesel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 18:29:58 -05:00
Florian Westphal fe6cc55f3a net: ip, ipv6: handle gso skbs in forwarding path
Marcelo Ricardo Leitner reported problems when the forwarding link path
has a lower mtu than the incoming one if the inbound interface supports GRO.

Given:
Host <mtu1500> R1 <mtu1200> R2

Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.

In this case, the kernel will fail to send ICMP fragmentation needed
messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
checks in forward path. Instead, Linux tries to send out packets exceeding
the mtu.

When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
not fragment the packets when forwarding, and again tries to send out
packets exceeding R1-R2 link mtu.

This alters the forwarding dstmtu checks to take the individual gso
segment lengths into account.

For ipv6, we send out pkt too big error for gso if the individual
segments are too big.

For ipv4, we either send icmp fragmentation needed, or, if the DF bit
is not set, perform software segmentation and let the output path
create fragments when the packet is leaving the machine.
It is not 100% correct as the error message will contain the headers of
the GRO skb instead of the original/segmented one, but it seems to
work fine in my (limited) tests.

Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
sofware segmentation.

However it turns out that skb_segment() assumes skb nr_frags is related
to mss size so we would BUG there.  I don't want to mess with it considering
Herbert and Eric disagree on what the correct behavior should be.

Hannes Frederic Sowa notes that when we would shrink gso_size
skb_segment would then also need to deal with the case where
SKB_MAX_FRAGS would be exceeded.

This uses sofware segmentation in the forward path when we hit ipv4
non-DF packets and the outgoing link mtu is too small.  Its not perfect,
but given the lack of bug reports wrt. GRO fwd being broken this is a
rare case anyway.  Also its not like this could not be improved later
once the dust settles.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 17:17:02 -05:00
Florian Westphal d206940319 net: core: introduce netif_skb_dev_features
Will be used by upcoming ipv4 forward path change that needs to
determine feature mask using skb->dst->dev instead of skb->dev.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 17:17:02 -05:00
wangweidong efb842c45e sctp: optimize the sctp_sysctl_net_register
Here, when the net is init_net, we needn't to kmemdup the ctl_table
again. So add a check for net. Also we can save some memory.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 17:08:29 -05:00
wangweidong 22a1f5140e sctp: fix a missed .data initialization
As commit 3c68198e75111a90("sctp: Make hmac algorithm selection for
 cookie generation dynamic"), we miss the .data initialization.
If we don't use the net_namespace, the problem that parts of the
sysctl configuration won't be isolation and won't occur.

In sctp_sysctl_net_register(), we register the sysctl for each
net, in the for(), we use the 'table[i].data' as check condition, so
when the 'i' is the index of sctp_hmac_alg, the data is NULL, then
break. So add the .data initialization.

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 17:08:29 -05:00
Cong Wang 0e0eee2465 net: correct error path in rtnl_newlink()
I saw the following BUG when ->newlink() fails in rtnl_newlink():

[   40.240058] kernel BUG at net/core/dev.c:6438!

this is due to free_netdev() is not supposed to be called before
netdev is completely unregistered, therefore it is not correct
to call free_netdev() here, at least for ops->newlink!=NULL case,
many drivers call it in ->destructor so that rtnl_unlock() will
take care of it, we probably don't need to do anything here.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 17:08:29 -05:00
Erik Hugne 64380a04de tipc: fix message corruption bug for deferred packets
If a packet received on a link is out-of-sequence, it will be
placed on a deferred queue and later reinserted in the receive
path once the preceding packets have been processed. The problem
with this is that it will be subject to the buffer adjustment from
link_recv_buf_validate twice. The second adjustment for 20 bytes
header space will corrupt the packet.

We solve this by tagging the deferred packets and bail out from
receive buffer validation for packets that have already been
subjected to this.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 16:35:05 -05:00
Felix Fietkau 1bf4bbb402 mac80211: send control port protocol frames to the VO queue
Improves reliability of wifi connections with WPA, since authentication
frames are prioritized over normal traffic and also typically exempt
from aggregation.

Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-12 11:26:43 +01:00
Linus Torvalds 16e5a2ed59 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates from David Miller:

 1) Fix flexcan build on big endian, from Arnd Bergmann

 2) Correctly attach cpsw to GPIO bitbang MDIO drive, from Stefan Roese

 3) udp_add_offload has to use GFP_ATOMIC since it can be invoked from
    non-sleepable contexts.  From Or Gerlitz

 4) vxlan_gro_receive() does not iterate over all possible flows
    properly, fix also from Or Gerlitz

 5) CAN core doesn't use a proper SKB destructor when it hooks up
    sockets to SKBs.  Fix from Oliver Hartkopp

 6) ip_tunnel_xmit() can use an uninitialized route pointer, fix from
    Eric Dumazet

 7) Fix address family assignment in IPVS, from Michal Kubecek

 8) Fix ath9k build on ARM, from Sujith Manoharan

 9) Make sure fail_over_mac only applies for the correct bonding modes,
    from Ding Tianhong

10) The udp offload code doesn't use RCU correctly, from Shlomo Pongratz

11) Handle gigabit features properly in generic PHY code, from Florian
    Fainelli

12) Don't blindly invoke link operations in
    rtnl_link_get_slave_info_data_size, they are optional.  Fix from
    Fernando Luis Vazquez Cao

13) Add USB IDs for Netgear Aircard 340U, from Bjørn Mork

14) Handle netlink packet padding properly in openvswitch, from Thomas
    Graf

15) Fix oops when deleting chains in nf_tables, from Patrick McHardy

16) Fix RX stalls in xen-netback driver, from Zoltan Kiss

17) Fix deadlock in mac80211 stack, from Emmanuel Grumbach

18) inet_nlmsg_size() forgets to consider ifa_cacheinfo, fix from Geert
    Uytterhoeven

19) tg3_change_mtu() can deadlock, fix from Nithin Sujir

20) Fix regression in setting SCTP local source addresses on accepted
    sockets, caused by some generic ipv6 socket changes.  Fix from
    Matija Glavinic Pecotic

21) IPPROTO_* must be pure defines, otherwise module aliases don't get
    constructed properly.  Fix from Jan Moskyto

22) IPV6 netconsole setup doesn't work properly unless an explicit
    source address is specified, fix from Sabrina Dubroca

23) Use __GFP_NORETRY for high order skb page allocations in
    sock_alloc_send_pskb and skb_page_frag_refill.  From Eric Dumazet

24) Fix a regression added in netconsole over bridging, from Cong Wang

25) TCP uses an artificial offset of 1ms for SRTT, but this doesn't jive
    well with TCP pacing which needs the SRTT to be accurate.  Fix from
    Eric Dumazet

26) Several cases of missing header file includes from Rashika Kheria

27) Add ZTE MF667 device ID to qmi_wwan driver, from Raymond Wanyoike

28) TCP Small Queues doesn't handle nonagle properly in some corner
    cases, fix from Eric Dumazet

29) Remove extraneous read_unlock in bond_enslave, whoops.  From Ding
    Tianhong

30) Fix 9p trans_virtio handling of vmalloc buffers, from Richard Yao

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (136 commits)
  6lowpan: fix lockdep splats
  alx: add missing stats_lock spinlock init
  9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers
  bonding: remove unwanted bond lock for enslave processing
  USB2NET : SR9800 : One chip USB2.0 USB2NET SR9800 Device Driver Support
  tcp: tsq: fix nonagle handling
  bridge: Prevent possible race condition in br_fdb_change_mac_address
  bridge: Properly check if local fdb entry can be deleted when deleting vlan
  bridge: Properly check if local fdb entry can be deleted in br_fdb_delete_by_port
  bridge: Properly check if local fdb entry can be deleted in br_fdb_change_mac_address
  bridge: Fix the way to check if a local fdb entry can be deleted
  bridge: Change local fdb entries whenever mac address of bridge device changes
  bridge: Fix the way to find old local fdb entries in br_fdb_change_mac_address
  bridge: Fix the way to insert new local fdb entries in br_fdb_changeaddr
  bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr
  tcp: correct code comment stating 3 min timeout for FIN_WAIT2, we only do 1 min
  net: vxge: Remove unused device pointer
  net: qmi_wwan: add ZTE MF667
  3c59x: Remove unused pointer in vortex_eisa_cleanup()
  net: fix 'ip rule' iif/oif device rename
  ...
2014-02-11 12:05:55 -08:00
Trond Myklebust 628356791b SUNRPC: Fix potential memory scribble in xprt_free_bc_request()
The call to xprt_free_allocation() will call list_del() on
req->rq_bc_pa_list, which is not attached to a list.
This patch moves the list_del() out of xprt_free_allocation()
and into those callers that need it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-11 13:56:54 -05:00
Trond Myklebust 06ea0bfe6e SUNRPC: Fix races in xs_nospace()
When a send failure occurs due to the socket being out of buffer space,
we call xs_nospace() in order to have the RPC task wait until the
socket has drained enough to make it worth while trying again.
The current patch fixes a race in which the socket is drained before
we get round to setting up the machinery in xs_nospace(), and which
is reported to cause hangs.

Link: http://lkml.kernel.org/r/20140210170315.33dfc621@notabene.brown
Fixes: a9a6b52ee1 (SUNRPC: Don't start the retransmission timer...)
Reported-by: Neil Brown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-11 09:34:30 -05:00
Eytan Lifshitz c368ddaa9a mac80211: fix memory leak
In case ieee80211_prep_connection() fails to dereference
sdata->vif.chanctx_conf, the function returns and doesn't
free new_sta. fixed.

Signed-off-by: Eytan Lifshitz <eytan.lifshitz@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-11 12:59:36 +01:00
Arik Nemtsov 32769814d5 mac80211: fix sched_scan restart on recovery
In case we were not suspended, the reconfig function returns without
configuring the scheduled scan.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-11 12:59:12 +01:00
Eric Dumazet 20e7c4e80d 6lowpan: fix lockdep splats
When a device ndo_start_xmit() calls again dev_queue_xmit(),
lockdep can complain because dev_queue_xmit() is re-entered and the
spinlocks protecting tx queues share a common lockdep class.

Same issue was fixed for bonding/l2tp/ppp in commits

0daa230302 ("[PATCH] bonding: lockdep annotation")
49ee49202b ("bonding: set qdisc_tx_busylock to avoid LOCKDEP splat")
23d3b8bfb8 ("net: qdisc busylock needs lockdep annotations ")
303c07db48 ("ppp: set qdisc_tx_busylock to avoid LOCKDEP splat ")

Reported-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 17:51:29 -08:00
Richard Yao b6f52ae2f0 9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers
The 9p-virtio transport does zero copy on things larger than 1024 bytes
in size. It accomplishes this by returning the physical addresses of
pages to the virtio-pci device. At present, the translation is usually a
bit shift.

That approach produces an invalid page address when we read/write to
vmalloc buffers, such as those used for Linux kernel modules. Any
attempt to load a Linux kernel module from 9p-virtio produces the
following stack.

[<ffffffff814878ce>] p9_virtio_zc_request+0x45e/0x510
[<ffffffff814814ed>] p9_client_zc_rpc.constprop.16+0xfd/0x4f0
[<ffffffff814839dd>] p9_client_read+0x15d/0x240
[<ffffffff811c8440>] v9fs_fid_readn+0x50/0xa0
[<ffffffff811c84a0>] v9fs_file_readn+0x10/0x20
[<ffffffff811c84e7>] v9fs_file_read+0x37/0x70
[<ffffffff8114e3fb>] vfs_read+0x9b/0x160
[<ffffffff81153571>] kernel_read+0x41/0x60
[<ffffffff810c83ab>] copy_module_from_fd.isra.34+0xfb/0x180

Subsequently, QEMU will die printing:

qemu-system-x86_64: virtio: trying to map MMIO memory

This patch enables 9p-virtio to correctly handle this case. This not
only enables us to load Linux kernel modules off virtfs, but also
enables ZFS file-based vdevs on virtfs to be used without killing QEMU.

Special thanks to both Avi Kivity and Alexander Graf for their
interpretation of QEMU backtraces. Without their guidence, tracking down
this bug would have taken much longer. Also, special thanks to Linus
Torvalds for his insightful explanation of why this should use
is_vmalloc_addr() instead of is_vmalloc_or_module_addr():

https://lkml.org/lkml/2014/2/8/272

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 17:48:54 -08:00
John Ogness bf06200e73 tcp: tsq: fix nonagle handling
Commit 46d3ceabd8 ("tcp: TCP Small Queues") introduced a possible
regression for applications using TCP_NODELAY.

If TCP session is throttled because of tsq, we should consult
tp->nonagle when TX completion is done and allow us to send additional
segment, especially if this segment is not a full MSS.
Otherwise this segment is sent after an RTO.

[edumazet] : Cooked the changelog, added another fix about testing
sk_wmem_alloc twice because TX completion can happen right before
setting TSQ_THROTTLED bit.

This problem is particularly visible with recent auto corking,
but might also be triggered with low tcp_limit_output_bytes
values or NIC drivers delaying TX completion by hundred of usec,
and very low rtt.

Thomas Glanzmann for example reported an iscsi regression, caused
by tcp auto corking making this bug quite visible.

Fixes: 46d3ceabd8 ("tcp: TCP Small Queues")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Thomas Glanzmann <thomas@glanzmann.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 15:23:39 -08:00
Toshiaki Makita ac4c886883 bridge: Prevent possible race condition in br_fdb_change_mac_address
br_fdb_change_mac_address() calls fdb_insert()/fdb_delete() without
br->hash_lock.

These hash list updates are racy with br_fdb_update()/br_fdb_cleanup().

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 14:34:34 -08:00
Toshiaki Makita 424bb9c97c bridge: Properly check if local fdb entry can be deleted when deleting vlan
Vlan codes unconditionally delete local fdb entries.
We should consider the possibility that other ports have the same
address and vlan.

Example of problematic case:
  ip link set eth0 address 12:34:56:78:90:ab
  ip link set eth1 address aa:bb:cc:dd:ee:ff
  brctl addif br0 eth0
  brctl addif br0 eth1 # br0 will have mac address 12:34:56:78:90:ab
  bridge vlan add dev eth0 vid 10
  bridge vlan add dev eth1 vid 10
  bridge vlan add dev br0 vid 10 self
We will have fdb entry such that f->dst == eth0, f->vlan_id == 10 and
f->addr == 12:34:56:78:90:ab at this time.
Next, delete eth0 vlan 10.
  bridge vlan del dev eth0 vid 10
In this case, we still need the entry for br0, but it will be deleted.

Note that br0 needs the entry even though its mac address is not set
manually. To delete the entry with proper condition checking,
fdb_delete_local() is suitable to use.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 14:34:34 -08:00
Toshiaki Makita a778e6d1a5 bridge: Properly check if local fdb entry can be deleted in br_fdb_delete_by_port
br_fdb_delete_by_port() doesn't care about vlan and mac address of the
bridge device.

As the check is almost the same as mac address changing, slightly modify
fdb_delete_local() and use it.

Note that we can always set added_by_user to 0 in fdb_delete_local() because
- br_fdb_delete_by_port() calls fdb_delete_local() for local entries
  regardless of its added_by_user. In this case, we have to check if another
  port has the same address and vlan, and if found, we have to create the
  entry (by changing dst). This is kernel-added entry, not user-added.
- br_fdb_changeaddr() doesn't call fdb_delete_local() for user-added entry.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 14:34:34 -08:00
Toshiaki Makita 960b589f86 bridge: Properly check if local fdb entry can be deleted in br_fdb_change_mac_address
br_fdb_change_mac_address() doesn't check if the local entry has the
same address as any of bridge ports.
Although I'm not sure when it is beneficial, current implementation allow
the bridge device to receive any mac address of its ports.
To preserve this behavior, we have to check if the mac address of the
entry being deleted is identical to that of any port.

As this check is almost the same as that in br_fdb_changeaddr(), create
a common function fdb_delete_local() and call it from
br_fdb_changeadddr() and br_fdb_change_mac_address().

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 14:34:33 -08:00
Toshiaki Makita 2b292fb4a5 bridge: Fix the way to check if a local fdb entry can be deleted
We should take into account the followings when deleting a local fdb
entry.

- nbp_vlan_find() can be used only when vid != 0 to check if an entry is
  deletable, because a fdb entry with vid 0 can exist at any time while
  nbp_vlan_find() always return false with vid 0.

  Example of problematic case:
    ip link set eth0 address 12:34:56:78:90:ab
    ip link set eth1 address 12:34:56:78:90:ab
    brctl addif br0 eth0
    brctl addif br0 eth1
    ip link set eth0 address aa:bb:cc:dd:ee:ff
  Then, the fdb entry 12:34:56:78:90:ab will be deleted even though the
  bridge port eth1 still has that address.

- The port to which the bridge device is attached might needs a local entry
  if its mac address is set manually.

  Example of problematic case:
    ip link set eth0 address 12:34:56:78:90:ab
    brctl addif br0 eth0
    ip link set br0 address 12:34:56:78:90:ab
    ip link set eth0 address aa:bb:cc:dd:ee:ff
  Then, the fdb still must have the entry 12:34:56:78:90:ab, but it will be
  deleted.

We can use br->dev->addr_assign_type to check if the address is manually
set or not, but I propose another approach.

Since we delete and insert local entries whenever changing mac address
of the bridge device, we can change dst of the entry to NULL regardless of
addr_assign_type when deleting an entry associated with a certain port,
and if it is found to be unnecessary later, then delete it.
That is, if changing mac address of a port, the entry might be changed
to its dst being NULL first, but is eventually deleted when recalculating
and changing bridge id.

This approach is especially useful when we want to share the code with
deleting vlan in which the bridge device might want such an entry regardless
of addr_assign_type, and makes things easy because we don't have to consider
if mac address of the bridge device will be changed or not at the time we
delete a local entry of a port, which means fdb code will not be bothered
even if the bridge id calculating logic is changed in the future.

Also, this change reduces inconsistent state, where frames whose dst is the
mac address of the bridge, can't reach the bridge because of premature fdb
entry deletion. This change reduces the possibility that the bridge device
replies unreachable mac address to arp requests, which could occur during
the short window between calling del_nbp() and br_stp_recalculate_bridge_id()
in br_del_if(). This will effective after br_fdb_delete_by_port() starts to
use the same code by following patch.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 14:34:33 -08:00
Toshiaki Makita a4b816d8ba bridge: Change local fdb entries whenever mac address of bridge device changes
Vlan code may need fdb change when changing mac address of bridge device
even if it is caused by the mac address changing of a bridge port.

Example configuration:
  ip link set eth0 address 12:34:56:78:90:ab
  ip link set eth1 address aa:bb:cc:dd:ee:ff
  brctl addif br0 eth0
  brctl addif br0 eth1 # br0 will have mac address 12:34:56:78:90:ab
  bridge vlan add dev br0 vid 10 self
  bridge vlan add dev eth0 vid 10
We will have fdb entry such that f->dst == NULL, f->vlan_id == 10 and
f->addr == 12:34:56:78:90:ab at this time.
Next, change the mac address of eth0 to greater value.
  ip link set eth0 address ee:ff:12:34:56:78
Then, mac address of br0 will be recalculated and set to aa:bb:cc:dd:ee:ff.
However, an entry aa:bb:cc:dd:ee:ff will not be created and we will be not
able to communicate using br0 on vlan 10.

Address this issue by deleting and adding local entries whenever
changing the mac address of the bridge device.

If there already exists an entry that has the same address, for example,
in case that br_fdb_changeaddr() has already inserted it,
br_fdb_change_mac_address() will simply fail to insert it and no
duplicated entry will be made, as it was.

This approach also needs br_add_if() to call br_fdb_insert() before
br_stp_recalculate_bridge_id() so that we don't create an entry whose
dst == NULL in this function to preserve previous behavior.

Note that this is a slight change in behavior where the bridge device can
receive the traffic to the new address before calling
br_stp_recalculate_bridge_id() in br_add_if().
However, it is not a problem because we have already the address on the
new port and such a way to insert new one before recalculating bridge id
is taken in br_device_event() as well.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 14:34:33 -08:00
Toshiaki Makita a3ebb7efe7 bridge: Fix the way to find old local fdb entries in br_fdb_change_mac_address
We have been always failed to delete the old entry at
br_fdb_change_mac_address() because br_set_mac_address() updates
dev->dev_addr before calling br_fdb_change_mac_address() and
br_fdb_change_mac_address() uses dev->dev_addr to find the old entry.

That update of dev_addr is completely unnecessary because the same work
is done in br_stp_change_bridge_id() which is called right away after
calling br_fdb_change_mac_address().

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 14:34:33 -08:00
Toshiaki Makita 2836882fe0 bridge: Fix the way to insert new local fdb entries in br_fdb_changeaddr
Since commit bc9a25d21e ("bridge: Add vlan support for local fdb entries"),
br_fdb_changeaddr() has inserted a new local fdb entry only if it can
find old one. But if we have two ports where they have the same address
or user has deleted a local entry, there will be no entry for one of the
ports.

Example of problematic case:
  ip link set eth0 address aa:bb:cc:dd:ee:ff
  ip link set eth1 address aa:bb:cc:dd:ee:ff
  brctl addif br0 eth0
  brctl addif br0 eth1 # eth1 will not have a local entry due to dup.
  ip link set eth1 address 12:34:56:78:90:ab
Then, the new entry for the address 12:34:56:78:90:ab will not be
created, and the bridge device will not be able to communicate.

Insert new entries regardless of whether we can find old entries or not.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 14:34:33 -08:00
Toshiaki Makita a5642ab474 bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr
br_fdb_changeaddr() assumes that there is at most one local entry per port
per vlan. It used to be true, but since commit 36fd2b63e3 ("bridge: allow
creating/deleting fdb entries via netlink"), it has not been so.
Therefore, the function might fail to search a correct previous address
to be deleted and delete an arbitrary local entry if user has added local
entries manually.

Example of problematic case:
  ip link set eth0 address ee:ff:12:34:56:78
  brctl addif br0 eth0
  bridge fdb add 12:34:56:78:90:ab dev eth0 master
  ip link set eth0 address aa:bb:cc:dd:ee:ff
Then, the address 12:34:56:78:90:ab might be deleted instead of
ee:ff:12:34:56:78, the original mac address of eth0.

Address this issue by introducing a new flag, added_by_user, to struct
net_bridge_fdb_entry.

Note that br_fdb_delete_by_port() has to set added_by_user to 0 in cases
like:
  ip link set eth0 address 12:34:56:78:90:ab
  ip link set eth1 address aa:bb:cc:dd:ee:ff
  brctl addif br0 eth0
  bridge fdb add aa:bb:cc:dd:ee:ff dev eth0 master
  brctl addif br0 eth1
  brctl delif br0 eth0
In this case, kernel should delete the user-added entry aa:bb:cc:dd:ee:ff,
but it also should have been added by "brctl addif br0 eth1" originally,
so we don't delete it and treat it a new kernel-created entry.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 14:34:33 -08:00
Trond Myklebust a699d65ec4 SUNRPC: Don't create a gss auth cache unless rpc.gssd is running
An infinite loop is caused when nfs4_establish_lease() fails
with -EACCES. This causes nfs4_handle_reclaim_lease_error()
to sleep a bit and resets the NFS4CLNT_LEASE_EXPIRED bit.
This in turn causes nfs4_state_manager() to try and
reestablished the lease, again, again, again...

The problem is a valid RPCSEC_GSS client is being created when
rpc.gssd is not running.

Link: http://lkml.kernel.org/r/1392066375-16502-1-git-send-email-steved@redhat.com
Fixes: 0ea9de0ea6 (sunrpc: turn warn_gssd() log message into a dprintk())
Reported-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-10 16:49:17 -05:00
Jesper Juhl b10bd54c05 tcp: correct code comment stating 3 min timeout for FIN_WAIT2, we only do 1 min
As far as I can tell we have used a default of 60 seconds for
FIN_WAIT2 timeout for ages (since 2.x times??).

In any case, the timeout these days is 60 seconds, so the 3 min
comment is wrong (and cost me a few minutes of my life when I was
debugging a FIN_WAIT2 related problem in a userspace application and
checked the kernel source for details).

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 19:14:23 -08:00
Maciej Żenczykowski 946c032e5a net: fix 'ip rule' iif/oif device rename
ip rules with iif/oif references do not update:
(detach/attach) across interface renames.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
CC: Willem de Bruijn <willemb@google.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Chris Davis <chrismd@google.com>
CC: Carlo Contavalli <ccontavalli@google.com>

Google-Bug-Id: 12936021
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 19:02:52 -08:00
Christian Engelmayer 310f6fd0ca 6lowpan: Remove unused pointer in lowpan_header_create()
Commit 8df8c56a (6lowpan: Moving generic compression code into 6lowpan_iphc.c)
left pointer 'hdr' unused - remove it.

Detected by Coverity: CID 1164868.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 18:38:10 -08:00
FX Le Bail d94c1f92bb ipv6: icmp6_send: fix Oops when pinging a not set up IPv6 peer on a sit tunnel
The patch 446fab5933 ("ipv6: enable anycast addresses
as source addresses in ICMPv6 error messages") causes an Oops when pinging a not
set up IPv6 peer on a sit tunnel.

The problem is that ipv6_anycast_destination() uses unconditionally skb_dst(skb),
which is NULL in this case.

The solution is to use instead the ipv6_chk_acast_addr_src() function.

Here are the steps to reproduce it:
modprobe sit
ip link add sit1 type sit remote 10.16.0.121 local 10.16.0.249
ip l s sit1 up
ip -6 a a dev sit1 2001🔢:123 remote 2001🔢:121
ping6 2001🔢:121

Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 18:12:40 -08:00
Rashika Kheria e1d83ee673 net: Mark functions as static in net/sunrpc/svc_xprt.c
Mark functions as static in net/sunrpc/svc_xprt.c because they are not
used outside this file.

This eliminates the following warning in net/sunrpc/svc_xprt.c:
net/sunrpc/svc_xprt.c:574:5: warning: no previous prototype for ‘svc_alloc_arg’ [-Wmissing-prototypes]
net/sunrpc/svc_xprt.c:615:18: warning: no previous prototype for ‘svc_get_next_xprt’ [-Wmissing-prototypes]
net/sunrpc/svc_xprt.c:694:6: warning: no previous prototype for ‘svc_add_new_temp_xprt’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:50 -08:00
Rashika Kheria bd76ed36ba net: Include appropriate header file in netfilter/nft_lookup.c
Include appropriate header file net/netfilter/nf_tables_core.h in
net/netfilter/nft_lookup.c because it has prototype declaration of
functions defined in net/netfilter/nft_lookup.c.

This eliminates the following warning in net/netfilter/nft_lookup.c:
net/netfilter/nft_lookup.c:133:12: warning: no previous prototype for ‘nft_lookup_module_init’ [-Wmissing-prototypes]
net/netfilter/nft_lookup.c:138:6: warning: no previous prototype for ‘nft_lookup_module_exit’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:50 -08:00
Rashika Kheria 535d3ae9c8 net: Move prototype declaration to header file include/net/net_namespace.h from net/ipx/af_ipx.c
Move prototype declaration of function to header file
include/net/net_namespace.h from net/ipx/af_ipx.c because they are used
by more than one file.

This eliminates the following warning in net/ipx/sysctl_net_ipx.c:
net/ipx/sysctl_net_ipx.c:33:6: warning: no previous prototype for ‘ipx_register_sysctl’ [-Wmissing-prototypes]
net/ipx/sysctl_net_ipx.c:38:6: warning: no previous prototype for ‘ipx_unregister_sysctl’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:50 -08:00
Rashika Kheria 7780d8ae4a net: Move prototype declaration to header file include/net/datalink.h from net/ipx/af_ipx.c
Move prototype declarations of function to header file
include/net/datalink.h from net/ipx/af_ipx.c because they are used by
more than one file.

This eliminates the following warning in net/ipx/pe2.c:
net/ipx/pe2.c:20:24: warning: no previous prototype for ‘make_EII_client’ [-Wmissing-prototypes]
net/ipx/pe2.c:32:6: warning: no previous prototype for ‘destroy_EII_client’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:50 -08:00
Rashika Kheria 578efbc19f net: Move prototype declaration to header file include/net/ipx.h from net/ipx/af_ipx.c
Move prototype declaration of functions to header file include/net/ipx.h
from net/ipx/af_ipx.c because they are used by more than one file.

This eliminates the following warning in
net/ipx/ipx_route.c:33:19: warning: no previous prototype for ‘ipxrtr_lookup’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:52:5: warning: no previous prototype for ‘ipxrtr_add_route’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:94:6: warning: no previous prototype for ‘ipxrtr_del_routes’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:149:5: warning: no previous prototype for ‘ipxrtr_route_skb’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:171:5: warning: no previous prototype for ‘ipxrtr_route_packet’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:261:5: warning: no previous prototype for ‘ipxrtr_ioctl’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:50 -08:00
Rashika Kheria 493cc5e5ba net: Move prototype declaration to include/net/ipx.h from net/ipx/ipx_route.c
Move prototype definition of function to header file include/net/ipx.h
from net/ipx/ipx_route.c because they are used by more than one file.

This eliminates the following warning from net/ipx/af_ipx.c:
net/ipx/af_ipx.c:193:23: warning: no previous prototype for ‘ipxitf_find_using_net’ [-Wmissing-prototypes]
net/ipx/af_ipx.c:577:5: warning: no previous prototype for ‘ipxitf_send’ [-Wmissing-prototypes]
net/ipx/af_ipx.c:1219:8: warning: no previous prototype for ‘ipx_cksum’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:49 -08:00
Rashika Kheria ab3301bd96 net: Move prototype declaration to header file include/net/dn.h from net/decnet/af_decnet.c
Move prototype declaration of functions to header file include/net/dn.h
from net/decnet/af_decnet.c because they are used by more than one file.

This eliminates the following warning in net/decnet/af_decnet.c:
net/decnet/sysctl_net_decnet.c:354:6: warning: no previous prototype for ‘dn_register_sysctl’ [-Wmissing-prototypes]
net/decnet/sysctl_net_decnet.c:359:6: warning: no previous prototype for ‘dn_unregister_sysctl’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:49 -08:00