Commit graph

33968 commits

Author SHA1 Message Date
Ian Morris cc24becae3 ipv6: White-space cleansing : Structure layouts
This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

This patch addresses structure definitions, specifically it cleanses the brace
placement and replaces spaces with tabs in a few places.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 22:37:52 -07:00
Ian Morris 67ba4152e8 ipv6: White-space cleansing : Line Layouts
This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

A number of items are addressed in this patch:
* Multiple spaces converted to tabs
* Spaces before tabs removed.
* Spaces in pointer typing cleansed (char *)foo etc.
* Remove space after sizeof
* Ensure spacing around comparators such as if statements.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 22:37:52 -07:00
Tom Herbert 48a5fc7731 gre: When GRE csum is present count as encap layer wrt csum
In GRE demux if the GRE checksum pop rcv encapsulation so that any
encapsulated checksums are treated as tunnel checksums.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 18:09:24 -07:00
Tom Herbert 57c67ff4bd udp: additional GRO support
Implement GRO for UDPv6. Add UDP checksum verification in gro_receive
for both UDP4 and UDP6 calling skb_gro_checksum_validate_zero_check.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 18:09:24 -07:00
Tom Herbert 149d0774a7 tcp: Call skb_gro_checksum_validate
In tcp[64]_gro_receive call skb_gro_checksum_validate to validate TCP
checksum in the gro context.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 18:09:24 -07:00
Tom Herbert 758f75d1ff gre: call skb_gro_checksum_simple_validate
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 18:09:23 -07:00
Tom Herbert 573e8fca25 net: skb_gro_checksum_* functions
Add skb_gro_checksum_validate, skb_gro_checksum_validate_zero_check,
and skb_gro_checksum_simple_validate, and __skb_gro_checksum_complete.
These are the cognates of the normal checksum functions but are used
in the gro_receive path and operate on GRO related fields in sk_buffs.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 18:09:23 -07:00
Daniel Borkmann 8fc54f6891 net: use reciprocal_scale() helper
Replace open codings of (((u64) <x> * <y>) >> 32) with reciprocal_scale().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 12:21:21 -07:00
David S. Miller 690e36e726 net: Allow raw buffers to be passed into the flow dissector.
Drivers, and perhaps other entities we have not yet considered,
sometimes want to know how deep the protocol headers go before
deciding how large of an SKB to allocate and how much of the packet to
place into the linear SKB area.

For example, consider a driver which has a device which DMAs into
pools of pages and then tells the driver where the data went in the
DMA descriptor(s).  The driver can then build an SKB and reference
most of the data via SKB fragments (which are page/offset/length
triplets).

However at least some of the front of the packet should be placed into
the linear SKB area, which comes before the fragments, so that packet
processing can get at the headers efficiently.  The first thing each
protocol layer is going to do is a "pskb_may_pull()" so we might as
well aggregate as much of this as possible while we're building the
SKB in the driver.

Part of supporting this is that we don't have an SKB yet, so we want
to be able to let the flow dissector operate on a raw buffer in order
to compute the offset of the end of the headers.

So now we have a __skb_flow_dissect() which takes an explicit data
pointer and length.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 12:13:41 -07:00
Jon Paul Maloy 301bae56f2 tipc: merge struct tipc_port into struct tipc_sock
We complete the merging of the port and socket layer by aggregating
the fields of struct tipc_port directly into struct tipc_sock, and
moving the combined structure into socket.c.

We also move all functions and macros that are not any longer
exposed to the rest of the stack into socket.c, and rename them
accordingly.

Despite the size of this commit, there are no functional changes.
We have only made such changes that are necessary due of the removal
of struct tipc_port.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:35 -07:00
Jon Paul Maloy 808d90f9c5 tipc: remove files ref.h and ref.c
The reference table is now 'socket aware' instead of being generic,
and has in reality become a socket internal table. In order to be
able to minimize the API exposed by the socket layer towards the rest
of the stack, we now move the reference table definitions and functions
into the file socket.c, and rename the functions accordingly.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:35 -07:00
Jon Paul Maloy 2e84c60b77 tipc: remove include file port.h
We move the inline functions in the file port.h to socket.c, and modify
their names accordingly.

We move struct tipc_port and some macros to socket.h.

Finally, we remove the file port.h.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:35 -07:00
Jon Paul Maloy 0fc87aaebd tipc: remove source file port.c
In this commit, we move the remaining functions in port.c to
socket.c, and give them new names that correspond to their new
location. We then remove the file port.c.

There are only cosmetic changes to the moved functions.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:35 -07:00
Jon Paul Maloy 6c9808ce09 tipc: remove port_lock
In previous commits we have reduced usage of port_lock to a minimum,
and complemented it with usage of bh_lock_sock() at the remaining
locations. The purpose has been to remove this lock altogether, since
it largely duplicates the role of bh_lock_sock. We are now ready to do
this.

However, we still need to protect the BH callers from inadvertent
release of the socket while they hold a reference to it. We do this by
replacing port_lock by a combination of a rw-lock protecting the
reference table as such, and updating the socket reference counter while
the socket is referenced from BH. This technique is more standard and
comprehensible than the previous approach, and turns out to have a
positive effect on overall performance.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:34 -07:00
Jon Paul Maloy 9b50fd087a tipc: replace port pointer with socket pointer in registry
In order to make tipc_sock the only entity referencable from other
parts of the stack, we add a tipc_sock pointer instead of a tipc_port
pointer to the registry. As a consequence, we also let the function
tipc_port_lock() return a pointer to a tipc_sock instead  of a tipc_port.
We keep the function's name for now, since the lock still is owned by
the port.

This is another step in the direction of eliminating port_lock, replacing
its usage with lock_sock() and bh_lock_sock().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:34 -07:00
Jon Paul Maloy 5a9ee0be33 tipc: use registry when scanning sockets
The functions tipc_port_get_ports() and tipc_port_reinit() scan over
all sockets/ports to access each of them. This is done by using a
dedicated linked list, 'tipc_socks' where all sockets are members. The
list is in turn protected by a spinlock, 'port_list_lock', while each
socket is locked by using port_lock at the moment of access.

In order to reduce complexity and risk of deadlock, we want to get
rid of the linked list and the accompanying spinlock.

This is what we do in this commit. Instead of the linked list, we use
the port registry to scan across the sockets. We also add usage of
bh_lock_sock() inside the scope of port_lock in both functions, as a
preparation for the complete removal of port_lock.

Finally, we move the functions from port.c to socket.c, and rename them
to tipc_sk_sock_show() and tipc_sk_reinit() repectively.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:34 -07:00
Jon Paul Maloy 5b8fa7ce82 tipc: eliminate functions tipc_port_init and tipc_port_destroy
After the latest changes to the socket/port layer the existence of
the functions tipc_port_init() and tipc_port_destroy() cannot be
justified. They are both called only once, from tipc_sk_create() and
tipc_sk_delete() respectively, and their functionality can better be
merged into the latter two functions.

This also entails that all remaining references to port_lock now are
made from inside socket.c, something that will make it easier to remove
this lock.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:34 -07:00
Jon Paul Maloy 739f5e4efc tipc: redefine message acknowledge function
The function tipc_acknowledge() is a remnant from the obsolete native
API. Currently, it grabs port_lock, before building an acknowledge
message and sending it to the peer.

Since all access to socket members now is protected by the socket lock,
it has become unnecessary to grab port_lock here.

In this commit, we remove the usage of port_lock, simplify the
function, and move it to socket.c, renaming it to tipc_sk_send_ack().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:34 -07:00
Jon Paul Maloy dadebc0029 tipc: eliminate port_connect()/port_disconnect() functions
tipc_port_connect()/tipc_port_disconnect() are remnants of the obsolete
native API. Their only task is to grab port_lock and call the functions
__tipc_port_connect()/__tipc_port_disconnect() respectively, which will
perform the actual state change.

Since socket/port exection now is single-threaded the use of port_lock
is not needed any more, so we can safely replace the two functions with
their lock-free counterparts.

In this commit, we remove the two functions. Furthermore, the contents
of __tipc_port_disconnect() is so trivial that we choose to eliminate
that function too, expanding its functionality into tipc_shutdown().
__tipc_port_connect() is simplified, moved to socket.c, and given the
more correct name tipc_sk_finish_conn(). Finally, we eliminate the
function auto_connect(), and expand its contents into filter_connect().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:34 -07:00
Jon Paul Maloy 80e44c2225 tipc: eliminate function tipc_port_shutdown()
tipc_port_shutdown() is a remnant from the now obsolete native
interface. As such it grabs port_lock in order to protect itself
from concurrent BH processing.

However, after the recent changes to the port/socket upcalls, sockets
are now basically single-threaded, and all execution, except the read-only
tipc_sk_timer(), is executing within the protection of lock_sock(). So
the use of port_lock is not needed here.

In this commit we eliminate the whole function, and merge it into its
only caller, tipc_shutdown().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:34 -07:00
Jon Paul Maloy 5728901581 tipc: clean up socket timer function
The last remaining BH upcall to the socket, apart for the message
reception function tipc_sk_rcv(), is the timer function.

We prefer to let this function continue executing in BH, since it only
does read-acces to semi-permanent data, but we make three changes to it:

1) We introduce a bh_lock_sock()/bh_unlock_sock() inside the scope
   of port_lock.  This is a preparation for replacing port_lock with
   bh_lock_sock() at the locations where it is still used.

2) We move the function from port.c to socket.c, as a further step
   of eliminating the port code level altogether.

3) We let it make use of the newly introduced tipc_msg_create()
   function. This enables us to get rid of three context specific
   functions (port_create_self_abort_msg() etc.) in port.c

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:33 -07:00
Jon Paul Maloy 02be61a981 tipc: use message to abort connections when losing contact to node
In the current implementation, each 'struct tipc_node' instance keeps
a linked list of those ports/sockets that are connected to the node
represented by that struct. The purpose of this is to let the node
object know which sockets to alert when it loses contact with its peer
node, i.e., which sockets need to have their connections aborted.

This entails an unwanted direct reference from the node structure
back to the port/socket structure, and a need to grab port_lock
when we have to make an upcall to the port. We want to get rid of
this unecessary BH entry point into the socket, and also eliminate
its use of port_lock.

In this commit, we instead let the node struct keep list of "connected
socket" structs, which each represents a connected socket, but is
allocated independently by the node at the moment of connection. If
the node loses contact with its peer node, the list is traversed, and
a "connection abort" message is created for each entry in the list. The
message is sent to it respective connected socket using the ordinary
data path, and the receiving socket aborts its connections upon reception
of the message.

This enables us to get rid of the direct reference from 'struct node' to
´struct port', and another unwanted BH access point to the latter.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:33 -07:00
Jon Paul Maloy 50100a5e39 tipc: use pseudo message to wake up sockets after link congestion
The current link implementation keeps a linked list of blocked ports/
sockets that is populated when there is link congestion. The purpose
of this is to let the link know which users to wake up when the
congestion abates.

This adds unnecessary complexity to the data structure and the code,
since it forces us to involve the link each time we want to delete
a socket. It also forces us to grab the spinlock port_lock within
the scope of node_lock. We want to get rid of this direct dependence,
as well as the deadlock hazard resulting from the usage of port_lock.

In this commit, we instead let the link keep list of a "wakeup" pseudo
messages for use in such situations. Those messages are sent to the
pending sockets via the ordinary message reception path, and wake up
the socket's owner when they are received.

This enables us to get rid of the 'waiting_ports' linked lists in struct
tipc_port that manifest this direct reference. As a consequence, we can
eliminate another BH entry into the socket, and hence the need to grab
port_lock. This is a further step in our effort to remove port_lock
altogether.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:33 -07:00
Jon Paul Maloy 1dd0bd2b14 tipc: introduce new function tipc_msg_create()
The function tipc_msg_init() has turned out to be of limited value
in many cases. It take too few parameters to be usable for creating
a complete message, it makes too many assumptions about what the
message should be used for, and it does not allocate any buffer to
be returned to the caller.

Therefore, we now introduce the new function tipc_msg_create(), which
takes all the parameters needed to create a full message, and returns
a buffer of the requested size. The new function will be very useful
for the changes we will be doing in later commits in this series.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:33 -07:00
David S. Miller f9474ddfaa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pulling to get some TIPC fixes that a net-next series depends
upon.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:12:08 -07:00
Yuchung Cheng 989e04c5bc tcp: improve undo on timeout
Upon timeout, undo (via both timestamps/Eifel and DSACKs) was
disabled if any retransmits were still in flight.  The concern was
perhaps that spurious retransmission sent in a previous recovery
episode may trigger DSACKs to falsely undo the current recovery.

However, this inadvertently misses undo opportunities (using either
TCP timestamps or DSACKs) when timeout occurs during a loss episode,
i.e.  recurring timeouts or timeout during fast recovery. In these
cases some retransmissions will be in flight but we should allow
undo. Furthermore, we should only reset undo_marker and undo_retrans
upon timeout if we are starting a new recovery episode. Finally,
when we do reset our undo state, we now do so in a manner similar
to tcp_enter_recovery(), so that we require a DSACK for each of
the outstsanding retransmissions. This will achieve the original
goal by requiring that we receive the same number of DSACKs as
retransmissions.

This patch increases the undo events by 50% on Google servers.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 21:28:02 -07:00
Eric Dumazet 884cf705c7 net: remove dead code after sk_data_ready change
As a followup to commit 676d23690f ("net: Fix use after free by
removing length arg from sk_data_ready callbacks"), we can remove
some useless code in sock_queue_rcv_skb() and rxrpc_queue_rcv_skb()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 21:08:50 -07:00
Eric Dumazet d2de875c6d net: use ktime_get_ns() and ktime_get_real_ns() helpers
ktime_get_ns() replaces ktime_to_ns(ktime_get())

ktime_get_real_ns() replaces ktime_to_ns(ktime_get_real())

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 19:57:23 -07:00
Michal Kazior 47e4df94d1 mac80211: fix channel switch for chanctx-based drivers
The new_ctx pointer is set only for non-chanctx drivers.  This yielded a
crash for chanctx-based drivers during channel switch finalization:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]

Use an adequate chanctx pointer to fix this.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-22 14:45:49 -07:00
Himangi Saraogi c0b802367b af_decnet: Use time_after_eq
The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.

A simplified version of the Coccinelle semantic patch making this change
is as follows:

@change@
expression E1,E2,E3;
@@
- jiffies - E1 >= (E2*E3)
+ time_after_eq(jiffies, E1+E2*E3)

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 12:23:11 -07:00
Himangi Saraogi 8b1b1eb521 decnet: Use time_after_eq
The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.

A simplified version of the Coccinelle semantic patch making this change
is as follows:

@change@
expression E1,E2;
@@
- (jiffies - E1) >= E2
+ time_after_eq(jiffies, E1+E2)

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 12:23:11 -07:00
Himangi Saraogi c72c95a064 ipconfig: Use time_before
The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.

A simplified version of the Coccinelle semantic patch making this change
is as follows:

@change@
expression E1,E2;
@@
- jiffies - E1 < E2
+ time_before(jiffies, E1+E2)

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 12:23:11 -07:00
Himangi Saraogi b5c5c36d36 dn_dev: Use time_before
The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.

A simplified version of the Coccinelle semantic patch making this change
is as follows:

@change@
expression E1,E2;
@@

(
- (jiffies - E1) < E2
+ time_before(jiffies, E1+E2)
)

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 12:23:11 -07:00
Andreea-Cristina Bernat 0932997e34 br_multicast: Replace rcu_assign_pointer() with RCU_INIT_POINTER()
The use of "rcu_assign_pointer()" is NULLing out the pointer.
According to RCU_INIT_POINTER()'s block comment:
"1.   This use of RCU_INIT_POINTER() is NULLing out the pointer"
it is better to use it instead of rcu_assign_pointer() because it has a
smaller overhead.

The following Coccinelle semantic patch was used:
@@
@@

- rcu_assign_pointer
+ RCU_INIT_POINTER
  (..., NULL)

Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 12:23:11 -07:00
Andreea-Cristina Bernat 8c6b00c816 net/openvswitch/flow.c: Replace rcu_dereference() with rcu_access_pointer()
The "rcu_dereference()" call is used directly in a condition.
Since its return value is never dereferenced it is recommended to use
"rcu_access_pointer()" instead of "rcu_dereference()".
Therefore, this patch makes the replacement.

The following Coccinelle semantic patch was used:
@@
@@

(
 if(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
|
 while(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
)

Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 12:23:10 -07:00
Andreea-Cristina Bernat e6b688838e net/ipv4/igmp.c: Replace rcu_dereference() with rcu_access_pointer()
The "rcu_dereference()" call is used directly in a condition.
Since its return value is never dereferenced it is recommended to use
"rcu_access_pointer()" instead of "rcu_dereference()".
Therefore, this patch makes the replacement.

The following Coccinelle semantic patch was used:
@@
@@

(
 if(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
|
 while(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
)

Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 12:23:10 -07:00
Sébastien Barré 1dced6a854 ipv4: Restore accept_local behaviour in fib_validate_source()
Commit 7a9bc9b81a ("ipv4: Elide fib_validate_source() completely when possible.")
introduced a short-circuit to avoid calling fib_validate_source when not
needed. That change took rp_filter into account, but not accept_local.
This resulted in a change of behaviour: with rp_filter and accept_local
off, incoming packets with a local address in the source field should be
dropped.

Here is how to reproduce the change pre/post 7a9bc9b81a commit:
-configure the same IPv4 address on hosts A and B.
-try to send an ARP request from B to A.
-The ARP request will be dropped before that commit, but accepted and answered
after that commit.

This adds a check for ACCEPT_LOCAL, to maintain full
fib validation in case it is 0. We also leave __fib_validate_source() earlier
when possible, based on the same check as fib_validate_source(), once the
accept_local stuff is verified.

Cc: Gregory Detal <gregory.detal@uclouvain.be>
Cc: Christoph Paasch <christoph.paasch@uclouvain.be>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 12:23:10 -07:00
Daniel Borkmann aa4a83ee8b net: sctp: fix suboptimal edge-case on non-active active/retrans path selection
In SCTP, selection of active (T.ACT) and retransmission (T.RET)
transports is being done whenever transport control operations
(UP, DOWN, PF, ...) are engaged through sctp_assoc_control_transport().

Commits 4c47af4d5e ("net: sctp: rework multihoming retransmission
path selection to rfc4960") and a7288c4dd5 ("net: sctp: improve
sctp_select_active_and_retran_path selection") have both improved
it towards a more fine-grained and optimal path selection.

Currently, the selection algorithm for T.ACT and T.RET is as follows:

1) Elect the two most recently used ACTIVE transports T1, T2 for
   T.ACT, T.RET, where T.ACT<-T1 and T1 is most recently used
2) In case primary path T.PRI not in {T1, T2} but ACTIVE, set
   T.ACT<-T.PRI and T.RET<-T1
3) If only T1 is ACTIVE from the set, set T.ACT<-T1 and T.RET<-T1
4) If none is ACTIVE, set T.ACT<-best(T.PRI, T.RET, T3) where
   T3 is the most recently used (if avail) in PF, set T.RET<-T.PRI

Prior to above commits, 4) was simply a camp on T.ACT<-T.PRI and
T.RET<-T.PRI, ignoring possible paths in PF. Camping on T.PRI is
still slightly suboptimal as it can lead to the following scenario:

Setup:
        <A>                                <B>
    T1: p1p1 (10.0.10.10) <==>  .'`)  <==> p1p1 (10.0.10.12)  <= T.PRI
    T2: p1p2 (10.0.10.20) <==> (_ . ) <==> p1p2 (10.0.10.22)

    net.sctp.rto_min = 1000
    net.sctp.path_max_retrans = 2
    net.sctp.pf_retrans = 0
    net.sctp.hb_interval = 1000

T.PRI is permanently down, T2 is put briefly into PF state (e.g. due to
link flapping). Here, the first time transmission is sent over PF path
T2 as it's the only non-INACTIVE path, but the retransmitted data-chunks
are sent over the INACTIVE path T1 (T.PRI), which is not good.

After the patch, it's choosing better transports in both cases by
modifying step 4):

4) If none is ACTIVE, set T.ACT_new<-best(T.ACT_old, T3) where T3 is
   the most recently used (if avail) in PF, set T.RET<-T.ACT_new

This will still select a best possible path in PF if available (which
can also include T.PRI/T.RET), and set both T.ACT/T.RET to it.

In case sctp_assoc_control_transport() *just* put T.ACT_old into INACTIVE
as it transitioned from ACTIVE->PF->INACTIVE and stays in INACTIVE just
for a very short while before going back ACTIVE, it will guarantee that
this path will be reselected for T.ACT/T.RET since T3 (PF) is not
available.

Previously, this was not possible, as we would only select between T.PRI
and T.RET, and a possible T3 would be NULL due to the fact that we have
just transitioned T3 in sctp_assoc_control_transport() from PF->INACTIVE
and would select a suboptimal path when T.PRI/T.RET have worse properties.

In the case that T.ACT_old permanently went to INACTIVE during this
transition and there's no PF path available, plus T.PRI and T.RET are
INACTIVE as well, we would now camp on T.ACT_old, but if everything is
being INACTIVE there's really not much we can do except hoping for a
successful HB to bring one of the transports back up again and, thus
cause a new selection through sctp_assoc_control_transport().

Now both tests work fine:

Case 1:

 1. T1 S(ACTIVE) T.ACT
    T2 S(ACTIVE) T.RET

 2. T1 S(ACTIVE) T.ACT, T.RET
    T2 S(PF)

 3. T1 S(ACTIVE) T.ACT, T.RET
    T2 S(INACTIVE)

 5. T1 S(PF) T.ACT, T.RET
    T2 S(INACTIVE)

[ 5.1 T1 S(INACTIVE) T.ACT, T.RET
      T2 S(INACTIVE) ]

 6. T1 S(ACTIVE) T.ACT, T.RET
    T2 S(INACTIVE)

 7. T1 S(ACTIVE) T.ACT
    T2 S(ACTIVE) T.RET

Case 2:

 1. T1 S(ACTIVE) T.ACT
    T2 S(ACTIVE) T.RET

 2. T1 S(PF)
    T2 S(ACTIVE) T.ACT, T.RET

 3. T1 S(INACTIVE)
    T2 S(ACTIVE) T.ACT, T.RET

 5. T1 S(INACTIVE)
    T2 S(PF) T.ACT, T.RET

[ 5.1 T1 S(INACTIVE)
      T2 S(INACTIVE) T.ACT, T.RET ]

 6. T1 S(INACTIVE)
    T2 S(ACTIVE) T.ACT, T.RET

 7. T1 S(ACTIVE) T.ACT
    T2 S(ACTIVE) T.RET

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 11:31:30 -07:00
Daniel Borkmann ea4f19c1f8 net: sctp: spare unnecessary comparison in sctp_trans_elect_best
When both transports are the same, we don't have to go down that
road only to realize that we will return the very same transport.
We are guaranteed that curr is always non-NULL. Therefore, just
short-circuit this special case.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 11:31:30 -07:00
Jiri Benc 2ba5af42a7 openvswitch: fix panic with multiple vlan headers
When there are multiple vlan headers present in a received frame, the first
one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
skb beyond the VLAN TPID may be still non-linear, including the inner TCI
and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.

This leads to two things:

1. Part of the resulting ethernet header is in the non-linear part of the
   skb. When eth_type_trans is called later as the result of
   OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
   is in fact accessing random data when it reads past the TPID.

2. network_header points into the ethernet header instead of behind it.
   mac_len is set to a wrong value (10), too.

Reported-by: Yulong Pei <ypei@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 11:24:04 -07:00
Benjamin Block 793c3b4000 net: ipv6: fib: don't sleep inside atomic lock
The function fib6_commit_metrics() allocates a piece of memory in mode
GFP_KERNEL while holding an atomic lock from higher up in the stack, in
the function __ip6_ins_rt(). This produces the following BUG:

> BUG: sleeping function called from invalid context at mm/slub.c:1250
> in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd
> 2 locks held by dhcpcd/2909:
>  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81978e67>] rtnl_lock+0x17/0x20
>  #1:  (&tb->tb6_lock){++--+.}, at: [<ffffffff81a6951a>] ip6_route_add+0x65a/0x800
> CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1
> Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
>  0000000000000008 ffff8800c8f13858 ffffffff81af135a 0000000000000000
>  ffff880212202430 ffff8800c8f13878 ffffffff810f8d3a ffff880212202c98
>  0000000000000010 ffff8800c8f138c8 ffffffff8121ad0e 0000000000000001
> Call Trace:
>  [<ffffffff81af135a>] dump_stack+0x4e/0x68
>  [<ffffffff810f8d3a>] __might_sleep+0x10a/0x120
>  [<ffffffff8121ad0e>] kmem_cache_alloc_trace+0x4e/0x190
>  [<ffffffff81a6bcd6>] ? fib6_commit_metrics+0x66/0x110
>  [<ffffffff81a6bcd6>] fib6_commit_metrics+0x66/0x110
>  [<ffffffff81a6cbf3>] fib6_add+0x883/0xa80
>  [<ffffffff81a6951a>] ? ip6_route_add+0x65a/0x800
>  [<ffffffff81a69535>] ip6_route_add+0x675/0x800
>  [<ffffffff81a68f2a>] ? ip6_route_add+0x6a/0x800
>  [<ffffffff81a6990c>] inet6_rtm_newroute+0x5c/0x80
>  [<ffffffff8197cf01>] rtnetlink_rcv_msg+0x211/0x260
>  [<ffffffff81978e67>] ? rtnl_lock+0x17/0x20
>  [<ffffffff81119708>] ? lock_release_holdtime+0x28/0x180
>  [<ffffffff81978e67>] ? rtnl_lock+0x17/0x20
>  [<ffffffff8197ccf0>] ? __rtnl_unlock+0x20/0x20
>  [<ffffffff819a989e>] netlink_rcv_skb+0x6e/0xd0
>  [<ffffffff81978ee5>] rtnetlink_rcv+0x25/0x40
>  [<ffffffff819a8e59>] netlink_unicast+0xd9/0x180
>  [<ffffffff819a9600>] netlink_sendmsg+0x700/0x770
>  [<ffffffff81103735>] ? local_clock+0x25/0x30
>  [<ffffffff8194e83c>] sock_sendmsg+0x6c/0x90
>  [<ffffffff811f98e3>] ? might_fault+0xa3/0xb0
>  [<ffffffff8195ca6d>] ? verify_iovec+0x7d/0xf0
>  [<ffffffff8194ec3e>] ___sys_sendmsg+0x37e/0x3b0
>  [<ffffffff8111ef15>] ? trace_hardirqs_on_caller+0x185/0x220
>  [<ffffffff81af979e>] ? mutex_unlock+0xe/0x10
>  [<ffffffff819a55ec>] ? netlink_insert+0xbc/0xe0
>  [<ffffffff819a65e5>] ? netlink_autobind.isra.30+0x125/0x150
>  [<ffffffff819a6520>] ? netlink_autobind.isra.30+0x60/0x150
>  [<ffffffff819a84f9>] ? netlink_bind+0x159/0x230
>  [<ffffffff811f989a>] ? might_fault+0x5a/0xb0
>  [<ffffffff8194f25e>] ? SYSC_bind+0x7e/0xd0
>  [<ffffffff8194f8cd>] __sys_sendmsg+0x4d/0x80
>  [<ffffffff8194f912>] SyS_sendmsg+0x12/0x20
>  [<ffffffff81afc692>] system_call_fastpath+0x16/0x1b

Fixing this by replacing the mode GFP_KERNEL with GFP_ATOMIC.

Signed-off-by: Benjamin Block <bebl@mageta.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 10:54:49 -07:00
zhuyj 061079ac0b sctp: not send SCTP_PEER_ADDR_CHANGE notifications with failed probe
Since the transport has always been in state SCTP_UNCONFIRMED, it
therefore wasn't active before and hasn't been used before, and it
always has been, so it is unnecessary to bug the user with a
notification.

Reported-by: Deepak Khandelwal <khandelwal.deepak.1987@gmail.com>
Suggested-by: Vlad Yasevich <vyasevich@gmail.com>
Suggested-by: Michael Tuexen <tuexen@fh-muenster.de>
Suggested-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-21 21:33:17 -07:00
Eric Dumazet dc808110bb packet: handle too big packets for PACKET_V3
af_packet can currently overwrite kernel memory by out of bound
accesses, because it assumed a [new] block can always hold one frame.

This is not generally the case, even if most existing tools do it right.

This patch clamps too long frames as API permits, and issue a one time
error on syslog.

[  394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82

In this example, packet header tp_snaplen was set to 3966,
and tp_len was set to 5042 (skb->len)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: f6fb8f100b ("af-packet: TPACKET_V3 flexible buffer implementation.")
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-21 16:44:28 -07:00
chas williams - CONTRACTOR 6df378d2d1 lec: Use rtnl lock/unlock when updating MTU
The LECS response contains the MTU that should be used.  Correctly
synchronize with other layers when updating.

Signed-off-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-21 16:31:23 -07:00
David S. Miller 02784f1b05 tipc: Fix build.
Missing semicolon in range check fix.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-19 11:16:38 -07:00
Vasily Averin 7201c1ddf7 cbq: now_rt removal
Now q->now_rt is identical to q->now and is not required anymore.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-19 10:58:44 -07:00
Vasily Averin 73d0f37ac4 cbq: incorrectly low bandwidth setting blocks limited traffic
Mainstream commit f0f6ee1f70 ("cbq: incorrect processing of high limits")
have side effect: if cbq bandwidth setting is less than real interface
throughput non-limited traffic can delay limited traffic for a very long time.

This happen because of q->now changes incorrectly in cbq_dequeue():
in described scenario L2T is much greater than real time delay,
and q->now gets an extra boost for each transmitted packet.

Accumulated boost prevents update q->now, and blocked class can wait
very long time until (q->now >= cl->undertime) will be true again.

To fix the problem the patch updates q->now on each cbq_update() call.
L2T-related pre-modification q->now was moved to cbq_update().

My testing confirmed that it fixes the problem and did not discover
any side-effects

Fixes: f0f6ee1f70 ("cbq: incorrect processing of high limits")

Signed-off-by: Vasily Averin <vvs@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-19 10:58:44 -07:00
Erik Hugne ac32c7f705 tipc: fix message importance range check
Commit 3b4f302d85 ("tipc: eliminate
redundant locking") introduced a bug by removing the sanity check
for message importance, allowing programs to assign any value to
the msg_user field. This will mess up the packet reception logic
and may cause random link resets.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-16 20:17:34 -07:00
Sven Eckelmann e050dbeb0d batman-adv: Fix parameter order of hlist_add_behind
1d023284c3 ("list: fix order of arguments for
hlist_add_after(_rcu)") was incorrectly rebased on top of
d9124268d8 ("batman-adv: Fix out-of-order
fragmentation support"). The parameter order change of the rebased patch was
not re-applied as expected. This causes a memory leak and can cause crashes
when out-of-order packets are received. hlist_add_behind will try to access the
uninitalized list pointers of frag_entry_new to find the previous/next entry
and may modify/read random memory locations.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-16 19:19:08 -07:00
Thomas Graf 9ce12eb16f netlink: Annotate RCU locking for seq_file walker
Silences the following sparse warnings:
net/netlink/af_netlink.c:2926:21: warning: context imbalance in 'netlink_seq_start' - wrong count at exit
net/netlink/af_netlink.c:2972:13: warning: context imbalance in 'netlink_seq_stop' - unexpected unlock

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14 15:13:40 -07:00