Commit graph

42181 commits

Author SHA1 Message Date
Joe Perches c37a2dfa67 netfilter: Convert FWINV<[foo]> macros and uses to NF_INVF
netfilter uses multiple FWINV #defines with identical form that hide a
specific structure variable and dereference it with a invflags member.

$ git grep "#define FWINV"
include/linux/netfilter_bridge/ebtables.h:#define FWINV(bool,invflg) ((bool) ^ !!(info->invflags & invflg))
net/bridge/netfilter/ebtables.c:#define FWINV2(bool, invflg) ((bool) ^ !!(e->invflags & invflg))
net/ipv4/netfilter/arp_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(arpinfo->invflags & (invflg)))
net/ipv4/netfilter/ip_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ipinfo->invflags & (invflg)))
net/ipv6/netfilter/ip6_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ip6info->invflags & (invflg)))
net/netfilter/xt_tcpudp.c:#define FWINVTCP(bool, invflg) ((bool) ^ !!(tcpinfo->invflags & (invflg)))

Consolidate these macros into a single NF_INVF macro.

Miscellanea:

o Neaten the alignment around these uses
o A few lines are > 80 columns for intelligibility

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-03 10:55:07 +02:00
Moritz Sichert f1504307b9 netfilter: Remove references to obsolete CONFIG_IP_ROUTE_FWMARK
This option was removed in commit 47dcf0cb10 ("[NET]: Rethink mark field
in struct flowi").

Signed-off-by: Moritz Sichert <moritz+linux@sichert.me>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-01 16:37:07 +02:00
Joe Perches 4ae89ad924 etherdevice.h & bridge: netfilter: Add and use ether_addr_equal_masked
There are code duplications of a masked ethernet address comparison here
so make it a separate function instead.

Miscellanea:

o Neaten alignment of FWINV macro uses to make it clearer for the reader

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-01 16:37:06 +02:00
Pablo Neira Ayuso 468b021b94 netfilter: x_tables: simplify ip{6}table_mangle_hook()
No need for a special case to handle NF_INET_POST_ROUTING, this is
basically the same handling as for prerouting, input, forward.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-01 16:37:02 +02:00
Arturo Borrero 0071e184a5 netfilter: nf_tables: add support for inverted logic in nft_lookup
Introduce a new configuration option for this expression, which allows users
to invert the logic of set lookups.

In _init() we will now return EINVAL if NFT_LOOKUP_F_INV is in anyway
related to a map lookup.

The code in the _eval() function has been untangled and updated to sopport the
XOR of options, as we should consider 4 cases:
 * lookup false, invert false -> NFT_BREAK
 * lookup false, invert true -> return w/o NFT_BREAK
 * lookup true, invert false -> return w/o NFT_BREAK
 * lookup true, invert true -> NFT_BREAK

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24 11:03:29 +02:00
Pablo Neira Ayuso 82bec71d46 netfilter: nf_tables: get rid of NFT_BASECHAIN_DISABLED
This flag was introduced to restore rulesets from the new netdev
family, but since 5ebe0b0eec ("netfilter: nf_tables: destroy
basechain and rules on netdevice removal") the ruleset is released
once the netdev is gone.

This also removes nft_register_basechain() and
nft_unregister_basechain() since they have no clients anymore after
this rework.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24 11:03:28 +02:00
Florian Westphal 3183ab8997 netfilter: conntrack: allow increasing bucket size via sysctl too
No need to restrict this to module parameter.

We export a copy of the real hash size -- when user alters the value we
allocate the new table, copy entries etc before we update the real size
to the requested one.

This is also needed because the real size is used by concurrent readers
and cannot be changed without synchronizing the conntrack generation
seqcnt.

We only allow changing this value from the initial net namespace.

Tested using http-client-benchmark vs. httpterm with concurrent

while true;do
 echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets
done

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24 11:03:28 +02:00
Pablo Neira Ayuso 8eee54be73 netfilter: nft_hash: support deletion of inactive elements
New elements are inactive in the preparation phase, and its
NFT_SET_ELEM_BUSY_MASK flag is set on.

This busy flag doesn't allow us to delete it from the same transaction,
following a sequence like:

	begin transaction
	add element X
	delete element X
	end transaction

This sequence is valid and may be triggered by robots. To resolve this
problem, allow deactivating elements that are active in the current
generation (ie. those that has been just added in this batch).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24 11:03:27 +02:00
Pablo Neira Ayuso 4e5001651f netfilter: nft_rbtree: check for next generation when deactivating elements
set->ops->deactivate() is invoked from nft_del_setelem() that happens
from the transaction path, so we have to check if the object is active
in the next generation, not the current.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24 11:03:26 +02:00
Pablo Neira Ayuso 37a9cc5255 netfilter: nf_tables: add generation mask to sets
Similar to ("netfilter: nf_tables: add generation mask to tables").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24 11:03:26 +02:00
Pablo Neira Ayuso 664b0f8cd8 netfilter: nf_tables: add generation mask to chains
Similar to ("netfilter: nf_tables: add generation mask to tables").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24 11:03:25 +02:00
Pablo Neira Ayuso f2a6d76676 netfilter: nf_tables: add generation mask to tables
This patch addresses two problems:

1) The netlink dump is inconsistent when interfering with an ongoing
   transaction update for several reasons:

1.a) We don't honor the internal NFT_TABLE_INACTIVE flag, and we should
     be skipping these inactive objects in the dump.

1.b) We perform speculative deletion during the preparation phase, that
     may result in skipping active objects.

1.c) The listing order changes, which generates noise when tracking
     incremental ruleset update via tools like git or our own
     testsuite.

2) We don't allow to add and to update the object in the same batch,
   eg. add table x; add table x { flags dormant\; }.

In order to resolve these problems:

1) If the user requests a deletion, the object becomes inactive in the
   next generation. Then, ignore objects that scheduled to be deleted
   from the lookup path, as they will be effectively removed in the
   next generation.

2) From the get/dump path, if the object is not currently active, we
   skip it.

3) Support 'add X -> update X' sequence from a transaction.

After this update, we obtain a consistent list as long as we stay
in the same generation. The userspace side can detect interferences
through the generation counter so it can restart the dumping.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24 11:03:24 +02:00
Pablo Neira Ayuso 889f7ee7c6 netfilter: nf_tables: add generic macros to check for generation mask
Thus, we can reuse these to check the genmask of any object type, not
only rules. This is required now that tables, chain and sets will get a
generation mask field too in follow up patches.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24 11:03:24 +02:00
Vishwanath Pai 7643507fe8 netfilter: xt_NFLOG: nflog-range does not truncate packets
li->u.ulog.copy_len is currently ignored by the kernel, we should truncate
the packet to either li->u.ulog.copy_len (if set) or copy_range before
sending it to userspace. 0 is a valid input for copy_len, so add a new
flag to indicate whether this was option was specified by the user or not.

Add two flags to indicate whether nflog-size/copy_len was set or not.
XT_NFLOG_F_COPY_LEN is for XT_NFLOG and NFLOG_F_COPY_LEN for nfnetlink_log

On the userspace side, this was initially represented by the option
nflog-range, this will be replaced by --nflog-size now. --nflog-range would
still exist but does not do anything.

Reported-by: Joe Dollard <jdollard@akamai.com>
Reviewed-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24 11:03:23 +02:00
Liping Zhang e1dbbc5907 netfilter: nf_reject_ipv4: don't send tcp RST if the packet is non-TCP
In iptables, if the user add a rule to send tcp RST and specify the
non-TCP protocol, such as UDP, kernel will reject this request. But
in nftables, this validity check only occurs in nft tool, i.e. only
in userspace.

This means that user can add such a rule like follows via nfnetlink:
  "nft add rule filter forward ip protocol udp reject with tcp reset"

This will generate some confusing tcp RST packets. So we should send
tcp RST only when it is TCP packet.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24 11:03:22 +02:00
Eric W. Biederman 9847371a84 netfilter: Allow xt_owner in any user namespace
Making this work is a little tricky as it really isn't kosher to
change the xt_owner_match_info in a check function.

Without changing xt_owner_match_info we need to know the user
namespace the uids and gids are specified in.  In the common case
net->user_ns == current_user_ns().  Verify net->user_ns ==
current_user_ns() in owner_check so we can later assume it in
owner_mt.

In owner_check also verify that all of the uids and gids specified are
in net->user_ns and that the expected min/max relationship exists
between the uids and gids in xt_owner_match_info.

In owner_mt get the network namespace from the outgoing socket, as this
must be the same network namespace as the netfilter rules, and use that
network namespace to find the user namespace the uids and gids in
xt_match_owner_info are encoded in.  Then convert from their encoded
from into the kernel internal format for uids and gids and perform the
owner match.

Similar to ping_group_range, this code does not try to detect
noncontiguous UID/GID ranges.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-23 13:58:55 +02:00
Florian Westphal 6c8dee9842 netfilter: move zone info into struct nf_conn
Curently we store zone information as a conntrack extension.
This has one drawback: for every lookup we need to fetch the zone data
from the extension area.

This change place the zone data directly into the main conntrack object
structure and then removes the zone conntrack extension.

The zone data is just 4 bytes, it fits into a padding hole before
the tuplehash info, so we do not even increase the nf_conn structure size.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-23 13:33:12 +02:00
Shivani Bhardwaj 7e53e7f8ca netfilter: nf_log: Remove NULL check
If 'logger' was NULL, there would be a direct jump to the label 'out',
since it has already been checked for NULL, remove this unnecessary
check.

Signed-off-by: Shivani Bhardwaj <shivanib134@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-23 13:32:43 +02:00
Florian Westphal 5a75cdebab netfilter: conntrack: align nf_conn on cacheline boundary
increases struct size by 32 bytes (288 -> 320), but it is the right thing,
else any attempt to (re-)arrange nf_conn members by cacheline won't work.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-23 13:31:54 +02:00
Liping Zhang 36f959c491 netfilter: xt_TRACE: add explicitly nf_logger_find_get call
Consider such situation, if nf_log_ipv4 kernel module is not installed,
and the user add a following iptables rule:
  # iptables -t raw -I PREROUTING -j TRACE

There will be no trace log generated until the user install nf_log_ipv4
module manully. So we should add request related nf_log module
appropriately here.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-23 13:26:49 +02:00
Liping Zhang f3bb53338e netfilter: nf_log: handle NFPROTO_INET properly in nf_logger_[find_get|put]
When we request NFPROTO_INET, it means both NFPROTO_IPV4 and NFPROTO_IPV6.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-23 13:24:42 +02:00
Xiubo Li a6d0bae148 netfilter: x_tables: fix possible ZERO_SIZE_PTR pointer dereferencing error.
Since we cannot make sure that the 'hook_mask' will always be none
zero here. If it equals to zero, the num_hooks will be zero too,
and then kmalloc() will return ZERO_SIZE_PTR, which is (void *)16.

Then the following error check will fails:
  ops = kmalloc(sizeof(*ops) * num_hooks, GFP_KERNEL);
  if (ops == NULL)
          return ERR_PTR(-ENOMEM);

So this patch will fix this with just doing the zero check before
kmalloc() is called.

Maybe the case above will never happen here, but in theory.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-23 12:13:06 +02:00
Florian Westphal 436a850dd9 netfilter: helper: avoid extra expectation iterations on unregister
The expectation table is not duplicated per net namespace anymore, so we can move
the expectation table and conntrack table iteration out of the per-net loop.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-07 17:26:51 +02:00
Tobin C Harding 402f9030cb bridge: netfilter: checkpatch data type fixes
checkpatch produces data type 'checks'.

This patch amends them by changing, for example:
uint8_t -> u8

Signed-off-by: Tobin C Harding <me@tobin.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-07 17:15:29 +02:00
Michal Kubeček 30759219f5 net: disable fragment reassembly if high_thresh is zero
Before commit 6d7b857d54 ("net: use lib/percpu_counter API for
fragmentation mem accounting"), setting the reassembly high threshold
to 0 prevented fragment reassembly as first fragment would be always
evicted before second could be added to the queue. While inefficient,
some users apparently relied on this method.

Since the commit mentioned above, a percpu counter is used for
reassembly memory accounting and high batch size avoids taking slow path
in most common scenarios. As a result, a whole full sized packet can be
reassembled without the percpu counter's main counter changing its value
so that even with high_thresh set to 0, fragmented packets can be still
reassembled and processed.

Add explicit check preventing reassembly if high threshold is zero.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-05 22:56:42 -04:00
Andrew Lunn 83c0afaec7 net: dsa: Add new binding implementation
The existing DSA binding has a number of limitations and problems. The
main problem is that it cannot represent a switch as a linux device,
hanging off some bus. It is limited to one CPU port. The DSA platform
device is artificial, and does not really represent hardware.

Implement a new binding which can be embedded into any type of node on
a bus to represent one switch device, and its links to other switches.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04 14:29:55 -07:00
Andrew Lunn e755e49eb3 net: dsa: Make mdio bus optional
The switch may want to instantiate its own MDIO bus. Only do it
centrally if the switch has not already created one, and the read op
is implemented.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04 14:29:54 -07:00
Andrew Lunn 39a7f2a4eb net: dsa: Refactor selection of tag ops into a function
Replace the two switch statements with an array lookup, and store the
result in the dsa tree structure. The drivers no longer need to know
the selected tag protocol, so remove it from the dsa switch structure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04 14:29:54 -07:00
Andrew Lunn 9b8e895c4e net: dsa: Split up creating/destroying of DSA and CPU ports
Refactor the code to setup a single DSA/CPU port into a function of
its own, and export it, so it can be used by the new binding.

Similarly, refactor the destroy code into a function.  When destroying
the ports, don't put the of node. They should be released at the end
along with the normal ports.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04 14:29:53 -07:00
Andrew Lunn 66472fc04e net: dsa: Copy the routing table into the switch structure
The new binding will not have a chip data structure, it will place the
routing directly into the switch structure. To enable backwards
compatibility, copy the routing from the chip data into the switch
structure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04 14:29:53 -07:00
Andrew Lunn 4a7704ffa8 net: dsa: Remove dynamic allocate of routing table
With a maximum of four switches, the size of the routing table is the
same as the pointer to it. Removing it makes the code simpler.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04 14:29:53 -07:00
Andrew Lunn 189b0d93ec net: dsa: Move port device node into port structure
Move the port device node structure into the port structure, from the
chip data. This information is needed in the next step of implementing
the new binding.

The chip data structure is used while parsing the whole old binding,
before the individual switch structures exist. With the new bindings,
this is reversed, the switches exist first, and the interconnections
between the switches is derived from the individual switch
bindings. Thus this chip data structure becomes unneeded.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
eviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04 14:29:53 -07:00
Andrew Lunn c8b098086b net: dsa: Add a ports structure and use it in the switch structure
There are going to be more per-port members added to the switch
structure. So add a port structure and move the netdev into it.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04 14:29:53 -07:00
Andrew Lunn 149cafd790 net: dsa: tag_{e}dsa.c: Remove dependency on platform data
The platform data nr_chips is used when validating a received packet,
to ensure it comes from a know switch chip. The number of possible
switches is limited to DSA_MAX_SWITCHES, so use this as the first
validation step. The new binding allows holes in the dst->ds[] array,
so also ensure ensure there is a valid dsa_switch for this packet.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04 14:29:53 -07:00
Andrew Lunn 6e8e862ded net: dsa: slave: Remove MDIO address from switch MDIO bus name
The DSA layer should no longer assume the switch is connected to an
MDIO bus. As a result, we cannot use the address on the MDIO bus when
forming the name of the switches internal MDIO bus for its builtin and
possibly external PHYs. The switch index is sufficient to make the
name unique, so drop the MDIO address.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04 14:29:52 -07:00
Andrew Lunn 0e57604408 net: dsa: slave: chip data is optional, don't dereference NULL
The new binding does not make use of dsa_chip_data, a.k.a cd.  When
retrieving the size of the EEPROM attached to a switch, don't assume
there is a cd attached to the switch structure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04 14:29:52 -07:00
David S. Miller 76f21b9900 net: Add docbook description for 'mtu' arg to skb_gso_validate_mtu()
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 22:56:28 -07:00
David S. Miller 3b55a537d0 sctp: Fix warning in sctp_packet_transmit_chunk()
size_t objects should be printed with %Z printf format.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 22:53:26 -07:00
Joe Perches 9b6d53985f rxrpc: Use pr_<level> and pr_fmt, reduce object size a few KB
Use the more common kernel logging style and reduce object size.

The logging message prefix changes from a mixture of
"RxRPC:" and "RXRPC:" to "af_rxrpc: ".

$ size net/rxrpc/built-in.o*
   text	   data	    bss	    dec	    hex	filename
  64172	   1972	   8304	  74448	  122d0	net/rxrpc/built-in.o.new
  67512	   1972	   8304	  77788	  12fdc	net/rxrpc/built-in.o.old

Miscellanea:

o Consolidate the ASSERT macros to use a single pr_err call with
  decimal and hexadecimal output and a stringified #OP argument

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:41:31 -04:00
Marcelo Ricardo Leitner 942b3235bf sctp: improve debug message to also log curr pkt and new chunk size
This is useful for debugging packet sizes.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:37:22 -04:00
Marcelo Ricardo Leitner 90017accff sctp: Add GSO support
SCTP has this pecualiarity that its packets cannot be just segmented to
(P)MTU. Its chunks must be contained in IP segments, padding respected.
So we can't just generate a big skb, set gso_size to the fragmentation
point and deliver it to IP layer.

This patch takes a different approach. SCTP will now build a skb as it
would be if it was received using GRO. That is, there will be a cover
skb with protocol headers and children ones containing the actual
segments, already segmented to a way that respects SCTP RFCs.

With that, we can tell skb_segment() to just split based on frag_list,
trusting its sizes are already in accordance.

This way SCTP can benefit from GSO and instead of passing several
packets through the stack, it can pass a single large packet.

v2:
- Added support for receiving GSO frames, as requested by Dave Miller.
- Clear skb->cb if packet is GSO (otherwise it's not used by SCTP)
- Added heuristics similar to what we have in TCP for not generating
  single GSO packets that fills cwnd.
v3:
- consider sctphdr size in skb_gso_transport_seglen()
- rebased due to 5c7cdf339a ("gso: Remove arbitrary checks for
  unsupported GSO")

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:37:21 -04:00
Marcelo Ricardo Leitner 3acb50c18d sctp: delay as much as possible skb_linearize
This patch is a preparation for the GSO one. In order to successfully
handle GSO packets on rx path we must not call skb_linearize, otherwise
it defeats any gain GSO may have had.

This patch thus delays as much as possible the call to skb_linearize,
leaving it to sctp_inq_pop() moment. For that the sanity checks
performed now know how to deal with fragments.

One positive side-effect of this is that if the socket is backlogged it
will have the chance of doing it on backlog processing instead of
during softirq.

With this move, it's evident that a check for non-linearity in
sctp_inq_pop was ineffective and is now removed. Note that a similar
check is performed a bit below this one.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:37:21 -04:00
Marcelo Ricardo Leitner ae7ef81ef0 skbuff: introduce skb_gso_validate_mtu
skb_gso_network_seglen is not enough for checking fragment sizes if
skb is using GSO_BY_FRAGS as we have to check frag per frag.

This patch introduces skb_gso_validate_mtu, based on the former, which
will wrap the use case inside it as all calls to skb_gso_network_seglen
were to validate if it fits on a given TMU, and improve the check.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:37:21 -04:00
Marcelo Ricardo Leitner 3953c46c3a sk_buff: allow segmenting based on frag sizes
This patch allows segmenting a skb based on its frags sizes instead of
based on a fixed value.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:37:21 -04:00
Marcelo Ricardo Leitner 57c0565039 skbuff: export skb_gro_receive
sctp GSO requires it and sctp can be compiled as a module, so we need to
export this function.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:37:21 -04:00
Zhang Shengju 684ff4ef5e ovs: set name assign type of internal port
Set name_assign_type of internal port to NET_NAME_USER.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-02 18:05:47 -04:00
Linus Torvalds 6b15d6650c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix negative error code usage in ATM layer, from Stefan Hajnoczi.

 2) If CONFIG_SYSCTL is disabled, the default TTL is not initialized
    properly.  From Ezequiel Garcia.

 3) Missing spinlock init in mvneta driver, from Gregory CLEMENT.

 4) Missing unlocks in hwmb error paths, also from Gregory CLEMENT.

 5) Fix deadlock on team->lock when propagating features, from Ivan
    Vecera.

 6) Work around buffer offset hw bug in alx chips, from Feng Tang.

 7) Fix double listing of SCTP entries in sctp_diag dumps, from Xin
    Long.

 8) Various statistics bug fixes in mlx4 from Eric Dumazet.

 9) Fix some randconfig build errors wrt fou ipv6 from Arnd Bergmann.

10) All of l2tp was namespace aware, but the ipv6 support code was not
    doing so.  From Shmulik Ladkani.

11) Handle on-stack hrtimers properly in pktgen, from Guenter Roeck.

12) Propagate MAC changes properly through VLAN devices, from Mike
    Manning.

13) Fix memory leak in bnx2x_init_one(), from Vitaly Kuznetsov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
  sfc: Track RPS flow IDs per channel instead of per function
  usbnet: smsc95xx: fix link detection for disabled autonegotiation
  virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv
  bnx2x: avoid leaking memory on bnx2x_init_one() failures
  fou: fix IPv6 Kconfig options
  openvswitch: update checksum in {push,pop}_mpls
  sctp: sctp_diag should dump sctp socket type
  net: fec: update dirty_tx even if no skb
  vlan: Propagate MAC address to VLANs
  atm: iphase: off by one in rx_pkt()
  atm: firestream: add more reserved strings
  vxlan: Accept user specified MTU value when create new vxlan link
  net: pktgen: Call destroy_hrtimer_on_stack()
  timer: Export destroy_hrtimer_on_stack()
  net: l2tp: Make l2tp_ip6 namespace aware
  Documentation: ip-sysctl.txt: clarify secure_redirects
  sfc: use flow dissector helpers for aRFS
  ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr
  net: nps_enet: Disable interrupts before napi reschedule
  net/lapb: tuse %*ph to dump buffers
  ...
2016-05-31 22:28:28 -07:00
Arnd Bergmann 95e4daa820 fou: fix IPv6 Kconfig options
The Kconfig options I added to work around broken compilation ended
up screwing up things more, as I used the wrong symbol to control
compilation of the file, resulting in IPv6 fou support to never be built
into the kernel.

Changing CONFIG_NET_FOU_IPV6_TUNNELS to CONFIG_IPV6_FOU fixes that
problem, I had renamed the symbol in one location but not the other,
and as the file is never being used by other kernel code, this did not
lead to a build failure that I would have caught.

After that fix, another issue with the same patch becomes obvious, as we
'select INET6_TUNNEL', which is related to IPV6_TUNNEL, but not the same,
and this can still cause the original build failure when IPV6_TUNNEL is
not built-in but IPV6_FOU is. The fix is equally trivial, we just need
to select the right symbol.

I have successfully build 350 randconfig kernels with this patch
and verified that the driver is now being built.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Valentin Rothberg <valentinrothberg@gmail.com>
Fixes: fabb13db44 ("fou: add Kconfig options for IPv6 support")
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31 14:07:49 -07:00
Simon Horman bc7cc5999f openvswitch: update checksum in {push,pop}_mpls
In the case of CHECKSUM_COMPLETE the skb checksum should be updated in
{push,pop}_mpls() as they the type in the ethernet header.

As suggested by Pravin Shelar.

Cc: Pravin Shelar <pshelar@nicira.com>
Fixes: 25cd9ba0ab ("openvswitch: Add basic MPLS support to kernel")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31 13:51:42 -07:00
Xin Long 40eb90e9cc sctp: sctp_diag should dump sctp socket type
Now we cannot distinguish that one sk is a udp or sctp style when
we use ss to dump sctp_info. it's necessary to dump it as well.

For sctp_diag, ss support is not officially available, thus there
are no official users of this yet, so we can add this field in the
middle of sctp_info without breaking user API.

v1->v2:
  - move 'sctpi_s_type' field to the end of struct sctp_info, so
    that it won't cause incompatibility with applications already
    built.
  - add __reserved3 in sctp_info to make sure sctp_info is 8-byte
    alignment.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31 11:59:06 -07:00