Commit graph

4641 commits

Author SHA1 Message Date
Eric Dumazet 97e219b7c1 gro_cells: move to net/core/gro_cells.c
We have many gro cells users, so lets move the code to avoid
duplication.

This creates a CONFIG_GRO_CELLS option.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 14:38:18 -05:00
David S. Miller 3efa70d78f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The conflict was an interaction between a bug fix in the
netvsc driver in 'net' and an optimization of the RX path
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 16:29:30 -05:00
Julian Anastasov 51ce8bd4d1 net: pending_confirm is not used anymore
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
As last step, we can remove the pending_confirm flag.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8 ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf3 ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 13:07:47 -05:00
Julian Anastasov 9b8805a325 sock: add sk_dst_pending_confirm flag
Add new sock flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
As not all call paths lock the socket use full word for
the flag.

Add sk_dst_confirm as replacement for dst_confirm when
called for received packets.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 13:07:46 -05:00
Eric Dumazet 69629464e0 udp: properly cope with csum errors
Dmitry reported that UDP sockets being destroyed would trigger the
WARN_ON(atomic_read(&sk->sk_rmem_alloc)); in inet_sock_destruct()

It turns out we do not properly destroy skb(s) that have wrong UDP
checksum.

Thanks again to syzkaller team.

Fixes : 7c13f97ffd ("udp: do fwd memory scheduling on dequeue")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 11:19:00 -05:00
Ido Schimmel a8eca32615 net: remove ndo_neigh_{construct, destroy} from stacked devices
In commit 18bfb924f0 ("net: introduce default neigh_construct/destroy
ndo calls for L2 upper devices") we added these ndos to stacked devices
such as team and bond, so that calls will be propagated to mlxsw.

However, previous commit removed the reliance on these ndos and no new
users of these ndos have appeared since above mentioned commit. We can
therefore safely remove this dead code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:57 -05:00
Eric Dumazet 02c1602ee7 net: remove __napi_complete()
All __napi_complete() callers have been converted to
use the more standard napi_complete_done(),
we can now remove this NAPI method for good.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 16:11:57 -05:00
Eric Dumazet 6e7bc478c9 net: skb_needs_check() accepts CHECKSUM_NONE for tx
My recent change missed fact that UFO would perform a complete
UDP checksum before segmenting in frags.

In this case skb->ip_summed is set to CHECKSUM_NONE.

We need to add this valid case to skb_needs_check()

Fixes: b2504a5dbe ("net: reduce skb_warn_bad_offload() noise")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 17:33:01 -05:00
Eric Dumazet 79e7fff47b net: remove support for per driver ndo_busy_poll()
We added generic support for busy polling in NAPI layer in linux-4.5

No network driver uses ndo_busy_poll() anymore, we can get rid
of the pointer in struct net_device_ops, and its use in sk_busy_loop()

Saves NETIF_F_BUSY_POLL features bit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 17:28:29 -05:00
David S. Miller 52e01b84a2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, they are:

1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from
   sk_buff so we only access one single cacheline in the conntrack
   hotpath. Patchset from Florian Westphal.

2) Don't leak pointer to internal structures when exporting x_tables
   ruleset back to userspace, from Willem DeBruijn. This includes new
   helper functions to copy data to userspace such as xt_data_to_user()
   as well as conversions of our ip_tables, ip6_tables and arp_tables
   clients to use it. Not surprinsingly, ebtables requires an ad-hoc
   update. There is also a new field in x_tables extensions to indicate
   the amount of bytes that we copy to userspace.

3) Add nf_log_all_netns sysctl: This new knob allows you to enable
   logging via nf_log infrastructure for all existing netnamespaces.
   Given the effort to provide pernet syslog has been discontinued,
   let's provide a way to restore logging using netfilter kernel logging
   facilities in trusted environments. Patch from Michal Kubecek.

4) Validate SCTP checksum from conntrack helper, from Davide Caratti.

5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly
   a copy&paste from the original helper, from Florian Westphal.

6) Reset netfilter state when duplicating packets, also from Florian.

7) Remove unnecessary check for broadcast in IPv6 in pkttype match and
   nft_meta, from Liping Zhang.

8) Add missing code to deal with loopback packets from nft_meta when
   used by the netdev family, also from Liping.

9) Several cleanups on nf_tables, one to remove unnecessary check from
   the netlink control plane path to add table, set and stateful objects
   and code consolidation when unregister chain hooks, from Gao Feng.

10) Fix harmless reference counter underflow in IPVS that, however,
    results in problems with the introduction of the new refcount_t
    type, from David Windsor.

11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp,
    from Davide Caratti.

12) Missing documentation on nf_tables uapi header, from Liping Zhang.

13) Use rb_entry() helper in xt_connlimit, from Geliang Tang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:58:20 -05:00
Eric Dumazet 5fa8bbda38 net: use a work queue to defer net_disable_timestamp() work
Dmitry reported a warning [1] showing that we were calling
net_disable_timestamp() -> static_key_slow_dec() from a non
process context.

Grabbing a mutex while holding a spinlock or rcu_read_lock()
is not allowed.

As Cong suggested, we now use a work queue.

It is possible netstamp_clear() exits while netstamp_needed_deferred
is not zero, but it is probably not worth trying to do better than that.

netstamp_needed_deferred atomic tracks the exact number of deferred
decrements.

[1]
[ INFO: suspicious RCU usage. ]
4.10.0-rc5+ #192 Not tainted
-------------------------------
./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
critical section!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 0
2 locks held by syz-executor14/23111:
 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock
include/net/sock.h:1454 [inline]
 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>]
rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
 #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook
include/linux/netfilter.h:201 [inline]
 #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>]
__ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160

stack backtrace:
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
 rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
 ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
 mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
 atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
 __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
 static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
 net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
 sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
 __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
 sk_destruct+0x47/0x80 net/core/sock.c:1460
 __sk_free+0x57/0x230 net/core/sock.c:1468
 sock_wfree+0xae/0x120 net/core/sock.c:1645
 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
 skb_release_all+0x15/0x60 net/core/skbuff.c:668
 __kfree_skb+0x15/0x20 net/core/skbuff.c:684
 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
 inet_frag_put include/net/inet_frag.h:133 [inline]
 nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
 nf_hook include/linux/netfilter.h:212 [inline]
 __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
 rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
 rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
 sock_sendmsg_nosec net/socket.c:635 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:645
 sock_write_iter+0x326/0x600 net/socket.c:848
 do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
 do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
 vfs_writev+0x87/0xc0 fs/read_write.c:911
 do_writev+0x110/0x2c0 fs/read_write.c:944
 SYSC_writev fs/read_write.c:1017 [inline]
 SyS_writev+0x27/0x30 fs/read_write.c:1014
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:752
in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
INFO: lockdep is turned off.
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 ___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
 mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
 atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
 __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
 static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
 net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
 sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
 __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
 sk_destruct+0x47/0x80 net/core/sock.c:1460
 __sk_free+0x57/0x230 net/core/sock.c:1468
 sock_wfree+0xae/0x120 net/core/sock.c:1645
 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
 skb_release_all+0x15/0x60 net/core/skbuff.c:668
 __kfree_skb+0x15/0x20 net/core/skbuff.c:684
 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
 inet_frag_put include/net/inet_frag.h:133 [inline]
 nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
 nf_hook include/linux/netfilter.h:212 [inline]
 __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
 rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
 rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
 sock_sendmsg_nosec net/socket.c:635 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:645
 sock_write_iter+0x326/0x600 net/socket.c:848
 do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
 do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
 vfs_writev+0x87/0xc0 fs/read_write.c:911
 do_writev+0x110/0x2c0 fs/read_write.c:944
 SYSC_writev fs/read_write.c:1017 [inline]
 SyS_writev+0x27/0x30 fs/read_write.c:1014
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559

Fixes: b90e5794c5 ("net: dont call jump_label_dec from irq context")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:11:07 -05:00
Stanislaw Gruszka 3808d34838 ethtool: do not vzalloc(0) on registers dump
If ->get_regs_len() callback return 0, we allocate 0 bytes of memory,
what print ugly warning in dmesg, which can be found further below.

This happen on mac80211 devices where ieee80211_get_regs_len() just
return 0 and driver only fills ethtool_regs structure and actually
do not provide any dump. However I assume this can happen on other
drivers i.e. when for some devices driver provide regs dump and for
others do not. Hence preventing to to print warning in ethtool code
seems to be reasonable.

ethtool: vmalloc: allocation failure: 0 bytes, mode:0x24080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO)
<snip>
Call Trace:
[<ffffffff813bde47>] dump_stack+0x63/0x8c
[<ffffffff811b0a1f>] warn_alloc+0x13f/0x170
[<ffffffff811f0476>] __vmalloc_node_range+0x1e6/0x2c0
[<ffffffff811f0874>] vzalloc+0x54/0x60
[<ffffffff8169986c>] dev_ethtool+0xb4c/0x1b30
[<ffffffff816adbb1>] dev_ioctl+0x181/0x520
[<ffffffff816714d2>] sock_do_ioctl+0x42/0x50
<snip>
Mem-Info:
active_anon:435809 inactive_anon:173951 isolated_anon:0
 active_file:835822 inactive_file:196932 isolated_file:0
 unevictable:0 dirty:8 writeback:0 unstable:0
 slab_reclaimable:157732 slab_unreclaimable:10022
 mapped:83042 shmem:306356 pagetables:9507 bounce:0
 free:130041 free_pcp:1080 free_cma:0
Node 0 active_anon:1743236kB inactive_anon:695804kB active_file:3343288kB inactive_file:787728kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:332168kB dirty:32kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1225424kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Node 0 DMA free:15900kB min:136kB low:168kB high:200kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 3187 7643 7643
Node 0 DMA32 free:419732kB min:28124kB low:35152kB high:42180kB active_anon:541180kB inactive_anon:248988kB active_file:1466388kB inactive_file:389632kB unevictable:0kB writepending:0kB present:3370280kB managed:3290932kB mlocked:0kB slab_reclaimable:217184kB slab_unreclaimable:4180kB kernel_stack:160kB pagetables:984kB bounce:0kB free_pcp:2236kB local_pcp:660kB free_cma:0kB
lowmem_reserve[]: 0 0 4456 4456

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 11:13:00 -05:00
Eric Dumazet 8fe809a992 net: add LINUX_MIB_PFMEMALLOCDROP counter
Debugging issues caused by pfmemalloc is often tedious.

Add a new SNMP counter to more easily diagnose these problems.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Acked-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02 23:34:19 -05:00
Eric Dumazet b9ea2a7be7 net: remove useless pfmemalloc setting
When __alloc_skb() allocates an skb from fast clone cache,
setting pfmemalloc on the clone is not needed.

Clone will be properly initialized later at skb_clone() time,
including pfmemalloc field, as it is included in the
headers_start/headers_end section which is fully copied.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02 23:03:05 -05:00
Florian Westphal cb9c68363e skbuff: add and use skb_nfct helper
Followup patch renames skb->nfct and changes its type so add a helper to
avoid intrusive rename change later.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-02 14:31:53 +01:00
Eric Dumazet b2504a5dbe net: reduce skb_warn_bad_offload() noise
Dmitry reported warnings occurring in __skb_gso_segment() [1]

All SKB_GSO_DODGY producers can allow user space to feed
packets that trigger the current check.

We could prevent them from doing so, rejecting packets, but
this might add regressions to existing programs.

It turns out our SKB_GSO_DODGY handlers properly set up checksum
information that is needed anyway when packets needs to be segmented.

By checking again skb_needs_check() after skb_mac_gso_segment(),
we should remove these pesky warnings, at a very minor cost.

With help from Willem de Bruijn

[1]
WARNING: CPU: 1 PID: 6768 at net/core/dev.c:2439 skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434
lo: caps=(0x000000a2803b7c69, 0x0000000000000000) len=138 data_len=0 gso_size=15883 gso_type=4 ip_summed=0
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 6768 Comm: syz-executor1 Not tainted 4.9.0 #5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 ffff8801c063ecd8 ffffffff82346bdf ffffffff00000001 1ffff100380c7d2e
 ffffed00380c7d26 0000000041b58ab3 ffffffff84b37e38 ffffffff823468f1
 ffffffff84820740 ffffffff84f289c0 dffffc0000000000 ffff8801c063ee20
Call Trace:
 [<ffffffff82346bdf>] __dump_stack lib/dump_stack.c:15 [inline]
 [<ffffffff82346bdf>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 [<ffffffff81827e34>] panic+0x1fb/0x412 kernel/panic.c:179
 [<ffffffff8141f704>] __warn+0x1c4/0x1e0 kernel/panic.c:542
 [<ffffffff8141f7e5>] warn_slowpath_fmt+0xc5/0x100 kernel/panic.c:565
 [<ffffffff8356cbaf>] skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434
 [<ffffffff83585cd2>] __skb_gso_segment+0x482/0x780 net/core/dev.c:2706
 [<ffffffff83586f19>] skb_gso_segment include/linux/netdevice.h:3985 [inline]
 [<ffffffff83586f19>] validate_xmit_skb+0x5c9/0xc20 net/core/dev.c:2969
 [<ffffffff835892bb>] __dev_queue_xmit+0xe6b/0x1e70 net/core/dev.c:3383
 [<ffffffff8358a2d7>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
 [<ffffffff83ad161d>] packet_snd net/packet/af_packet.c:2930 [inline]
 [<ffffffff83ad161d>] packet_sendmsg+0x32ed/0x4d30 net/packet/af_packet.c:2955
 [<ffffffff834f0aaa>] sock_sendmsg_nosec net/socket.c:621 [inline]
 [<ffffffff834f0aaa>] sock_sendmsg+0xca/0x110 net/socket.c:631
 [<ffffffff834f329a>] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1954
 [<ffffffff834f5e58>] __sys_sendmsg+0x138/0x300 net/socket.c:1988
 [<ffffffff834f604d>] SYSC_sendmsg net/socket.c:1999 [inline]
 [<ffffffff834f604d>] SyS_sendmsg+0x2d/0x50 net/socket.c:1995
 [<ffffffff84371941>] entry_SYSCALL_64_fastpath+0x1f/0xc2

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov  <dvyukov@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01 12:02:48 -05:00
Theuns Verwoerd 160ca01424 rtnetlink: Handle IFLA_MASTER parameter when processing rtnl_newlink
Allow a master interface to be specified as one of the parameters when
creating a new interface via rtnl_newlink.  Previously this would
require invoking interface creation, waiting for it to complete, and
then separately binding that new interface to a master.

In particular, this is used when creating a macvlan child interface for
VRRP in a VRF configuration, allowing the interface creator to specify
directly what master interface should be inherited by the child,
without having to deal with asynchronous complications and potential
race conditions.

Signed-off-by: Theuns Verwoerd <theuns.verwoerd@alliedtelesis.co.nz>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01 11:53:23 -05:00
David S. Miller 04cdf13e34 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-02-01

1) Some typo fixes, from Alexander Alemayhu.

2) Don't acquire state lock in get_mtu functions.
   The only rece against a dead state does not matter.
   From Florian Westphal.

3) Remove xfrm4_state_fini, it is unused for more than
   10 years. From Florian Westphal.

4) Various rcu usage improvements. From Florian Westphal.

5) Properly handle crypto arrors in ah4/ah6.
   From Gilad Ben-Yossef.

6) Try to avoid skb linearization in esp4 and esp6.

7) The esp trailer is now set up in different places,
   add a helper for this.

8) With the upcomming usage of gro_cells in IPsec,
   a gro merged skb can have a secpath. Drop it
   before freeing or reusing the skb.

9) Add a xfrm dummy network device for napi. With
   this we can use gro_cells from within xfrm,
   it allows IPsec GRO without impact on the generic
   networking code.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01 11:22:38 -05:00
Alexei Starovoitov 4d1ceea851 net: ethtool: convert large order kmalloc allocations to vzalloc
under memory pressure 'ethtool -S' command may warn:
[ 2374.385195] ethtool: page allocation failure: order:4, mode:0x242c0c0
[ 2374.405573] CPU: 12 PID: 40211 Comm: ethtool Not tainted
[ 2374.423071] Call Trace:
[ 2374.423076]  [<ffffffff8148cb29>] dump_stack+0x4d/0x64
[ 2374.423080]  [<ffffffff811667cb>] warn_alloc_failed+0xeb/0x150
[ 2374.423082]  [<ffffffff81169cd3>] ? __alloc_pages_direct_compact+0x43/0xf0
[ 2374.423084]  [<ffffffff8116a25c>] __alloc_pages_nodemask+0x4dc/0xbf0
[ 2374.423091]  [<ffffffffa0023dc2>] ? cmd_exec+0x722/0xcd0 [mlx5_core]
[ 2374.423095]  [<ffffffff811b3dcc>] alloc_pages_current+0x8c/0x110
[ 2374.423097]  [<ffffffff81168859>] alloc_kmem_pages+0x19/0x90
[ 2374.423099]  [<ffffffff81186e5e>] kmalloc_order_trace+0x2e/0xe0
[ 2374.423101]  [<ffffffff811c0084>] __kmalloc+0x204/0x220
[ 2374.423105]  [<ffffffff816c269e>] dev_ethtool+0xe4e/0x1f80
[ 2374.423106]  [<ffffffff816b967e>] ? dev_get_by_name_rcu+0x5e/0x80
[ 2374.423108]  [<ffffffff816d6926>] dev_ioctl+0x156/0x560
[ 2374.423111]  [<ffffffff811d4c68>] ? mem_cgroup_commit_charge+0x78/0x3c0
[ 2374.423117]  [<ffffffff8169d542>] sock_do_ioctl+0x42/0x50
[ 2374.423119]  [<ffffffff8169d9c3>] sock_ioctl+0x1b3/0x250
[ 2374.423121]  [<ffffffff811f0f42>] do_vfs_ioctl+0x92/0x580
[ 2374.423123]  [<ffffffff8100222b>] ? do_audit_syscall_entry+0x4b/0x70
[ 2374.423124]  [<ffffffff8100287c>] ? syscall_trace_enter_phase1+0xfc/0x120
[ 2374.423126]  [<ffffffff811f14a9>] SyS_ioctl+0x79/0x90
[ 2374.423127]  [<ffffffff81002bb0>] do_syscall_64+0x50/0xa0
[ 2374.423129]  [<ffffffff817e19bc>] entry_SYSCALL64_slow_path+0x25/0x25

~1160 mlx5 counters ~= order 4 allocation which is unlikely to succeed
under memory pressure. Convert them to vzalloc() as ethtool_get_regs() does.
Also take care of drivers without counters similar to
commit 67ae7cf1ee ("ethtool: Allow zero-length register dumps again")
and reduce warn_on to warn_on_once.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-31 13:28:06 -05:00
David Ahern 30357d7d8a lwtunnel: remove device arg to lwtunnel_build_state
Nothing about lwt state requires a device reference, so remove the
input argument.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:14:22 -05:00
Steffen Klassert f991bb9da1 net: Drop secpath on free after gro merge.
With a followup patch, a gro merged skb can have a secpath.
So drop it before freeing or reusing the skb.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-30 06:45:38 +01:00
David S. Miller 4e8f2fc1a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two trivial overlapping changes conflicts in MPLS and mlx5.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-28 10:33:06 -05:00
Eric Dumazet 158f323b98 net: adjust skb->truesize in pskb_expand_head()
Slava Shwartsman reported a warning in skb_try_coalesce(), when we
detect skb->truesize is completely wrong.

In his case, issue came from IPv6 reassembly coping with malicious
datagrams, that forced various pskb_may_pull() to reallocate a bigger
skb->head than the one allocated by NIC driver before entering GRO
layer.

Current code does not change skb->truesize, leaving this burden to
callers if they care enough.

Blindly changing skb->truesize in pskb_expand_head() is not
easy, as some producers might track skb->truesize, for example
in xmit path for back pressure feedback (sk->sk_wmem_alloc)

We can detect the cases where it should be safe to change
skb->truesize :

1) skb is not attached to a socket.
2) If it is attached to a socket, destructor is sock_edemux()

My audit gave only two callers doing their own skb->truesize
manipulation.

I had to remove skb parameter in sock_edemux macro when
CONFIG_INET is not set to avoid a compile error.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 12:03:29 -05:00
Robert Shearman 85c814016c lwtunnel: Fix oops on state free after encap module unload
When attempting to free lwtunnel state after the module for the encap
has been unloaded an oops occurs:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: lwtstate_free+0x18/0x40
[..]
task: ffff88003e372380 task.stack: ffffc900001fc000
RIP: 0010:lwtstate_free+0x18/0x40
RSP: 0018:ffff88003fd83e88 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88002bbb3380 RCX: ffff88000c91a300
[..]
Call Trace:
 <IRQ>
 free_fib_info_rcu+0x195/0x1a0
 ? rt_fibinfo_free+0x50/0x50
 rcu_process_callbacks+0x2d3/0x850
 ? rcu_process_callbacks+0x296/0x850
 __do_softirq+0xe4/0x4cb
 irq_exit+0xb0/0xc0
 smp_apic_timer_interrupt+0x3d/0x50
 apic_timer_interrupt+0x93/0xa0
[..]
Code: e8 6e c6 fc ff 89 d8 5b 5d c3 bb de ff ff ff eb f4 66 90 66 66 66 66 90 55 48 89 e5 53 0f b7 07 48 89 fb 48 8b 04 c5 00 81 d5 81 <48> 8b 40 08 48 85 c0 74 13 ff d0 48 8d 7b 20 be 20 00 00 00 e8

The problem is after the module for the encap can be unloaded the
corresponding ops is removed and is thus NULL here.

Modules implementing lwtunnel ops should not be allowed to unload
while there is state alive using those ops, so grab the module
reference for the ops on creating lwtunnel state and of course release
the reference when freeing the state.

Fixes: 1104d9ba44 ("lwtunnel: Add destroy state operation")
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24 16:21:36 -05:00
Robert Shearman 88ff7334f2 net: Specify the owning module for lwtunnel ops
Modules implementing lwtunnel ops should not be allowed to unload
while there is state alive using those ops, so specify the owning
module for all lwtunnel ops.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24 16:21:36 -05:00
Daniel Borkmann d1b662adcd bpf: allow option for setting bpf_l4_csum_replace from scratch
When programs need to calculate the csum from scratch for small UDP
packets and use bpf_l4_csum_replace() to feed the result from helpers
like bpf_csum_diff(), then we need a flag besides BPF_F_MARK_MANGLED_0
that would ignore the case of current csum being 0, and which would
still allow for the helper to set the csum and transform when needed
to CSUM_MANGLED_0.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24 14:46:06 -05:00
Daniel Borkmann 2492d3b867 bpf: enable load bytes helper for filter/reuseport progs
BPF_PROG_TYPE_SOCKET_FILTER are used in various facilities such as
for SO_REUSEPORT and packet fanout demuxing, packet filtering, kcm,
etc, and yet the only facility they can use is BPF_LD with {BPF_ABS,
BPF_IND} for single byte/half/word access.

Direct packet access is only restricted to tc programs right now,
but we can still facilitate usage by allowing skb_load_bytes() helper
added back then in 05c74e5e53 ("bpf: add bpf_skb_load_bytes helper")
that calls skb_header_pointer() similarly to bpf_load_pointer(), but
for stack buffers with larger access size.

Name the previous sk_filter_func_proto() as bpf_base_func_proto()
since this is used everywhere else as well, similarly for the ctx
converter, that is, bpf_convert_ctx_access().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24 14:46:05 -05:00
Daniel Borkmann 4faf940dd8 bpf: simplify __is_valid_access test on cb
The __is_valid_access() test for cb[] from 62c7989b24 ("bpf: allow
b/h/w/dw access for bpf's cb in ctx") was done unnecessarily complex,
we can just simplify it the same way as recent fix from 2d071c643f
("bpf, trace: make ctx access checks more robust") did. Overflow can
never happen as size is 1/2/4/8 depending on access.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24 14:46:05 -05:00
Mahesh Bandewar 1b7cd0044e net: remove duplicate code.
netdev_rx_handler_register() checks to see if the handler is already
busy which was recently separated into netdev_is_rx_handler_busy(). So
use the same function inside register() to avoid code duplication.
Essentially this change should be a no-op

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20 12:22:25 -05:00
Phil Sutter 9af15c3825 device: Implement a bus agnostic dev_num_vf routine
Now that pci_bus_type has num_vf callback set, dev_num_vf can be
implemented in a bus type independent way and the check for whether a
PCI device is being handled in rtnetlink can be dropped.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20 11:43:17 -05:00
David Ahern 9ed59592e3 lwtunnel: fix autoload of lwt modules
Trying to add an mpls encap route when the MPLS modules are not loaded
hangs. For example:

    CONFIG_MPLS=y
    CONFIG_NET_MPLS_GSO=m
    CONFIG_MPLS_ROUTING=m
    CONFIG_MPLS_IPTUNNEL=m

    $ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2

The ip command hangs:
root       880   826  0 21:25 pts/0    00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2

    $ cat /proc/880/stack
    [<ffffffff81065a9b>] call_usermodehelper_exec+0xd6/0x134
    [<ffffffff81065efc>] __request_module+0x27b/0x30a
    [<ffffffff814542f6>] lwtunnel_build_state+0xe4/0x178
    [<ffffffff814aa1e4>] fib_create_info+0x47f/0xdd4
    [<ffffffff814ae451>] fib_table_insert+0x90/0x41f
    [<ffffffff814a8010>] inet_rtm_newroute+0x4b/0x52
    ...

modprobe is trying to load rtnl-lwt-MPLS:

root       881     5  0 21:25 ?        00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS

and it hangs after loading mpls_router:

    $ cat /proc/881/stack
    [<ffffffff81441537>] rtnl_lock+0x12/0x14
    [<ffffffff8142ca2a>] register_netdevice_notifier+0x16/0x179
    [<ffffffffa0033025>] mpls_init+0x25/0x1000 [mpls_router]
    [<ffffffff81000471>] do_one_initcall+0x8e/0x13f
    [<ffffffff81119961>] do_init_module+0x5a/0x1e5
    [<ffffffff810bd070>] load_module+0x13bd/0x17d6
    ...

The problem is that lwtunnel_build_state is called with rtnl lock
held preventing mpls_init from registering.

Given the potential references held by the time lwtunnel_build_state it
can not drop the rtnl lock to the load module. So, extract the module
loading code from lwtunnel_build_state into a new function to validate
the encap type. The new function is called while converting the user
request into a fib_config which is well before any table, device or
fib entries are examined.

Fixes: 745041e2aa ("lwtunnel: autoload of lwt modules")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18 17:07:14 -05:00
Eric Dumazet 7be2c82cfd net: fix harmonize_features() vs NETIF_F_HIGHDMA
Ashizuka reported a highmem oddity and sent a patch for freescale
fec driver.

But the problem root cause is that core networking stack
must ensure no skb with highmem fragment is ever sent through
a device that does not assert NETIF_F_HIGHDMA in its features.

We need to call illegal_highdma() from harmonize_features()
regardless of CSUM checks.

Fixes: ec5f061564 ("net: Kill link between CSUM and SG features.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pravin Shelar <pshelar@ovn.org>
Reported-by: "Ashizuka, Yuusuke" <ashiduka@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18 15:24:27 -05:00
Eran Ben Elisha 31a86d1372 net: ethtool: Initialize buffer when querying device channel settings
Ethtool channels respond struct was uninitialized when querying device
channel boundaries settings. As a result, unreported fields by the driver
hold garbage.  This may cause sending unsupported params to driver.

Fixes: 8bf3686204 ('ethtool: ensure channel counts are within bounds ...')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
CC: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18 14:58:23 -05:00
Robert Shearman aefb4d4ad8 net: AF-specific RTM_GETSTATS attributes
Add the functionality for including address-family-specific per-link
stats in RTM_GETSTATS messages. This is done through adding a new
IFLA_STATS_AF_SPEC attribute under which address family attributes are
nested and then the AF-specific attributes can be further nested. This
follows the model of IFLA_AF_SPEC on RTM_*LINK messages and it has the
advantage of presenting an easily extended hierarchy. The rtnl_af_ops
structure is extended to provide AFs with the opportunity to fill and
provide the size of their stats attributes.

One alternative would have been to provide AFs with the ability to add
attributes directly into the RTM_GETSTATS message without a nested
hierarchy. I discounted this approach as it increases the rate at
which the 32 attribute number space is used up and it makes
implementation a little more tricky for stats dump resuming (at the
moment the order in which attributes are added to the message has to
match the numeric order of the attributes).

Another alternative would have been to register per-AF RTM_GETSTATS
handlers. I discounted this approach as I perceived a common use-case
to be getting all the stats for an interface and this approach would
necessitate multiple requests/dumps to retrieve them all.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-17 14:38:43 -05:00
Colin Ian King 57b68ec2a7 flow dissector: check if arp_eth is null rather than arp
arp is being checked instead of arp_eth to see if the call to
__skb_header_pointer failed. Fix this by checking arp_eth is
null instead of arp.   Also fix to use length hlen rather than
hlen - sizeof(_arp); thanks to Eric Dumazet for spotting
this latter issue.

CoverityScan CID#1396428 ("Logically dead code") on 2nd
arp comparison (which should be arp_eth instead).

Fixes: commit 55733350e5 ("flow disector: ARP support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16 13:48:48 -05:00
Eric Dumazet c1ce1560a1 secure_seq: fix sparse errors
Fixes following warnings :

net/core/secure_seq.c:125:28: warning: incorrect type in argument 1
(different base types)
net/core/secure_seq.c:125:28:    expected unsigned int const [unsigned]
[usertype] a
net/core/secure_seq.c:125:28:    got restricted __be32 [usertype] saddr
net/core/secure_seq.c:125:35: warning: incorrect type in argument 2
(different base types)
net/core/secure_seq.c:125:35:    expected unsigned int const [unsigned]
[usertype] b
net/core/secure_seq.c:125:35:    got restricted __be32 [usertype] daddr
net/core/secure_seq.c:125:43: warning: cast from restricted __be16
net/core/secure_seq.c:125:61: warning: restricted __be16 degrades to
integer

Fixes: 7cd23e5300 ("secure_seq: use SipHash in place of MD5")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 15:57:10 -05:00
Wei Yongjun 79471b10d6 lwt_bpf: bpf_lwt_prog_cmp() can be static
Fixes the following sparse warning:

net/core/lwt_bpf.c:355:5: warning:
 symbol 'bpf_lwt_prog_cmp' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 10:04:40 -05:00
Daniel Borkmann 62c7989b24 bpf: allow b/h/w/dw access for bpf's cb in ctx
When structs are used to store temporary state in cb[] buffer that is
used with programs and among tail calls, then the generated code will
not always access the buffer in bpf_w chunks. We can ease programming
of it and let this act more natural by allowing for aligned b/h/w/dw
sized access for cb[] ctx member. Various test cases are attached as
well for the selftest suite. Potentially, this can also be reused for
other program types to pass data around.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 10:00:31 -05:00
Daniel Borkmann 6b8cc1d11e bpf: pass original insn directly to convert_ctx_access
Currently, when calling convert_ctx_access() callback for the various
program types, we pass in insn->dst_reg, insn->src_reg, insn->off from
the original instruction. This information is needed to rewrite the
instruction that is based on the user ctx structure into a kernel
representation for the ctx. As we'd like to allow access size beyond
just BPF_W, we'd need also insn->code for that in order to decode the
original access size. Given that, lets just pass insn directly to the
convert_ctx_access() callback and work on that to not clutter the
callback with even more arguments we need to pass when everything is
already contained in insn. So lets go through that once, no functional
change.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 10:00:31 -05:00
Ursula Braun 526735ddc0 net: fix AF_SMC related typo
When introducing the new socket family AF_SMC in
commit ac7138746e ("smc: establish new socket family"),
a typo in af_family_clock_key_strings has slipped in.
This patch repairs it.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Fixes: ac7138746e ("smc: establish new socket family")
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 09:47:01 -05:00
Florian Fainelli 738b35ccee net: core: Make netif_wake_subqueue a wrapper
netif_wake_subqueue() is duplicating the same thing that netif_tx_wake_queue()
does, so make it call it directly after looking up the queue from the index.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 09:18:05 -05:00
David S. Miller 02ac5d1487 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two AF_* families adding entries to the lockdep tables
at the same time.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 14:43:39 -05:00
Linus Torvalds ba836a6f5a Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "27 fixes.

  There are three patches that aren't actually fixes. They're simple
  function renamings which are nice-to-have in mainline as ongoing net
  development depends on them."

* akpm: (27 commits)
  timerfd: export defines to userspace
  mm/hugetlb.c: fix reservation race when freeing surplus pages
  mm/slab.c: fix SLAB freelist randomization duplicate entries
  zram: support BDI_CAP_STABLE_WRITES
  zram: revalidate disk under init_lock
  mm: support anonymous stable page
  mm: add documentation for page fragment APIs
  mm: rename __page_frag functions to __page_frag_cache, drop order from drain
  mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
  mm, memcg: fix the active list aging for lowmem requests when memcg is enabled
  mm: don't dereference struct page fields of invalid pages
  mailmap: add codeaurora.org names for nameless email commits
  signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
  mm: pmd dirty emulation in page fault handler
  ipc/sem.c: fix incorrect sem_lock pairing
  lib/Kconfig.debug: fix frv build failure
  mm: get rid of __GFP_OTHER_NODE
  mm: fix remote numa hits statistics
  mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
  ocfs2: fix crash caused by stale lvb with fsdlm plugin
  ...
2017-01-11 11:15:15 -08:00
Simon Horman 55733350e5 flow disector: ARP support
Allow dissection of (R)ARP operation hardware and protocol addresses
for Ethernet hardware and IPv4 protocol addresses.

There are currently no users of FLOW_DISSECTOR_KEY_ARP.
A follow-up patch will allow FLOW_DISSECTOR_KEY_ARP to be used by the
flower classifier.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 11:02:47 -05:00
Eric Dumazet 7cfd5fd5a9 gro: use min_t() in skb_gro_reset_offset()
On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
so we shall use min_t() instead of min() to avoid a compiler error.

Fixes: 1272ce87fa ("gro: Enter slow-path if there is no tailroom")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 08:15:40 -05:00
Alexander Duyck 8c2dd3e4a4 mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
Patch series "Page fragment updates", v4.

This patch series takes care of a few cleanups for the page fragments
API.

First we do some renames so that things are much more consistent.  First
we move the page_frag_ portion of the name to the front of the functions
names.  Secondly we split out the cache specific functions from the
other page fragment functions by adding the word "cache" to the name.

Finally I added a bit of documentation that will hopefully help to
explain some of this.  I plan to revisit this later as we get things
more ironed out in the near future with the changes planned for the DMA
setup to support eXpress Data Path.

This patch (of 3):

This patch renames the page frag functions to be more consistent with
other APIs.  Specifically we place the name page_frag first in the name
and then have either an alloc or free call name that we append as the
suffix.  This makes it a bit clearer in terms of naming.

In addition we drop the leading double underscores since we are
technically no longer a backing interface and instead the front end that
is called from the networking APIs.

Link: http://lkml.kernel.org/r/20170104023854.13451.67390.stgit@localhost.localdomain
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Herbert Xu 1272ce87fa gro: Enter slow-path if there is no tailroom
The GRO path has a fast-path where we avoid calling pskb_may_pull
and pskb_expand by directly accessing frag0.  However, this should
only be done if we have enough tailroom in the skb as otherwise
we'll have to expand it later anyway.

This patch adds the check by capping frag0_len with the skb tailroom.

Fixes: cb18978cbf ("gro: Open-code final pskb_may_pull")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:26:12 -05:00
Anna, Suman 5d722b3024 net: add the AF_QIPCRTR entries to family name tables
Commit bdabad3e36 ("net: Add Qualcomm IPC router") introduced a
new address family. Update the family name tables accordingly so
that the lockdep initialization can use the proper names for this
family.

Cc: Courtney Cavin <courtney.cavin@sonymobile.com>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 20:50:59 -05:00
Eric Dumazet d9584d8ccc net: skb_flow_get_be16() can be static
Removes following sparse complain :

net/core/flow_dissector.c:70:8: warning: symbol 'skb_flow_get_be16'
was not declared. Should it be static?

Fixes: 972d3876fa ("flow dissector: ICMP support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 13:30:13 -05:00
Alexei Starovoitov 39f19ebbf5 bpf: rename ARG_PTR_TO_STACK
since ARG_PTR_TO_STACK is no longer just pointer to stack
rename it to ARG_PTR_TO_MEM and adjust comment.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:56:27 -05:00