Commit graph

782062 commits

Author SHA1 Message Date
Cong Wang
cc4dfb7f70 rds: fix two RCU related problems
When a rds sock is bound, it is inserted into the bind_hash_table
which is protected by RCU. But when releasing rds sock, after it
is removed from this hash table, it is freed immediately without
respecting RCU grace period. This could cause some use-after-free
as reported by syzbot.

Mark the rds sock with SOCK_RCU_FREE before inserting it into the
bind_hash_table, so that it would be always freed after a RCU grace
period.

The other problem is in rds_find_bound(), the rds sock could be
freed in between rhashtable_lookup_fast() and rds_sock_addref(),
so we need to extend RCU read lock protection in rds_find_bound()
to close this race condition.

Reported-and-tested-by: syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com
Reported-by: syzbot+93a5839deb355537440f@syzkaller.appspotmail.com
Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: rds-devel@oss.oracle.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oarcle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 00:09:19 -07:00
Kai-Heng Feng
6ad5690199 r8169: Clear RTL_FLAG_TASK_*_PENDING when clearing RTL_FLAG_TASK_ENABLED
After system suspend, sometimes the r8169 doesn't work when ethernet
cable gets pluggued.

This issue happens because rtl_reset_work() doesn't get called from
rtl8169_runtime_resume(), after system suspend.

In rtl_task(), RTL_FLAG_TASK_* only gets cleared if this condition is
met:
if (!netif_running(dev) ||
    !test_bit(RTL_FLAG_TASK_ENABLED, tp->wk.flags))
    ...

If RTL_FLAG_TASK_ENABLED was cleared during system suspend while
RTL_FLAG_TASK_RESET_PENDING was set, the next rtl_schedule_task() won't
schedule task as the flag is still there.

So in addition to clearing RTL_FLAG_TASK_ENABLED, also clears other
flags.

Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 00:05:33 -07:00
Haishuang Yan
51dc63e391 erspan: fix error handling for erspan tunnel
When processing icmp unreachable message for erspan tunnel, tunnel id
should be erspan_net_id instead of ipgre_net_id.

Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 23:50:54 -07:00
Haishuang Yan
5a64506b5c erspan: return PACKET_REJECT when the appropriate tunnel is not found
If erspan tunnel hasn't been established, we'd better send icmp port
unreachable message after receive erspan packets.

Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 23:50:53 -07:00
Willem de Bruijn
0297c1c2ea tcp: rate limit synflood warnings further
Convert pr_info to net_info_ratelimited to limit the total number of
synflood warnings.

Commit 946cedccbd ("tcp: Change possible SYN flooding messages")
rate limits synflood warnings to one per listener.

Workloads that open many listener sockets can still see a high rate of
log messages. Syzkaller is one frequent example.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 23:34:20 -07:00
Hauke Mehrtens
2d946e5bcd MIPS: lantiq: dma: add dev pointer
dma_zalloc_coherent() now crashes if no dev pointer is given.
Add a dev pointer to the ltq_dma_channel structure and fill it in the
driver using it.

This fixes a bug introduced in kernel 4.19.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 23:33:19 -07:00
David S. Miller
4ecdf77091 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for you net tree:

1) Remove duplicated include at the end of UDP conntrack, from Yue Haibing.

2) Restore conntrack dependency on xt_cluster, from Martin Willi.

3) Fix splat with GSO skbs from the checksum target, from Florian Westphal.

4) Rework ct timeout support, the template strategy to attach custom timeouts
   is not correct since it will not work in conjunction with conntrack zones
   and we have a possible free after use when removing the rule due to missing
   refcounting. To fix these problems, do not use conntrack template at all
   and set custom timeout on the already valid conntrack object. This
   fix comes with a preparation patch to simplify timeout adjustment by
   initializating the first position of the timeout array for all of the
   existing trackers. Patchset from Florian Westphal.

5) Fix missing dependency on from IPv4 chain NAT type, from Florian.

6) Release chain reference counter from the flush path, from Taehee Yoo.

7) After flushing an iptables ruleset, conntrack hooks are unregistered
   and entries are left stale to be cleaned up by the timeout garbage
   collector. No TCP tracking is done on established flows by this time.
   If ruleset is reloaded, then hooks are registered again and TCP
   tracking is restored, which considers packets to be invalid. Clear
   window tracking to exercise TCP flow pickup from the middle given that
   history is lost for us. Again from Florian.

8) Fix crash from netlink interface with CONFIG_NF_CONNTRACK_TIMEOUT=y
   and CONFIG_NF_CT_NETLINK_TIMEOUT=n.

9) Broken CT target due to returning incorrect type from
   ctnl_timeout_find_get().

10) Solve conntrack clash on NF_REPEAT verdicts too, from Michal Vaner.

11) Missing conversion of hashlimit sysctl interface to new API, from
    Cong Wang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 21:17:30 -07:00
Cong Wang
1286df269f netfilter: xt_hashlimit: use s->file instead of s->private
After switching to the new procfs API, it is supposed to
retrieve the private pointer from PDE_DATA(file_inode(s->file)),
s->private is no longer referred.

Fixes: 1cd6718272 ("netfilter/x_tables: switch to proc_create_seq_private")
Reported-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:35:32 +02:00
Michal 'vorner' Vaner
ad18d7bf68 netfilter: nfnetlink_queue: Solve the NFQUEUE/conntrack clash for NF_REPEAT
NF_REPEAT places the packet at the beginning of the iptables chain
instead of accepting or rejecting it right away. The packet however will
reach the end of the chain and continue to the end of iptables
eventually, so it needs the same handling as NF_ACCEPT and NF_DROP.

Fixes: 368982cd7d ("netfilter: nfnetlink_queue: resolve clash for unconfirmed conntracks")
Signed-off-by: Michal 'vorner' Vaner <michal.vaner@avast.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:31:47 +02:00
Pablo Neira Ayuso
99e25d071f netfilter: cttimeout: ctnl_timeout_find_get() returns incorrect pointer to type
Compiler did not catch incorrect typing in the rcu hook assignment.

 % nfct add timeout test-tcp inet tcp established 100 close 10 close_wait 10
 % iptables -I OUTPUT -t raw -p tcp -j CT --timeout test-tcp
 dmesg - xt_CT: Timeout policy `test-tcp' can only be used by L3 protocol number 25000

The CT target bails out with incorrect layer 3 protocol number.

Fixes: 6c1fd7dc48 ("netfilter: cttimeout: decouple timeout policy from nfnetlink_cttimeout object")
Reported-by: Harsha Sharma <harshasharmaiitr@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:31:10 +02:00
Pablo Neira Ayuso
a874752a10 netfilter: conntrack: timeout interface depend on CONFIG_NF_CONNTRACK_TIMEOUT
Now that cttimeout support for nft_ct is in place, these should depend
on CONFIG_NF_CONNTRACK_TIMEOUT otherwise we can crash when dumping the
policy if this option is not enabled.

[   71.600121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[...]
[   71.600141] CPU: 3 PID: 7612 Comm: nft Not tainted 4.18.0+ #246
[...]
[   71.600188] Call Trace:
[   71.600201]  ? nft_ct_timeout_obj_dump+0xc6/0xf0 [nft_ct]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:30:25 +02:00
Florian Westphal
f94e63801a netfilter: conntrack: reset tcp maxwin on re-register
Doug Smythies says:
  Sometimes it is desirable to temporarily disable, or clear,
  the iptables rule set on a computer being controlled via a
  secure shell session (SSH). While unwise on an internet facing
  computer, I also do it often on non-internet accessible computers
  while testing. Recently, this has become problematic, with the
  SSH session being dropped upon re-load of the rule set.

The problem is that when all rules are deleted, conntrack hooks get
unregistered.

In case the rules are re-added later, its possible that tcp window
has moved far enough so that all packets are considered invalid (out of
window) until entry expires (which can take forever, default
established timeout is 5 days).

Fix this by clearing maxwin of existing tcp connections on register.

v2: don't touch entries on hook removal.
v3: remove obsolete expiry check.

Reported-by: Doug Smythies <dsmythies@telus.net>
Fixes: 4d3a57f23d ("netfilter: conntrack: do not enable connection tracking unless needed")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:29:24 +02:00
Kristian Evensen
7c5cca3588 qmi_wwan: Support dynamic config on Quectel EP06
Quectel EP06 (and EM06/EG06) supports dynamic configuration of USB
interfaces, without the device changing VID/PID or configuration number.
When the configuration is updated and interfaces are added/removed, the
interface numbers change. This means that the current code for matching
EP06 does not work.

This patch removes the current EP06 interface number match, and replaces
it with a match on class, subclass and protocol. Unfortunately, matching
on those three alone is not enough, as the diag interface exports the
same values as QMI. The other serial interfaces + adb export different
values and do not match.

The diag interface only has two endpoints, while the QMI interface has
three. I have therefore added a check for number of interfaces, and we
ignore the interface if the number of endpoints equals two.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:48:54 -07:00
Kuninori Morimoto
3ebb17446b ethernet: renesas: convert to SPDX identifiers
This patch updates license to use SPDX-License-Identifier
instead of verbose license text.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:11:53 -07:00
Taehee Yoo
5d407b071d ip: frags: fix crash in ip_do_fragment()
A kernel crash occurrs when defragmented packet is fragmented
in ip_do_fragment().
In defragment routine, skb_orphan() is called and
skb->ip_defrag_offset is set. but skb->sk and
skb->ip_defrag_offset are same union member. so that
frag->sk is not NULL.
Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
defragmented packet is fragmented.

test commands:
   %iptables -t nat -I POSTROUTING -j MASQUERADE
   %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000

splat looks like:
[  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
[  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
[  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
[  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
[  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
[  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
[  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
[  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
[  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
[  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
[  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
[  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
[  261.198158] Call Trace:
[  261.199018]  ? dst_output+0x180/0x180
[  261.205011]  ? save_trace+0x300/0x300
[  261.209018]  ? ip_copy_metadata+0xb00/0xb00
[  261.213034]  ? sched_clock_local+0xd4/0x140
[  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
[  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
[  261.227014]  ? find_held_lock+0x39/0x1c0
[  261.233008]  ip_finish_output+0x51d/0xb50
[  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
[  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
[  261.250152]  ? rcu_is_watching+0x77/0x120
[  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
[  261.261033]  ? nf_hook_slow+0xb1/0x160
[  261.265007]  ip_output+0x1c7/0x710
[  261.269005]  ? ip_mc_output+0x13f0/0x13f0
[  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
[  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
[  261.282996]  ? nf_hook_slow+0xb1/0x160
[  261.287007]  raw_sendmsg+0x21f9/0x4420
[  261.291008]  ? dst_output+0x180/0x180
[  261.297003]  ? sched_clock_cpu+0x126/0x170
[  261.301003]  ? find_held_lock+0x39/0x1c0
[  261.306155]  ? stop_critical_timings+0x420/0x420
[  261.311004]  ? check_flags.part.36+0x450/0x450
[  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
[  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
[  261.326142]  ? cyc2ns_read_end+0x10/0x10
[  261.330139]  ? raw_bind+0x280/0x280
[  261.334138]  ? sched_clock_cpu+0x126/0x170
[  261.338995]  ? check_flags.part.36+0x450/0x450
[  261.342991]  ? __lock_acquire+0x4500/0x4500
[  261.348994]  ? inet_sendmsg+0x11c/0x500
[  261.352989]  ? dst_output+0x180/0x180
[  261.357012]  inet_sendmsg+0x11c/0x500
[ ... ]

v2:
 - clear skb->sk at reassembly routine.(Eric Dumarzet)

Fixes: fa0f527358 ("ip: use rb trees for IP frag queue.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 14:50:56 -07:00
Vakul Garg
52ea992cfa net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPC
tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
function sk_alloc_sg(). In case the number of SG entries hit
MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
current SG index to '0'. This leads to calling of function
tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
kernel crash. To fix this, set the number of SG elements to the number
of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
returns -ENOSPC.

Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 08:10:01 -07:00
David S. Miller
0e1f4c76be Merge branch 'ena-fixes'
Netanel Belgazal says:

====================
bug fixes for ENA Ethernet driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:56 -07:00
Netanel Belgazal
37dff155dc net: ena: fix incorrect usage of memory barriers
Added memory barriers where they were missing to support multiple
architectures, and removed redundant ones.

As part of removing the redundant memory barriers and improving
performance, we moved to more relaxed versions of memory barriers,
as well as to the more relaxed version of writel - writel_relaxed,
while maintaining correctness.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:56 -07:00
Netanel Belgazal
28abf4e9c9 net: ena: fix missing calls to READ_ONCE
Add READ_ONCE calls where necessary (for example when iterating
over a memory field that gets updated by the hardware).

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:56 -07:00
Netanel Belgazal
944b28aa29 net: ena: fix missing lock during device destruction
acquire the rtnl_lock during device destruction to avoid
using partially destroyed device.

ena_remove() shares almost the same logic as ena_destroy_device(),
so use ena_destroy_device() and avoid duplications.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:56 -07:00
Netanel Belgazal
fe870c77ef net: ena: fix potential double ena_destroy_device()
ena_destroy_device() can potentially be called twice.
To avoid this, check that the device is running and
only then proceed destroying it.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:55 -07:00
Netanel Belgazal
cfa324a514 net: ena: fix device destruction to gracefully free resources
When ena_destroy_device() is called from ena_suspend(), the device is
still reachable from the driver. Therefore, the driver can send a command
to the device to free all resources.
However, in all other cases of calling ena_destroy_device(), the device is
potentially in an error state and unreachable from the driver. In these
cases the driver must not send commands to the device.

The current implementation does not request resource freeing from the
device even when possible. We add the graceful parameter to
ena_destroy_device() to enable resource freeing when possible, and
use it in ena_suspend().

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:55 -07:00
Netanel Belgazal
ef5b0771d2 net: ena: fix driver when PAGE_SIZE == 64kB
The buffer length field in the ena rx descriptor is 16 bit, and the
current driver passes a full page in each ena rx descriptor.
When PAGE_SIZE equals 64kB or more, the buffer length field becomes
zero.
To solve this issue, limit the ena Rx descriptor to use 16kB even
when allocating 64kB kernel pages. This change would not impact ena
device functionality, as 16kB is still larger than maximum MTU.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:55 -07:00
Netanel Belgazal
772ed869f5 net: ena: fix surprise unplug NULL dereference kernel crash
Starting with driver version 1.5.0, in case of a surprise device
unplug, there is a race caused by invoking ena_destroy_device()
from two different places. As a result, the readless register might
be accessed after it was destroyed.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:55 -07:00
Vincent Whitchurch
5cf4a8532c tcp: really ignore MSG_ZEROCOPY if no SO_ZEROCOPY
According to the documentation in msg_zerocopy.rst, the SO_ZEROCOPY
flag was introduced because send(2) ignores unknown message flags and
any legacy application which was accidentally passing the equivalent of
MSG_ZEROCOPY earlier should not see any new behaviour.

Before commit f214f915e7 ("tcp: enable MSG_ZEROCOPY"), a send(2) call
which passed the equivalent of MSG_ZEROCOPY without setting SO_ZEROCOPY
would succeed.  However, after that commit, it fails with -ENOBUFS.  So
it appears that the SO_ZEROCOPY flag fails to fulfill its intended
purpose.  Fix it.

Fixes: f214f915e7 ("tcp: enable MSG_ZEROCOPY")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-07 23:11:06 -07:00
Cong Wang
a162c35114 net_sched: properly cancel netlink dump on failure
When nla_put*() fails after nla_nest_start(), we need
to call nla_nest_cancel() to cancel the message, otherwise
we end up calling nla_nest_end() like a success.

Fixes: 0ed5269f9e ("net/sched: add tunnel option support to act_tunnel_key")
Cc: Davide Caratti <dcaratti@redhat.com>
Cc: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-07 23:05:07 -07:00
Juergen Gross
8edfe2e992 xen/netfront: fix waiting for xenbus state change
Commit 822fb18a82 ("xen-netfront: wait xenbus state change when load
module manually") added a new wait queue to wait on for a state change
when the module is loaded manually. Unfortunately there is no wakeup
anywhere to stop that waiting.

Instead of introducing a new wait queue rename the existing
module_unload_q to module_wq and use it for both purposes (loading and
unloading).

As any state change of the backend might be intended to stop waiting
do the wake_up_all() in any case when netback_changed() is called.

Fixes: 822fb18a82 ("xen-netfront: wait xenbus state change when load module manually")
Cc: <stable@vger.kernel.org> #4.18
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-07 23:03:16 -07:00
Maciej S. Szmigiero
f74dd480cf r8169: set TxConfig register after TX / RX is enabled, just like RxConfig
Commit 3559d81e76 ("r8169: simplify rtl_hw_start_8169") changed order of
two register writes:
1) Caused RxConfig to be written before TX / RX is enabled,
2) Caused TxConfig to be written before TX / RX is enabled.

At least on XIDs 10000000 ("RTL8169sb/8110sb") and
18000000 ("RTL8169sc/8110sc") such writes are ignored by the chip, leaving
values in these registers intact.

Change 1) was reverted by
commit 05212ba813 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices"),
however change 2) wasn't.

In practice, this caused TxConfig's "InterFrameGap time" and "Max DMA Burst
Size per Tx DMA Burst" bits to be zero dramatically reducing TX performance
(in my tests it dropped from around 500Mbps to around 50Mbps).

This patch fixes the issue by moving TxConfig register write a bit later in
the code so it happens after TX / RX is already enabled.

Fixes: 05212ba813 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices")
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-07 14:52:23 -07:00
Cong Wang
8f5c5fcf35 tipc: call start and done ops directly in __tipc_nl_compat_dumpit()
__tipc_nl_compat_dumpit() uses a netlink_callback on stack,
so the only way to align it with other ->dumpit() call path
is calling tipc_dump_start() and tipc_dump_done() directly
inside it. Otherwise ->dumpit() would always get NULL from
cb->args[].

But tipc_dump_start() uses sock_net(cb->skb->sk) to retrieve
net pointer, the cb->skb here doesn't set skb->sk, the net pointer
is saved in msg->net instead, so introduce a helper function
__tipc_dump_start() to pass in msg->net.

Ying pointed out cb->args[0...3] are already used by other
callbacks on this call path, so we can't use cb->args[0] any
more, use cb->args[4] instead.

Fixes: 9a07efa9ae ("tipc: switch to rhashtable iterator")
Reported-and-tested-by: syzbot+e93a2c41f91b8e2c7d9b@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-06 21:49:18 -07:00
David S. Miller
6da410d97f mlx5e-fixes-2018-09-05
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbkHDZAAoJEEg/ir3gV/o+X8MH/2STl828uBTkYNwxWOz9szWm
 xkMs9pEQzTNjPGoVNhTVStA3TdeyPOQccRh8JBykJstSmFD37OAaIsvGmmRfetKi
 wb0JFi9QzNt47hJKX8DLbif3oUh1KmFpVStT4D9eAWNK0ke5Rn4qiUm43T6WejDk
 1r7W8j5ZNUbhfdEj5jMYc3H8sZFNcyTxiFGmZSlovnV6JWMZdIOnwmd+VghPxbSE
 27dN7cpVRxh+wpEmZVVRyQULe0WgXE/CQ/xU9efkroabwGpjs6ivU10QpBOFJ8Js
 c/+muwKNMPG6n4wxblUSwXshLmKXLiKXPnC6iNNf9+Q77F8gL37DcsVHKGPYrho=
 =NhLj
 -----END PGP SIGNATURE-----

Merge tag 'mlx5e-fixes-2018-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2018-09-05

This pull request contains some fixes for mlx5 etherent netdevice and
core driver.

Please pull and let me know if there's any problem.

For -stable v4.9:
('net/mlx5: Fix debugfs cleanup in the device init/remove flow')

For -stable v4.12:
("net/mlx5: E-Switch, Fix memory leak when creating switchdev mode FDB tables")

For -stable v4.13:
("net/mlx5: Fix use-after-free in self-healing flow")

For -stable v4.14:
("net/mlx5: Check for error in mlx5_attach_interface")

For -stable v4.15:
("net/mlx5: Fix not releasing read lock when adding flow rules")

For -stable v4.17:
("net/mlx5: Fix possible deadlock from lockdep when adding fte to fg")

For -stable v4.18:
("net/mlx5: Use u16 for Work Queue buffer fragment size")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-06 07:56:04 -07:00
David S. Miller
fce471e3c1 Merge branch 'iucv-fixes'
Julian Wiedmann says:

====================
net/iucv: fixes 2018-09-05

please apply three straight-forward fixes for iucv. One that prevents
leaking the skb on malformed inbound packets, one to fix the error
handling on transmit error, and one to get rid of a compile warning.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:32:22 -07:00
Julian Wiedmann
b7f4156554 net/iucv: declare iucv_path_table_empty() as static
Fixes a compile warning.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:32:22 -07:00
Julian Wiedmann
b2f543949a net/af_iucv: fix skb handling on HiperTransport xmit error
When sending an skb, afiucv_hs_send() bails out on various error
conditions. But currently the caller has no way of telling whether the
skb was freed or not - resulting in potentially either
a) leaked skbs from iucv_send_ctrl(), or
b) double-free's from iucv_sock_sendmsg().

As dev_queue_xmit() will always consume the skb (even on error), be
consistent and also free the skb from all other error paths. This way
callers no longer need to care about managing the skb.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:32:22 -07:00
Julian Wiedmann
222440996d net/af_iucv: drop inbound packets with invalid flags
Inbound packets may have any combination of flag bits set in their iucv
header. If we don't know how to handle a specific combination, drop the
skb instead of leaking it.

To clarify what error is returned in this case, replace the hard-coded
0 with the corresponding macro.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:32:21 -07:00
Davide Caratti
ee28bb56ac net/sched: fix memory leak in act_tunnel_key_init()
If users try to install act_tunnel_key 'set' rules with duplicate values
of 'index', the tunnel metadata are allocated, but never released. Then,
kmemleak complains as follows:

 # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
 # echo clear > /sys/kernel/debug/kmemleak
 # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
 Error: TC IDR already exists.
 We have an error talking to the kernel
 # echo scan > /sys/kernel/debug/kmemleak
 # cat /sys/kernel/debug/kmemleak
 unreferenced object 0xffff8800574e6c80 (size 256):
   comm "tc", pid 5617, jiffies 4298118009 (age 57.990s)
   hex dump (first 32 bytes):
     00 00 00 00 00 00 00 00 00 1c e8 b0 ff ff ff ff  ................
     81 24 c2 ad ff ff ff ff 00 00 00 00 00 00 00 00  .$..............
   backtrace:
     [<00000000b7afbf4e>] tunnel_key_init+0x8a5/0x1800 [act_tunnel_key]
     [<000000007d98fccd>] tcf_action_init_1+0x698/0xac0
     [<0000000099b8f7cc>] tcf_action_init+0x15c/0x590
     [<00000000dc60eebe>] tc_ctl_action+0x336/0x5c2
     [<000000002f5a2f7d>] rtnetlink_rcv_msg+0x357/0x8e0
     [<000000000bfe7575>] netlink_rcv_skb+0x124/0x350
     [<00000000edab656f>] netlink_unicast+0x40f/0x5d0
     [<00000000b322cdcb>] netlink_sendmsg+0x6e8/0xba0
     [<0000000063d9d490>] sock_sendmsg+0xb3/0xf0
     [<00000000f0d3315a>] ___sys_sendmsg+0x654/0x960
     [<00000000c06cbd42>] __sys_sendmsg+0xd3/0x170
     [<00000000ce72e4b0>] do_syscall_64+0xa5/0x470
     [<000000005caa2d97>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
     [<00000000fac1b476>] 0xffffffffffffffff

This problem theoretically happens also in case users attempt to setup a
geneve rule having wrong configuration data, or when the kernel fails to
allocate 'params_new'. Ensure that tunnel_key_init() releases the tunnel
metadata also in the above conditions.

Addresses-Coverity-ID: 1373974 ("Resource leak")
Fixes: d0f6dd8a91 ("net/sched: Introduce act_tunnel_key")
Fixes: 0ed5269f9e ("net/sched: add tunnel option support to act_tunnel_key")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:18:54 -07:00
Cong Wang
0a3b8b2b21 tipc: orphan sock in tipc_release()
Before we unlock the sock in tipc_release(), we have to
detach sk->sk_socket from sk, otherwise a parallel
tipc_sk_fill_sock_diag() could stil read it after we
free this socket.

Fixes: c30b70deb5 ("tipc: implement socket diagnostics for AF_TIPC")
Reported-and-tested-by: syzbot+48804b87c16588ad491d@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:14:00 -07:00
Roi Dayan
ad9421e36a net/mlx5: Fix possible deadlock from lockdep when adding fte to fg
This is a false positive report due to incorrect nested lock
annotations as we lock multiple fgs with the same subclass.
Instead of locking all fgs only lock the one being used as was
done before.

Fixes: bd71b08ec2 ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:34 -07:00
Saeed Mahameed
fc433829f9 net/mlx5e: Ethtool steering, fix udp source port value
Copy and paste bug was introduced in the offending patch.
We need to write udp source port value into the headers value and not
headers criteria "mask".

Fixes: 142644f8a1 ("net/mlx5e: Ethtool steering flow parsing refactoring")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Huy Nguyen
47bc94b822 net/mlx5: Check for error in mlx5_attach_interface
Currently, mlx5_attach_interface does not check for error
after calling intf->attach or intf->add. When these two calls
fails, the client is not initialized and will cause issues such as
kernel panic on invalid address in the teardown path (mlx5_detach_interface)

Fixes: 737a234bb6 ("net/mlx5: Introduce attach/detach to interface API")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Daniel Jurgens
df7ddb2396 net/mlx5: Consider PCI domain in search for next dev
The PCI BDF is not unique. PCI domain must also be considered when
searching for the next physical device during lag setup. Example below:

mlx5_core 0000:01:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(128) RxCqeCmprss(0)
mlx5_core 0000:01:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(128) RxCqeCmprss(0)
mlx5_core 0001:01:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(128) RxCqeCmprss(0)
mlx5_core 0001:01:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(128) RxCqeCmprss(0)

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Roi Dayan
071304772f net/mlx5: Fix not releasing read lock when adding flow rules
If building match list fg fails and we never jumped to
search_again_locked label then the function returned without
unlocking the read lock.

Fixes: bd71b08ec2 ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Raed Salem
c88a026e01 net/mlx5: E-Switch, Fix memory leak when creating switchdev mode FDB tables
The memory allocated for the slow path table flow group input structure
was not freed upon successful return, fix that.

Fixes: 1967ce6ea5 ("net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev mode")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Tariq Toukan
a090362210 net/mlx5: Use u16 for Work Queue buffer strides offset
Minimal stride size is 16.
Hence, the number of strides in a fragment (of PAGE_SIZE)
is <= PAGE_SIZE / 16 <= 4K.

u16 is sufficient to represent this.

Fixes: d7037ad73d ("net/mlx5: Fix QP fragmented buffer allocation")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Tariq Toukan
8d71e81850 net/mlx5: Use u16 for Work Queue buffer fragment size
Minimal stride size is 16.
Hence, the number of strides in a fragment (of PAGE_SIZE)
is <= PAGE_SIZE / 16 <= 4K.

u16 is sufficient to represent this.

Fixes: 388ca8be00 ("IB/mlx5: Implement fragmented completion queue (CQ)")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Jack Morgenstein
5df816e7f4 net/mlx5: Fix debugfs cleanup in the device init/remove flow
When initializing the device (procedure init_one), the driver
calls mlx5_pci_init to perform pci initialization. As part of this
initialization, mlx5_pci_init creates a debugfs directory.
If this creation fails, init_one aborts, returning failure to
the caller (which is the probe method caller).

The main reason for such a failure to occur is if the debugfs
directory already exists. This can happen if the last time
mlx5_pci_close was called, debugfs_remove (silently) failed due
to the debugfs directory not being empty.

Guarantee that such a debugfs_remove failure will not occur by
instead calling debugfs_remove_recursive in procedure mlx5_pci_close.

Fixes: 59211bd3b6 ("net/mlx5: Split the load/unload flow into hardware and software flows")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Jack Morgenstein
76d5581c87 net/mlx5: Fix use-after-free in self-healing flow
When the mlx5 health mechanism detects a problem while the driver
is in the middle of init_one or remove_one, the driver needs to prevent
the health mechanism from scheduling future work; if future work
is scheduled, there is a problem with use-after-free: the system WQ
tries to run the work item (which has been freed) at the scheduled
future time.

Prevent this by disabling work item scheduling in the health mechanism
when the driver is in the middle of init_one() or remove_one().

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Stefan Wahren
e65a9e480e net: qca_spi: Fix race condition in spi transfers
With performance optimization the spi transfer and messages of basic
register operations like qcaspi_read_register moved into the private
driver structure. But they weren't protected against mutual access
(e.g. between driver kthread and ethtool). So dumping the QCA7000
registers via ethtool during network traffic could make spi_sync
hang forever, because the completion in spi_message is overwritten.

So revert the optimization completely.

Fixes: 291ab06ecf ("net: qualcomm: new Ethernet over SPI driver for QCA700")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 08:09:35 -07:00
Petr Oros
9d7f19dc46 be2net: Fix memory leak in be_cmd_get_profile_config()
DMA allocated memory is lost in be_cmd_get_profile_config() when we
call it with non-NULL port_res parameter.

Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 08:07:55 -07:00
Petr Machata
3a3539cd36 mlxsw: spectrum_buffers: Set up a dedicated pool for BUM traffic
MC-aware mode was recently enabled by mlxsw on Spectrum switches in
commit 7b81953066 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw
ports"). Unfortunately, testing has shown that the fix is incomplete and
in the presented form actually makes the problem even worse, because any
amount of MC traffic causes UC disruption.

The reason for this is that currently, mlxsw configures the MC-specific
TCs (8..15) to map to pool 0. It also configures a maximum buffer size
of 0, but for MC traffic that maximum is disregarded and not part of the
quota. Therefore MC traffic is always admitted to the egress buffer.

Fix the configuration by directing the MC TCs into pool 15, which is
dedicated to MC traffic and recognized as such by the silicon.

Fixes: 7b81953066 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 07:55:43 -07:00
Linus Torvalds
28619527b8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Must perform TXQ teardown before unregistering interfaces in
    mac80211, from Toke Høiland-Jørgensen.

 2) Don't allow creating mac80211_hwsim with less than one channel, from
    Johannes Berg.

 3) Division by zero in cfg80211, fix from Johannes Berg.

 4) Fix endian issue in tipc, from Haiqing Bai.

 5) BPF sockmap use-after-free fixes from Daniel Borkmann.

 6) Spectre-v1 in mac80211_hwsim, from Jinbum Park.

 7) Missing rhashtable_walk_exit() in tipc, from Cong Wang.

 8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when
    kvzalloc() tries to use kmalloc() pages. From Eric Dumazet.

 9) Fix deadlock in hv_netvsc, from Dexuan Cui.

10) Do not restart timewait timer on RST, from Florian Westphal.

11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev.

12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu.

13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai.

14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu.

15) Use-after-free in act_ife, fix from Cong Wang.

16) Missing hold to meta module in act_ife, from Vlad Buslov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
  net: phy: sfp: Handle unimplemented hwmon limits and alarms
  net: sched: action_ife: take reference to meta module
  act_ife: fix a potential use-after-free
  net/mlx5: Fix SQ offset in QPs with small RQ
  tipc: correct spelling errors for tipc_topsrv_queue_evt() comments
  tipc: correct spelling errors for struct tipc_bc_base's comment
  bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.
  bnxt_en: Clean up unused functions.
  bnxt_en: Fix firmware signaled resource change logic in open.
  sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel
  sctp: fix invalid reference to the index variable of the iterator
  net/ibm/emac: wrong emac_calc_base call was used by typo
  net: sched: null actions array pointer before releasing action
  vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition
  r8169: add support for NCube 8168 network card
  ip6_tunnel: respect ttl inherit for ip6tnl
  mac80211: shorten the IBSS debug messages
  mac80211: don't Tx a deauth frame if the AP forbade Tx
  mac80211: Fix station bandwidth setting after channel switch
  mac80211: fix a race between restart and CSA flows
  ...
2018-09-04 12:45:11 -07:00