linux

mirror of https://github.com/torvalds/linux synced 2024-09-21 03:28:37 +00:00

Author	SHA1	Message	Date
Martin Varghese	350e7ab92d	net: Removed the device type check to add mpls support for devices MPLS has no dependency with the device type of underlying devices. Hence the device type check to add mpls support for devices can be avoided. Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 11:40:47 -07:00
Ido Schimmel	73cb11933c	ipmr: Copy option to correct variable Cited commit mistakenly copied provided option to 'val' instead of to 'mfc': ``` - if (copy_from_user(&mfc, optval, sizeof(mfc))) { + if (copy_from_sockptr(&val, optval, sizeof(val))) { ``` Fix this by copying the option to 'mfc'. selftest router_multicast.sh before: $ ./router_multicast.sh smcroutectl: Unknown or malformed IPC message 'a' from client. smcroutectl: failed removing multicast route, does not exist. TEST: mcast IPv4 [FAIL] Multicast not received on first host TEST: mcast IPv6 [ OK ] smcroutectl: Unknown or malformed IPC message 'a' from client. smcroutectl: failed removing multicast route, does not exist. TEST: RPF IPv4 [FAIL] Multicast not received on first host TEST: RPF IPv6 [ OK ] selftest router_multicast.sh after: $ ./router_multicast.sh TEST: mcast IPv4 [ OK ] TEST: mcast IPv6 [ OK ] TEST: RPF IPv4 [ OK ] TEST: RPF IPv6 [ OK ] Fixes: `01ccb5b48f` ("net/ipv4: switch ip_mroute_setsockopt to sockptr_t") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 11:39:55 -07:00
Karsten Graul	72b7f6c487	net/smc: unique reason code for exceeded max dmb count When the maximum dmb buffer limit for an ism device is reached no more dmb buffers can be registered. When this happens the reason code is set to SMC_CLC_DECL_MEM indicating out-of-memory. This is the same reason code that is used when no memory could be allocated for the new dmb buffer. This is confusing for users when they see this error but there is more memory available. To solve this set a separate new reason code when the maximum dmb limit exceeded. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 10:30:01 -07:00
Andrii Nakryiko	e8407fdeb9	bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commands Now that BPF program/link management is centralized in generic net_device code, kernel code never queries program id from drivers, so XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary. This patch removes all the implementations of those commands in kernel, along the xdp_attachment_query(). This patch was compile-tested on allyesconfig. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com	2020-07-25 20:37:02 -07:00
Andrii Nakryiko	c1931c9784	bpf: Implement BPF XDP link-specific introspection APIs Implement XDP link-specific show_fdinfo and link_info to emit ifindex. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-7-andriin@fb.com	2020-07-25 20:37:02 -07:00
Andrii Nakryiko	026a4c28e1	bpf, xdp: Implement LINK_UPDATE for BPF XDP link Add support for LINK_UPDATE command for BPF XDP link to enable reliable replacement of underlying BPF program. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-6-andriin@fb.com	2020-07-25 20:37:02 -07:00
Andrii Nakryiko	aa8d3a716b	bpf, xdp: Add bpf_link-based XDP attachment API Add bpf_link-based API (bpf_xdp_link) to attach BPF XDP program through BPF_LINK_CREATE command. bpf_xdp_link is mutually exclusive with direct BPF program attachment, previous BPF program should be detached prior to attempting to create a new bpf_xdp_link attachment (for a given XDP mode). Once BPF link is attached, it can't be replaced by other BPF program attachment or link attachment. It will be detached only when the last BPF link FD is closed. bpf_xdp_link will be auto-detached when net_device is shutdown, similarly to how other BPF links behave (cgroup, flow_dissector). At that point bpf_link will become defunct, but won't be destroyed until last FD is closed. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-5-andriin@fb.com	2020-07-25 20:37:02 -07:00
Andrii Nakryiko	d4baa9368a	bpf, xdp: Extract common XDP program attachment logic Further refactor XDP attachment code. dev_change_xdp_fd() is split into two parts: getting bpf_progs from FDs and attachment logic, working with bpf_progs. This makes attachment logic a bit more straightforward and prepares code for bpf_xdp_link inclusion, which will share the common logic. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-4-andriin@fb.com	2020-07-25 20:37:02 -07:00
Andrii Nakryiko	7f0a838254	bpf, xdp: Maintain info on attached XDP BPF programs in net_device Instead of delegating to drivers, maintain information about which BPF programs are attached in which XDP modes (generic/skb, driver, or hardware) locally in net_device. This effectively obsoletes XDP_QUERY_PROG command. Such re-organization simplifies existing code already. But it also allows to further add bpf_link-based XDP attachments without drivers having to know about any of this at all, which seems like a good setup. XDP_SETUP_PROG/XDP_SETUP_PROG_HW are just low-level commands to driver to install/uninstall active BPF program. All the higher-level concerns about prog/link interaction will be contained within generic driver-agnostic logic. All the XDP_QUERY_PROG calls to driver in dev_xdp_uninstall() were removed. It's not clear for me why dev_xdp_uninstall() were passing previous prog_flags when resetting installed programs. That seems unnecessary, plus most drivers don't populate prog_flags anyways. Having XDP_SETUP_PROG vs XDP_SETUP_PROG_HW should be enough of an indicator of what is required of driver to correctly reset active BPF program. dev_xdp_uninstall() is also generalized as an iteration over all three supported mode. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-3-andriin@fb.com	2020-07-25 20:37:02 -07:00
Yonghong Song	5ce6e77c7e	bpf: Implement bpf iterator for sock local storage map The bpf iterator for bpf sock local storage map is implemented. User space interacts with sock local storage map with fd as a key and storage value. In kernel, passing fd to the bpf program does not really make sense. In this case, the sock itself is passed to bpf program. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184116.590602-1-yhs@fb.com	2020-07-25 20:16:33 -07:00
Yonghong Song	f9c7927295	bpf: Refactor to provide aux info to bpf_iter_init_seq_priv_t This patch refactored target bpf_iter_init_seq_priv_t callback function to accept additional information. This will be needed in later patches for map element targets since a particular map should be passed to traverse elements for that particular map. In the future, other information may be passed to target as well, e.g., pid, cgroup id, etc. to customize the iterator. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184110.590156-1-yhs@fb.com	2020-07-25 20:16:32 -07:00
Yonghong Song	14fc6bd6b7	bpf: Refactor bpf_iter_reg to have separate seq_info member There is no functionality change for this patch. Struct bpf_iter_reg is used to register a bpf_iter target, which includes information for both prog_load, link_create and seq_file creation. This patch puts fields related seq_file creation into a different structure. This will be useful for map elements iterator where one iterator covers different map types and different map types may have different seq_ops, init/fini private_data function and private_data size. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184109.590030-1-yhs@fb.com	2020-07-25 20:16:32 -07:00
Jakub Sitnicki	c8a2983c4d	udp: Don't discard reuseport selection when group has connections When BPF socket lookup prog selects a socket that belongs to a reuseport group, and the reuseport group has connected sockets in it, the socket selected by reuseport will be discarded, and socket returned by BPF socket lookup will be used instead. Modify this behavior so that the socket selected by reuseport running after BPF socket lookup always gets used. Ignore the fact that the reuseport group might have connections because it is only relevant when scoring sockets during regular hashtable-based lookup. Fixes: `72f7e9440e` ("udp: Run SK_LOOKUP BPF program on socket lookup") Fixes: `6d4201b138` ("udp6: Run SK_LOOKUP BPF program on socket lookup") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/bpf/20200722161720.940831-2-jakub@cloudflare.com	2020-07-25 20:16:01 -07:00
David S. Miller	a57066b1a0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net The UDP reuseport conflict was a little bit tricky. The net-next code, via bpf-next, extracted the reuseport handling into a helper so that the BPF sk lookup code could invoke it. At the same time, the logic for reuseport handling of unconnected sockets changed via commit `efc6b6f6c3` which changed the logic to carry on the reuseport result into the rest of the lookup loop if we do not return immediately. This requires moving the reuseport_has_conns() logic into the callers. While we are here, get rid of inline directives as they do not belong in foo.c files. The other changes were cases of more straightforward overlapping modifications. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-25 17:49:04 -07:00
Linus Torvalds	1b64b2e244	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net into master Pull networking fixes from David Miller: 1) Fix RCU locaking in iwlwifi, from Johannes Berg. 2) mt76 can access uninitialized NAPI struct, from Felix Fietkau. 3) Fix race in updating pause settings in bnxt_en, from Vasundhara Volam. 4) Propagate error return properly during unbind failures in ax88172a, from George Kennedy. 5) Fix memleak in adf7242_probe, from Liu Jian. 6) smc_drv_probe() can leak, from Wang Hai. 7) Don't muck with the carrier state if register_netdevice() fails in the bonding driver, from Taehee Yoo. 8) Fix memleak in dpaa_eth_probe, from Liu Jian. 9) Need to check skb_put_padto() return value in hsr_fill_tag(), from Murali Karicheri. 10) Don't lose ionic RSS hash settings across FW update, from Shannon Nelson. 11) Fix clobbered SKB control block in act_ct, from Wen Xu. 12) Missing newlink in "tx_timeout" sysfs output, from Xiongfeng Wang. 13) IS_UDPLITE cleanup a long time ago, incorrectly handled transformations involving UDPLITE_RECV_CC. From Miaohe Lin. 14) Unbalanced locking in netdevsim, from Taehee Yoo. 15) Suppress false-positive error messages in qed driver, from Alexander Lobakin. 16) Out of bounds read in ax25_connect and ax25_sendmsg, from Peilin Ye. 17) Missing SKB release in cxgb4's uld_send(), from Navid Emamdoost. 18) Uninitialized value in geneve_changelink(), from Cong Wang. 19) Fix deadlock in xen-netfront, from Andera Righi. 19) flush_backlog() frees skbs with IRQs disabled, so should use dev_kfree_skb_irq() instead of kfree_skb(). From Subash Abhinov Kasiviswanathan. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits) drivers/net/wan: lapb: Corrected the usage of skb_cow dev: Defer free of skbs in flush_backlog qrtr: orphan socket in qrtr_release() xen-netfront: fix potential deadlock in xennet_remove() flow_offload: Move rhashtable inclusion to the source file geneve: fix an uninitialized value in geneve_changelink() bonding: check return value of register_netdevice() in bond_newlink() tcp: allow at most one TLP probe per flight AX.25: Prevent integer overflows in connect and sendmsg cxgb4: add missing release on skb in uld_send() net: atlantic: fix PTP on AQC10X AX.25: Prevent out-of-bounds read in ax25_sendmsg() sctp: shrink stream outq when fails to do addstream reconf sctp: shrink stream outq only when new outcnt < old outcnt AX.25: Fix out-of-bounds read in ax25_connect() enetc: Remove the mdio bus on PF probe bailout net: ethernet: ti: add NETIF_F_HW_TC hw feature flag for taprio offload net: ethernet: ave: Fix error returns in ave_init drivers/net/wan/x25_asy: Fix to make it work ipvs: fix the connection sync failed in some cases ...	2020-07-25 11:50:59 -07:00
Subash Abhinov Kasiviswanathan	7df5cb75cf	dev: Defer free of skbs in flush_backlog IRQs are disabled when freeing skbs in input queue. Use the IRQ safe variant to free skbs here. Fixes: `145dd5f9c8` ("net: flush the softnet backlog in process context") Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 19:59:22 -07:00
Cong Wang	af9f691f0f	qrtr: orphan socket in qrtr_release() We have to detach sock from socket in qrtr_release(), otherwise skb->sk may still reference to this socket when the skb is released in tun->queue, particularly sk->sk_wq still points to &sock->wq, which leads to a UAF. Reported-and-tested-by: syzbot+6720d64f31c081c2f708@syzkaller.appspotmail.com Fixes: `28fb4e59a4` ("net: qrtr: Expose tunneling endpoint to user space") Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:29:52 -07:00
Tom Parkin	ab6934e084	l2tp: WARN_ON rather than BUG_ON in l2tp_session_free l2tp_session_free called BUG_ON if the tunnel magic feather value wasn't correct. The intent of this was to catch lifetime bugs; for example early tunnel free due to incorrect use of reference counts. Since the tunnel magic feather being wrong indicates either early free or structure corruption, we can avoid doing more damage by simply leaving the tunnel structure alone. If the tunnel refcount isn't dropped when it should be, the tunnel instance will remain in the kernel, resulting in the tunnel structure and socket leaking. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	0dd62f69d8	l2tp: remove BUG_ON refcount value in l2tp_session_free l2tp_session_free is only called by l2tp_session_dec_refcount when the reference count reaches zero, so it's of limited value to validate the reference count value in l2tp_session_free itself. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	493048f5df	l2tp: WARN_ON rather than BUG_ON in l2tp_session_queue_purge l2tp_session_queue_purge is used during session shutdown to drop any skbs queued for reordering purposes according to L2TP dataplane rules. The BUG_ON in this function checks the session magic feather in an attempt to catch lifetime bugs. Rather than crashing the kernel with a BUG_ON, we can simply WARN_ON and refuse to do anything more -- in the worst case this could result in a leak. However this is highly unlikely given that the session purge only occurs from codepaths which have obtained the session by means of a lookup via. the parent tunnel and which check the session "dead" flag to protect against shutdown races. While we're here, have l2tp_session_queue_purge return void rather than an integer, since neither of the callsites checked the return value. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	ebb4f5e6e4	l2tp: don't BUG_ON seqfile checks in l2tp_ppp checkpatch advises that WARN_ON and recovery code are preferred over BUG_ON which crashes the kernel. l2tp_ppp has a BUG_ON check of struct seq_file's private pointer in pppol2tp_seq_start prior to accessing data through that pointer. Rather than crashing, we can simply bail out early and return NULL in order to terminate the seq file processing in much the same way as we do when reaching the end of tunnel/session instances to render. Retain a WARN_ON to help trace possible bugs in this area. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	1aa646ac71	l2tp: don't BUG_ON session magic checks in l2tp_ppp checkpatch advises that WARN_ON and recovery code are preferred over BUG_ON which crashes the kernel. l2tp_ppp.c's BUG_ON checks of the l2tp session structure's "magic" field occur in code paths where it's reasonably easy to recover: * In the case of pppol2tp_sock_to_session, we can return NULL and the caller will bail out appropriately. There is no change required to any of the callsites of this function since they already handle pppol2tp_sock_to_session returning NULL. * In the case of pppol2tp_session_destruct we can just avoid decrementing the reference count on the suspect session structure. In the worst case scenario this results in a memory leak, which is preferable to a crash. Convert these uses of BUG_ON to WARN_ON accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	cd3e29b333	l2tp: remove BUG_ON in l2tp_tunnel_closeall l2tp_tunnel_closeall is only called from l2tp_core.c, and it's easy to statically analyse the code path calling it to validate that it should never be passed a NULL tunnel pointer. Having a BUG_ON checking the tunnel pointer triggers a checkpatch warning. Since the BUG_ON is of no value, remove it to avoid the warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	ce2f86ae25	l2tp: remove BUG_ON in l2tp_session_queue_purge l2tp_session_queue_purge is only called from l2tp_core.c, and it's easy to statically analyse the code paths calling it to validate that it should never be passed a NULL session pointer. Having a BUG_ON checking the session pointer triggers a checkpatch warning. Since the BUG_ON is of no value, remove it to avoid the warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	7a379558c2	l2tp: WARN_ON rather than BUG_ON in l2tp_dfs_seq_start l2tp_dfs_seq_start had a BUG_ON to catch a possible programming error in l2tp_dfs_seq_open. Since we can easily bail out of l2tp_dfs_seq_start, prefer to do that and flag the error with a WARN_ON rather than crashing the kernel. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	95075150d0	l2tp: avoid multiple assignments checkpatch warns about multiple assignments. Update l2tp accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Willem de Bruijn	01370434df	icmp6: support rfc 4884 Extend the rfc 4884 read interface introduced for ipv4 in commit `eba75c587e` ("icmp: support rfc 4884") to ipv6. Add socket option SOL_IPV6/IPV6_RECVERR_RFC4884. Changes v1->v2: - make ipv6_icmp_error_rfc4884 static (file scope) Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:12:41 -07:00
Willem de Bruijn	178c49d9f9	icmp: prepare rfc 4884 for ipv6 The RFC 4884 spec is largely the same between IPv4 and IPv6. Factor out the IPv4 specific parts in preparation for IPv6 support: - icmp types supported - icmp header size, and thus offset to original datagram start - datagram length field offset in icmp(6)hdr. - datagram length field word size: 4B for IPv4, 8B for IPv6. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:12:41 -07:00
Willem de Bruijn	c4e9e09f55	icmp: revise rfc4884 tests 1) Only accept packets with original datagram len field >= header len. The extension header must start after the original datagram headers. The embedded datagram len field is compared against the 128B minimum stipulated by RFC 4884. It is unlikely that headers extend beyond this. But as we know the exact header length, check explicitly. 2) Remove the check that datagram length must be <= 576B. This is a send constraint. There is no value in testing this on rx. Within private networks it may be known safe to send larger packets. Process these packets. This test was also too lax. It compared original datagram length rather than entire icmp packet length. The stand-alone fix would be: - if (hlen + skb->len > 576) + if (-skb_network_offset(skb) + skb->len > 576) Fixes: `eba75c587e` ("icmp: support rfc 4884") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:12:41 -07:00
Colin Ian King	623b57bec7	sctp: remove redundant initialization of variable status The variable status is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Also put the variable declarations into reverse christmas tree order. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:10:13 -07:00
Eelco Chaudron	a65878d6f0	net: openvswitch: fixes potential deadlock in dp cleanup code The previous patch introduced a deadlock, this patch fixes it by making sure the work is canceled without holding the global ovs lock. This is done by moving the reorder processing one layer up to the netns level. Fixes: `eac87c413b` ("net: openvswitch: reorder masks array based on usage") Reported-by: syzbot+2c4ff3614695f75ce26c@syzkaller.appspotmail.com Reported-by: syzbot+bad6507e5db05017b008@syzkaller.appspotmail.com Reviewed-by: Paolo <pabeni@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 16:58:38 -07:00
Christoph Hellwig	dfd3d5266d	sctp: fix slab-out-of-bounds in SCTP_DELAYED_SACK processing This sockopt accepts two kinds of parameters, using struct sctp_sack_info and struct sctp_assoc_value. The mentioned commit didn't notice an implicit cast from the smaller (latter) struct to the bigger one (former) when copying the data from the user space, which now leads to an attempt to write beyond the buffer (because it assumes the storing buffer is bigger than the parameter itself). Fix it by allocating a sctp_sack_info on stack and filling it out based on the small struct for the compat case. Changelog stole from an earlier patch from Marcelo Ricardo Leitner. Fixes: `ebb25defdc` ("sctp: pass a kernel pointer to sctp_setsockopt_delayed_ack") Reported-by: syzbot+0e4699d000d8b874d8dc@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 16:40:36 -07:00
Christoph Hellwig	6d04fe15f7	net: optimize the sockptr_t for unified kernel/user address spaces For architectures like x86 and arm64 we don't need the separate bit to indicate that a pointer is a kernel pointer as the address spaces are unified. That way the sockptr_t can be reduced to a union of two pointers, which leads to nicer calling conventions. The only caveat is that we need to check that users don't pass in kernel address and thus gain access to kernel memory. Thus the USER_SOCKPTR helper is replaced with a init_user_sockptr function that does this check and returns an error if it fails. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	a7b75c5a8c	net: pass a sockptr_t into ->setsockopt Rework the remaining setsockopt code to pass a sockptr_t instead of a plain user pointer. This removes the last remaining set_fs(KERNEL_DS) outside of architecture specific code. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154] Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	d38d2b00ba	net/tcp: switch do_tcp_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	d4c19c4914	net/tcp: switch ->md5_parse to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	91ac1ccaff	net/udp: switch udp_lib_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	894cfbc0cf	net/ipv6: switch do_ipv6_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	b84d2b73af	net/ipv6: factor out a ipv6_set_opt_hdr helper Factour out a helper to set the IPv6 option headers from do_ipv6_setsockopt. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	86298285c9	net/ipv6: switch ipv6_flowlabel_opt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Note that the get case is pretty weird in that it actually copies data back to userspace from setsockopt. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	ff6a4cf214	net/ipv6: split up ipv6_flowlabel_opt Split ipv6_flowlabel_opt into a subfunction for each action and a small wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	b43c615313	net/ipv6: switch ip6_mroute_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	89654c5fcd	net/ipv4: switch do_ip_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	de40a3e883	net/ipv4: merge ip_options_get and ip_options_get_from_user Use the sockptr_t type to merge the versions. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	01ccb5b48f	net/ipv4: switch ip_mroute_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	b03afaa82e	bpfilter: switch bpfilter_ip_set_sockopt to sockptr_t This is mostly to prepare for cleaning up the callers, as bpfilter by design can't handle kernel pointers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	c2f12630c6	netfilter: switch nf_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	ab214d1bf8	netfilter: switch xt_copy_counters to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	7e4b9dbabb	netfilter: remove the unused user argument to do_update_counters Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	c6d1b26a8f	net/xfrm: switch xfrm_user_policy to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	c8c1bbb6eb	net: switch sock_set_timeout to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	c34645ac25	net: switch sock_set_timeout to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	5790642b47	net: switch sock_setbindtodevice to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	b1ea9ff6af	net: switch copy_bpf_fprog_from_user to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	d200cf624c	bpfilter: reject kernel addresses The bpfilter user mode helper processes the optval address using process_vm_readv. Don't send it kernel addresses fed under set_fs(KERNEL_DS) as that won't work. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	c9ffebdde8	net/bpfilter: split __bpfilter_process_sockopt Split __bpfilter_process_sockopt into a low-level send request routine and the actual setsockopt hook to split the init time ping from the actual setsockopt processing. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	e024e00818	bpfilter: fix up a sparse annotation The __user doesn't make sense when casting to an integer type, just switch to a uintptr_t cast which also removes the need for the __force. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Ariel Levkovich	5923b8f7fa	net/sched: cls_flower: Add hash info to flow classification Adding new cls flower keys for hash value and hash mask and dissect the hash info from the skb into the flow key towards flow classication. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:23:31 -07:00
Ariel Levkovich	0cb09aff9d	net/flow_dissector: add packet hash dissection Retreive a hash value from the SKB and store it in the dissector key for future matching. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:23:31 -07:00
Herbert Xu	c2b69f24eb	flow_offload: Move rhashtable inclusion to the source file I noticed that touching linux/rhashtable.h causes lib/vsprintf.c to be rebuilt. This dependency came through a bogus inclusion in the file net/flow_offload.h. This patch moves it to the right place. This patch also removes a lingering rhashtable inclusion in cls_api created by the same commit. Fixes: `4e481908c5` ("flow_offload: move tc indirect block to...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:17:22 -07:00
David S. Miller	8e8135862c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for net: 1) Fix NAT hook deletion when table is dormant, from Florian Westphal. 2) Fix IPVS sync stalls, from guodeqing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 17:22:09 -07:00
Vladimir Oltean	5df5661a13	net: dsa: stop overriding master's ndo_get_phys_port_name The purpose of this override is to give the user an indication of what the number of the CPU port is (in DSA, the CPU port is a hardware implementation detail and not a network interface capable of traffic). However, it has always failed (by design) at providing this information to the user in a reliable fashion. Prior to commit `3369afba1e` ("net: Call into DSA netdevice_ops wrappers"), the behavior was to only override this callback if it was not provided by the DSA master. That was its first failure: if the DSA master itself was a DSA port or a switchdev, then the user would not see the number of the CPU port in /sys/class/net/eth0/phys_port_name, but the number of the DSA master port within its respective physical switch. But that was actually ok in a way. The commit mentioned above changed that behavior, and now overrides the master's ndo_get_phys_port_name unconditionally. That comes with problems of its own, which are worse in a way. The idea is that it's typical for switchdev users to have udev rules for consistent interface naming. These are based, among other things, on the phys_port_name attribute. If we let the DSA switch at the bottom to start randomly overriding ndo_get_phys_port_name with its own CPU port, we basically lose any predictability in interface naming, or even uniqueness, for that matter. So, there are reasons to let DSA override the master's callback (to provide a consistent interface, a number which has a clear meaning and must not be interpreted according to context), and there are reasons to not let DSA override it (it breaks udev matching for the DSA master). But, there is an alternative method for users to retrieve the number of the CPU port of each DSA switch in the system: $ devlink port pci/0000:00:00.5/0: type eth netdev swp0 flavour physical port 0 pci/0000:00:00.5/2: type eth netdev swp2 flavour physical port 2 pci/0000:00:00.5/4: type notset flavour cpu port 4 spi/spi2.0/0: type eth netdev sw0p0 flavour physical port 0 spi/spi2.0/1: type eth netdev sw0p1 flavour physical port 1 spi/spi2.0/2: type eth netdev sw0p2 flavour physical port 2 spi/spi2.0/4: type notset flavour cpu port 4 spi/spi2.1/0: type eth netdev sw1p0 flavour physical port 0 spi/spi2.1/1: type eth netdev sw1p1 flavour physical port 1 spi/spi2.1/2: type eth netdev sw1p2 flavour physical port 2 spi/spi2.1/3: type eth netdev sw1p3 flavour physical port 3 spi/spi2.1/4: type notset flavour cpu port 4 So remove this duplicated, unreliable and troublesome method. From this patch on, the phys_port_name attribute of the DSA master will only contain information about itself (if at all). If the users need reliable information about the CPU port they're probably using devlink anyway. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 15:14:58 -07:00
Yuchung Cheng	76be93fc07	tcp: allow at most one TLP probe per flight Previously TLP may send multiple probes of new data in one flight. This happens when the sender is cwnd limited. After the initial TLP containing new data is sent, the sender receives another ACK that acks partial inflight. It may re-arm another TLP timer to send more, if no further ACK returns before the next TLP timeout (PTO) expires. The sender may send in theory a large amount of TLP until send queue is depleted. This only happens if the sender sees such irregular uncommon ACK pattern. But it is generally undesirable behavior during congestion especially. The original TLP design restrict only one TLP probe per inflight as published in "Reducing Web Latency: the Virtue of Gentle Aggression", SIGCOMM 2013. This patch changes TLP to send at most one probe per inflight. Note that if the sender is app-limited, TLP retransmits old data and did not have this issue. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 12:23:32 -07:00
Dan Carpenter	17ad73e941	AX.25: Prevent integer overflows in connect and sendmsg We recently added some bounds checking in ax25_connect() and ax25_sendmsg() and we so we removed the AX25_MAX_DIGIS checks because they were no longer required. Unfortunately, I believe they are required to prevent integer overflows so I have added them back. Fixes: `8885bb0621` ("AX.25: Prevent out-of-bounds read in ax25_sendmsg()") Fixes: `2f2a7ffad5` ("AX.25: Fix out-of-bounds read in ax25_connect()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 12:09:57 -07:00
Tom Parkin	70c05bfa4a	l2tp: cleanup kzalloc calls Passing "sizeof(struct blah)" in kzalloc calls is less readable, potentially prone to future bugs if the type of the pointer is changed, and triggers checkpatch warnings. Tweak the kzalloc calls in l2tp which use this form to avoid the warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:54:40 -07:00
Tom Parkin	0787840dad	l2tp: cleanup netlink tunnel create address handling When creating an L2TP tunnel using the netlink API, userspace must either pass a socket FD for the tunnel to use (for managed tunnels), or specify the tunnel source/destination address (for unmanaged tunnels). Since source/destination addresses may be AF_INET or AF_INET6, the l2tp netlink code has conditionally compiled blocks to support IPv6. Rather than embedding these directly into l2tp_nl_cmd_tunnel_create (where it makes the code difficult to read and confuses checkpatch to boot) split the handling of address-related attributes into a separate function. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:54:40 -07:00
Tom Parkin	584ca31f46	l2tp: cleanup netlink send of tunnel address information l2tp_nl_tunnel_send has conditionally compiled code to support AF_INET6, which makes the code difficult to follow and triggers checkpatch warnings. Split the code out into functions to handle the AF_INET v.s. AF_INET6 cases, which both improves readability and resolves the checkpatch warnings. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:54:40 -07:00
Tom Parkin	26d9a27106	l2tp: check socket address type in l2tp_dfs_seq_tunnel_show checkpatch warns about indentation and brace balancing around the conditionally compiled code for AF_INET6 support in l2tp_dfs_seq_tunnel_show. By adding another check on the socket address type we can make the code more readable while removing the checkpatch warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:54:40 -07:00
Tom Parkin	6c0ec37b82	l2tp: cleanup unnecessary braces in if statements These checks are all simple and don't benefit from extra braces to clarify intent. Remove them for easier-reading code. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:54:40 -07:00
Tom Parkin	0febc7b3cd	l2tp: cleanup comparisons to NULL checkpatch warns about comparisons to NULL, e.g. CHECK: Comparison to NULL could be written "!rt" #474: FILE: net/l2tp/l2tp_ip.c:474: + if (rt == NULL) { These sort of comparisons are generally clearer and more readable the way checkpatch suggests, so update l2tp accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:54:40 -07:00
Miaohe Lin	49b0aa1b65	net/ncsi: use eth_zero_addr() to clear mac address Use eth_zero_addr() to clear mac address insetad of memset(). Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:49:41 -07:00
Paolo Abeni	4cf8b7e48a	subflow: introduce and use mptcp_can_accept_new_subflow() So that we can easily perform some basic PM-related adimission checks before creating the child socket. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:25 -07:00
Paolo Abeni	97e617518c	subflow: use rsk_ops->send_reset() tcp_send_active_reset() is more prone to transient errors (memory allocation or xmit queue full): in stress conditions the kernel may drop the egress packet, and the client will be stuck. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:25 -07:00
Paolo Abeni	b7514694ed	subflow: explicitly check for plain tcp rsk When syncookie are in use, the TCP stack may feed into subflow_syn_recv_sock() plain TCP request sockets. We can't access mptcp_subflow_request_sock-specific fields on such sockets. Explicitly check the rsk ops to do safe accesses. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:25 -07:00
Paolo Abeni	fa25e815d9	mptcp: cleanup subflow_finish_connect() The mentioned function has several unneeded branches, handle each case - MP_CAPABLE, MP_JOIN, fallback - under a single conditional and drop quite a bit of duplicate code. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:24 -07:00
Paolo Abeni	b93df08ccd	mptcp: explicitly track the fully established status Currently accepted msk sockets become established only after accept() returns the new sk to user-space. As MP_JOIN request are refused as per RFC spec on non fully established socket, the above causes mp_join self-tests instabilities. This change lets the msk entering the established status as soon as it receives the 3rd ack and propagates the first subflow fully established status on the msk socket. Finally we can change the subflow acceptance condition to take in account both the sock state and the msk fully established flag. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:24 -07:00
Paolo Abeni	0235d075a5	mptcp: mark as fallback even early ones In the unlikely event of a failure at connect time, we currently clear the request_mptcp flag - so that the MPC handshake is not started at all, but the msk is not explicitly marked as fallback. This would lead to later insertion of wrong DSS options in the xmitted packets, in violation of RFC specs and possibly fooling the peer. Fixes: `e1ff9e82e2` ("net: mptcp: improve fallback to TCP") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:24 -07:00
Paolo Abeni	53eb4c383d	mptcp: avoid data corruption on reinsert When updating a partially acked data fragment, we actually corrupt it. This is irrelevant till we send data on a single subflow, as retransmitted data, if any are discarded by the peer as duplicate, but it will cause data corruption as soon as we will start creating non backup subflows. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:24 -07:00
Paolo Abeni	b0977bb268	subflow: always init 'rel_write_seq' Currently we do not init the subflow write sequence for MP_JOIN subflows. This will cause bad mapping being generated as soon as we will use non backup subflow. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:24 -07:00
Tom Parkin	efcd8c8540	l2tp: avoid precidence issues in L2TP_SKB_CB macro checkpatch warned about the L2TP_SKB_CB macro's use of its argument: add braces to avoid the problem. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	c0235fb39b	l2tp: line-break long function prototypes In l2tp_core.c both l2tp_tunnel_create and l2tp_session_create take quite a number of arguments and have a correspondingly long prototype. This is both quite difficult to scan visually, and triggers checkpatch warnings. Add a line break to make these function prototypes more readable. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	bdf9866e4b	l2tp: prefer seq_puts for unformatted output checkpatch warns about use of seq_printf where seq_puts would do. Modify l2tp_debugfs accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	dbf82f3fac	l2tp: prefer using BIT macro Use BIT(x) rather than (1<<x), reported by checkpatch.pl. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	bef04d162c	l2tp: add identifier name in function pointer prototype Reported by checkpatch: "WARNING: function definition argument 'struct sock *' should also have an identifier name" Add an identifier name to help document the prototype. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	0864e331fd	l2tp: cleanup suspect code indent l2tp_core has conditionally compiled code in l2tp_xmit_skb for IPv6 support. The structure of this code triggered a checkpatch warning due to incorrect indentation. Fix up the indentation to address the checkpatch warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	8ce9825a59	l2tp: cleanup wonky alignment of line-broken function calls Arguments should be aligned with the function call open parenthesis as per checkpatch. Tweak some function calls which were not aligned correctly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	9f7da9a0e3	l2tp: cleanup difficult-to-read line breaks Some l2tp code had line breaks which made the code more difficult to read. These were originally motivated by the 80-character line width coding guidelines, but were actually a negative from the perspective of trying to follow the code. Remove these linebreaks for clearer code, even if we do exceed 80 characters in width in some places. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	20dcb1107a	l2tp: cleanup comments Modify some l2tp comments to better adhere to kernel coding style, as reported by checkpatch.pl. Add descriptive comments for the l2tp per-net spinlocks to document their use. Fix an incorrect comment in l2tp_recv_common: RFC2661 section 5.4 states that: "The LNS controls enabling and disabling of sequence numbers by sending a data message with or without sequence numbers present at any time during the life of a session." l2tp handles this correctly in l2tp_recv_common, but the comment around the code was incorrect and confusing. Fix up the comment accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	b71a61ccfe	l2tp: cleanup whitespace use Fix up various whitespace issues as reported by checkpatch.pl: * remove spaces around operators where appropriate, * add missing blank lines following declarations, * remove multiple blank lines, or trailing blank lines at the end of functions. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Peilin Ye	8885bb0621	AX.25: Prevent out-of-bounds read in ax25_sendmsg() Checks on `addr_len` and `usax->sax25_ndigis` are insufficient. ax25_sendmsg() can go out of bounds when `usax->sax25_ndigis` equals to 7 or 8. Fix it. It is safe to remove `usax->sax25_ndigis > AX25_MAX_DIGIS`, since `addr_len` is guaranteed to be less than or equal to `sizeof(struct full_sockaddr_ax25)` Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:06:49 -07:00
Parav Pandit	637989b5d7	devlink: Always use user_ptr[0] for devlink and simplify post_doit Currently devlink instance is searched on all doit() operations. But it is optionally stored into user_ptr[0]. This requires rediscovering devlink again doing post_doit(). Few devlink commands related to port shared buffers needs 3 pointers (devlink, devlink_port, and devlink_sb) while executing doit commands. Though devlink pointer can be derived from the devlink_port during post_doit() operation when doit() callback has acquired devlink instance lock, relying on such scheme to access devlik pointer makes code very fragile. Hence, to avoid ambiguity in post_doit() and to avoid searching devlink instance again, simplify code by always storing devlink instance in user_ptr[0] and derive devlink_sb pointer in their respective callback routines. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:06:08 -07:00
Xin Long	3ecdda3e9a	sctp: shrink stream outq when fails to do addstream reconf When adding a stream with stream reconf, the new stream firstly is in CLOSED state but new out chunks can still be enqueued. Then once gets the confirmation from the peer, the state will change to OPEN. However, if the peer denies, it needs to roll back the stream. But when doing that, it only sets the stream outcnt back, and the chunks already in the new stream don't get purged. It caused these chunks can still be dequeued in sctp_outq_dequeue_data(). As its stream is still in CLOSE, the chunk will be enqueued to the head again by sctp_outq_head_data(). This chunk will never be sent out, and the chunks after it can never be dequeued. The assoc will be 'hung' in a dead loop of sending this chunk. To fix it, this patch is to purge these chunks already in the new stream by calling sctp_stream_shrink_out() when failing to do the addstream reconf. Fixes: `11ae76e67a` ("sctp: implement receiver-side procedures for the Reconf Response Parameter") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:00:12 -07:00
Xin Long	8f13399db2	sctp: shrink stream outq only when new outcnt < old outcnt It's not necessary to go list_for_each for outq->out_chunk_list when new outcnt >= old outcnt, as no chunk with higher sid than new (outcnt - 1) exists in the outqueue. While at it, also move the list_for_each code in a new function sctp_stream_shrink_out(), which will be used in the next patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:00:12 -07:00
Paolo Abeni	6ab301c98f	mptcp: zero token hash at creation time. Otherwise the 'chain_len' filed will carry random values, some token creation calls will fail due to excessive chain length, causing unexpected fallback to TCP. Fixes: `2c5ebd001d` ("mptcp: refactor token container") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 17:57:37 -07:00
Peilin Ye	2f2a7ffad5	AX.25: Fix out-of-bounds read in ax25_connect() Checks on `addr_len` and `fsa->fsa_ax25.sax25_ndigis` are insufficient. ax25_connect() can go out of bounds when `fsa->fsa_ax25.sax25_ndigis` equals to 7 or 8. Fix it. This issue has been reported as a KMSAN uninit-value bug, because in such a case, ax25_connect() reaches into the uninitialized portion of the `struct sockaddr_storage` statically allocated in __sys_connect(). It is safe to remove `fsa->fsa_ax25.sax25_ndigis > AX25_MAX_DIGIS` because `addr_len` is guaranteed to be less than or equal to `sizeof(struct full_sockaddr_ax25)`. Reported-by: syzbot+c82752228ed975b0a623@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=55ef9d629f3b3d7d70b69558015b63b48d01af66 Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 17:56:40 -07:00
Richard Sailer	749c08f820	net: dccp: Add SIOCOUTQ IOCTL support (send buffer fill) This adds support for the SIOCOUTQ IOCTL to get the send buffer fill of a DCCP socket, like UDP and TCP sockets already have. Regarding the used data field: DCCP uses per packet sequence numbers, not per byte, so sequence numbers can't be used like in TCP. sk_wmem_queued is not used by DCCP and always 0, even in test on highly congested paths. Therefore this uses sk_wmem_alloc like in UDP. Signed-off-by: Richard Sailer <richard_siegfried@systemli.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 17:00:37 -07:00
Kurt Kanzenbach	85e05d263e	net: dsa: of: Allow ethernet-ports as encapsulating node Due to unified Ethernet Switch Device Tree Bindings allow for ethernet-ports as encapsulating node as well. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 16:56:43 -07:00
Christoph Hellwig	a6c0d0934f	net: explicitly include <linux/compat.h> in net/core/sock.c The buildbot found a config where the header isn't already implicitly pulled in, so add an explicit include as well. Fixes: `8c918ffbba` ("net: remove compat_sock_common_{get,set}sockopt") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 13:01:10 -07:00
David S. Miller	dee72f8a0c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-07-21 The following pull-request contains BPF updates for your net-next tree. We've added 46 non-merge commits during the last 6 day(s) which contain a total of 68 files changed, 4929 insertions(+), 526 deletions(-). The main changes are: 1) Run BPF program on socket lookup, from Jakub. 2) Introduce cpumap, from Lorenzo. 3) s390 JIT fixes, from Ilya. 4) teach riscv JIT to emit compressed insns, from Luke. 5) use build time computed BTF ids in bpf iter, from Yonghong. ==================== Purely independent overlapping changes in both filter.h and xdp.h Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 12:35:33 -07:00
Mark Salyzyn	37bd22420f	af_key: pfkey_dump needs parameter validation In pfkey_dump() dplen and splen can both be specified to access the xfrm_address_t structure out of bounds in__xfrm_state_filter_match() when it calls addr_match() with the indexes. Return EINVAL if either are out of range. Signed-off-by: Mark Salyzyn <salyzyn@android.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kernel-team@android.com Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-07-22 13:33:22 +02:00
Peter Zijlstra	015dc08918	Merge branch 'sched/urgent'	2020-07-22 10:22:02 +02:00
Florian Westphal	c1d069e3bf	mptcp: move helper to where its used Only used in token.c. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 16:22:18 -07:00
guodeqing	8210e344cc	ipvs: fix the connection sync failed in some cases The sync_thread_backup only checks sk_receive_queue is empty or not, there is a situation which cannot sync the connection entries when sk_receive_queue is empty and sk_rmem_alloc is larger than sk_rcvbuf, the sync packets are dropped in __udp_enqueue_schedule_skb, this is because the packets in reader_queue is not read, so the rmem is not reclaimed. Here I add the check of whether the reader_queue of the udp sock is empty or not to solve this problem. Fixes: `2276f58ac5` ("udp: use a separate rx queue for packet reception") Reported-by: zhouxudong <zhouxudong8@huawei.com> Signed-off-by: guodeqing <geffrey.guo@huawei.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-07-22 01:21:34 +02:00
Gustavo A. R. Silva	954d82979b	netfilter: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-07-22 01:18:05 +02:00
Andrew Sy Kim	35dfb01314	ipvs: queue delayed work to expire no destination connections if expire_nodest_conn=1 When expire_nodest_conn=1 and a destination is deleted, IPVS does not expire the existing connections until the next matching incoming packet. If there are many connection entries from a single client to a single destination, many packets may get dropped before all the connections are expired (more likely with lots of UDP traffic). An optimization can be made where upon deletion of a destination, IPVS queues up delayed work to immediately expire any connections with a deleted destination. This ensures any reused source ports from a client (within the IPVS timeouts) are scheduled to new real servers instead of silently dropped. Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-07-22 01:17:59 +02:00
Parav Pandit	eac5f8a95a	devlink: Constify devlink instance pointer Constify devlink instance pointer while checking if reload operation is supported or not. This helps to review the scope of checks done in reload. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 16:14:58 -07:00
Parav Pandit	9232a3e67b	devlink: Avoid duplicate check for reload enabled flag Reload operation is enabled or not is already checked by devlink_reload(). Hence, remove the duplicate check from devlink_nl_cmd_reload(). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 16:14:58 -07:00
Parav Pandit	6553e561ca	devlink: Do not hold devlink mutex when initializing devlink fields There is no need to hold a device global lock when initializing devlink device fields of a devlink instance which is not yet part of the devices list. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 16:14:58 -07:00
Miaohe Lin	b0a422772f	net: udp: Fix wrong clean up for IS_UDPLITE macro We can't use IS_UDPLITE to replace udp_sk->pcflag when UDPLITE_RECV_CC is checked. Fixes: `b2bf1e2659` ("[UDP]: Clean up for IS_UDPLITE macro") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 15:41:49 -07:00
Xiongfeng Wang	9bb5fbea59	net-sysfs: add a newline when printing 'tx_timeout' by sysfs When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to add a newline for easy reading. root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout 0root@syzkaller:~# Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 15:35:58 -07:00
Kuniyuki Iwashima	efc6b6f6c3	udp: Improve load balancing for SO_REUSEPORT. Currently, SO_REUSEPORT does not work well if connected sockets are in a UDP reuseport group. Then reuseport_has_conns() returns true and the result of reuseport_select_sock() is discarded. Also, unconnected sockets have the same score, hence only does the first unconnected socket in udp_hslot always receive all packets sent to unconnected sockets. So, the result of reuseport_select_sock() should be used for load balancing. The noteworthy point is that the unconnected sockets placed after connected sockets in sock_reuseport.socks will receive more packets than others because of the algorithm in reuseport_select_sock(). index \| connected \| reciprocal_scale \| result --------------------------------------------- 0 \| no \| 20% \| 40% 1 \| no \| 20% \| 20% 2 \| yes \| 20% \| 0% 3 \| no \| 20% \| 40% 4 \| yes \| 20% \| 0% If most of the sockets are connected, this can be a problem, but it still works better than now. Fixes: `acdcecc612` ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn <willemb@google.com> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 15:31:02 -07:00
Kuniyuki Iwashima	f2b2c55e51	udp: Copy has_conns in reuseport_grow(). If an unconnected socket in a UDP reuseport group connect()s, has_conns is set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all sockets in udp_hslot looking for the connected socket with the highest score. However, when the number of sockets bound to the port exceeds max_socks, reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2() to return without scanning all sockets, resulting in that packets sent to connected sockets may be distributed to unconnected sockets. Therefore, reuseport_grow() should copy has_conns. Fixes: `acdcecc612` ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn <willemb@google.com> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 15:31:02 -07:00
Yonghong Song	951cf368bc	bpf: net: Use precomputed btf_id for bpf iterators One additional field btf_id is added to struct bpf_ctx_arg_aux to store the precomputed btf_ids. The btf_id is computed at build time with BTF_ID_LIST or BTF_ID_LIST_GLOBAL macro definitions. All existing bpf iterators are changed to used pre-compute btf_ids. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200720163403.1393551-1-yhs@fb.com	2020-07-21 13:26:26 -07:00
Yonghong Song	fce557bcef	bpf: Make btf_sock_ids global tcp and udp bpf_iter can reuse some socket ids in btf_sock_ids, so make it global. I put the extern definition in btf_ids.h as a central place so it can be easily discovered by developers. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200720163402.1393427-1-yhs@fb.com	2020-07-21 13:26:26 -07:00
Yonghong Song	bc4f0548f6	bpf: Compute bpf_skc_to_() helper socket btf ids at build time Currently, socket types (struct tcp_sock, udp_sock, etc.) used by bpf_skc_to_() helpers are computed when vmlinux_btf is first built in the kernel. Commit `5a2798ab32` ("bpf: Add BTF_ID_LIST/BTF_ID/BTF_ID_UNUSED macros") implemented a mechanism to compute btf_ids at kernel build time which can simplify kernel implementation and reduce runtime overhead by removing in-kernel btf_id calculation. This patch did exactly this, removing in-kernel btf_id computation and utilizing build-time btf_id computation. If CONFIG_DEBUG_INFO_BTF is not defined, BTF_ID_LIST will define an array with size of 5, which is not enough for btf_sock_ids. So define its own static array if CONFIG_DEBUG_INFO_BTF is not defined. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200720163358.1393023-1-yhs@fb.com	2020-07-21 13:26:26 -07:00
Steffen Klassert	b328ecc468	xfrm: Make the policy hold queue work with VTI. We forgot to support the xfrm policy hold queue when VTI was implemented. This patch adds everything we need so that we can use the policy hold queue together with VTI interfaces. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-07-21 08:34:44 +02:00
Tung Nguyen	6ef9dcb780	tipc: allow to build NACK message in link timeout function Commit `02288248b0` ("tipc: eliminate gap indicator from ACK messages") eliminated sending of the 'gap' indicator in regular ACK messages and only allowed to build NACK message with enabled probe/probe_reply. However, necessary correction for building NACK message was missed in tipc_link_timeout() function. This leads to significant delay and link reset (due to retransmission failure) in lossy environment. This commit fixes it by setting the 'probe' flag to 'true' when the receive deferred queue is not empty. As a result, NACK message will be built to send back to another peer. Fixes: `02288248b0` ("tipc: eliminate gap indicator from ACK messages") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 20:11:22 -07:00
wenxu	ae372cb175	net/sched: act_ct: fix restore the qdisc_skb_cb after defrag The fragment packets do defrag in tcf_ct_handle_fragments will clear the skb->cb which make the qdisc_skb_cb clear too. So the qdsic_skb_cb should be store before defrag and restore after that. It also update the pkt_len after all the fragments finish the defrag to one packet and make the following actions counter correct. Fixes: `b57dc7c13e` ("net/sched: Introduce action ct") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 18:36:37 -07:00
Vladimir Oltean	71d4364abd	net: dsa: use the ETH_MIN_MTU and ETH_DATA_LEN default values Now that DSA supports MTU configuration, undo the effects of commit `8b1efc0f83` ("net: remove MTU limits on a few ether_setup callers") and let DSA interfaces use the default min_mtu and max_mtu specified by ether_setup(). This is more important for min_mtu: since DSA is Ethernet, the minimum MTU is the same as of any other Ethernet interface, and definitely not zero. For the max_mtu, we have a callback through which drivers can override that, if they want to. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 18:35:04 -07:00
Wang Hai	2b96692bcf	net: hsr: remove redundant null check Because kfree_skb already checked NULL skb parameter, so the additional checks are unnecessary, just remove them. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 18:33:32 -07:00
Murali Karicheri	5d93518e06	net: hsr: check for return value of skb_put_padto() skb_put_padto() can fail. So check for return type and return NULL for skb. Caller checks for skb and acts correctly if it is NULL. Fixes: `6d6148bc78` ("net: hsr: fix incorrect lsdu size in the tag of HSR frames for small frames") Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 18:02:28 -07:00
Karsten Graul	a9e4450295	net/smc: fix dmb buffer shortage There is a current limit of 1920 registered dmb buffers per ISM device for smc-d. One link group can contain 255 connections, each connection is using one dmb buffer. When the connection is closed then the registered buffer is held in a queue and is reused by the next connection. When a link group is 'full' then another link group is created and uses an own buffer pool. The link groups are added to a list using list_add() which puts a new link group to the first position in the list. In the situation that many connections are opened (>1920) and a few of them stay open while others are closed quickly we end up with at least 8 link groups. For a new connection a matching link group is looked up, iterating over the list of link groups. The trailing 7 link groups all have registered dmb buffers which could be reused, while the first link group has only a few dmb buffers and then hit the 1920 limit. Because the first link group is not full (255 connection limit not reached) it is chosen and finally the connection falls back to TCP because there is no dmb buffer available in this link group. There are multiple ways to fix that: using list_add_tail() allows to scan older link groups first for free buffers which ensures that buffers are reused first. This fixes the problem for smc-r link groups as well. For smc-d there is an even better way to address this problem because smc-d does not have the 255 connections per link group limit. So fix the problem for smc-d by allowing large link groups. Fixes: `c6ba7c9ba4` ("net/smc: add base infrastructure for SMC-D and ISM") Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 17:52:25 -07:00
Karsten Graul	2bced6aefa	net/smc: put slot when connection is killed To get a send slot smc_wr_tx_get_free_slot() is called, which might wait for a free slot. When smc_wr_tx_get_free_slot() returns there is a check if the connection was killed in the meantime. In that case don't only return an error, but also put back the free slot. Fixes: `b290098092` ("net/smc: cancel send and receive for terminated socket") Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 17:52:25 -07:00
David Howells	639f181f0e	rxrpc: Fix sendmsg() returning EPIPE due to recvmsg() returning ENODATA rxrpc_sendmsg() returns EPIPE if there's an outstanding error, such as if rxrpc_recvmsg() indicating ENODATA if there's nothing for it to read. Change rxrpc_recvmsg() to return EAGAIN instead if there's nothing to read as this particular error doesn't get stored in ->sk_err by the networking core. Also change rxrpc_sendmsg() so that it doesn't fail with delayed receive errors (there's no way for it to report which call, if any, the error was caused by). Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 17:47:10 -07:00
Jiri Pirko	a8b7b2d0b3	sched: sch_api: add missing rcu read lock to silence the warning In case the qdisc_match_from_root function() is called from non-rcu path with rtnl mutex held, a suspiciout rcu usage warning appears: [ 241.504354] ============================= [ 241.504358] WARNING: suspicious RCU usage [ 241.504366] 5.8.0-rc4-custom-01521-g72a7c7d549c3 #32 Not tainted [ 241.504370] ----------------------------- [ 241.504378] net/sched/sch_api.c:270 RCU-list traversed in non-reader section!! [ 241.504382] other info that might help us debug this: [ 241.504388] rcu_scheduler_active = 2, debug_locks = 1 [ 241.504394] 1 lock held by tc/1391: [ 241.504398] #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0 [ 241.504431] stack backtrace: [ 241.504440] CPU: 0 PID: 1391 Comm: tc Not tainted 5.8.0-rc4-custom-01521-g72a7c7d549c3 #32 [ 241.504446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [ 241.504453] Call Trace: [ 241.504465] dump_stack+0x100/0x184 [ 241.504482] lockdep_rcu_suspicious+0x153/0x15d [ 241.504499] qdisc_match_from_root+0x293/0x350 Fix this by passing the rtnl held lockdep condition down to hlist_for_each_entry_rcu() Reported-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 17:00:02 -07:00
Florian Fainelli	9c0c7014f3	net: dsa: Setup dsa_netdev_ops Now that we have all the infrastructure in place for calling into the dsa_ptr->netdev_ops function pointers, install them when we configure the DSA CPU/management interface and tear them down. The flow is unchanged from before, but now we preserve equality of tests when network device drivers do tests like dev->netdev_ops == &foo_ops which was not the case before since we were allocating an entirely new structure. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 16:48:22 -07:00
Florian Fainelli	3369afba1e	net: Call into DSA netdevice_ops wrappers Make the core net_device code call into our ndo_do_ioctl() and ndo_get_phys_port_name() functions via the wrappers defined previously Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 16:48:22 -07:00
Florian Fainelli	aad74d849d	net: Wrap ndo_do_ioctl() to prepare for DSA stacked ops In preparation for adding another layer of call into a DSA stacked ops singleton, wrap the ndo_do_ioctl() call into dev_do_ioctl(). Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 16:48:22 -07:00
Willem de Bruijn	eba75c587e	icmp: support rfc 4884 Add setsockopt SOL_IP/IP_RECVERR_4884 to return the offset to an extension struct if present. ICMP messages may include an extension structure after the original datagram. RFC 4884 standardized this behavior. It stores the offset in words to the extension header in u8 icmphdr.un.reserved[1]. The field is valid only for ICMP types destination unreachable, time exceeded and parameter problem, if length is at least 128 bytes and entire packet does not exceed 576 bytes. Return the offset to the start of the extension struct when reading an ICMP error from the error queue, if it matches the above constraints. Do not return the raw u8 field. Return the offset from the start of the user buffer, in bytes. The kernel does not return the network and transport headers, so subtract those. Also validate the headers. Return the offset regardless of validation, as an invalid extension must still not be misinterpreted as part of the original datagram. Note that !invalid does not imply valid. If the extension version does not match, no validation can take place, for instance. For backward compatibility, make this optional, set by setsockopt SOL_IP/IP_RECVERR_RFC4884. For API example and feature test, see github.com/wdebruij/kerneltools/blob/master/tests/recv_icmp_v2.c For forward compatibility, reserve only setsockopt value 1, leaving other bits for additional icmp extensions. Changes v1->v2: - convert word offset to byte offset from start of user buffer - return in ee_data as u8 may be insufficient - define extension struct and object header structs - return len only if constraints met - if returning len, also validate Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 19:20:22 -07:00
Christoph Hellwig	6c8983a606	sctp: remove the out_nounlock label in sctp_setsockopt This is just used once, and a direct return for the redirect to the AF case is much easier to follow than jumping to the end of a very long function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:44 -07:00
Christoph Hellwig	26feba8090	sctp: pass a kernel pointer to sctp_setsockopt_pf_expose Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:44 -07:00
Christoph Hellwig	92c4f17255	sctp: pass a kernel pointer to sctp_setsockopt_ecn_supported Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:44 -07:00
Christoph Hellwig	963855a938	sctp: pass a kernel pointer to sctp_setsockopt_auth_supported Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:44 -07:00
Christoph Hellwig	9263ac97af	sctp: pass a kernel pointer to sctp_setsockopt_event Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:44 -07:00
Christoph Hellwig	565059cb9b	sctp: pass a kernel pointer to sctp_setsockopt_event Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:44 -07:00
Christoph Hellwig	a42624669e	sctp: pass a kernel pointer to sctp_setsockopt_reuse_port Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:44 -07:00
Christoph Hellwig	5b8d3b2446	sctp: pass a kernel pointer to sctp_setsockopt_interleaving_supported Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:44 -07:00
Christoph Hellwig	d636e7f31f	sctp: pass a kernel pointer to sctp_setsockopt_scheduler_value Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	4d2fba3a7e	sctp: pass a kernel pointer to sctp_setsockopt_scheduler Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	4d6fb26062	sctp: pass a kernel pointer to sctp_setsockopt_add_streams Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	b97d20ce53	sctp: pass a kernel pointer to sctp_setsockopt_reset_assoc Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	d492243435	sctp: pass a kernel pointer to sctp_setsockopt_reset_streams Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	356dc6f16a	sctp: pass a kernel pointer to sctp_setsockopt_enable_strreset Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	3f49f72035	sctp: pass a kernel pointer to sctp_setsockopt_reconfig_supported Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	ac37435bfe	sctp: pass a kernel pointer to sctp_setsockopt_default_prinfo Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	4a97fa4f09	sctp: pass a kernel pointer to sctp_setsockopt_pr_supported Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	cfa6fde266	sctp: pass a kernel pointer to sctp_setsockopt_recvnxtinfo Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	a98af7c84a	sctp: pass a kernel pointer to sctp_setsockopt_recvrcvinfo Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	b0ac3bb894	sctp: pass a kernel pointer to sctp_setsockopt_paddr_thresholds Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	c9abc2c1c2	sctp: pass a kernel pointer to sctp_setsockopt_auto_asconf Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	76b3d0c445	sctp: pass a kernel pointer to sctp_setsockopt_deactivate_key Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	97dc9f2e3e	sctp: pass a kernel pointer to sctp_setsockopt_del_key Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	dcab0a7a57	sctp: pass a kernel pointer to sctp_setsockopt_active_key Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:43 -07:00
Christoph Hellwig	534d13d07e	sctp: pass a kernel pointer to sctp_setsockopt_auth_key Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Adapt sctp_setsockopt to use a kzfree for this case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	89fae01eef	sctp: switch sctp_setsockopt_auth_key to use memzero_explicit Switch from kzfree to sctp_setsockopt_auth_key + kfree to prepare for moving the kfree to common code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	3564ef442a	sctp: pass a kernel pointer to sctp_setsockopt_hmac_ident Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	88266d31b8	sctp: pass a kernel pointer to sctp_setsockopt_auth_chunk Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	f5bee0adb1	sctp: pass a kernel pointer to sctp_setsockopt_maxburst Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	1031cea001	sctp: pass a kernel pointer to sctp_setsockopt_fragment_interleave Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	722eca9eca	sctp: pass a kernel pointer to sctp_setsockopt_context Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	07e5035c6f	sctp: pass a kernel pointer to sctp_setsockopt_adaptation_layer Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	dcd0357580	sctp: pass a kernel pointer to sctp_setsockopt_maxseg Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	ffc08f086a	sctp: pass a kernel pointer to sctp_setsockopt_mappedv4 Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	5b864c8dab	sctp: pass a kernel pointer to sctp_setsockopt_associnfo Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	af5ae60e42	sctp: pass a kernel pointer to sctp_setsockopt_rtoinfo Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	f87ddbc0c0	sctp: pass a kernel pointer to sctp_setsockopt_nodelay Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	46a0ae9de3	sctp: pass a kernel pointer to sctp_setsockopt_peer_primary_addr Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	1eec695804	sctp: pass a kernel pointer to sctp_setsockopt_primary_addr Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	8a2409d356	sctp: pass a kernel pointer to sctp_setsockopt_default_sndinfo Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	c23ad6d2b7	sctp: pass a kernel pointer to sctp_setsockopt_default_send_param Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:42 -07:00
Christoph Hellwig	9dfa6f0494	sctp: pass a kernel pointer to sctp_setsockopt_initmsg Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:41 -07:00
Christoph Hellwig	bb13d647d9	sctp: pass a kernel pointer to sctp_setsockopt_partial_delivery_point Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:41 -07:00
Christoph Hellwig	ebb25defdc	sctp: pass a kernel pointer to sctp_setsockopt_delayed_ack Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:41 -07:00
Christoph Hellwig	9b7b0d1a39	sctp: pass a kernel pointer to sctp_setsockopt_peer_addr_params Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:41 -07:00
Christoph Hellwig	0b49a65c77	sctp: pass a kernel pointer to sctp_setsockopt_autoclose Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:41 -07:00
Christoph Hellwig	a98d21a173	sctp: pass a kernel pointer to sctp_setsockopt_events Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:41 -07:00
Christoph Hellwig	1083582558	sctp: pass a kernel pointer to sctp_setsockopt_disable_fragments Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:41 -07:00
Christoph Hellwig	ce5b2f8929	sctp: pass a kernel pointer to __sctp_setsockopt_connectx Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:41 -07:00
Christoph Hellwig	8c7517f54c	sctp: pass a kernel pointer to sctp_setsockopt_bindx Rename sctp_setsockopt_bindx_kernel back to sctp_setsockopt_bindx, and use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer in the old sctp_setsockopt_bindx. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:41 -07:00
Christoph Hellwig	ca84bd058d	sctp: copy the optval from user space in sctp_setsockopt Prepare for for moving the copy_from_user from the individual sockopts to the main setsockopt helper. As of this commit the kopt variable is not used yet, but the following commits will start using it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:26:41 -07:00
Christoph Hellwig	a44d9e7210	net: make ->{get,set}sockopt in proto_ops optional Just check for a NULL method instead of wiring up sock_no_{get,set}sockopt. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:41 -07:00
Christoph Hellwig	3021ad5299	net/ipv6: remove compat_ipv6_{get,set}sockopt Handle the few cases that need special treatment in-line using in_compat_syscall(). This also removes all the now unused compat_{get,set}sockopt methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:41 -07:00
Christoph Hellwig	fdf5bdd87c	net/ipv6: factor out mcast join/leave setsockopt helpers Factor out one helper each for setting the native and compat version of the MCAST_MSFILTER option. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:41 -07:00
Christoph Hellwig	ca0e65eb29	net/ipv6: factor out MCAST_MSFILTER setsockopt helpers Factor out one helper each for setting the native and compat version of the MCAST_MSFILTER option. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:41 -07:00
Christoph Hellwig	d5541e85cd	net/ipv6: factor out MCAST_MSFILTER getsockopt helpers Factor out one helper each for getting the native and compat version of the MCAST_MSFILTER option. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:41 -07:00
Christoph Hellwig	b6238c04c0	net/ipv4: remove compat_ip_{get,set}sockopt Handle the few cases that need special treatment in-line using in_compat_syscall(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:41 -07:00
Christoph Hellwig	02caad7cc0	net/ipv4: factor out mcast join/leave setsockopt helpers Factor out one helper each for setting the native and compat version of the MCAST_MSFILTER option. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:41 -07:00
Christoph Hellwig	d62c38f6a1	net/ipv4: factor out MCAST_MSFILTER setsockopt helpers Factor out one helper each for setting the native and compat version of the MCAST_MSFILTER option. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:41 -07:00
Christoph Hellwig	49e74c24f3	net/ipv4: factor out MCAST_MSFILTER getsockopt helpers Factor out one helper each for getting the native and compat version of the MCAST_MSFILTER option. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:41 -07:00
Christoph Hellwig	657e4c34a2	netfilter: split nf_sockopt Split nf_sockopt into a getsockopt and setsockopt side as they share very little code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Christoph Hellwig	c34bc10d25	netfilter: remove the compat argument to xt_copy_counters_from_user Lift the in_compat_syscall() from the callers instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Christoph Hellwig	77d4df41d5	netfilter: remove the compat_{get,set} methods All instances handle compat sockopts via in_compat_syscall() now, so remove the compat_{get,set} methods as well as the compat_nf_{get,set}sockopt wrappers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Christoph Hellwig	fc66de8e16	netfilter/ebtables: clean up compat {get, set}sockopt handling Merge the native and compat {get,set}sockopt handlers using in_compat_syscall(). Note that this required moving a fair amout of code around to be done sanely. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Christoph Hellwig	f415e76fd7	netfilter/ip6_tables: clean up compat {get, set}sockopt handling Merge the native and compat {get,set}sockopt handlers using in_compat_syscall(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Christoph Hellwig	89c53c14e4	netfilter/ip_tables: clean up compat {get,set}sockopt handling Merge the native and compat {get,set}sockopt handlers using in_compat_syscall(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Christoph Hellwig	983094b4fc	netfilter/arp_tables: clean up compat {get, set}sockopt handling Merge the native and compat {get,set}sockopt handlers using in_compat_syscall(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Christoph Hellwig	55db9c0e85	net: remove compat_sys_{get,set}sockopt Now that the ->compat_{get,set}sockopt proto_ops methods are gone there is no good reason left to keep the compat syscalls separate. This fixes the odd use of unsigned int for the compat_setsockopt optlen and the missing sock_use_custom_sol_socket. It would also easily allow running the eBPF hooks for the compat syscalls, but such a large change in behavior does not belong into a consolidation patch like this one. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Christoph Hellwig	8c918ffbba	net: remove compat_sock_common_{get,set}sockopt Add the compat handling to sock_common_{get,set}sockopt instead, keyed of in_compat_syscall(). This allow to remove the now unused ->compat_{get,set}sockopt methods from struct proto_ops. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Christoph Hellwig	4d295e5461	net: simplify cBPF setsockopt compat handling Add a helper that copies either a native or compat bpf_fprog from userspace after verifying the length, and remove the compat setsockopt handlers that now aren't required. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Christoph Hellwig	d8a9b38f83	net: streamline __sys_getsockopt Return early when sockfd_lookup_light fails to reduce a level of indentation for most of the function body. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Christoph Hellwig	4a3672993f	net: streamline __sys_setsockopt Return early when sockfd_lookup_light fails to reduce a level of indentation for most of the function body. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Christoph Hellwig	a06d30ae7a	net/atm: remove the atmdev_ops {get, set}sockopt methods All implementations of these two methods are dummies that always return -EINVAL. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Randy Dunlap	089377b7e8	net: rds: rdma_transport.h: delete duplicated word Delete the doubled word "be" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: netdev@vger.kernel.org Cc: linux-rdma@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:14:51 -07:00
Randy Dunlap	dfd5ec1ba6	net: atm: lec_arpc.h: delete duplicated word Delete the doubled word "the" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:14:21 -07:00
Karsten Graul	1ad2405833	net/smc: fix restoring of fallback changes When a listen socket is closed then all non-accepted sockets in its accept queue are to be released. Inside __smc_release() the helper smc_restore_fallback_changes() restores the changes done to the socket without to check if the clcsocket has a file set. This can result in a crash. Fix this by checking the file pointer first. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Fixes: `f536dffc0b` ("net/smc: fix closing of fallback SMC sockets") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 15:30:23 -07:00
Karsten Graul	fd7f3a7465	net/smc: remove freed buffer from list Two buffers are allocated for each SMC connection. Each buffer is added to a buffer list after creation. When the second buffer allocation fails, the first buffer is freed but not deleted from the list. This might result in crashes when another connection picks up the freed buffer later and starts to work with it. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Fixes: `6511aad3f0` ("net/smc: change smc_buf_free function parameters") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 15:30:23 -07:00
Karsten Graul	741a49a4dc	net/smc: do not call dma sync for unmapped memory The dma related ...sync_sg... functions check the link state before the dma function is actually called. But the check in smc_link_usable() allows links in ACTIVATING state which are not yet mapped to dma memory. Under high load it may happen that the sync_sg functions are called for such a link which results in an debug output like DMA-API: mlx5_core 0002:00:00.0: device driver tries to sync DMA memory it has not allocated [device address=0x0000000103370000] [size=65536 bytes] To fix that introduce a helper to check for the link state ACTIVE and use it where appropriate. And move the link state update to ACTIVATING to the end of smcr_link_init() when most initial setup is done. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Fixes: `d854fcbfae` ("net/smc: add new link state and related helpers") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 15:30:22 -07:00
Karsten Graul	b9979c2e83	net/smc: fix handling of delete link requests As smc client the delete link requests are assigned to the flow when _any_ flow is active. This may break other flows that do not expect delete link requests during their handling. Fix that by assigning the request only when an add link flow is active. With that fix the code for smc client and smc server is the same, so remove the separate handling. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Fixes: `9ec6bf19ec` ("net/smc: llc_del_link_work and use the LLC flow for delete link") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 15:30:22 -07:00
Karsten Graul	c48254fa48	net/smc: move add link processing for new device into llc layer When a new ib device is up smc will send an add link invitation to the peer if needed. This is currently done with rudimentary flow control. Under high workload these add link invitations can disturb other llc flows because they arrive unexpected. Fix this by integrating the invitations into the normal llc event flow and handle them as a flow. While at it, check for already assigned requests in the flow before the new add link request is assigned. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Fixes: `1f90a05d9f` ("net/smc: add smcr_port_add() and smcr_link_up() processing") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 15:30:22 -07:00
Karsten Graul	2ff0867851	net/smc: drop out-of-flow llc response messages To be save from unexpected or late llc response messages check if the arrived message fits to the current flow type and drop out-of-flow messages. And drop it when there is already a response assigned to the flow. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Fixes: `ef79d439cd` ("net/smc: process llc responses in tasklet context") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 15:30:22 -07:00
Karsten Graul	63673597cc	net/smc: protect smc ib device initialization Before an smc ib device is used the first time for an smc link it is lazily initialized. When there are 2 active link groups and a new ib device is brought online then it might happen that 2 link creations run in parallel and enter smc_ib_setup_per_ibdev(). Both allocate new send and receive completion queues on the device, but only one set of them keeps assigned and the other leaks. Fix that by protecting the setup and cleanup code using a mutex. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Fixes: `f3c1deddb2` ("net/smc: separate function for link initialization") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 15:30:22 -07:00
Karsten Graul	7df8bcb560	net/smc: fix link lookup for new rdma connections For new rdma connections the SMC server assigns the link and sends the link data in the clc accept message. To match the correct link use not only the qp_num but also the gid and the mac of the links. If there are equal qp_nums for different links the wrong link would be chosen. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Fixes: `0fb0b02bd6` ("net/smc: adapt SMC client code to use the LLC flow") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 15:30:22 -07:00
Karsten Graul	68fd894203	net/smc: clear link during SMC client link down processing In a link-down condition we notify the SMC server and expect that the server will finally trigger the link clear processing on the client side. This could fail when anything along this notification path goes wrong. Clear the link as part of SMC client link-down processing to prevent dangling links. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Fixes: `541afa10c1` ("net/smc: add smcr_port_err() and smcr_link_down() processing") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 15:30:22 -07:00
Karsten Graul	a35fffbf98	net/smc: handle unexpected response types for confirm link A delete link could arrive during confirm link processing. Handle this situation directly in smc_llc_srv_conf_link() rather than using the logic in smc_llc_wait() to avoid the unexpected message handling there. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Fixes: `1551c95b61` ("net/smc: final part of add link processing as SMC server") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 15:30:22 -07:00
Wang Hai	74d6a5d566	9p/trans_fd: Fix concurrency del of req_list in p9_fd_cancelled/p9_read_work p9_read_work and p9_fd_cancelled may be called concurrently. In some cases, req->req_list may be deleted by both p9_read_work and p9_fd_cancelled. We can fix it by ignoring replies associated with a cancelled request and ignoring cancelled request if message has been received before lock. Link: http://lkml.kernel.org/r/20200612090833.36149-1-wanghai38@huawei.com Fixes: `60ff779c4a` ("9p: client: remove unused code and any reference to "cancelled" function") Cc: <stable@vger.kernel.org> # v3.12+ Reported-by: syzbot+77a25acfa0382e06ab23@syzkaller.appspotmail.com Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2020-07-19 14:58:47 +02:00
Christoph Hellwig	a39c46067c	net/9p: validate fds in p9_fd_open p9_fd_open just fgets file descriptors passed in from userspace, but doesn't verify that they are valid for read or writing. This gets cought down in the VFS when actually attempting a read or write, but a new warning added in linux-next upsets syzcaller. Fix this by just verifying the fds early on. Link: http://lkml.kernel.org/r/20200710085722.435850-1-hch@lst.de Reported-by: syzbot+e6f77e16ff68b2434a2c@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> [Dominique: amend goto as per Doug Nazar's review] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2020-07-19 14:58:29 +02:00
Jakub Sitnicki	6d4201b138	udp6: Run SK_LOOKUP BPF program on socket lookup Same as for udp4, let BPF program override the socket lookup result, by selecting a receiving socket of its choice or failing the lookup, if no connected UDP socket matched packet 4-tuple. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-11-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	2a08748cd3	udp6: Extract helper for selecting socket from reuseport group Prepare for calling into reuseport from __udp6_lib_lookup as well. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-10-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	72f7e9440e	udp: Run SK_LOOKUP BPF program on socket lookup Following INET/TCP socket lookup changes, modify UDP socket lookup to let BPF program select a receiving socket before searching for a socket by destination address and port as usual. Lookup of connected sockets that match packet 4-tuple is unaffected by this change. BPF program runs, and potentially overrides the lookup result, only if a 4-tuple match was not found. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-9-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	7629c73a14	udp: Extract helper for selecting socket from reuseport group Prepare for calling into reuseport from __udp4_lib_lookup as well. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-8-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	1122702f02	inet6: Run SK_LOOKUP BPF program on socket lookup Following ipv4 stack changes, run a BPF program attached to netns before looking up a listening socket. Program can return a listening socket to use as result of socket lookup, fail the lookup, or take no action. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-7-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	5df6531292	inet6: Extract helper for selecting socket from reuseport group Prepare for calling into reuseport from inet6_lookup_listener as well. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-6-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	1559b4aa1d	inet: Run SK_LOOKUP BPF program on socket lookup Run a BPF program before looking up a listening socket on the receive path. Program selects a listening socket to yield as result of socket lookup by calling bpf_sk_assign() helper and returning SK_PASS code. Program can revert its decision by assigning a NULL socket with bpf_sk_assign(). Alternatively, BPF program can also fail the lookup by returning with SK_DROP, or let the lookup continue as usual with SK_PASS on return, when no socket has been selected with bpf_sk_assign(). This lets the user match packets with listening sockets freely at the last possible point on the receive path, where we know that packets are destined for local delivery after undergoing policing, filtering, and routing. With BPF code selecting the socket, directing packets destined to an IP range or to a port range to a single socket becomes possible. In case multiple programs are attached, they are run in series in the order in which they were attached. The end result is determined from return codes of all the programs according to following rules: 1. If any program returned SK_PASS and selected a valid socket, the socket is used as result of socket lookup. 2. If more than one program returned SK_PASS and selected a socket, last selection takes effect. 3. If any program returned SK_DROP, and no program returned SK_PASS and selected a socket, socket lookup fails with -ECONNREFUSED. 4. If all programs returned SK_PASS and none of them selected a socket, socket lookup continues to htable-based lookup. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-5-jakub@cloudflare.com	2020-07-17 20:18:16 -07:00
Jakub Sitnicki	80b373f74f	inet: Extract helper for selecting socket from reuseport group Prepare for calling into reuseport from __inet_lookup_listener as well. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-4-jakub@cloudflare.com	2020-07-17 20:18:16 -07:00
Jakub Sitnicki	e9ddbb7707	bpf: Introduce SK_LOOKUP program type with a dedicated attach point Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer when looking up a listening socket for a new connection request for connection oriented protocols, or when looking up an unconnected socket for a packet for connection-less protocols. When called, SK_LOOKUP BPF program can select a socket that will receive the packet. This serves as a mechanism to overcome the limits of what bind() API allows to express. Two use-cases driving this work are: (1) steer packets destined to an IP range, on fixed port to a socket 192.0.2.0/24, port 80 -> NGINX socket (2) steer packets destined to an IP address, on any port to a socket 198.51.100.1, any port -> L7 proxy socket In its run-time context program receives information about the packet that triggered the socket lookup. Namely IP version, L4 protocol identifier, and address 4-tuple. Context can be further extended to include ingress interface identifier. To select a socket BPF program fetches it from a map holding socket references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...) helper to record the selection. Transport layer then uses the selected socket as a result of socket lookup. In its basic form, SK_LOOKUP acts as a filter and hence must return either SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should look for a socket to receive the packet, or use the one selected by the program if available, while SK_DROP informs the transport layer that the lookup should fail. This patch only enables the user to attach an SK_LOOKUP program to a network namespace. Subsequent patches hook it up to run on local delivery path in ipv4 and ipv6 stacks. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com	2020-07-17 20:18:16 -07:00
Murali Karicheri	eea9f73e1f	net: hsr: validate address B before copying to skb Validate MAC address before copying the same to outgoing frame skb destination address. Since a node can have zero mac address for Link B until a valid frame is received over that link, this fix address the issue of a zero MAC address being in the packet. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-17 18:54:37 -07:00
Murali Karicheri	6d6148bc78	net: hsr: fix incorrect lsdu size in the tag of HSR frames for small frames For small Ethernet frames with size less than minimum size 66 for HSR vs 60 for regular Ethernet frames, hsr driver currently doesn't pad the frame to make it minimum size. This results in incorrect LSDU size being populated in the HSR tag for these frames. Fix this by padding the frame to the minimum size applicable for HSR. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-17 18:54:37 -07:00
Wang Hai	0b4a66a389	nfc: nci: add missed destroy_workqueue in nci_register_device When nfc_register_device fails in nci_register_device, destroy_workqueue() shouled be called to destroy ndev->tx_wq. Fixes: `3c1c0f5dc8` ("NFC: NCI: Fix nci_register_device init sequence") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-17 13:08:08 -07:00
Suraj Upadhyay	e0c3f4c4fd	net: decnet: af_decnet: Simplify goto loop. Replace goto loop with while loop. Signed-off-by: Suraj Upadhyay <usuraj35@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-17 12:56:11 -07:00
Priyaranjan Jha	e3a5a1e8b6	tcp: add SNMP counter for no. of duplicate segments reported by DSACK There are two existing SNMP counters, TCPDSACKRecv and TCPDSACKOfoRecv, which are incremented depending on whether the DSACKed range is below the cumulative ACK sequence number or not. Unfortunately, these both implicitly assume each DSACK covers only one segment. This makes these counters unusable for estimating spurious retransmit rates, or real/non-spurious loss rate. This patch introduces a new SNMP counter, TCPDSACKRecvSegs, which tracks the estimated number of duplicate segments based on: (DSACKed sequence range) / MSS. This counter is usable for estimating spurious retransmit rates, or real/non-spurious loss rate. Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-17 12:54:30 -07:00
Priyaranjan Jha	a71d77e6be	tcp: fix segment accounting when DSACK range covers multiple segments Currently, while processing DSACK, we assume DSACK covers only one segment. This leads to significant underestimation of DSACKs with LRO/GRO. This patch fixes segment accounting with DSACK by estimating segment count from DSACK sequence range / MSS. Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-17 12:54:30 -07:00
Davide Caratti	8c72894048	mptcp: silence warning in subflow_data_ready() since commit `d47a721520` ("mptcp: fix race in subflow_data_ready()"), it is possible to observe a regression in MP_JOIN kselftests. For sockets in TCP_CLOSE state, it's not sufficient to just wake up the main socket: we also need to ensure that received data are made available to the reader. Silence the WARN_ON_ONCE() in these cases: it preserves the syzkaller fix and restores kselftests when they are ran as follows: # while true; do > make KBUILD_OUTPUT=/tmp/kselftest TARGETS=net/mptcp kselftest > done Reported-by: Florian Westphal <fw@strlen.de> Fixes: `d47a721520` ("mptcp: fix race in subflow_data_ready()") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/47 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-17 12:47:00 -07:00
Weilong Chen	cebb69754f	rtnetlink: Fix memory(net_device) leak when ->newlink fails When vlan_newlink call register_vlan_dev fails, it might return error with dev->reg_state = NETREG_UNREGISTERED. The rtnl_newlink should free the memory. But currently rtnl_newlink only free the memory which state is NETREG_UNINITIALIZED. BUG: memory leak unreferenced object 0xffff8881051de000 (size 4096): comm "syz-executor139", pid 560, jiffies 4294745346 (age 32.445s) hex dump (first 32 bytes): 76 6c 61 6e 32 00 00 00 00 00 00 00 00 00 00 00 vlan2........... 00 45 28 03 81 88 ff ff 00 00 00 00 00 00 00 00 .E(............. backtrace: [<0000000047527e31>] kmalloc_node include/linux/slab.h:578 [inline] [<0000000047527e31>] kvmalloc_node+0x33/0xd0 mm/util.c:574 [<000000002b59e3bc>] kvmalloc include/linux/mm.h:753 [inline] [<000000002b59e3bc>] kvzalloc include/linux/mm.h:761 [inline] [<000000002b59e3bc>] alloc_netdev_mqs+0x83/0xd90 net/core/dev.c:9929 [<000000006076752a>] rtnl_create_link+0x2c0/0xa20 net/core/rtnetlink.c:3067 [<00000000572b3be5>] __rtnl_newlink+0xc9c/0x1330 net/core/rtnetlink.c:3329 [<00000000e84ea553>] rtnl_newlink+0x66/0x90 net/core/rtnetlink.c:3397 [<0000000052c7c0a9>] rtnetlink_rcv_msg+0x540/0x990 net/core/rtnetlink.c:5460 [<000000004b5cb379>] netlink_rcv_skb+0x12b/0x3a0 net/netlink/af_netlink.c:2469 [<00000000c71c20d3>] netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] [<00000000c71c20d3>] netlink_unicast+0x4c6/0x690 net/netlink/af_netlink.c:1329 [<00000000cca72fa9>] netlink_sendmsg+0x735/0xcc0 net/netlink/af_netlink.c:1918 [<000000009221ebf7>] sock_sendmsg_nosec net/socket.c:652 [inline] [<000000009221ebf7>] sock_sendmsg+0x109/0x140 net/socket.c:672 [<000000001c30ffe4>] ____sys_sendmsg+0x5f5/0x780 net/socket.c:2352 [<00000000b71ca6f3>] ___sys_sendmsg+0x11d/0x1a0 net/socket.c:2406 [<0000000007297384>] __sys_sendmsg+0xeb/0x1b0 net/socket.c:2439 [<000000000eb29b11>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 [<000000006839b4d0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `cb626bf566` ("net-sysfs: Fix reference count leak") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-17 12:33:18 -07:00
Eelco Chaudron	eac87c413b	net: openvswitch: reorder masks array based on usage This patch reorders the masks array every 4 seconds based on their usage count. This greatly reduces the masks per packet hit, and hence the overall performance. Especially in the OVS/OVN case for OpenShift. Here are some results from the OVS/OVN OpenShift test, which use 8 pods, each pod having 512 uperf connections, each connection sends a 64-byte request and gets a 1024-byte response (TCP). All uperf clients are on 1 worker node while all uperf servers are on the other worker node. Kernel without this patch : 7.71 Gbps Kernel with this patch applied: 14.52 Gbps We also run some tests to verify the rebalance activity does not lower the flow insertion rate, which does not. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Tested-by: Andrew Theurer <atheurer@redhat.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-17 10:36:50 -07:00
Xin Long	96a2082950	ip6_vti: use IS_REACHABLE to avoid some compile errors Naresh reported some compile errors: arm build failed due this error on linux-next 20200713 and 20200713 net/ipv6/ip6_vti.o: In function `vti6_rcv_tunnel': ip6_vti.c:(.text+0x1d20): undefined reference to `xfrm6_tunnel_spi_lookup' This happened when set CONFIG_IPV6_VTI=y and CONFIG_INET6_TUNNEL=m. We don't really want ip6_vti to depend inet6_tunnel completely, but only to disable the tunnel code when inet6_tunnel is not seen. So instead of adding "select INET6_TUNNEL" for IPV6_VTI, this patch is only to change to IS_REACHABLE to avoid these compile error. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Fixes: `08622869ed` ("ip6_vti: support IP6IP6 tunnel processing with .cb_handler") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-07-17 10:41:27 +02:00
Xin Long	0a0d93b943	xfrm: interface: use IS_REACHABLE to avoid some compile errors kernel test robot reported some compile errors: ia64-linux-ld: net/xfrm/xfrm_interface.o: in function `xfrmi4_fini': net/xfrm/xfrm_interface.c:900: undefined reference to `xfrm4_tunnel_deregister' ia64-linux-ld: net/xfrm/xfrm_interface.c:901: undefined reference to `xfrm4_tunnel_deregister' ia64-linux-ld: net/xfrm/xfrm_interface.o: in function `xfrmi4_init': net/xfrm/xfrm_interface.c:873: undefined reference to `xfrm4_tunnel_register' ia64-linux-ld: net/xfrm/xfrm_interface.c:876: undefined reference to `xfrm4_tunnel_register' ia64-linux-ld: net/xfrm/xfrm_interface.c:885: undefined reference to `xfrm4_tunnel_deregister' This happened when set CONFIG_XFRM_INTERFACE=y and CONFIG_INET_TUNNEL=m. We don't really want xfrm_interface to depend inet_tunnel completely, but only to disable the tunnel code when inet_tunnel is not seen. So instead of adding "select INET_TUNNEL" for XFRM_INTERFACE, this patch is only to change to IS_REACHABLE to avoid these compile error. Reported-by: kernel test robot <lkp@intel.com> Fixes: `da9bbf0598` ("xfrm: interface: support IPIP and IPIP6 tunnels processing with .cb_handler") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-07-17 10:40:54 +02:00
Sabrina Dubroca	95a35b42bc	xfrm: policy: fix IPv6-only espintcp compilation In case we're compiling espintcp support only for IPv6, we should still initialize the common code. Fixes: `26333c37fc` ("xfrm: add IPv6 support for espintcp") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-07-17 10:22:22 +02:00
Sabrina Dubroca	e229c877cd	espintcp: recv() should return 0 when the peer socket is closed man 2 recv says: RETURN VALUE When a stream socket peer has performed an orderly shutdown, the return value will be 0 (the traditional "end-of-file" return). Currently, this works for blocking reads, but non-blocking reads will return -EAGAIN. This patch overwrites that return value when the peer won't send us any more data. Fixes: `e27cca96cd` ("xfrm: add espintcp (RFC 8229)") Reported-by: Andrew Cagney <cagney@libreswan.org> Tested-by: Andrew Cagney <cagney@libreswan.org> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-07-17 10:21:54 +02:00
Sabrina Dubroca	ac1321efb1	espintcp: support non-blocking sends Currently, non-blocking sends from userspace result in EOPNOTSUPP. To support this, we need to tell espintcp_sendskb_locked() and espintcp_sendskmsg_locked() that non-blocking operation was requested from espintcp_sendmsg(). Fixes: `e27cca96cd` ("xfrm: add espintcp (RFC 8229)") Reported-by: Andrew Cagney <cagney@libreswan.org> Tested-by: Andrew Cagney <cagney@libreswan.org> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-07-17 10:21:03 +02:00
Petr Machata	ac5c66f261	Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" This reverts commit `aebe4426cc`. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-07-16 16:48:34 -07:00
Petr Machata	55f656cdb8	net: sched: Do not drop root lock in tcf_qevent_handle() Mirred currently does not mix well with blocks executed after the qdisc root lock is taken. This includes classification blocks (such as in PRIO, ETS, DRR qdiscs) and qevents. The locking caused by the packet mirrored by mirred can cause deadlocks: either when the thread of execution attempts to take the lock a second time, or when two threads end up waiting on each other's locks. The qevent patchset attempted to not introduce further badness of this sort, and dropped the lock before executing the qevent block. However this lead to too little locking and races between qdisc configuration and packet enqueue in the RED qdisc. Before the deadlock issues are solved in a way that can be applied across many qdiscs reasonably easily, do for qevents what is done for the classification blocks and just keep holding the root lock. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-07-16 16:48:34 -07:00
Kees Cook	3f649ab728	treewide: Remove uninitialized_var() usage Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' \| cut -d: -f1 \| sort -u \| \ xargs perl -pi -e \ 's/\buninitialized_var$([^$]+)\)/\1/g; s:\s/\ (GCC be quiet\|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>	2020-07-16 12:35:15 -07:00
John Ogness	632ca50f2c	af_packet: TPACKET_V3: replace busy-wait loop A busy-wait loop is used to implement waiting for bits to be copied from the skb to the kernel buffer before retiring a block. This is a problem on PREEMPT_RT because the copying task could be preempted by the busy-waiting task and thus live lock in the busy-wait loop. Replace the busy-wait logic with an rwlock_t. This provides lockdep coverage and makes the code RT ready. Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-07-16 11:40:39 -07:00
Lorenzo Bianconi	9216477449	bpf: cpumap: Add the possibility to attach an eBPF program to cpumap Introduce the capability to attach an eBPF program to cpumap entries. The idea behind this feature is to add the possibility to define on which CPU run the eBPF program if the underlying hw does not support RSS. Current supported verdicts are XDP_DROP and XDP_PASS. This patch has been tested on Marvell ESPRESSObin using xdp_redirect_cpu sample available in the kernel tree to identify possible performance regressions. Results show there are no observable differences in packet-per-second: $./xdp_redirect_cpu --progname xdp_cpu_map0 --dev eth0 --cpu 1 rx: 354.8 Kpps rx: 356.0 Kpps rx: 356.8 Kpps rx: 356.3 Kpps rx: 356.6 Kpps rx: 356.6 Kpps rx: 356.7 Kpps rx: 355.8 Kpps rx: 356.8 Kpps rx: 356.8 Kpps Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/5c9febdf903d810b3415732e5cd98491d7d9067a.1594734381.git.lorenzo@kernel.org	2020-07-16 17:00:32 +02:00
Eric Biggers	5a7a0d9400	mptcp: use sha256() instead of open coding Now that there's a function that calculates the SHA-256 digest of a buffer in one step, use it instead of sha256_init() + sha256_update() + sha256_final(). Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Cc: mptcp@lists.01.org Cc: Mat Martineau <mathew.j.martineau@linux.intel.com> Cc: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-07-16 21:49:05 +10:00
Suraj Upadhyay	514d09529d	decnet: dn_dev: Remove an unnecessary label. Remove the unnecessary label from dn_dev_ioctl() and make its error handling simpler to read. Signed-off-by: Suraj Upadhyay <usuraj35@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-07-15 18:03:28 -07:00
Stefano Garzarella	f961134a61	vsock/virtio: annotate 'the_virtio_vsock' RCU pointer Commit `0deab087b1` ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock") starts to use RCU to protect 'the_virtio_vsock' pointer, but we forgot to annotate it. This patch adds the annotation to fix the following sparse errors: net/vmw_vsock/virtio_transport.c:73:17: error: incompatible types in comparison expression (different address spaces): net/vmw_vsock/virtio_transport.c:73:17: struct virtio_vsock [noderef] __rcu * net/vmw_vsock/virtio_transport.c:73:17: struct virtio_vsock * net/vmw_vsock/virtio_transport.c:171:17: error: incompatible types in comparison expression (different address spaces): net/vmw_vsock/virtio_transport.c:171:17: struct virtio_vsock [noderef] __rcu * net/vmw_vsock/virtio_transport.c:171:17: struct virtio_vsock * net/vmw_vsock/virtio_transport.c:207:17: error: incompatible types in comparison expression (different address spaces): net/vmw_vsock/virtio_transport.c:207:17: struct virtio_vsock [noderef] __rcu * net/vmw_vsock/virtio_transport.c:207:17: struct virtio_vsock * net/vmw_vsock/virtio_transport.c:561:13: error: incompatible types in comparison expression (different address spaces): net/vmw_vsock/virtio_transport.c:561:13: struct virtio_vsock [noderef] __rcu * net/vmw_vsock/virtio_transport.c:561:13: struct virtio_vsock * net/vmw_vsock/virtio_transport.c:612:9: error: incompatible types in comparison expression (different address spaces): net/vmw_vsock/virtio_transport.c:612:9: struct virtio_vsock [noderef] __rcu * net/vmw_vsock/virtio_transport.c:612:9: struct virtio_vsock * net/vmw_vsock/virtio_transport.c:631:9: error: incompatible types in comparison expression (different address spaces): net/vmw_vsock/virtio_transport.c:631:9: struct virtio_vsock [noderef] __rcu * net/vmw_vsock/virtio_transport.c:631:9: struct virtio_vsock * Fixes: `0deab087b1` ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock") Reported-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-07-15 17:47:15 -07:00
Dan Carpenter	336f531ab1	netfilter: nf_tables: Fix a use after free in nft_immediate_destroy() The nf_tables_rule_release() function frees "rule" so we have to use the _safe() version of list_for_each_entry(). Fixes: `d0e2c7de92` ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-07-15 20:15:19 +02:00
Florian Westphal	1e9451cbda	netfilter: nf_tables: fix nat hook table deletion sybot came up with following transaction: add table ip syz0 add chain ip syz0 syz2 { type nat hook prerouting priority 0; policy accept; } add table ip syz0 { flags dormant; } delete chain ip syz0 syz2 delete table ip syz0 which yields: hook not found, pf 2 num 0 WARNING: CPU: 0 PID: 6775 at net/netfilter/core.c:413 __nf_unregister_net_hook+0x3e6/0x4a0 net/netfilter/core.c:413 [..] nft_unregister_basechain_hooks net/netfilter/nf_tables_api.c:206 [inline] nft_table_disable net/netfilter/nf_tables_api.c:835 [inline] nf_tables_table_disable net/netfilter/nf_tables_api.c:868 [inline] nf_tables_commit+0x32d3/0x4d70 net/netfilter/nf_tables_api.c:7550 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:486 [inline] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:544 [inline] nfnetlink_rcv+0x14a5/0x1e50 net/netfilter/nfnetlink.c:562 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] Problem is that when I added ability to override base hook registration to make nat basechains register with the nat core instead of netfilter core, I forgot to update nft_table_disable() to use that instead of the 'raw' hook register interface. In syzbot transaction, the basechain is of 'nat' type. Its registered with the nat core. The switch to 'dormant mode' attempts to delete from netfilter core instead. After updating nft_table_disable/enable to use the correct helper, nft_(un)register_basechain_hooks can be folded into the only remaining caller. Because nft_trans_table_enable() won't do anything when the DORMANT flag is set, remove the flag first, then re-add it in case re-enablement fails, else this patch breaks sequence: add table ip x { flags dormant; } /* add base chains */ add table ip x The last 'add' will remove the dormant flags, but won't have any other effect -- base chains are not registered. Then, next 'set dormant flag' will create another 'hook not found' splat. Reported-by: syzbot+2570f2c036e3da5db176@syzkaller.appspotmail.com Fixes: `4e25ceb80b` ("netfilter: nf_tables: allow chain type to override hook register") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-07-15 20:02:28 +02:00
Colin Ian King	912288442c	xprtrdma: fix incorrect header size calculations Currently the header size calculations are using an assignment operator instead of a += operator when accumulating the header size leading to incorrect sizes. Fix this by using the correct operator. Addresses-Coverity: ("Unused value") Fixes: `302d3deb20` ("xprtrdma: Prevent inline overflow") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-07-15 13:01:01 -04:00

... 3 4 5 6 7 ...

61713 commits