Commit graph

795442 commits

Author SHA1 Message Date
Jaime Caamaño Ruiz 46ebe2834b openvswitch: Fix push/pop ethernet validation
When there are both pop and push ethernet header actions among the
actions to be applied to a packet, an unexpected EINVAL (Invalid
argument) error is obtained. This is due to mac_proto not being reset
correctly when those actions are validated.

Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047554.html
Fixes: 91820da6ae ("openvswitch: add Ethernet push and pop actions")
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 18:37:16 -07:00
Niklas Cassel 30549aab14 net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules
When building stmmac, it is only possible to select CONFIG_DWMAC_GENERIC,
or any of the glue drivers, when CONFIG_STMMAC_PLATFORM is set.
The only exception is CONFIG_STMMAC_PCI.

When calling of_mdiobus_register(), it will call our ->reset()
callback, which is set to stmmac_mdio_reset().

Most of the code in stmmac_mdio_reset() is protected by a
"#if defined(CONFIG_STMMAC_PLATFORM)", which will evaluate
to false when CONFIG_STMMAC_PLATFORM=m.

Because of this, the phy reset gpio will only be pulled when
stmmac is built as built-in, but not when built as modules.

Fix this by using "#if IS_ENABLED()" instead of "#if defined()".

Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 18:35:58 -07:00
David S. Miller 4d3163cf87 Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2018-10-31

This series contains a various collection of fixes.

Miroslav Lichvar from Red Hat or should I say IBM now?  Updates the PHC
timecounter interval for igb so that it gets updated at least once
every 550 seconds.

Ngai-Mint provides a fix for fm10k to prevent a soft lockup or system
crash by adding a new condition to determine if the SM mailbox is in the
correct state before proceeding.

Jake provides several fm10k fixes, first one marks complier aborts as
non-fatal since on some platforms trigger machine check errors when the
compile aborts.  Added missing device ids to the in-kernel driver.  Due
to the recent fixes, bumped the driver version.

I (Jeff Kirsher) fixed a XFRM_ALGO dependency for both ixgbe and
ixgbevf.  This fix was based on the original work from Arnd Bergmann,
which only fixed ixgbe.

Mitch provides a fix for i40e/avf to update the status codes, which
resolves an issue between a mis-match between i40e and the iavf driver,
which also supports the ice LAN driver.

Radoslaw fixes the ixgbe where the driver is logging a message about
spoofed packets detected when the VF is re-started with a different MAC
address.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 18:21:37 -07:00
David S. Miller df975da4e5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-11-01

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix tcp_bpf_recvmsg() to return -EAGAIN instead of 0 in non-blocking
   case when no data is available yet, from John.

2) Fix a compilation error in libbpf_attach_type_by_name() when compiled
   with clang 3.8, from Andrey.

3) Fix a partial copy of map pointer on scalar alu and remove id
   generation for RET_PTR_TO_MAP_VALUE return types, from Daniel.

4) Add unlimited memlock limit for kernel selftest's flow_dissector_load
   program, from Yonghong.

5) Fix ping for some BPF shell based kselftests where distro does not
   ship "ping -6" anymore, from Li.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 17:34:08 -07:00
Alexei Starovoitov dfeb8f4c96 Merge branch 'verifier-fixes'
Daniel Borkmann says:

====================
The series contains two fixes in BPF core and test cases. For details
please see individual patches. Thanks!
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-31 16:53:18 -07:00
Daniel Borkmann 832c6f2c29 bpf: test make sure to run unpriv test cases in test_verifier
Right now unprivileged tests are never executed as a BPF test run,
only loaded. Allow for running them as well so that we can check
the outcome and probe for regressions.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-31 16:53:17 -07:00
Daniel Borkmann 2683f4128c bpf: add various test cases to test_verifier
Add some more map related test cases to test_verifier kselftest
to improve test coverage. Summary: 1012 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-31 16:53:17 -07:00
Daniel Borkmann 4d31f30148 bpf: don't set id on after map lookup with ptr_to_map_val return
In the verifier there is no such semantics where registers with
PTR_TO_MAP_VALUE type have an id assigned to them. This is only
used in PTR_TO_MAP_VALUE_OR_NULL and later on nullified once the
test against NULL has been pattern matched and type transformed
into PTR_TO_MAP_VALUE.

Fixes: 3e6a4b3e02 ("bpf/verifier: introduce BPF_PTR_TO_MAP_VALUE")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Roman Gushchin <guro@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-31 16:53:17 -07:00
Daniel Borkmann 0962590e55 bpf: fix partial copy of map_ptr when dst is scalar
ALU operations on pointers such as scalar_reg += map_value_ptr are
handled in adjust_ptr_min_max_vals(). Problem is however that map_ptr
and range in the register state share a union, so transferring state
through dst_reg->range = ptr_reg->range is just buggy as any new
map_ptr in the dst_reg is then truncated (or null) for subsequent
checks. Fix this by adding a raw member and use it for copying state
over to dst_reg.

Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-31 16:53:17 -07:00
Andrey Ignatov 3615353218 libbpf: Fix compile error in libbpf_attach_type_by_name
Arnaldo Carvalho de Melo reported build error in libbpf when clang
version 3.8.1-24 (tags/RELEASE_381/final) is used:

libbpf.c:2201:36: error: comparison of constant -22 with expression of
type 'const enum bpf_attach_type' is always false
[-Werror,-Wtautological-constant-out-of-range-compare]
                if (section_names[i].attach_type == -EINVAL)
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~
1 error generated.

Fix the error by keeping "is_attachable" property of a program in a
separate struct field instead of trying to use attach_type itself.

Fixes: 956b620fcf ("libbpf: Introduce libbpf_attach_type_by_name")
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-31 23:06:17 +01:00
Li Zhijian deee2cae27 kselftests/bpf: use ping6 as the default ipv6 ping binary if it exists
ping binary on some distros doesn't support "ping -6" anymore.

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-31 23:05:30 +01:00
David S. Miller e2acdddde0 Merge branch 'mlxsw-Enable-minimum-shaper-on-MC-TCs'
Ido Schimmel says:

====================
mlxsw: Enable minimum shaper on MC TCs

Petr says:

An MC-aware mode was introduced in commit 7b81953066 ("mlxsw:
spectrum: Configure MC-aware mode on mlxsw ports"). In MC-aware mode,
BUM traffic gets a special treatment by being assigned to a separate set
of traffic classes 8..15. Pairs of TCs 0 and 8, 1 and 9, etc., are then
configured to strictly prioritize the lower-numbered ones. The intention
is to prevent BUM traffic from flooding the switch and push out all UC
traffic, which would otherwise happen, and instead give UC traffic
precedence.

However strictly prioritizing UC traffic has the effect that UC overload
pushes out all BUM traffic, such as legitimate ARP queries. These
packets are kept in queues for a while, but under sustained UC overload,
their lifetime eventually expires and these packets are dropped. That is
detrimental to network performance as well.

In this patchset, MC TCs (8..15) are configured with minimum shaper of
200Mbps (a minimum permitted value) to allow a trickle of necessary
control traffic to get through.

First in patch #1, the QEEC register is extended with fields necessary
to configure the minimum shaper.

In patch #2, minimum shaper is enabled on TCs 8..15.

In patches #3 and #4, first the MC-awareness test is tweaked to support
the minimum shaper, and then a new test is introduced to test that MC
traffic behaves well under UC overload.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:56:59 -07:00
Petr Machata a5ee171d08 selftests: mlxsw: qos_mc_aware: Add a test for UC awareness
In a previous patch, mlxsw was updated to configure a minimum bandwidth
allowance on MC TCs. Test that this indeed fixes the problem of UC
traffic overload pushing out all MC traffic.

Fixes: b5638d46c9 ("selftests: mlxsw: Add a test for UC behavior under MC flood")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:56:59 -07:00
Petr Machata 8f3f09358c selftests: mlxsw: qos_mc_aware: Tweak for min shaper
Since the minimum shaper is now being enabled for MC TCs, it's
unreasonable to expect no UC traffic loss. Minimal min shaper value is
200Mbps, which is 20% of the 1Gbps that this test configures on egress.
To cover for glitches, tolerate up to 25% UC degradation under MC
overload.

Fixes: b5638d46c9 ("selftests: mlxsw: Add a test for UC behavior under MC flood")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:56:59 -07:00
Petr Machata 0fe6402316 mlxsw: spectrum: Set minimum shaper on MC TCs
An MC-aware mode was introduced in commit 7b81953066 ("mlxsw:
spectrum: Configure MC-aware mode on mlxsw ports"). In MC-aware mode,
BUM traffic gets a special treatment by being assigned to a separate set
of traffic classes 8..15. Pairs of TCs 0 and 8, 1 and 9, etc., are then
configured to strictly prioritize the lower-numbered ones. The intention
is to prevent BUM traffic from flooding the switch and push out all UC
traffic, which would otherwise happen, and instead give UC traffic
precedence.

However strictly prioritizing UC traffic has the effect that UC overload
pushes out all BUM traffic, such as legitimate ARP queries. These
packets are kept in queues for a while, but under sustained UC overload,
their lifetime eventually expires and these packets are dropped. That is
detrimental to network performance as well.

Therefore configure the MC TCs (8..15) with minimum shaper of 200Mbps (a
minimum permitted value) to allow a trickle of necessary control traffic
to get through.

Fixes: 7b81953066 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:56:58 -07:00
Petr Machata 8b931821aa mlxsw: reg: QEEC: Add minimum shaper fields
Add QEEC.mise (minimum shaper enable) and QEEC.min_shaper_rate to enable
configuration of minimum shaper.

Increase the QEEC length to 0x20 as well: that's the length that the
register has had for a long time now, but with the configurations that
mlxsw typically exercises, the firmware tolerated 0x1C-sized packets.
With mise=true however, FW rejects packets unless they have the full
required length.

Fixes: b9b7cee405 ("mlxsw: reg: Add QoS ETS Element Configuration register")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:56:58 -07:00
David S. Miller c4d63c7147 Merge branch 'hns3-fixes'
Huazhong Tan says:

====================
Bugfix for the HNS3 driver

This patch series include bugfix for the HNS3 ethernet
controller driver.

Change log:
V4->V5:
	Fixes comments from Joe Perches & Sergei Shtylyov
V3->V4:
	Fixes comments from Sergei Shtylyov
V2->V3:
	Fixes comments from Sergei Shtylyov
V1->V2:
	Fixes the compilation break reported by kbuild test robot
	http://patchwork.ozlabs.org/patch/989818/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:42:39 -07:00
Huazhong Tan 29118ab962 net: hns3: bugfix for rtnl_lock's range in the hclgevf_reset()
Since hclgevf_reset_wait() is used to wait for the hardware to complete
the reset, it is not necessary to hold the rtnl_lock during
hclgevf_reset_wait(). So this patch releases the lock for the duration
of hclgevf_reset_wait().

Fixes: 6988eb2a9b ("net: hns3: Add support to reset the enet/ring mgmt layer")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:42:38 -07:00
Huazhong Tan a963052e53 net: hns3: bugfix for rtnl_lock's range in the hclge_reset()
Since hclge_reset_wait() is used to wait for the hardware to complete
the reset, it is not necessary to hold the rtnl_lock during
hclge_reset_wait(). So this patch releases the lock for the duration
of hclge_reset_wait().

Fixes: 6d4fab3953 ("net: hns3: Reset net device with rtnl_lock")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:42:38 -07:00
Huazhong Tan 3c88ed1d79 net: hns3: bugfix for handling mailbox while the command queue reinitialized
In a multi-core machine, the mailbox service and reset service
will be executed at the same time. The reset service will re-initialize
the command queue, before that, the mailbox handler can only get some
invalid messages.

The HCLGE_STATE_CMD_DISABLE flag means that the command queue is not
available and needs to be reinitialized. Therefore, when the mailbox
handler recognizes this flag, it should not process the command.

Fixes: dde1a86e93 ("net: hns3: Add mailbox support to PF driver")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:42:38 -07:00
Huazhong Tan 7fa6be4fd2 net: hns3: fix incorrect return value/type of some functions
There are some functions that, when they fail to send the command,
need to return the corresponding error value to its caller.

Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Fixes: 681ec3999b ("net: hns3: fix for vlan table lost problem when resetting")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:42:38 -07:00
Huazhong Tan 1c12493809 net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read
When there is a PHY, the driver needs to complete some operations through
MDIO during reset reinitialization, so HCLGE_STATE_CMD_DISABLE is more
suitable than HCLGE_STATE_RST_HANDLING to prevent the MDIO operation from
being sent during the hardware reset.

Fixes: b50ae26c57 ("net: hns3: never send command queue message to IMP when reset)
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:42:38 -07:00
Huazhong Tan 6d71ec6cbf net: hns3: bugfix for is_valid_csq_clean_head()
The HEAD pointer of the hardware command queue maybe equal to the command
queue's next_to_use in the driver, so that does not belong to the invalid
HEAD pointer, since the hardware may not process the command in time,
causing the HEAD pointer to be too late to update. The variables' name
in this function is unreadable, so give them a more readable one.

Fixes: 3ff504908f ("net: hns3: fix a dead loop in hclge_cmd_csq_clean")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:42:38 -07:00
Huazhong Tan 5faaf0752a net: hns3: remove unnecessary queue reset in the hns3_uninit_all_ring()
It is not necessary to reset the queue in the hns3_uninit_all_ring(),
since the queue is stopped in the down operation, and will be reset
in the up operation. And the judgment of the HCLGE_STATE_RST_HANDLING
flag in the hclge_reset_tqp() is not correct, because we need to reset
tqp during pf reset, otherwise it may cause queue not being reset to
working state problem.

Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:42:38 -07:00
Huazhong Tan b2f74dbaf1 net: hns3: bugfix for the initialization of command queue's spin lock
The spin lock of the command queue only need to be initialized once
when the driver initializes the command queue. It is not necessary to
initialize the spin lock when resetting. At the same time, the
modification of the queue member should be performed after acquiring
the lock.

Fixes: 3efb960f05 ("net: hns3: Refactor the initialization of command queue")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:42:38 -07:00
Huazhong Tan 0d4411408a net: hns3: bugfix for reporting unknown vector0 interrupt repeatly problem
The current driver supports handling two vector0 interrupts, reset and
mailbox. When the hardware reports an interrupt of another type of
interrupt source, if the driver does not process the interrupt, but
enables the interrupt, the hardware will repeatedly report the unknown
interrupt.

Therefore, the driver enables the vector0 interrupt after clearing the
known type of interrupt source. Other conditions are not enabled.

Fixes: cd8c5c269b ("net: hns3: Fix for hclge_reset running repeatly problem")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:42:38 -07:00
Huazhong Tan 73b907a083 net: hns3: bugfix for buffer not free problem during resetting
When hns3_get_ring_config()/hns3_queue_to_ring()/
hns3_get_vector_ring_chain() failed during resetting, the allocated
memory has not been freed before these three functions return. So
this patch adds error handler in these functions to fix it.

Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:42:38 -07:00
Huazhong Tan ece4bf46e9 net: hns3: add error handler for hns3_nic_init_vector_data()
When hns3_nic_init_vector_data() fails to map ring to vector,
it should cancel the netif_napi_add() that has been successfully
done and then exits.

Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:42:38 -07:00
Eric Dumazet d48051c5b8 net/mlx5e: fix csum adjustments caused by RXFCS
As shown by Dmitris, we need to use csum_block_add() instead of csum_add()
when adding the FCS contribution to skb csum.

Before 4.18 (more exactly commit 88078d98d1 "net: pskb_trim_rcsum()
and CHECKSUM_COMPLETE are friends"), the whole skb csum was thrown away,
so RXFCS changes were ignored.

Then before commit d55bef5059 ("net: fix pskb_trim_rcsum_slow() with
odd trim offset") both mlx5 and pskb_trim_rcsum_slow() bugs were canceling
each other.

Now we fixed pskb_trim_rcsum_slow() we need to fix mlx5.

Note that this patch also rewrites mlx5e_get_fcs() to :

- Use skb_header_pointer() instead of reinventing it.
- Use __get_unaligned_cpu32() to avoid possible non aligned accesses
  as Dmitris pointed out.

Fixes: 902a545904 ("net/mlx5e: When RXFCS is set, add FCS data into checksum calculation")
Reported-by: Paweł Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eran Ben Elisha <eranbe@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Dimitris Michailidis <dmichail@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Paweł Staszewski <pstaszewski@itcare.pl>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Tested-By: Maria Pasechnik <mariap@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:40:52 -07:00
Jason Wang ff002269a4 vhost: Fix Spectre V1 vulnerability
The idx in vhost_vring_ioctl() was controlled by userspace, hence a
potential exploitation of the Spectre variant 1 vulnerability.

Fixing this by sanitizing idx before using it to index d->vqs.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:39:15 -07:00
Bo YU b1c234441e net: drop a space before tabs
Fix a warning from checkpatch.pl:'please no space before tabs'
in include/net/af_unix.h

Signed-off-by: Bo YU <tsu.yubo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:37:12 -07:00
Bo YU c4147beabe net: add an identifier name for 'struct sock *'
Fix a warning from checkpatch:
function definition argument 'struct sock *' should also have an
identifier name in include/net/af_unix.h.

Signed-off-by: Bo YU <tsu.yubo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:37:12 -07:00
Colin Ian King e7611088f0 net: hns3: fix spelling mistake "intrerrupt" -> "interrupt"
Trivial fix to spelling mistake in dev_err message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:33:10 -07:00
Radoslaw Tyl 6702185c1f ixgbe: fix MAC anti-spoofing filter after VFLR
This change resolves a driver bug where the driver is logging a
message that says "Spoofed packets detected". This can occur on the PF
(host) when a VF has VLAN+MACVLAN enabled and is re-started with a
different MAC address.

MAC and VLAN anti-spoofing filters are to be enabled together.

Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Piotr Skajewski <piotrx.skajewski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-31 11:05:51 -07:00
Mitch Williams bb58fd7eef i40e: Update status codes
Add a few new status code which will be used by the ice driver, and
rename a few to make them more consistent. Error code are mapped to
similar values as in i40e_status.h, so as to be compatible with older
VF drivers not using this status enum.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-31 10:57:43 -07:00
Jeff Kirsher 48e01e001d ixgbe/ixgbevf: fix XFRM_ALGO dependency
Based on the original work from Arnd Bergmann.

When XFRM_ALGO is not enabled, the new ixgbe IPsec code produces a
link error:

drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.o: In function `ixgbe_ipsec_vf_add_sa':
ixgbe_ipsec.c:(.text+0x1266): undefined reference to `xfrm_aead_get_byname'

Simply selecting XFRM_ALGO from here causes circular dependencies, so
to fix it, we probably want this slightly more complex solution that is
similar to what other drivers with XFRM offload do:

A separate Kconfig symbol now controls whether we include the IPsec
offload code. To keep the old behavior, this is left as 'default y'. The
dependency in XFRM_OFFLOAD still causes a circular dependency but is
not actually needed because this symbol is not user visible, so removing
that dependency on top makes it all work.

CC: Arnd Bergmann <arnd@arndb.de>
CC: Shannon Nelson <shannon.nelson@oracle.com>
Fixes: eda0333ac2 ("ixgbe: add VF IPsec management")
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
2018-10-31 10:53:15 -07:00
Jacob Keller 35ae5414e7 fm10k: bump driver version to match out-of-tree release
The upstream and out-of-tree drivers are once again at comparable
functionality. It's been a while since we updated the upstream driver
version, so bump it now.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-31 10:49:15 -07:00
Jacob Keller 9a1fe1e2bb fm10k: add missing device IDs to the upstream driver
The device IDs for the Ethernet SDI Adapter devices were never added to
the upstream driver. The IDs are already in the pci.ids database, and
are supported by the out-of-tree driver.

Add the device IDs now, so that the upstream driver can recognize and
load these devices.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-31 10:43:13 -07:00
Jacob Keller e330af7889 fm10k: ensure completer aborts are marked as non-fatal after a resume
VF drivers can trigger PCIe completer aborts any time they read a queue
that they don't own. Even in nominal circumstances, it is not possible
to prevent the VF driver from reading queues it doesn't own. VF drivers
may attempt to read queues it previously owned, but which it no longer
does due to a PF reset.

Normally these completer aborts aren't an issue. However, on some
platforms these trigger machine check errors. This is true even if we
lower their severity from fatal to non-fatal. Indeed, we already have
code for lowering the severity.

We could attempt to mask these errors conditionally around resets, which
is the most common time they would occur. However this would essentially
be a race between the PF and VF drivers, and we may still occasionally
see machine check exceptions on these strictly configured platforms.

Instead, mask the errors entirely any time we resume VFs. By doing so,
we prevent the completer aborts from being sent to the parent PCIe
device, and thus these strict platforms will not upgrade them into
machine check errors.

Additionally, we don't lose any information by masking these errors,
because we'll still report VFs which attempt to access queues via the
FUM_BAD_VF_QACCESS errors.

Without this change, on platforms where completer aborts cause machine
check exceptions, the VF reading queues it doesn't own could crash the
host system. Masking the completer abort prevents this, so we should
mask it for good, and not just around a PCIe reset. Otherwise malicious
or misconfigured VFs could cause the host system to crash.

Because we are masking the error entirely, there is little reason to
also keep setting the severity bit, so that code is also removed.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-31 10:37:32 -07:00
Ngai-Mint Kwan e69e40c806 fm10k: fix SM mailbox full condition
Current condition will always incorrectly report a full SM mailbox if an
IES API application is not running. Due to this, the
"fm10k_service_task" will be infinitely queued into the driver's
workqueue. This, in turn, will cause a "kworker" thread to report 100%
CPU utilization and might cause "soft lockup" events or system crashes.

To fix this issue, a new condition is added to determine if the SM
mailbox is in the correct state of FM10K_STATE_OPEN before proceeding.
In other words, an instance of the IES API must be running. If there is,
the remainder of the flow stays the same which is to determine if the SM
mailbox capacity has been exceeded or not and take appropriate action.

Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-31 10:31:24 -07:00
Miroslav Lichvar 094bf4d0e9 igb: shorten maximum PHC timecounter update interval
The timecounter needs to be updated at least once per ~550 seconds in
order to avoid a 40-bit SYSTIM timestamp to be misinterpreted as an old
timestamp.

Since commit 500462a9d ("timers: Switch to a non-cascading wheel"),
scheduling of delayed work seems to be less accurate and a requested
delay of 540 seconds may actually be longer than 550 seconds. Shorten
the delay to 480 seconds to be sure the timecounter is updated in time.

This fixes an issue with HW timestamps on 82580/I350/I354 being off by
~1100 seconds for few seconds every ~9 minutes.

Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-31 10:24:41 -07:00
John Fastabend 27b31e68bc bpf: tcp_bpf_recvmsg should return EAGAIN when nonblocking and no data
We return 0 in the case of a nonblocking socket that has no data
available. However, this is incorrect and may confuse applications.
After this patch we do the correct thing and return the error
EAGAIN.

Quoting return codes from recvmsg manpage,

EAGAIN or EWOULDBLOCK
 The socket is marked nonblocking and the receive operation would
 block, or a receive timeout had been set and the timeout expired
 before data was received.

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-30 23:31:22 +01:00
Yonghong Song b31d30d9be tools/bpf: add unlimited rlimit for flow_dissector_load
On our test machine, bpf selftest test_flow_dissector.sh failed
with the following error:
  # ./test_flow_dissector.sh
  bpffs not mounted. Mounting...
  libbpf: failed to create map (name: 'jmp_table'): Operation not permitted
  libbpf: failed to load object 'bpf_flow.o'
  ./flow_dissector_load: bpf_prog_load bpf_flow.o
  selftests: test_flow_dissector [FAILED]

Let us increase the rlimit to remove the above map
creation failure.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-30 23:31:21 +01:00
Marc Zyngier a6b3a3fa04 net: mvpp2: Fix affinity hint allocation
The mvpp2 driver has the curious behaviour of passing a stack variable
to irq_set_affinity_hint(), which results in the kernel exploding
the first time anyone accesses this information. News flash: userspace
does, and irqbalance will happily take the machine down. Great stuff.

An easy fix is to track the mask within the queue_vector structure,
and to make sure it has the same lifetime as the interrupt itself.

Fixes: e531f76757 ("net: mvpp2: handle cases where more CPUs are available than s/w threads")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-30 11:34:41 -07:00
Eric Dumazet 3aa8029e1a net/mlx4_en: add a missing <net/ip.h> include
Abdul Haleem reported a build error on ppc :

drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: `struct
iphdr` declared inside parameter list [enabled by default]
           struct iphdr *iph)
                  ^
drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: its scope is
only this definition or declaration, which is probably not what you want
[enabled by default]
drivers/net/ethernet/mellanox/mlx4/en_rx.c: In function
get_fixed_ipv4_csum:
drivers/net/ethernet/mellanox/mlx4/en_rx.c:586:20: error: dereferencing
pointer to incomplete type
  __u8 ipproto = iph->protocol;
                    ^

Fixes: 55469bc6b5 ("drivers: net: remove <net/busy_poll.h> inclusion when not needed")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-30 11:18:59 -07:00
Ido Schimmel da71577545 rtnetlink: Disallow FDB configuration for non-Ethernet device
When an FDB entry is configured, the address is validated to have the
length of an Ethernet address, but the device for which the address is
configured can be of any type.

The above can result in the use of uninitialized memory when the address
is later compared against existing addresses since 'dev->addr_len' is
used and it may be greater than ETH_ALEN, as with ip6tnl devices.

Fix this by making sure that FDB entries are only configured for
Ethernet devices.

BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863
CPU: 1 PID: 4318 Comm: syz-executor998 Not tainted 4.19.0-rc3+ #49
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x14b/0x190 lib/dump_stack.c:113
  kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956
  __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645
  memcmp+0x11d/0x180 lib/string.c:863
  dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464
  ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline]
  rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558
  rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715
  netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454
  rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733
  netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
  netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343
  netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg net/socket.c:631 [inline]
  ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
  __sys_sendmsg net/socket.c:2152 [inline]
  __do_sys_sendmsg net/socket.c:2161 [inline]
  __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
  __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
  do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x440ee9
Code: e8 cc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff6a93b518 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440ee9
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
R10: 00000000004002c8 R11: 0000000000000213 R12: 000000000000b4b0
R13: 0000000000401ec0 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:256 [inline]
  kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:181
  kmsan_kmalloc+0x98/0x100 mm/kmsan/kmsan_hooks.c:91
  kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:100
  slab_post_alloc_hook mm/slab.h:446 [inline]
  slab_alloc_node mm/slub.c:2718 [inline]
  __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351
  __kmalloc_reserve net/core/skbuff.c:138 [inline]
  __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206
  alloc_skb include/linux/skbuff.h:996 [inline]
  netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
  netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg net/socket.c:631 [inline]
  ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
  __sys_sendmsg net/socket.c:2152 [inline]
  __do_sys_sendmsg net/socket.c:2161 [inline]
  __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
  __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
  do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7

v2:
* Make error message more specific (David)

Fixes: 090096bf3d ("net: generic fdb support for drivers without ndo_fdb_<op>")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29 20:52:35 -07:00
Xin Long 7133583693 sctp: check policy more carefully when getting pr status
When getting pr_assocstatus and pr_streamstatus by sctp_getsockopt,
it doesn't correctly process the case when policy is set with
SCTP_PR_SCTP_ALL | SCTP_PR_SCTP_MASK. It even causes a
slab-out-of-bounds in sctp_getsockopt_pr_streamstatus().

This patch fixes it by return -EINVAL for this case.

Fixes: 0ac1077e3a ("sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL")
Reported-by: syzbot+5da0d0a72a9e7d791748@syzkaller.appspotmail.com
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29 20:50:41 -07:00
Xin Long df132eff46 sctp: clear the transport of some out_chunk_list chunks in sctp_assoc_rm_peer
If a transport is removed by asconf but there still are some chunks with
this transport queuing on out_chunk_list, later an use-after-free issue
will be caused when accessing this transport from these chunks in
sctp_outq_flush().

This is an old bug, we fix it by clearing the transport of these chunks
in out_chunk_list when removing a transport in sctp_assoc_rm_peer().

Reported-by: syzbot+56a40ceee5fb35932f4d@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29 20:49:44 -07:00
David S. Miller 2b0ab72799 Merge branch 'mlxsw-Couple-of-fixes'
Ido Schimmel says:

====================
mlxsw: Couple of fixes

First patch makes sure mlxsw does not ignore user requests to delete FDB
entries that were learned by the device.

Second patch fixes a use-after-free that can be triggered by requesting
a reload via devlink when the previous reload failed.

Please consider both patches for stable. They apply cleanly to both
4.18.y and 4.19.y.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29 20:48:00 -07:00
Shalom Toledo a22712a962 mlxsw: core: Fix devlink unregister flow
After a failed reload, the driver is still registered to devlink, its
devlink instance is still allocated and the 'reload_fail' flag is set.
Then, in the next reload try, the driver's allocated devlink instance will
be freed without unregistering from devlink and its components (e.g,
resources). This scenario can cause a use-after-free if the user tries to
execute command via devlink user-space tool.

Fix by not freeing the devlink instance during reload (failed or not).

Fixes: 24cc68ad6c ("mlxsw: core: Add support for reload")
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29 20:48:00 -07:00