Commit graph

1059105 commits

Author SHA1 Message Date
Jakub Kicinski 6efcdadc15 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
bpf 2021-12-08

We've added 12 non-merge commits during the last 22 day(s) which contain
a total of 29 files changed, 659 insertions(+), 80 deletions(-).

The main changes are:

1) Fix an off-by-two error in packet range markings and also add a batch of
   new tests for coverage of these corner cases, from Maxim Mikityanskiy.

2) Fix a compilation issue on MIPS JIT for R10000 CPUs, from Johan Almbladh.

3) Fix two functional regressions and a build warning related to BTF kfunc
   for modules, from Kumar Kartikeya Dwivedi.

4) Fix outdated code and docs regarding BPF's migrate_disable() use on non-
   PREEMPT_RT kernels, from Sebastian Andrzej Siewior.

5) Add missing includes in order to be able to detangle cgroup vs bpf header
   dependencies, from Jakub Kicinski.

6) Fix regression in BPF sockmap tests caused by missing detachment of progs
   from sockets when they are removed from the map, from John Fastabend.

7) Fix a missing "no previous prototype" warning in x86 JIT caused by BPF
   dispatcher, from Björn Töpel.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add selftests to cover packet access corner cases
  bpf: Fix the off-by-two error in range markings
  treewide: Add missing includes masked by cgroup -> bpf dependency
  tools/resolve_btfids: Skip unresolved symbol warning for empty BTF sets
  bpf: Fix bpf_check_mod_kfunc_call for built-in modules
  bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL
  mips, bpf: Fix reference to non-existing Kconfig symbol
  bpf: Make sure bpf_disable_instrumentation() is safe vs preemption.
  Documentation/locking/locktypes: Update migrate_disable() bits.
  bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap
  bpf, sockmap: Attach map progs to psock early for feature probes
  bpf, x86: Fix "no previous prototype" warning
====================

Link: https://lore.kernel.org/r/20211208155125.11826-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08 16:06:44 -08:00
Russell King (Oracle) 2b29cb9e3f net: dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
This commit fixes a misunderstanding in commit 4a3e0aeddf ("net: dsa:
mv88e6xxx: don't use PHY_DETECT on internal PHY's").

For Marvell DSA switches with the PHY_DETECT bit (for non-6250 family
devices), controls whether the PPU polls the PHY to retrieve the link,
speed, duplex and pause status to update the port configuration. This
applies for both internal and external PHYs.

For some switches such as 88E6352 and 88E6390X, PHY_DETECT has an
additional function of enabling auto-media mode between the internal
PHY and SERDES blocks depending on which first gains link.

The original intention of commit 5d5b231da7 (net: dsa: mv88e6xxx: use
PHY_DETECT in mac_link_up/mac_link_down) was to allow this bit to be
used to detect when this propagation is enabled, and allow software to
update the port configuration. This has found to be necessary for some
switches which do not automatically propagate status from the SERDES to
the port, which includes the 88E6390. However, commit 4a3e0aeddf
("net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's") breaks
this assumption.

Maarten Zanders has confirmed that the issue he was addressing was for
an 88E6250 switch, which does not have a PHY_DETECT bit in bit 12, but
instead a link status bit. Therefore, mv88e6xxx_port_ppu_updates() does
not report correctly.

This patch resolves the above issues by reverting Maarten's change and
instead making mv88e6xxx_port_ppu_updates() indicate whether the port
is internal for the 88E6250 family of switches.

  Yes, you're right, I'm targeting the 6250 family. And yes, your
  suggestion would solve my case and is a better implementation for
  the other devices (as far as I can see).

Fixes: 4a3e0aeddf ("net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Tested-by: Maarten Zanders <maarten.zanders@mind.be>
Link: https://lore.kernel.org/r/E1muXm7-00EwJB-7n@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08 14:35:13 -08:00
Maxim Mikityanskiy b560b21f71 bpf: Add selftests to cover packet access corner cases
This commit adds BPF verifier selftests that cover all corner cases by
packet boundary checks. Specifically, 8-byte packet reads are tested at
the beginning of data and at the beginning of data_meta, using all kinds
of boundary checks (all comparison operators: <, >, <=, >=; both
permutations of operands: data + length compared to end, end compared to
data + length). For each case there are three tests:

1. Length is just enough for an 8-byte read. Length is either 7 or 8,
   depending on the comparison.

2. Length is increased by 1 - should still pass the verifier. These
   cases are useful, because they failed before commit 2fa7d94afc
   ("bpf: Fix the off-by-two error in range markings").

3. Length is decreased by 1 - should be rejected by the verifier.

Some existing tests are just renamed to avoid duplication.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211207081521.41923-1-maximmi@nvidia.com
2021-12-08 15:42:26 +01:00
Joakim Zhang b5bd95d171 net: fec: only clear interrupt of handling queue in fec_enet_rx_queue()
Background:
We have a customer is running a Profinet stack on the 8MM which receives and
responds PNIO packets every 4ms and PNIO-CM packets every 40ms. However, from
time to time the received PNIO-CM package is "stock" and is only handled when
receiving a new PNIO-CM or DCERPC-Ping packet (tcpdump shows the PNIO-CM and
the DCERPC-Ping packet at the same time but the PNIO-CM HW timestamp is from
the expected 40 ms and not the 2s delay of the DCERPC-Ping).

After debugging, we noticed PNIO, PNIO-CM and DCERPC-Ping packets would
be handled by different RX queues.

The root cause should be driver ack all queues' interrupt when handle a
specific queue in fec_enet_rx_queue(). The blamed patch is introduced to
receive as much packets as possible once to avoid interrupt flooding.
But it's unreasonable to clear other queues'interrupt when handling one
queue, this patch tries to fix it.

Fixes: ed63f1dcd5 (net: fec: clear receive interrupts before processing a packet)
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Reported-by: Nicolas Diaz <nicolas.diaz@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20211206135457.15946-1-qiangqing.zhang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:39:39 -08:00
Jakub Kicinski 65af674a59 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-12-06

This series contains updates to iavf and i40e drivers.

Mitch adds restoration of MSI state during reset for iavf.

Michal fixes checking and reporting of descriptor count changes to
communicate changes and/or issues for iavf.

Karen resolves an issue with failed handling of VF requests while a VF
reset is occurring for i40e.

Mateusz removes clearing of VF requested queue count when configuring
VF ADQ for i40e.

Norbert fixes a NULL pointer dereference that can occur when getting VSI
descriptors for i40e.
====================

Link: https://lore.kernel.org/r/20211206183519.2733180-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:33:12 -08:00
Jakub Kicinski 9e8926888c Merge branch 'net-phy-fix-doc-build-warning'
Yanteng Si says:

====================
net: phy: Fix doc build warnings
====================

Link: https://lore.kernel.org/r/cover.1638776933.git.siyanteng@loongson.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:26:22 -08:00
Yanteng Si c35e8de704 net: phy: Add the missing blank line in the phylink_suspend comment
Fix warning as:

Documentation/networking/kapi:147: ./drivers/net/phy/phylink.c:1657: WARNING: Unexpected indentation.
Documentation/networking/kapi:147: ./drivers/net/phy/phylink.c:1658: WARNING: Block quote ends without a blank line; unexpected unindent.

Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:26:22 -08:00
Yanteng Si a97770cc40 net: phy: Remove unnecessary indentation in the comments of phy_device
Fix warning as:

linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:543: WARNING: Unexpected indentation.
linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:544: WARNING: Block quote ends without a blank line; unexpected unindent.
linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:546: WARNING: Unexpected indentation.

Suggested-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:26:06 -08:00
Ameer Hamza e6f60c51f0 gve: fix for null pointer dereference.
Avoid passing NULL skb to __skb_put() function call if
napi_alloc_skb() returns NULL.

Fixes: 37149e9374 ("gve: Implement packet continuation for RX.")
Signed-off-by: Ameer Hamza <amhamza.mgc@gmail.com>
Link: https://lore.kernel.org/r/20211205183810.8299-1-amhamza.mgc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:57:17 -08:00
Petr Machata 6ebe4b3508 MAINTAINERS: net: mlxsw: Remove Jiri as a maintainer, add myself
Jiri has moved on and will not carry out the mlxsw maintainership duty any
longer. Add myself as a co-maintainer instead.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/45b54312cdebaf65c5d110b15a5dd2df795bf2be.1638807297.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:35:43 -08:00
Jakub Kicinski 56a271be06 Merge branch 'net-tls-cover-all-ciphers-with-tests'
Vadim Fedorenko says:

====================
net: tls: cover all ciphers with tests

Recent patches to Kernel TLS showed that some ciphers are not covered
with tests. Let's cover missed.
====================

Link: https://lore.kernel.org/r/20211206213932.7508-1-vfedorenko@novek.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:18:10 -08:00
Vadim Fedorenko 13bf99ab21 selftests: tls: add missing AES256-GCM cipher
Add tests for TLSv1.2 and TLSv1.3 with AES256-GCM cipher

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:18:07 -08:00
Vadim Fedorenko d76c51f976 selftests: tls: add missing AES-CCM cipher tests
Add tests for TLSv1.2 and TLSv1.3 with AES-CCM cipher.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:18:07 -08:00
Dan Carpenter d17b9737c2 net/qla3xxx: fix an error code in ql_adapter_up()
The ql_wait_for_drvr_lock() fails and returns false, then this
function should return an error code instead of returning success.

The other problem is that the success path prints an error message
netdev_err(ndev, "Releasing driver lock\n");  Delete that and
re-order the code a little to make it more clear.

Fixes: 5a4faa8737 ("[PATCH] qla3xxx NIC driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20211207082416.GA16110@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 10:37:10 -08:00
Jakub Kicinski 2a62df3692 linux-can-fixes-for-5.16-20211207
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmGvM24THG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCpyVqK+u3vqZL8CACmYXCIKdLxumCmfa7z61r6Y839grFl
 weBo1dgrg/FIIyxro0EEmGi3ZRGrVcQNyQOQKl9xtt8FFmkrl5yCioQ5N+ib/Qt7
 6BhMN0kdaWQDzgyIT5BT8Ba/13S4Hpjb7baDBf+Rqw7WemeX2hni8Dx4WxflfMbo
 lpxuRyDtUvndUHVzATkbB8TLsmB50wdTinzZkY3IV8bAhLcznQ2vYvV4HblbeNlA
 BtEEtAjsR7zFfyqDmxOIOdqMD4m4vjUnaOoT6KQznNIy1EYxFgX7VFNAp3DeGaX3
 bM4CTEbVm980hPgm8tjFL0p6BExWWR9q6lR/x41O/P2cQ2PKNDSuBnuo
 =4e0K
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-5.16-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
can 2021-12-07

The 1st patch is by Vincent Mailhol and fixes a use after free in the
pch_can driver.

Dan Carpenter fixes a use after free in the ems_pcmcia sja1000 driver.

The remaining 7 patches target the m_can driver. Brian Silverman
contributes a patch to disable and ignore the ELO interrupt, which is
currently not handled in the driver and may lead to an interrupt
storm. Vincent Mailhol's patch fixes a memory leak in the error path
of the m_can_read_fifo() function. The remaining patches are
contributed by Matthias Schiffer, first a iomap_read_fifo() and
iomap_write_fifo() functions are fixed in the PCI glue driver, then
the clock rate for the Intel Ekhart Lake platform is fixed, the last 3
patches add support for the custom bit timings on the Elkhart Lake
platform.

* tag 'linux-can-fixes-for-5.16-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: m_can: pci: use custom bit timings for Elkhart Lake
  can: m_can: make custom bittiming fields const
  Revert "can: m_can: remove support for custom bit timing"
  can: m_can: pci: fix incorrect reference clock rate
  can: m_can: pci: fix iomap_read_fifo() and iomap_write_fifo()
  can: m_can: m_can_read_fifo: fix memory leak in error branch
  can: m_can: Disable and ignore ELO interrupt
  can: sja1000: fix use after free in ems_pcmcia_add_card()
  can: pch_can: pch_can_rx_normal: fix use after free
====================

Link: https://lore.kernel.org/r/20211207102420.120131-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 10:32:05 -08:00
Matthias Schiffer ea4c178768 can: m_can: pci: use custom bit timings for Elkhart Lake
The relevant datasheet [1] specifies nonstandard limits for the bit timing
parameters. While it is unclear what the exact effect of violating these
limits is, it seems like a good idea to adhere to the documentation.

[1] Intel Atom® x6000E Series, and Intel® Pentium® and Celeron® N and J
    Series Processors for IoT Applications Datasheet,
    Volume 2 (Book 3 of 3), July 2021, Revision 001

Fixes: cab7ffc032 ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
Link: https://lore.kernel.org/all/9eba5d7c05a48ead4024ffa6e5926f191d8c6b38.1636967198.git.matthias.schiffer@ew.tq-group.com
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07 09:51:41 +01:00
Matthias Schiffer ea22ba40de can: m_can: make custom bittiming fields const
The assigned timing structs will be defined a const anyway, so we can
avoid a few casts by declaring the struct fields as const as well.

Link: https://lore.kernel.org/all/4508fa4e639164b2584c49a065d90c78a91fa568.1636967198.git.matthias.schiffer@ew.tq-group.com
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07 09:51:41 +01:00
Matthias Schiffer ea768b2ffe Revert "can: m_can: remove support for custom bit timing"
The timing limits specified by the Elkhart Lake CPU datasheets do not
match the defaults. Let's reintroduce the support for custom bit timings.

This reverts commit 0ddd83fbeb.

Link: https://lore.kernel.org/all/00c9e2596b1a548906921a574d4ef7a03c0dace0.1636967198.git.matthias.schiffer@ew.tq-group.com
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07 09:51:41 +01:00
Matthias Schiffer 8c03b8bff7 can: m_can: pci: fix incorrect reference clock rate
When testing the CAN controller on our Ekhart Lake hardware, we
determined that all communication was running with twice the configured
bitrate. Changing the reference clock rate from 100MHz to 200MHz fixed
this. Intel's support has confirmed to us that 200MHz is indeed the
correct clock rate.

Fixes: cab7ffc032 ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
Link: https://lore.kernel.org/all/c9cf3995f45c363e432b3ae8eb1275e54f009fc8.1636967198.git.matthias.schiffer@ew.tq-group.com
Cc: stable@vger.kernel.org
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reviewed-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07 09:51:41 +01:00
Matthias Schiffer d737de2d7c can: m_can: pci: fix iomap_read_fifo() and iomap_write_fifo()
The same fix that was previously done in m_can_platform in commit
99d173fbe8 ("can: m_can: fix iomap_read_fifo() and iomap_write_fifo()")
is required in m_can_pci as well to make iomap_read_fifo() and
iomap_write_fifo() work for val_count > 1.

Fixes: 812270e544 ("can: m_can: Batch FIFO writes during CAN transmit")
Fixes: 1aa6772f64 ("can: m_can: Batch FIFO reads during CAN receive")
Link: https://lore.kernel.org/all/20211118144011.10921-1-matthias.schiffer@ew.tq-group.com
Cc: stable@vger.kernel.org
Cc: Matt Kline <matt@bitbashing.io>
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07 09:51:36 +01:00
Vincent Mailhol 31cb32a590 can: m_can: m_can_read_fifo: fix memory leak in error branch
In m_can_read_fifo(), if the second call to m_can_fifo_read() fails,
the function jump to the out_fail label and returns without calling
m_can_receive_skb(). This means that the skb previously allocated by
alloc_can_skb() is not freed. In other terms, this is a memory leak.

This patch adds a goto label to destroy the skb if an error occurs.

Issue was found with GCC -fanalyzer, please follow the link below for
details.

Fixes: e39381770e ("can: m_can: Disable IRQs on FIFO bus errors")
Link: https://lore.kernel.org/all/20211107050755.70655-1-mailhol.vincent@wanadoo.fr
Cc: stable@vger.kernel.org
Cc: Matt Kline <matt@bitbashing.io>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07 09:09:06 +01:00
Brian Silverman f58ac1adc7 can: m_can: Disable and ignore ELO interrupt
With the design of this driver, this condition is often triggered.
However, the counter that this interrupt indicates an overflow is never
read either, so overflowing is harmless.

On my system, when a CAN bus starts flapping up and down, this locks up
the whole system with lots of interrupts and printks.

Specifically, this interrupt indicates the CEL field of ECR has
overflowed. All reads of ECR mask out CEL.

Fixes: e0d1f4816f ("can: m_can: add Bosch M_CAN controller support")
Link: https://lore.kernel.org/all/20211129222628.7490-1-brian.silverman@bluerivertech.com
Cc: stable@vger.kernel.org
Signed-off-by: Brian Silverman <brian.silverman@bluerivertech.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07 09:09:02 +01:00
Dan Carpenter 3ec6ca6b1a can: sja1000: fix use after free in ems_pcmcia_add_card()
If the last channel is not available then "dev" is freed.  Fortunately,
we can just use "pdev->irq" instead.

Also we should check if at least one channel was set up.

Fixes: fd734c6f25 ("can/sja1000: add driver for EMS PCMCIA card")
Link: https://lore.kernel.org/all/20211124145041.GB13656@kili
Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07 09:08:58 +01:00
Vincent Mailhol 94cddf1e92 can: pch_can: pch_can_rx_normal: fix use after free
After calling netif_receive_skb(skb), dereferencing skb is unsafe.
Especially, the can_frame cf which aliases skb memory is dereferenced
just after the call netif_receive_skb(skb).

Reordering the lines solves the issue.

Fixes: b21d18b51b ("can: Topcliff: Add PCH_CAN driver.")
Link: https://lore.kernel.org/all/20211123111654.621610-1-mailhol.vincent@wanadoo.fr
Cc: stable@vger.kernel.org
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07 09:08:36 +01:00
Eric Dumazet 4dbb0dad8e devlink: fix netns refcount leak in devlink_nl_cmd_reload()
While preparing my patch series adding netns refcount tracking,
I spotted bugs in devlink_nl_cmd_reload()

Some error paths forgot to release a refcount on a netns.

To fix this, we can reduce the scope of get_net()/put_net()
section around the call to devlink_reload().

Fixes: ccdf07219d ("devlink: Add reload action option to devlink reload command")
Fixes: dc64cc7c63 ("devlink: Add devlink reload limit option")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Moshe Shemesh <moshe@mellanox.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20211205192822.1741045-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:56:32 -08:00
Antoine Tenart dde91ccfa2 ethtool: do not perform operations on net devices being unregistered
There is a short period between a net device starts to be unregistered
and when it is actually gone. In that time frame ethtool operations
could still be performed, which might end up in unwanted or undefined
behaviours[1].

Do not allow ethtool operations after a net device starts its
unregistration. This patch targets the netlink part as the ioctl one
isn't affected: the reference to the net device is taken and the
operation is executed within an rtnl lock section and the net device
won't be found after unregister.

[1] For example adding Tx queues after unregister ends up in NULL
    pointer exceptions and UaFs, such as:

      BUG: KASAN: use-after-free in kobject_get+0x14/0x90
      Read of size 1 at addr ffff88801961248c by task ethtool/755

      CPU: 0 PID: 755 Comm: ethtool Not tainted 5.15.0-rc6+ #778
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/014
      Call Trace:
       dump_stack_lvl+0x57/0x72
       print_address_description.constprop.0+0x1f/0x140
       kasan_report.cold+0x7f/0x11b
       kobject_get+0x14/0x90
       kobject_add_internal+0x3d1/0x450
       kobject_init_and_add+0xba/0xf0
       netdev_queue_update_kobjects+0xcf/0x200
       netif_set_real_num_tx_queues+0xb4/0x310
       veth_set_channels+0x1c3/0x550
       ethnl_set_channels+0x524/0x610

Fixes: 041b1c5d4a ("ethtool: helper functions for netlink interface")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Link: https://lore.kernel.org/r/20211203101318.435618-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:53:32 -08:00
Norbert Zulinski 23ec111bf3 i40e: Fix NULL pointer dereference in i40e_dbg_dump_desc
When trying to dump VFs VSI RX/TX descriptors
using debugfs there was a crash
due to NULL pointer dereference in i40e_dbg_dump_desc.
Added a check to i40e_dbg_dump_desc that checks if
VSI type is correct for dumping RX/TX descriptors.

Fixes: 02e9c29081 ("i40e: debugfs interface")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-06 09:44:56 -08:00
Mateusz Palczewski 8aa55ab422 i40e: Fix pre-set max number of queues for VF
After setting pre-set combined to 16 queues and reserving 16 queues by
tc qdisc, pre-set maximum combined queues returned to default value
after VF reset being 4 and this generated errors during removing tc.
Fixed by removing clear num_req_queues before reset VF.

Fixes: e284fc2804 (i40e: Add and delete cloud filter)
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Bindushree P <Bindushree.p@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-06 09:44:56 -08:00
Karen Sornek 61125b8be8 i40e: Fix failed opcode appearing if handling messages from VF
Fix failed operation code appearing if handling messages from VF.
Implemented by waiting for VF appropriate state if request starts
handle while VF reset.
Without this patch the message handling request while VF is in
a reset state ends with error -5 (I40E_ERR_PARAM).

Fixes: 5c3c48ac6b ("i40e: implement virtual device interface")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-06 09:44:56 -08:00
Michal Maloszewski 1a1aa356dd iavf: Fix reporting when setting descriptor count
iavf_set_ringparams doesn't communicate to the user that

1. The user requested descriptor count is out of range. Instead it
   just quietly sets descriptors to the "clamped" value and calls it
   done. This makes it look an invalid value was successfully set as
   the descriptor count when this isn't actually true.

2. The user provided descriptor count needs to be inflated for alignment
   reasons.

This behavior is confusing. The ice driver has already addressed this
by rejecting invalid values for descriptor count and
messaging for alignment adjustments.
Do the same thing here by adding the error and info messages.

Fixes: fbb7ddfef2 ("i40evf: core ethtool functionality")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Michal Maloszewski <michal.maloszewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-06 08:27:50 -08:00
Lee Jones 2be6d4d16a net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero
Currently, due to the sequential use of min_t() and clamp_t() macros,
in cdc_ncm_check_tx_max(), if dwNtbOutMaxSize is not set, the logic
sets tx_max to 0.  This is then used to allocate the data area of the
SKB requested later in cdc_ncm_fill_tx_frame().

This does not cause an issue presently because when memory is
allocated during initialisation phase of SKB creation, more memory
(512b) is allocated than is required for the SKB headers alone (320b),
leaving some space (512b - 320b = 192b) for CDC data (172b).

However, if more elements (for example 3 x u64 = [24b]) were added to
one of the SKB header structs, say 'struct skb_shared_info',
increasing its original size (320b [320b aligned]) to something larger
(344b [384b aligned]), then suddenly the CDC data (172b) no longer
fits in the spare SKB data area (512b - 384b = 128b).

Consequently the SKB bounds checking semantics fails and panics:

  skbuff: skb_over_panic: text:ffffffff830a5b5f len:184 put:172   \
     head:ffff888119227c00 data:ffff888119227c00 tail:0xb8 end:0x80 dev:<NULL>

  ------------[ cut here ]------------
  kernel BUG at net/core/skbuff.c:110!
  RIP: 0010:skb_panic+0x14f/0x160 net/core/skbuff.c:106
  <snip>
  Call Trace:
   <IRQ>
   skb_over_panic+0x2c/0x30 net/core/skbuff.c:115
   skb_put+0x205/0x210 net/core/skbuff.c:1877
   skb_put_zero include/linux/skbuff.h:2270 [inline]
   cdc_ncm_ndp16 drivers/net/usb/cdc_ncm.c:1116 [inline]
   cdc_ncm_fill_tx_frame+0x127f/0x3d50 drivers/net/usb/cdc_ncm.c:1293
   cdc_ncm_tx_fixup+0x98/0xf0 drivers/net/usb/cdc_ncm.c:1514

By overriding the max value with the default CDC_NCM_NTB_MAX_SIZE_TX
when not offered through the system provided params, we ensure enough
data space is allocated to handle the CDC data, meaning no crash will
occur.

Cc: Oliver Neukum <oliver@neukum.org>
Fixes: 289507d336 ("net: cdc_ncm: use sysfs for rx/tx aggregation tuning")
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Reviewed-by: Bjørn Mork <bjorn@mork.no>
Link: https://lore.kernel.org/r/20211202143437.1411410-1-lee.jones@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-03 16:56:02 -08:00
Manish Chopra 8e227b198a qede: validate non LSO skb length
Although it is unlikely that stack could transmit a non LSO
skb with length > MTU, however in some cases or environment such
occurrences actually resulted into firmware asserts due to packet
length being greater than the max supported by the device (~9700B).

This patch adds the safeguard for such odd cases to avoid firmware
asserts.

v2: Added "Fixes" tag with one of the initial driver commit
    which enabled the TX traffic actually (as this was probably
    day1 issue which was discovered recently by some customer
    environment)

Fixes: a2ec6172d2 ("qede: Add support for link")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20211203174413.13090-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-03 16:53:37 -08:00
Maxim Mikityanskiy 2fa7d94afc bpf: Fix the off-by-two error in range markings
The first commit cited below attempts to fix the off-by-one error that
appeared in some comparisons with an open range. Due to this error,
arithmetically equivalent pieces of code could get different verdicts
from the verifier, for example (pseudocode):

  // 1. Passes the verifier:
  if (data + 8 > data_end)
      return early
  read *(u64 *)data, i.e. [data; data+7]

  // 2. Rejected by the verifier (should still pass):
  if (data + 7 >= data_end)
      return early
  read *(u64 *)data, i.e. [data; data+7]

The attempted fix, however, shifts the range by one in a wrong
direction, so the bug not only remains, but also such piece of code
starts failing in the verifier:

  // 3. Rejected by the verifier, but the check is stricter than in #1.
  if (data + 8 >= data_end)
      return early
  read *(u64 *)data, i.e. [data; data+7]

The change performed by that fix converted an off-by-one bug into
off-by-two. The second commit cited below added the BPF selftests
written to ensure than code chunks like #3 are rejected, however,
they should be accepted.

This commit fixes the off-by-two error by adjusting new_range in the
right direction and fixes the tests by changing the range into the
one that should actually fail.

Fixes: fb2a311a31 ("bpf: fix off by one for range markings with L{T, E} patterns")
Fixes: b37242c773 ("bpf: add test cases to bpf selftests to cover all access tests")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211130181607.593149-1-maximmi@nvidia.com
2021-12-03 21:44:42 +01:00
Jakub Kicinski 8581fd402a treewide: Add missing includes masked by cgroup -> bpf dependency
cgroup.h (therefore swap.h, therefore half of the universe)
includes bpf.h which in turn includes module.h and slab.h.
Since we're about to get rid of that dependency we need
to clean things up.

v2: drop the cpu.h include from cacheinfo.h, it's not necessary
and it makes riscv sensitive to ordering of include files.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Acked-by: Peter Chen <peter.chen@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/all/20211120035253.72074-1-kuba@kernel.org/  # v1
Link: https://lore.kernel.org/all/20211120165528.197359-1-kuba@kernel.org/ # cacheinfo discussion
Link: https://lore.kernel.org/bpf/20211202203400.1208663-1-kuba@kernel.org
2021-12-03 10:58:13 -08:00
Dan Carpenter badd7857f5 net: altera: set a couple error code in probe()
There are two error paths which accidentally return success instead of
a negative error code.

Fixes: bbd2190ce9 ("Altera TSE: Add main and header file for Altera Ethernet Driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-03 14:23:11 +00:00
Jiasheng Jiang 128f6ec95a net: bcm4908: Handle dma_set_coherent_mask error codes
The return value of dma_set_coherent_mask() is not always 0.
To catch the exception in case that dma is not support the mask.

Fixes: 9d61d138ab ("net: broadcom: rename BCM4908 driver & update DT binding")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-03 14:18:53 +00:00
Li Zhijian 0f8a3b48f9 selftests: net/fcnal-test.sh: add exit code
Previously, the selftest framework always treats it as *ok* even though
some of them are failed actually. That's because the script always
returns 0.

It supports PASS/FAIL/SKIP exit code now.

CC: Philip Li <philip.li@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Li Zhijian <zhijianx.li@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-03 14:17:34 +00:00
Eric Dumazet dac8e00fb6 bonding: make tx_rebalance_counter an atomic
KCSAN reported a data-race [1] around tx_rebalance_counter
which can be accessed from different contexts, without
the protection of a lock/mutex.

[1]
BUG: KCSAN: data-race in bond_alb_init_slave / bond_alb_monitor

write to 0xffff888157e8ca24 of 4 bytes by task 7075 on cpu 0:
 bond_alb_init_slave+0x713/0x860 drivers/net/bonding/bond_alb.c:1613
 bond_enslave+0xd94/0x3010 drivers/net/bonding/bond_main.c:1949
 do_set_master net/core/rtnetlink.c:2521 [inline]
 __rtnl_newlink net/core/rtnetlink.c:3475 [inline]
 rtnl_newlink+0x1298/0x13b0 net/core/rtnetlink.c:3506
 rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5571
 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2491
 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5589
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x5fc/0x6c0 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x6e1/0x7d0 net/netlink/af_netlink.c:1916
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg net/socket.c:724 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2409
 ___sys_sendmsg net/socket.c:2463 [inline]
 __sys_sendmsg+0x195/0x230 net/socket.c:2492
 __do_sys_sendmsg net/socket.c:2501 [inline]
 __se_sys_sendmsg net/socket.c:2499 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2499
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888157e8ca24 of 4 bytes by task 1082 on cpu 1:
 bond_alb_monitor+0x8f/0xc00 drivers/net/bonding/bond_alb.c:1511
 process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
 worker_thread+0x616/0xa70 kernel/workqueue.c:2445
 kthread+0x2c7/0x2e0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30

value changed: 0x00000001 -> 0x00000064

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1082 Comm: kworker/u4:3 Not tainted 5.16.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: bond1 bond_alb_monitor

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-03 14:16:48 +00:00
Eric Dumazet 03cfda4fa6 tcp: fix another uninit-value (sk_rx_queue_mapping)
KMSAN is still not happy [1].

I missed that passive connections do not inherit their
sk_rx_queue_mapping values from the request socket,
but instead tcp_child_process() is calling
sk_mark_napi_id(child, skb)

We have many sk_mark_napi_id() callers, so I am providing
a new helper, forcing the setting sk_rx_queue_mapping
and sk_napi_id.

Note that we had no KMSAN report for sk_napi_id because
passive connections got a copy of this field from the listener.
sk_rx_queue_mapping in the other hand is inside the
sk_dontcopy_begin/sk_dontcopy_end so sk_clone_lock()
leaves this field uninitialized.

We might remove dead code populating req->sk_rx_queue_mapping
in the future.

[1]

BUG: KMSAN: uninit-value in __sk_rx_queue_set include/net/sock.h:1924 [inline]
BUG: KMSAN: uninit-value in sk_rx_queue_update include/net/sock.h:1938 [inline]
BUG: KMSAN: uninit-value in sk_mark_napi_id include/net/busy_poll.h:136 [inline]
BUG: KMSAN: uninit-value in tcp_child_process+0xb42/0x1050 net/ipv4/tcp_minisocks.c:833
 __sk_rx_queue_set include/net/sock.h:1924 [inline]
 sk_rx_queue_update include/net/sock.h:1938 [inline]
 sk_mark_napi_id include/net/busy_poll.h:136 [inline]
 tcp_child_process+0xb42/0x1050 net/ipv4/tcp_minisocks.c:833
 tcp_v4_rcv+0x3d83/0x4ed0 net/ipv4/tcp_ipv4.c:2066
 ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:460 [inline]
 ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
 ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
 ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
 ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
 __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
 __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
 __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
 netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
 gro_normal_list net/core/dev.c:5850 [inline]
 napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
 virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
 __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
 napi_poll net/core/dev.c:7087 [inline]
 net_rx_action+0x824/0x1880 net/core/dev.c:7174
 __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
 run_ksoftirqd+0x33/0x50 kernel/softirq.c:920
 smpboot_thread_fn+0x616/0xbf0 kernel/smpboot.c:164
 kthread+0x721/0x850 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30

Uninit was created at:
 __alloc_pages+0xbc7/0x10a0 mm/page_alloc.c:5409
 alloc_pages+0x8a5/0xb80
 alloc_slab_page mm/slub.c:1810 [inline]
 allocate_slab+0x287/0x1c20 mm/slub.c:1947
 new_slab mm/slub.c:2010 [inline]
 ___slab_alloc+0xbdf/0x1e90 mm/slub.c:3039
 __slab_alloc mm/slub.c:3126 [inline]
 slab_alloc_node mm/slub.c:3217 [inline]
 slab_alloc mm/slub.c:3259 [inline]
 kmem_cache_alloc+0xbb3/0x11c0 mm/slub.c:3264
 sk_prot_alloc+0xeb/0x570 net/core/sock.c:1914
 sk_clone_lock+0xd6/0x1940 net/core/sock.c:2118
 inet_csk_clone_lock+0x8d/0x6a0 net/ipv4/inet_connection_sock.c:956
 tcp_create_openreq_child+0xb1/0x1ef0 net/ipv4/tcp_minisocks.c:453
 tcp_v4_syn_recv_sock+0x268/0x2710 net/ipv4/tcp_ipv4.c:1563
 tcp_check_req+0x207c/0x2a30 net/ipv4/tcp_minisocks.c:765
 tcp_v4_rcv+0x36f5/0x4ed0 net/ipv4/tcp_ipv4.c:2047
 ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:460 [inline]
 ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
 ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
 ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
 ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
 __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
 __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
 __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
 netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
 gro_normal_list net/core/dev.c:5850 [inline]
 napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
 virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
 __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
 napi_poll net/core/dev.c:7087 [inline]
 net_rx_action+0x824/0x1880 net/core/dev.c:7174
 __do_softirq+0x1fe/0x7eb kernel/softirq.c:558

Fixes: 342159ee39 ("net: avoid dirtying sk->sk_rx_queue_mapping")
Fixes: a37a0ee4d2 ("net: avoid uninit-value from tcp_conn_request")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Tested-by: Alexander Potapenko <glider@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-03 14:15:49 +00:00
Eric Dumazet a941892455 inet: use #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING consistently
Since commit 4e1beecc3b ("net/sock: Add kernel config
SOCK_RX_QUEUE_MAPPING"),
sk_rx_queue_mapping access is guarded by CONFIG_SOCK_RX_QUEUE_MAPPING.

Fixes: 54b92e8419 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-03 14:15:07 +00:00
Li Zhijian db925bca33 selftests/tc-testing: Fix cannot create /sys/bus/netdevsim/new_device: Directory nonexistent
Install netdevsim to provide /sys/bus/netdevsim/new_device interface.

It helps to fix:
 # ok 97 9a7d - Change ETS strict band without quantum # skipped - skipped - previous setup failed 11 ce7d
 #
 #
 # -----> prepare stage *** Could not execute: "echo "1 1 4" > /sys/bus/netdevsim/new_device"
 #
 # -----> prepare stage *** Error message: "/bin/sh: 1: cannot create /sys/bus/netdevsim/new_device: Directory nonexistent
 # "
 #
 # -----> prepare stage *** Aborting test run.
 #
 #
 # <_io.BufferedReader name=5> *** stdout ***
 #

Signed-off-by: Li Zhijian <zhijianx.li@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-03 11:46:41 +00:00
Li Zhijian a8c9505c53 selftests/tc-testing: add missing config
qdiscs/fq_pie requires CONFIG_NET_SCH_FQ_PIE, otherwise tc will fail
to create a fq_pie qdisc.

It fixes following issue:
 # not ok 57 83be - Create FQ-PIE with invalid number of flows
 #       Command exited with 2, expected 0
 # Error: Specified qdisc not found.

Signed-off-by: Li Zhijian <zhijianx.li@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-03 11:46:41 +00:00
Li Zhijian 96f3896780 selftests/tc-testing: add exit code
Mark the summary result as FAIL to prevent from confusing the selftest
framework if some of them are failed.

Previously, the selftest framework always treats it as *ok* even though
some of them are failed actually. That's because the script tdc.sh always
return 0.

 # All test results:
 #
 # 1..97
 # ok 1 83be - Create FQ-PIE with invalid number of flows
 # ok 2 8b6e - Create RED with no flags
[...snip]
 # ok 6 5f15 - Create RED with flags ECN, harddrop
 # ok 7 53e8 - Create RED with flags ECN, nodrop
 # ok 8 d091 - Fail to create RED with only nodrop flag
 # ok 9 af8e - Create RED with flags ECN, nodrop, harddrop
 # not ok 10 ce7d - Add mq Qdisc to multi-queue device (4 queues)
 #       Could not match regex pattern. Verify command output:
 # qdisc mq 1: root
 # qdisc fq_codel 0: parent 1:4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 # qdisc fq_codel 0: parent 1:3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
[...snip]
 # ok 96 6979 - Change quantum of a strict ETS band
 # ok 97 9a7d - Change ETS strict band without quantum
 #
 #
 #
 #
 ok 1 selftests: tc-testing: tdc.sh <<< summary result

CC: Philip Li <philip.li@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Li Zhijian <zhijianx.li@intel.com>
Acked-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-03 11:46:41 +00:00
Peilin Ye f6071e5e39 selftests/fib_tests: Rework fib_rp_filter_test()
Currently rp_filter tests in fib_tests.sh:fib_rp_filter_test() are
failing.  ping sockets are bound to dummy1 using the "-I" option
(SO_BINDTODEVICE), but socket lookup is failing when receiving ping
replies, since the routing table thinks they belong to dummy0.

For example, suppose ping is using a SOCK_RAW socket for ICMP messages.
When receiving ping replies, in __raw_v4_lookup(), sk->sk_bound_dev_if
is 3 (dummy1), but dif (skb_rtable(skb)->rt_iif) says 2 (dummy0), so the
raw_sk_bound_dev_eq() check fails.  Similar things happen in
ping_lookup() for SOCK_DGRAM sockets.

These tests used to pass due to a bug [1] in iputils, where "ping -I"
actually did not bind ICMP message sockets to device.  The bug has been
fixed by iputils commit f455fee41c07 ("ping: also bind the ICMP socket
to the specific device") in 2016, which is why our rp_filter tests
started to fail.  See [2] .

Fixing the tests while keeping everything in one netns turns out to be
nontrivial.  Rework the tests and build the following topology:

 ┌─────────────────────────────┐    ┌─────────────────────────────┐
 │  network namespace 1 (ns1)  │    │  network namespace 2 (ns2)  │
 │                             │    │                             │
 │  ┌────┐     ┌─────┐         │    │  ┌─────┐            ┌────┐  │
 │  │ lo │<───>│veth1│<────────┼────┼─>│veth2│<──────────>│ lo │  │
 │  └────┘     ├─────┴──────┐  │    │  ├─────┴──────┐     └────┘  │
 │             │192.0.2.1/24│  │    │  │192.0.2.1/24│             │
 │             └────────────┘  │    │  └────────────┘             │
 └─────────────────────────────┘    └─────────────────────────────┘

Consider sending an ICMP_ECHO packet A in ns2.  Both source and
destination IP addresses are 192.0.2.1, and we use strict mode rp_filter
in both ns1 and ns2:

  1. A is routed to lo since its destination IP address is one of ns2's
     local addresses (veth2);
  2. A is redirected from lo's egress to veth2's egress using mirred;
  3. A arrives at veth1's ingress in ns1;
  4. A is redirected from veth1's ingress to lo's ingress, again, using
     mirred;
  5. In __fib_validate_source(), fib_info_nh_uses_dev() returns false,
     since A was received on lo, but reverse path lookup says veth1;
  6. However A is not dropped since we have relaxed this check for lo in
     commit 66f8209547 ("fib: relax source validation check for loopback
     packets");

Making sure A is not dropped here in this corner case is the whole point
of having this test.

  7. As A reaches the ICMP layer, an ICMP_ECHOREPLY packet, B, is
     generated;
  8. Similarly, B is redirected from lo's egress to veth1's egress (in
     ns1), then redirected once again from veth2's ingress to lo's
     ingress (in ns2), using mirred.

Also test "ping 127.0.0.1" from ns2.  It does not trigger the relaxed
check in __fib_validate_source(), but just to make sure the topology
works with loopback addresses.

Tested with ping from iputils 20210722-41-gf9fb573:

$ ./fib_tests.sh -t rp_filter

IPv4 rp_filter tests
    TEST: rp_filter passes local packets		[ OK ]
    TEST: rp_filter passes loopback packets		[ OK ]

[1] https://github.com/iputils/iputils/issues/55
[2] f455fee41c

Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: adb701d6cf ("selftests: add a test case for rp_filter")
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20211201004720.6357-1-yepeilin.cs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-02 17:59:34 -08:00
Andrii Nakryiko 7d9b3ad424 Merge branch 'Fixes for kfunc-mod regressions and warnings'
Kumar Kartikeya says:

====================

This set includes fixes for two regressions and one build warning introduced by
the kfunc for modules series.

Changelog:
----------

v1 -> v2:
v1: https://lore.kernel.org/bpf/20211115191840.496263-1-memxor@gmail.com

 * Instead of demoting resolve_btfids warning to debug, only skip in case of
   set->cnt == 0.
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-12-02 13:39:47 -08:00
Kumar Kartikeya Dwivedi 3345193f6f tools/resolve_btfids: Skip unresolved symbol warning for empty BTF sets
resolve_btfids prints a warning when it finds an unresolved symbol,
(id == 0) in id_patch. This can be the case for BTF sets that are empty
(due to disabled config options), hence printing warnings for certain
builds, most recently seen in [0].

The reason behind this is because id->cnt aliases id->id in btf_id
struct, leading to empty set showing up as ID 0 when we get to id_patch,
which triggers the warning. Since sets are an exception here, accomodate
by reusing hole in btf_id for bool is_set member, setting it to true for
BTF set when setting id->cnt, and use that to skip extraneous warning.

  [0]: https://lore.kernel.org/all/1b99ae14-abb4-d18f-cc6a-d7e523b25542@gmail.com

Before:

; ./tools/bpf/resolve_btfids/resolve_btfids -v -b vmlinux net/ipv4/tcp_cubic.ko
adding symbol tcp_cubic_kfunc_ids
WARN: resolve_btfids: unresolved symbol tcp_cubic_kfunc_ids
patching addr     0: ID       0 [tcp_cubic_kfunc_ids]
sorting  addr     4: cnt      0 [tcp_cubic_kfunc_ids]
update ok for net/ipv4/tcp_cubic.ko

After:

; ./tools/bpf/resolve_btfids/resolve_btfids -v -b vmlinux net/ipv4/tcp_cubic.ko
adding symbol tcp_cubic_kfunc_ids
patching addr     0: ID       0 [tcp_cubic_kfunc_ids]
sorting  addr     4: cnt      0 [tcp_cubic_kfunc_ids]
update ok for net/ipv4/tcp_cubic.ko

Fixes: 0e32dfc80b ("bpf: Enable TCP congestion control kfunc from modules")
Reported-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211122144742.477787-4-memxor@gmail.com
2021-12-02 13:39:46 -08:00
Kumar Kartikeya Dwivedi b12f031043 bpf: Fix bpf_check_mod_kfunc_call for built-in modules
When module registering its set is built-in, THIS_MODULE will be NULL,
hence we cannot return early in case owner is NULL.

Fixes: 14f267d95f ("bpf: btf: Introduce helpers for dynamic BTF set registration")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211122144742.477787-3-memxor@gmail.com
2021-12-02 13:39:46 -08:00
Kumar Kartikeya Dwivedi d9847eb8be bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL
Vinicius Costa Gomes reported [0] that build fails when
CONFIG_DEBUG_INFO_BTF is enabled and CONFIG_BPF_SYSCALL is disabled.
This leads to btf.c not being compiled, and then no symbol being present
in vmlinux for the declarations in btf.h. Since BTF is not useful
without enabling BPF subsystem, disallow this combination.

However, theoretically disabling both now could still fail, as the
symbol for kfunc_btf_id_list variables is not available. This isn't a
problem as the compiler usually optimizes the whole register/unregister
call, but at lower optimization levels it can fail the build in linking
stage.

Fix that by adding dummy variables so that modules taking address of
them still work, but the whole thing is a noop.

  [0]: https://lore.kernel.org/bpf/20211110205418.332403-1-vinicius.gomes@intel.com

Fixes: 14f267d95f ("bpf: btf: Introduce helpers for dynamic BTF set registration")
Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211122144742.477787-2-memxor@gmail.com
2021-12-02 13:39:46 -08:00
Linus Torvalds a51e3ac43d Networking fixes for 5.16-rc4, including fixes from wireless,
and wireguard.
 
 Current release - regressions:
 
  - smc: keep smc_close_final()'s error code during active close
 
 Current release - new code bugs:
 
  - iwlwifi: various static checker fixes (int overflow, leaks, missing
    error codes)
 
  - rtw89: fix size of firmware header before transfer, avoid crash
 
  - mt76: fix timestamp check in tx_status; fix pktid leak;
 
  - mscc: ocelot: fix missing unlock on error in ocelot_hwstamp_set()
 
 Previous releases - regressions:
 
  - smc: fix list corruption in smc_lgr_cleanup_early
 
  - ipv4: convert fib_num_tclassid_users to atomic_t
 
 Previous releases - always broken:
 
  - tls: fix authentication failure in CCM mode
 
  - vrf: reset IPCB/IP6CB when processing outbound pkts, prevent
    incorrect processing
 
  - dsa: mv88e6xxx: fixes for various device errata
 
  - rds: correct socket tunable error in rds_tcp_tune()
 
  - ipv6: fix memory leak in fib6_rule_suppress
 
  - wireguard: reset peer src endpoint when netns exits
 
  - wireguard: improve resilience to DoS around incoming handshakes
 
  - tcp: fix page frag corruption on page fault which involves TCP
 
  - mpls: fix missing attributes in delete notifications
 
  - mt7915: fix NULL pointer dereference with ad-hoc mode
 
 Misc:
 
  - rt2x00: be more lenient about EPROTO errors during start
 
  - mlx4_en: update reported link modes for 1/10G
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGo6KMACgkQMUZtbf5S
 Iruq+BAAhRMTcL+X4eRIL9lIEvWEKHMKLCA/pUaQWNlSxsbEeydJWRNSc37Cs3pv
 z0rYIEhfieOz8+QXS1Kq+yZwJVXjA8Jvgld2qw9V9Y5w+N15Mj8RUtG8NaUw+o4E
 U8PCAbaamnbzyPdlCYcVHschd8MD0BCXm5+jAGeIyCP+KQCnhEpFZv+bvHaWzQR8
 FZLYrhXTR9W0DFsrKG9+haqFwFBR3+VDqTGILhaHPE+r2o6wKQQ5yJMhd8fq0SaC
 nne8zDkGuFEeW3cxj0VbhdRMyrV97eMK+P4dZ2P0Z7xcrsed9/2XJkNQNJGtuRnj
 GGJV6utupJRAY+lnJNUkifqS4Wt7KirfZsSsyaKKa4plyoVgtGhiqEYFTQVLagC0
 CF4Qe+3qks6rESbRu6PEFN4oWSkMEhRzdcDpg7vBDURUKcrRs9fgtNUJUCi8nKFA
 A/F/K+7IHBoBZyQYZbYmnGdNsNauKbF3rUY3hwMGBfQZIr/wsql9+jhtLsmZX77m
 V/L7KzT2jhhNc5gDzuLps25K3P7snKuV19qQSsY2LeuGj1x3gmWZ+ibN6ynhB+Gt
 KBnfHDMTI/4aciZBIbwJmwfeRhCF8tOfw0WZdUP7FRIXukbfVuDBoznWLz4BKKgf
 GSYSTNDs/PHZQo5vCQ/onvTwUK5aN6zoPNy5ih7lp9YZBYtN2TI=
 =r0Jh
 -----END PGP SIGNATURE-----

Merge tag 'net-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless, and wireguard.

  Mostly scattered driver changes this week, with one big clump in
  mv88e6xxx. Nothing of note, really.

  Current release - regressions:

   - smc: keep smc_close_final()'s error code during active close

  Current release - new code bugs:

   - iwlwifi: various static checker fixes (int overflow, leaks, missing
     error codes)

   - rtw89: fix size of firmware header before transfer, avoid crash

   - mt76: fix timestamp check in tx_status; fix pktid leak;

   - mscc: ocelot: fix missing unlock on error in ocelot_hwstamp_set()

  Previous releases - regressions:

   - smc: fix list corruption in smc_lgr_cleanup_early

   - ipv4: convert fib_num_tclassid_users to atomic_t

  Previous releases - always broken:

   - tls: fix authentication failure in CCM mode

   - vrf: reset IPCB/IP6CB when processing outbound pkts, prevent
     incorrect processing

   - dsa: mv88e6xxx: fixes for various device errata

   - rds: correct socket tunable error in rds_tcp_tune()

   - ipv6: fix memory leak in fib6_rule_suppress

   - wireguard: reset peer src endpoint when netns exits

   - wireguard: improve resilience to DoS around incoming handshakes

   - tcp: fix page frag corruption on page fault which involves TCP

   - mpls: fix missing attributes in delete notifications

   - mt7915: fix NULL pointer dereference with ad-hoc mode

  Misc:

   - rt2x00: be more lenient about EPROTO errors during start

   - mlx4_en: update reported link modes for 1/10G"

* tag 'net-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
  net: dsa: b53: Add SPI ID table
  gro: Fix inconsistent indenting
  selftests: net: Correct case name
  net/rds: correct socket tunable error in rds_tcp_tune()
  mctp: Don't let RTM_DELROUTE delete local routes
  net/smc: Keep smc_close_final rc during active close
  ibmvnic: drop bad optimization in reuse_tx_pools()
  ibmvnic: drop bad optimization in reuse_rx_pools()
  net/smc: fix wrong list_del in smc_lgr_cleanup_early
  Fix Comment of ETH_P_802_3_MIN
  ethernet: aquantia: Try MAC address from device tree
  ipv4: convert fib_num_tclassid_users to atomic_t
  net: avoid uninit-value from tcp_conn_request
  net: annotate data-races on txq->xmit_lock_owner
  octeontx2-af: Fix a memleak bug in rvu_mbox_init()
  net/mlx4_en: Fix an use-after-free bug in mlx4_en_try_alloc_resources()
  vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit
  net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings()
  net: dsa: mv88e6xxx: Link in pcs_get_state() if AN is bypassed
  net: dsa: mv88e6xxx: Fix inband AN for 2500base-x on 88E6393X family
  ...
2021-12-02 11:22:06 -08:00
Linus Torvalds 2b2c0f24ba Three tracing fixes:
- Allow compares of strings when using signed and unsigned characters
 
 - Fix kmemleak false positive for histogram entries.
 
 - Handle negative numbers for user defined kretprobe data sizes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYagqyxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlCuAP45eI+DN2P+HxlnuWq7bLt/HYcOucit
 nALTZ4OIux8kqgEAss4wSTcUIefOQHi3PiMXgJmyXheTbeBUk/ecInDoZAQ=
 =P6C/
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Three tracing fixes:

   - Allow compares of strings when using signed and unsigned characters

   - Fix kmemleak false positive for histogram entries

   - Handle negative numbers for user defined kretprobe data sizes"

* tag 'trace-v5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  kprobes: Limit max data_size of the kretprobe instances
  tracing: Fix a kmemleak false positive in tracing_map
  tracing/histograms: String compares should not care about signed values
2021-12-02 11:07:41 -08:00