Commit graph

1154354 commits

Author SHA1 Message Date
Steen Hegelund 95fa74148d net: microchip: sparx5: Reset VCAP counter for new rules
When a rule counter is external to the VCAP such as the Sparx5 IS2 counters
are, then this counter must be reset when a new rule is created.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 13:45:16 +00:00
Steen Hegelund 6573f71ae7 net: microchip: vcap api: Erase VCAP cache before encoding rule
For consistency the VCAP cache area is erased just before the new rule is
being encoded.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 13:45:16 +00:00
Heiner Kallweit ce870af395 r8169: reset bus if NIC isn't accessible after tx timeout
ASPM issues may result in the NIC not being accessible any longer.
In this case disabling ASPM may not work. Therefore detect this case
by checking whether register reads return ~0, and try to make the
NIC accessible again by resetting the secondary bus.

v2:
- add exception handling for the case that pci_reset_bus() fails

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 13:38:28 +00:00
Kurt Kanzenbach 9627c981ac net: dsa: mv88e6xxx: Enable PTP receive for mv88e6390
The switch receives management traffic such as STP and LLDP. However, PTP
messages are not received, only transmitted.

Ideally, the switch would trap all PTP messages to the management CPU. This
particular switch has a PTP block which identifies PTP messages and traps them
to a dedicated port. There is a register to program this destination. This is
not used at the moment.

Therefore, program it to the same port as the MGMT traffic is trapped to. This
allows to receive PTP messages as soon as timestamping is enabled.

In addition, the datasheet mentions that this register is not valid e.g., for
6190 variants. So, add a new PTP operation which is added for the 6390 and 6290
devices.

Tested simply like this on Marvell 88E6390, revision 1:

|/ # ptp4l -2 -i lan4 --tx_timestamp_timeout=40 -m
|[...]
|ptp4l[147.450]: master offset         56 s2 freq   +1262 path delay       413
|ptp4l[148.450]: master offset         22 s2 freq   +1244 path delay       434
|ptp4l[149.450]: master offset          5 s2 freq   +1234 path delay       446
|ptp4l[150.451]: master offset          3 s2 freq   +1233 path delay       451
|ptp4l[151.451]: master offset          1 s2 freq   +1232 path delay       451
|ptp4l[152.451]: master offset         -3 s2 freq   +1229 path delay       451
|ptp4l[153.451]: master offset          9 s2 freq   +1240 path delay       451

Link: https://lore.kernel.org/r/CAFSKS=PJBpvtRJxrR4sG1hyxpnUnQpiHg4SrUNzAhkWnyt9ivg@mail.gmail.com
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 13:36:57 +00:00
Bobby Eshleman 71dc9ec9ac virtio/vsock: replace virtio_vsock_pkt with sk_buff
This commit changes virtio/vsock to use sk_buff instead of
virtio_vsock_pkt. Beyond better conforming to other net code, using
sk_buff allows vsock to use sk_buff-dependent features in the future
(such as sockmap) and improves throughput.

This patch introduces the following performance changes:

Tool: Uperf
Env: Phys Host + L1 Guest
Payload: 64k
Threads: 16
Test Runs: 10
Type: SOCK_STREAM
Before: commit b7bfaa761d ("Linux 6.2-rc3")

Before
------
g2h: 16.77Gb/s
h2g: 10.56Gb/s

After
-----
g2h: 21.04Gb/s
h2g: 10.76Gb/s

Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 13:26:33 +00:00
David S. Miller 5ef2702ab4 Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-01-13 (ixgbe)

This series contains updates to ixgbe driver only.

Jesse resolves warning for RCU pointer by no longer restoring old
pointer.

Sebastian adds waiting for updating of link info on devices utilizing
crosstalk fix to avoid false link state.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 13:21:11 +00:00
Andrew Lunn 7b3c4c370c
regmap: Rework regmap_mdio_c45_{read|write} for new C45 API.
The MDIO subsystem is getting rid of MII_ADDR_C45 and thus also
encoding associated encoding of the C45 device address and register
address into one value. regmap-mdio also uses this encoding for the
C45 bus.

Move to the new C45 helpers for MDIO access and provide regmap-mdio
helper macros.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Link: https://lore.kernel.org/r/20230116111509.4086236-1-michael@walle.cc
Signed-off-by: Mark Brown <broonie@kernel.org>
2023-01-16 13:16:09 +00:00
Kirill Tkhai b27401a30e unix: Improve locking scheme in unix_show_fdinfo()
After switching to TCP_ESTABLISHED or TCP_LISTEN sk_state, alive SOCK_STREAM
and SOCK_SEQPACKET sockets can't change it anymore (since commit 3ff8bff704
"unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()").

Thus, we do not need to take lock here.

Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 11:21:11 +00:00
David S. Miller 40ea3ee2ce Merge branch 'virtio-net-xdp-multi-buffer'
Heng Qi says:

====================
virtio-net: support multi buffer xdp

Changes since PATCH v4:
- Make netdev_warn() in [PATCH 2/10] independent from [PATCH 3/10].

Changes since PATCH v3:
- Separate fix patch [2/10] for MTU calculation of single buffer xdp.
  Note that this patch needs to be backported to the stable branch.

Changes since PATCH v2:
- Even if single buffer xdp has a hole mechanism, there will be no
  problem (limiting mtu and turning off GUEST GSO), so there is no
  need to backport "[PATCH 1/9]";
- Modify calculation of MTU for single buffer xdp in virtnet_xdp_set();
- Make truesize in mergeable mode return to literal meaning;
- Add some comments for legibility;

Changes since RFC:
- Using headroom instead of vi->xdp_enabled to avoid re-reading
  in add_recvbuf_mergeable();
- Disable GRO_HW and keep linearization for single buffer xdp;
- Renamed to virtnet_build_xdp_buff_mrg();
- pr_debug() to netdev_dbg();
- Adjusted the order of the patch series.

Currently, virtio net only supports xdp for single-buffer packets
or linearized multi-buffer packets. This patchset supports xdp for
multi-buffer packets, then larger MTU can be used if xdp sets the
xdp.frags. This does not affect single buffer handling.

In order to build multi-buffer xdp neatly, we integrated the code
into virtnet_build_xdp_buff_mrg() for xdp. The first buffer is used
for prepared xdp buff, and the rest of the buffers are added to
its skb_shared_info structure. This structure can also be
conveniently converted during XDP_PASS to get the corresponding skb.

Since virtio net uses comp pages, and bpf_xdp_frags_increase_tail()
is based on the assumption of the page pool,
(rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag))
is negative in most cases. So we didn't set xdp_rxq->frag_size in
virtnet_open() to disable the tail increase.

====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 11:15:49 +00:00
Heng Qi fab89bafa9 virtio-net: support multi-buffer xdp
Driver can pass the skb to stack by build_skb_from_xdp_buff().

Driver forwards multi-buffer packets using the send queue
when XDP_TX and XDP_REDIRECT, and clears the reference of multi
pages when XDP_DROP.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 11:15:48 +00:00
Heng Qi 18117a842a virtio-net: remove xdp related info from page_to_skb()
For the clear construction of xdp_buff, we remove the xdp processing
interleaved with page_to_skb(). Now, the logic of xdp and building
skb from xdp are separate and independent.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 11:15:48 +00:00
Heng Qi b26aa481b4 virtio-net: build skb from multi-buffer xdp
This converts the xdp_buff directly to a skb, including
multi-buffer and single buffer xdp. We'll isolate the
construction of skb based on xdp from page_to_skb().

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 11:15:48 +00:00
Heng Qi 97717e8dbd virtio-net: transmit the multi-buffer xdp
This serves as the basis for XDP_TX and XDP_REDIRECT
to send a multi-buffer xdp_frame.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 11:15:48 +00:00
Heng Qi 22174f79a4 virtio-net: construct multi-buffer xdp in mergeable
Build multi-buffer xdp using virtnet_build_xdp_buff_mrg().

For the prefilled buffer before xdp is set, we will probably use
vq reset in the future. At the same time, virtio net currently
uses comp pages, and bpf_xdp_frags_increase_tail() needs to calculate
the tailroom of the last frag, which will involve the offset of the
corresponding page and cause a negative value, so we disable tail
increase by not setting xdp_rxq->frag_size.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 11:15:48 +00:00
Heng Qi ef75cb51f1 virtio-net: build xdp_buff with multi buffers
Support xdp for multi buffer packets in mergeable mode.

Putting the first buffer as the linear part for xdp_buff,
and the rest of the buffers as non-linear fragments to struct
skb_shared_info in the tailroom belonging to xdp_buff.

Let 'truesize' return to its literal meaning, that is, when
xdp is set, it includes the length of headroom and tailroom.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 11:15:48 +00:00
Heng Qi 50bd14bc98 virtio-net: update bytes calculation for xdp_frame
Update relative record value for xdp_frame as basis
for multi-buffer xdp transmission.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 11:15:48 +00:00
Heng Qi 8d9bc36de5 virtio-net: set up xdp for multi buffer packets
When the xdp program sets xdp.frags, which means it can process
multi-buffer packets over larger MTU, so we continue to support xdp.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 11:15:48 +00:00
Heng Qi e814b958ad virtio-net: fix calculation of MTU for single-buffer xdp
When single-buffer xdp is loaded, the size of the buffer filled each time
is 'sz = (PAGE_SIZE - headroom - tailroom)', which is the maximum packet
length that the driver allows the device to pass in. Otherwise, the packet
with a length greater than sz will come in, so num_buf will be greater than
or equal to 2, and xdp_linearize_page() will be performed and the packet
will be dropped because the total length is greater than PAGE_SIZE. So the
maximum value of MTU for single-buffer xdp is 'max_sz = sz - ETH_HLEN'.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 11:15:48 +00:00
Heng Qi 484beac2ff virtio-net: disable the hole mechanism for xdp
XDP core assumes that the frame_size of xdp_buff and the length of
the frag are PAGE_SIZE. The hole may cause the processing of xdp to
fail, so we disable the hole mechanism when xdp is set.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 11:15:48 +00:00
Srujana Challa 5129bd8e88 octeontx2-af: update CPT inbound inline IPsec config mailbox
Updates CPT inbound inline IPsec configure mailbox to take
CPT credit, opcode, credit_th and bpid from VF.
This patch also adds a mailbox to read inbound IPsec
configuration.

Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16 10:11:48 +00:00
Jakub Kicinski 298bfe27d1 Merge branch 'mlxbf_gige-add-bluefield-3-support'
David Thompson says:

====================
mlxbf_gige: add BlueField-3 support

This patch series adds driver logic to the "mlxbf_gige"
Ethernet driver in order to support the third generation
BlueField SoC (BF3).  The existing "mlxbf_gige" driver is
extended with BF3-specific logic and run-time decisions
are made by the driver depending on the SoC generation
(BF2 vs. BF3).

The BF3 SoC is similar to BF2 SoC with regards to transmit
and receive packet processing:
       * Driver rings usage; consumer & producer indices
       * Single queue for receive and transmit
       * DMA operation

The differences between BF3 and BF2 SoC are:
       * In addition to supporting 1Gbps interface speed, the BF3 SoC
         adds support for 10Mbps and 100Mbps interface speeds
       * BF3 requires SerDes config logic to support its SGMII interface
       * BF3 adds support for "ethtool -s" for interface speed config
       * BF3 utilizes different MDIO logic for accessing the
         board-level PHY device

Testing
  - Successful build of kernel for ARM64, ARM32, X86_64
  - Tested ARM64 build on FastModels, Palladium, SoC
====================

Link: https://lore.kernel.org/r/20230112202609.21331-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:59:12 -08:00
David Thompson e1cc8ce462 mlxbf_gige: fix white space in mlxbf_gige_eth_ioctl
This patch fixes the white space issue raised by checkpatch:
CHECK: Alignment should match open parenthesis
+static int mlxbf_gige_eth_ioctl(struct net_device *netdev,
+                              struct ifreq *ifr, int cmd)

Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:59:09 -08:00
David Thompson cedd97737a mlxbf_gige: add "set_link_ksettings" ethtool callback
This patch extends the "ethtool_ops" data structure to
include the "set_link_ksettings" callback. This change
enables configuration of the various interface speeds
that the BlueField-3 supports (10Mbps, 100Mbps, and 1Gbps).

Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:59:09 -08:00
David Thompson 20d03d4d94 mlxbf_gige: support 10M/100M/1G speeds on BlueField-3
The BlueField-3 OOB interface supports 10Mbps, 100Mbps, and 1Gbps speeds.
The external PHY is responsible for autonegotiating the speed with the
link partner. Once the autonegotiation is done, the BlueField PLU needs
to be configured accordingly.

This patch does two things:
1) Initialize the advertised control flow/duplex/speed in the probe
   based on the BlueField SoC generation (2 or 3)
2) Adjust the PLU speed config in the PHY interrupt handler

Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:59:09 -08:00
David Thompson 2321d69f92 mlxbf_gige: add MDIO support for BlueField-3
This patch adds initial MDIO support for the BlueField-3
SoC. Separate header files for the BlueField-2 and the
BlueField-3 SoCs have been created.  These header files
hold the SoC-specific MDIO macros since the register
offsets and bit fields have changed.  Also, in BlueField-3
there is a separate register for writing and reading the
MDIO data.  Finally, instead of having "if" statements
everywhere to differentiate between SoC-specific logic,
a mlxbf_gige_mdio_gw_t struct was created for this purpose.

Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:59:09 -08:00
Russell King (Oracle) e2a9575025 net: pcs: pcs-lynx: use phylink_get_link_timer_ns() helper
Use the phylink_get_link_timer_ns() helper to get the period for the
link timer.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1pFyhW-0067jq-Fh@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:52:50 -08:00
Piergiorgio Beruto 28dbf774bc plca.c: fix obvious mistake in checking retval
Revert a wrong fix that was done during the review process. The
intention was to substitute "if(ret < 0)" with "if(ret)".
Unfortunately, the intended fix did not meet the code.
Besides, after additional review, it was decided that "if(ret < 0)"
was actually the right thing to do.

Fixes: 8580e16c28 ("net/ethtool: add netlink interface for the PLCA RS")
Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/f2277af8951a51cfee2fb905af8d7a812b7beaf4.1673616357.git.piergiorgio.beruto@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:42:53 -08:00
Jakub Kicinski da1b0b5c1b Merge branch 'net-mdio-continue-separating-c22-and-c45'
Michael Walle says:

====================
net: mdio: Continue separating C22 and C45

I've picked this older series from Andrew up and rebased it onto
the latest net-next.

This is the second patch set in the series which separates the C22
and C45 MDIO bus transactions at the API level to the MDIO bus drivers.
====================

Link: https://lore.kernel.org/r/20230112-net-next-c45-seperation-part-2-v1-0-5eeaae931526@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:40:56 -08:00
Andrew Lunn 80e87442e6 enetc: Separate C22 and C45 transactions
The enetc MDIO bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls where appropriate.

This driver is shared with the Felix DSA switch, so update that at the
same time.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:40:54 -08:00
Andrew Lunn 3c7826d0b1 net: stmmac: Separate C22 and C45 transactions for xgmac
The stmmac MDIO bus driver in variant gmac4 can perform both C22 and
C45 transfers. Create separate functions for each and register the
C45 versions using the new API calls where appropriate.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:40:54 -08:00
Andrew Lunn 5b0a447eff net: stmmac: Separate C22 and C45 transactions for xgmac2
The stmicro stmmac xgmac2 MDIO bus driver can perform both C22 and C45
transfers. Create separate functions for each and register the C45
versions using the new API calls where appropriate.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:40:54 -08:00
Andrew Lunn 3d90c03cb4 net: lan743x: Separate C22 and C45 transactions
The microchip lan743x MDIO bus driver can perform both C22 and C45
transfers in some variants. Create separate functions for each and
register the C45 versions using the new API calls where appropriate.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:40:54 -08:00
Andrew Lunn 900888374e net: ethernet: mtk_eth_soc: Separate C22 and C45 transactions
The mediatek bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:40:54 -08:00
Andrew Lunn c58e39942a net: mdio: ipq4019: Separate C22 and C45 transactions
The ipq4019 driver can perform both C22 and C45 transfers.  Create
separate functions for each and register the C45 versions using the
new driver API calls.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:40:54 -08:00
Andrew Lunn c3c497eb8b net: mdio: aspeed: Separate C22 and C45 transactions
The aspeed MDIO bus driver can perform both C22 and C45 transfers.
Modify the existing C45 functions to take the devad as a parameter,
and remove the wrappers so there are individual C22 and C45 functions. Add
the C45 functions to the new API calls.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:40:54 -08:00
Andrew Lunn d544a25930 net: mdio: mux-bcm-iproc: Separate C22 and C45 transactions
The MDIO mux broadcom iproc can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:40:54 -08:00
Andrew Lunn 87e3bee0f2 net: mdio: i2c: Separate C22 and C45 transactions
The MDIO over I2C bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:40:54 -08:00
Andrew Lunn 93641ecbaa net: mdio: cavium: Separate C22 and C45 transactions
The cavium IP can perform both C22 and C45 transfers.  Create separate
functions for each and register the C45 versions in both the octeon
and thunder bus driver.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:40:53 -08:00
Bin Chen 9b7fe8046d nfp: add DCB IEEE support
Add basic DCB IEEE support. This includes support for ETS, max-rate,
and DSCP to user priority mapping.

DCB may be configured using iproute2's dcb command.
Example usage:
  dcb ets set dev $dev tc-tsa 0:ets 1:ets 2:ets 3:ets 4:ets 5:ets \
    6:ets 7:ets tc-bw 0:0 1:80 2:0 3:0 4:0 5:0 6:20 7:0
  dcb maxrate set dev $dev tc-maxrate 1:1000bit

And DCB configuration can be shown using:
  dcb ets show dev $dev
  dcb maxrate show dev $dev

Signed-off-by: Bin Chen <bin.chen@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230112121102.469739-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:40:01 -08:00
Lorenzo Bianconi bf20ce9f30 net: ethernet: mtk_wed: get rid of queue lock for tx queue
Similar to MTK Wireless Ethernet Dispatcher (WED) MCU rx queue,
we do not need to protect WED MCU tx queue with a spin lock since
the tx queue is accessed in the two following routines:
- mtk_wed_wo_queue_tx_skb():
  it is run at initialization and during mt7915 normal operation.
  Moreover MCU messages are serialized through MCU mutex.
- mtk_wed_wo_queue_tx_clean():
  it runs just at mt7915 driver module unload when no more messages
  are sent to the MCU.

Remove tx queue spinlock.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/7bd0337b2a13ab1a63673b7c03fd35206b3b284e.1673515140.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:35:16 -08:00
Jon Maxwell af6d10345c ipv6: remove max_size check inline with ipv4
In ip6_dst_gc() replace:

  if (entries > gc_thresh)

With:

  if (entries > ops->gc_thresh)

Sending Ipv6 packets in a loop via a raw socket triggers an issue where a
route is cloned by ip6_rt_cache_alloc() for each packet sent. This quickly
consumes the Ipv6 max_size threshold which defaults to 4096 resulting in
these warnings:

[1]   99.187805] dst_alloc: 7728 callbacks suppressed
[2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.
.
.
[300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.

When this happens the packet is dropped and sendto() gets a network is
unreachable error:

remaining pkt 200557 errno 101
remaining pkt 196462 errno 101
.
.
remaining pkt 126821 errno 101

Implement David Aherns suggestion to remove max_size check seeing that Ipv6
has a GC to manage memory usage. Ipv4 already does not check max_size.

Here are some memory comparisons for Ipv4 vs Ipv6 with the patch:

Test by running 5 instances of a program that sends UDP packets to a raw
socket 5000000 times. Compare Ipv4 and Ipv6 performance with a similar
program.

Ipv4:

Before test:

MemFree:        29427108 kB
Slab:             237612 kB

ip6_dst_cache       1912   2528    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        2881   3990    192   42    2 : tunables    0    0    0

During test:

MemFree:        29417608 kB
Slab:             247712 kB

ip6_dst_cache       1912   2528    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache       44394  44394    192   42    2 : tunables    0    0    0

After test:

MemFree:        29422308 kB
Slab:             238104 kB

ip6_dst_cache       1912   2528    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        3048   4116    192   42    2 : tunables    0    0    0

Ipv6 with patch:

Errno 101 errors are not observed anymore with the patch.

Before test:

MemFree:        29422308 kB
Slab:             238104 kB

ip6_dst_cache       1912   2528    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        3048   4116    192   42    2 : tunables    0    0    0

During Test:

MemFree:        29431516 kB
Slab:             240940 kB

ip6_dst_cache      11980  12064    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        3048   4116    192   42    2 : tunables    0    0    0

After Test:

MemFree:        29441816 kB
Slab:             238132 kB

ip6_dst_cache       1902   2432    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        3048   4116    192   42    2 : tunables    0    0    0

Tested-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230112012532.311021-1-jmaxwell37@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 20:59:14 -08:00
Keith Busch c191751410 caif: don't assume iov_iter type
The details of the iov_iter types are appropriately abstracted, so
there's no need to check for specific type fields. Just let the
abstractions handle it.

This is preparing for io_uring/net's io_send to utilize the more
efficient ITER_UBUF.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20230111184245.3784393-1-kbusch@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 20:44:20 -08:00
Anand Moon e471d83e1f dt-bindings: net: rockchip-dwmac: fix rv1126 compatible warning
Fix compatible string for RV1126 gmac, and constrain it to
be compatible with Synopsys dwmac 4.20a.

fix below warning
$ make CHECK_DTBS=y rv1126-edgeble-neu2-io.dtb
arch/arm/boot/dts/rv1126-edgeble-neu2-io.dtb: ethernet@ffc40000:
		 compatible: 'oneOf' conditional failed, one must be fixed:
        ['rockchip,rv1126-gmac', 'snps,dwmac-4.20a'] is too long
        'rockchip,rv1126-gmac' is not one of ['rockchip,rk3568-gmac', 'rockchip,rk3588-gmac']

Fixes: b36fe2f436 ("dt-bindings: net: rockchip-dwmac: add rv1126 compatible")
Reviewed-by: Jagan Teki <jagan@edgeble.ai>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Anand Moon <anand@edgeble.ai>
Link: https://lore.kernel.org/r/20230111172437.5295-1-anand@edgeble.ai
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 20:15:39 -08:00
Sebastian Czapla 6f8179c192 ixgbe: Filter out spurious link up indication
Add delayed link state recheck to filter false link up indication
caused by transceiver with no fiber cable attached.

Signed-off-by: Sebastian Czapla <sebastianx.czapla@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-13 09:42:15 -08:00
Jesse Brandeburg 3fe1d0a48d ixgbe: XDP: fix checker warning from rcu pointer
The ixgbe driver uses an older style failure mode when initializing the
XDP program and the queues. It causes some warnings when running C=2
checking builds (and it's the last one in the ethernet/intel tree).

$ make W=1 C=2 M=`pwd`/drivers/net/ethernet/intel modules
.../ixgbe_main.c:10301:25: error: incompatible types in comparison expression (different address spaces):
.../ixgbe_main.c:10301:25:    struct bpf_prog [noderef] __rcu *
.../ixgbe_main.c:10301:25:    struct bpf_prog *

Fix the problem by removing the line that tried to re-xchg "the old_prog
pointer" if there was an error, to make this driver act like the other
drivers which return the error code without "pointer restoration."

Also, update the "copy the pointer" logic to use WRITE_ONCE as many/all
the other drivers do, which required making a change in two separate
functions that write the xdp_prog variable in the ring.

The code here was modeled after the code in i40e/i40e_xdp_setup().

NOTE: Compile-tested only.

CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
CC: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-13 09:42:15 -08:00
Yunhui Cui 6e6eda44b9 sock: add tracepoint for send recv length
Add 2 tracepoints to monitor the tcp/udp traffic
of per process and per cgroup.

Regarding monitoring the tcp/udp traffic of each process, there are two
existing solutions, the first one is https://www.atoptool.nl/netatop.php.
The second is via kprobe/kretprobe.

Netatop solution is implemented by registering the hook function at the
hook point provided by the netfilter framework.

These hook functions may be in the soft interrupt context and cannot
directly obtain the pid. Some data structures are added to bind packets
and processes. For example, struct taskinfobucket, struct taskinfo ...

Every time the process sends and receives packets it needs multiple
hashmaps,resulting in low performance and it has the problem fo inaccurate
tcp/udp traffic statistics(for example: multiple threads share sockets).

We can obtain the information with kretprobe, but as we know, kprobe gets
the result by trappig in an exception, which loses performance compared
to tracepoint.

We compared the performance of tracepoints with the above two methods, and
the results are as follows:

ab -n 1000000 -c 1000 -r http://127.0.0.1/index.html
without trace:
Time per request: 39.660 [ms] (mean)
Time per request: 0.040 [ms] (mean, across all concurrent requests)

netatop:
Time per request: 50.717 [ms] (mean)
Time per request: 0.051 [ms] (mean, across all concurrent requests)

kr:
Time per request: 43.168 [ms] (mean)
Time per request: 0.043 [ms] (mean, across all concurrent requests)

tracepoint:
Time per request: 41.004 [ms] (mean)
Time per request: 0.041 [ms] (mean, across all concurrent requests

It can be seen that tracepoint has better performance.

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: Xiongchun Duan <duanxiongchun@bytedance.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13 10:25:10 +00:00
David S. Miller 8e8b6c63cc Merge branch 'rmnet-tx-pkt-aggregation'
Daniele Palmas says:

====================
net: add tx packets aggregation to ethtool and rmnet

Hello maintainers and all,

this patchset implements tx qmap packets aggregation in rmnet and generic
ethtool support for that.

Some low-cat Thread-x based modems are not capable of properly reaching the maximum
allowed throughput both in tx and rx during a bidirectional test if tx packets
aggregation is not enabled.

I verified this problem with rmnet + qmi_wwan by using a MDM9207 Cat. 4 based modem
(50Mbps/150Mbps max throughput). What is actually happening is pictured at
https://drive.google.com/file/d/1gSbozrtd9h0X63i6vdkNpN68d-9sg8f9/view

Testing with iperf TCP, when rx and tx flows are tested singularly there's no issue
in tx and minor issues in rx (not able to reach max throughput). When there are concurrent
tx and rx flows, tx throughput has an huge drop. rx a minor one, but still present.

The same scenario with tx aggregation enabled is pictured at
https://drive.google.com/file/d/1jcVIKNZD7K3lHtwKE5W02mpaloudYYih/view
showing a regular graph.

This issue does not happen with high-cat modems (e.g. SDX20), or at least it
does not happen at the throughputs I'm able to test currently: maybe the same
could happen when moving close to the maximum rates supported by those modems.
Anyway, having the tx aggregation enabled should not hurt.

The first attempt to solve this issue was in qmi_wwan qmap implementation,
see the discussion at https://lore.kernel.org/netdev/20221019132503.6783-1-dnlplm@gmail.com/

However, it turned out that rmnet was a better candidate for the implementation.

Moreover, Greg and Jakub suggested also to use ethtool for the configuration:
not sure if I got their advice right, but this patchset add also generic ethtool
support for tx aggregation.

The patches have been tested mainly against an MDM9207 based modem through USB
and SDX55 through PCI (MHI).

v2 should address the comments highlighted in the review: the implementation is
still in rmnet, due to Subash's request of keeping tx aggregation there.

v3 fixes ethtool-netlink.rst content out of table bounds and a W=1 build warning
for patch 2.

v4 solves a race related to egress_agg_params.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13 10:23:52 +00:00
Daniele Palmas db8a563a9d net: qualcomm: rmnet: add ethtool support for configuring tx aggregation
Add support for ETHTOOL_COALESCE_TX_AGGR for configuring the tx
aggregation settings.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13 10:23:52 +00:00
Daniele Palmas 64b5d1f8f2 net: qualcomm: rmnet: add tx packets aggregation
Add tx packets aggregation.

Bidirectional TCP throughput tests through iperf with low-cat
Thread-x based modems revelead performance issues both in tx
and rx.

The Windows driver does not show this issue: inspecting USB
packets revealed that the only notable change is the driver
enabling tx packets aggregation.

Tx packets aggregation is by default disabled and can be enabled
by increasing the value of ETHTOOL_A_COALESCE_TX_MAX_AGGR_FRAMES.

The maximum aggregated size is by default set to a reasonably low
value in order to support the majority of modems.

This implementation is based on patches available in Code Aurora
repositories (msm kernel) whose main authors are

Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Sean Tranchetti <stranche@codeaurora.org>

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Reviewed-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13 10:23:52 +00:00
Daniele Palmas 31de284239 ethtool: add tx aggregation parameters
Add the following ethtool tx aggregation parameters:

ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES
Maximum size in bytes of a tx aggregated block of frames.

ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES
Maximum number of frames that can be aggregated into a block.

ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS
Time in usecs after the first packet arrival in an aggregated
block for the block to be sent.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13 10:23:52 +00:00