Commit graph

1130163 commits

Author SHA1 Message Date
Russell King fd580c9830 net: sfp: augment SFP parsing with phy_interface_t bitmap
We currently parse the SFP EEPROM to a bitmap of ethtool link modes,
and then attempt to convert the link modes to a PHY interface mode.
While this works at present, there are cases where this is sub-optimal.
For example, where a module can operate with several different PHY
interface modes.

To start addressing this, arrange for the SFP EEPROM parsing to also
provide a bitmap of the possible PHY interface modes.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:32 +01:00
Russell King (Oracle) 1645f44dd5 net: phylink: add ability to validate a set of interface modes
Rather than having the ability to validate all supported interface
modes or a single interface mode, introduce the ability to validate
a subset of supported modes.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[ rebased on current net-next ]
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:32 +01:00
Barnabás Pőcze 8d05fc0394 platform/x86: use PLATFORM_DEVID_NONE instead of -1
Use the `PLATFORM_DEVID_NONE` constant instead of
hard-coding -1 when creating a platform device.

No functional changes are intended.

Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com>
Link: https://lore.kernel.org/r/20220930104857.2796923-1-pobrn@protonmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-10-03 09:40:04 +02:00
Mario Limonciello c928df03bd platform/x86/amd: pmc: Dump idle mask during "check" stage instead
The idle mask is dumped during the "prepare" and "restore" stage
right now, which helps to demonstrate issues only related to the
first s2idle entry.

If the system has entered s2idle once, but was woken up never
breaking the s2idle loop but also never went back to sleep we
might still have another issue to deal with however.

Move the dynamic debugging message here so that we'll catch it on
each iteration.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216516
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20220929215042.745-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-10-03 09:39:55 +02:00
Nathan Huckleberry 73ea735073 net: sparx5: Fix return type of sparx5_port_xmit_impl
The ndo_start_xmit field in net_device_ops is expected to be of type
netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).

The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.

The return type of sparx5_port_xmit_impl should be changed from int to
netdev_tx_t.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Cc: llvm@lists.linux.dev
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 08:07:40 +01:00
Kuniyuki Iwashima 7a62ed6136 af_unix: Fix memory leaks of the whole sk due to OOB skb.
syzbot reported a sequence of memory leaks, and one of them indicated we
failed to free a whole sk:

  unreferenced object 0xffff8880126e0000 (size 1088):
    comm "syz-executor419", pid 326, jiffies 4294773607 (age 12.609s)
    hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 7d 00 00 00 00 00 00 00  ........}.......
      01 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
    backtrace:
      [<000000006fefe750>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:1970
      [<0000000074006db5>] sk_alloc+0x3b/0x800 net/core/sock.c:2029
      [<00000000728cd434>] unix_create1+0xaf/0x920 net/unix/af_unix.c:928
      [<00000000a279a139>] unix_create+0x113/0x1d0 net/unix/af_unix.c:997
      [<0000000068259812>] __sock_create+0x2ab/0x550 net/socket.c:1516
      [<00000000da1521e1>] sock_create net/socket.c:1566 [inline]
      [<00000000da1521e1>] __sys_socketpair+0x1a8/0x550 net/socket.c:1698
      [<000000007ab259e1>] __do_sys_socketpair net/socket.c:1751 [inline]
      [<000000007ab259e1>] __se_sys_socketpair net/socket.c:1748 [inline]
      [<000000007ab259e1>] __x64_sys_socketpair+0x97/0x100 net/socket.c:1748
      [<000000007dedddc1>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      [<000000007dedddc1>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
      [<000000009456679f>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

We can reproduce this issue by creating two AF_UNIX SOCK_STREAM sockets,
send()ing an OOB skb to each other, and close()ing them without consuming
the OOB skbs.

  int skpair[2];

  socketpair(AF_UNIX, SOCK_STREAM, 0, skpair);

  send(skpair[0], "x", 1, MSG_OOB);
  send(skpair[1], "x", 1, MSG_OOB);

  close(skpair[0]);
  close(skpair[1]);

Currently, we free an OOB skb in unix_sock_destructor() which is called via
__sk_free(), but it's too late because the receiver's unix_sk(sk)->oob_skb
is accounted against the sender's sk->sk_wmem_alloc and __sk_free() is
called only when sk->sk_wmem_alloc is 0.

In the repro sequences, we do not consume the OOB skb, so both two sk's
sock_put() never reach __sk_free() due to the positive sk->sk_wmem_alloc.
Then, no one can consume the OOB skb nor call __sk_free(), and we finally
leak the two whole sk.

Thus, we must free the unconsumed OOB skb earlier when close()ing the
socket.

Fixes: 314001f0bf ("af_unix: Add OOB support")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 08:00:50 +01:00
David S. Miller 3735264d72 Merge branch 'ip_tunnel-netlink-parms'
Liu Jian says:

====================
Add helper functions to parse netlink msg of ip_tunnel

v1->v2: Move the implementation of the helper function to ip_tunnel_core.c
v2->v3: Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 07:59:06 +01:00
Liu Jian b86fca800a net: Add helper function to parse netlink msg of ip_tunnel_parm
Add ip_tunnel_netlink_parms to parse netlink msg of ip_tunnel_parm.
Reduces duplicate code, no actual functional changes.

Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 07:59:06 +01:00
Liu Jian 537dd2d9fb net: Add helper function to parse netlink msg of ip_tunnel_encap
Add ip_tunnel_netlink_encap_parms to parse netlink msg of ip_tunnel_encap.
Reduces duplicate code, no actual functional changes.

Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 07:59:06 +01:00
Tetsuo Handa a91b750fd6 net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks()
syzbot is reporting lockdep warning at rds_tcp_reset_callbacks() [1], for
commit ac3615e7f3 ("RDS: TCP: Reduce code duplication in
rds_tcp_reset_callbacks()") added cancel_delayed_work_sync() into a section
protected by lock_sock() without realizing that rds_send_xmit() might call
lock_sock().

We don't need to protect cancel_delayed_work_sync() using lock_sock(), for
even if rds_{send,recv}_worker() re-queued this work while __flush_work()
 from cancel_delayed_work_sync() was waiting for this work to complete,
retried rds_{send,recv}_worker() is no-op due to the absence of RDS_CONN_UP
bit.

Link: https://syzkaller.appspot.com/bug?extid=78c55c7bc6f66e53dce2 [1]
Reported-by: syzbot <syzbot+78c55c7bc6f66e53dce2@syzkaller.appspotmail.com>
Co-developed-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+78c55c7bc6f66e53dce2@syzkaller.appspotmail.com>
Fixes: ac3615e7f3 ("RDS: TCP: Reduce code duplication in rds_tcp_reset_callbacks()")
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 07:56:02 +01:00
David S. Miller 42e8e6d906 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1) Refactor selftests to use an array of structs in xfrm_fill_key().
   From Gautam Menghani.

2) Drop an unused argument from xfrm_policy_match.
   From Hongbin Wang.

3) Support collect metadata mode for xfrm interfaces.
   From Eyal Birger.

4) Add netlink extack support to xfrm.
   From Sabrina Dubroca.

Please note, there is a merge conflict in:

include/net/dst_metadata.h

between commit:

0a28bfd497 ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support")

from the net-next tree and commit:

5182a5d48c ("net: allow storing xfrm interface metadata in metadata_dst")

from the ipsec-next tree.

Can be solved as done in linux-next.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 07:52:13 +01:00
Takashi Iwai 02f2e785c4 Merge branch 'for-next' into for-linus 2022-10-03 08:48:26 +02:00
Wilken Gottwalt 0cf46a653b hwmon: (corsair-psu) add USB id of new revision of the HX1000i psu
Also updates the documentation accordingly.

Signed-off-by: Wilken Gottwalt <wilken.gottwalt@posteo.net>
Link: https://lore.kernel.org/r/YznOUQ7Pijedu0NW@monster.localdomain
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-10-02 14:38:55 -07:00
Linus Torvalds 4fe89d07dc Linux 6.0 2022-10-02 14:09:07 -07:00
Darrick J. Wong adc9c2e5a7 iomap: add a tracepoint for mappings returned by map_blocks
Add a new tracepoint so we can see what mapping the filesystem returns
to writeback a dirty page.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-02 11:42:19 -07:00
Darrick J. Wong 3d5f3ba1ac iomap: iomap: fix memory corruption when recording errors during writeback
Every now and then I see this crash on arm64:

Unable to handle kernel NULL pointer dereference at virtual address 00000000000000f8
Buffer I/O error on dev dm-0, logical block 8733687, async page read
Mem abort info:
  ESR = 0x0000000096000006
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x06: level 2 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000006
  CM = 0, WnR = 0
user pgtable: 64k pages, 42-bit VAs, pgdp=0000000139750000
[00000000000000f8] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000, pmd=0000000000000000
Internal error: Oops: 96000006 [#1] PREEMPT SMP
Buffer I/O error on dev dm-0, logical block 8733688, async page read
Dumping ftrace buffer:
Buffer I/O error on dev dm-0, logical block 8733689, async page read
   (ftrace buffer empty)
XFS (dm-0): log I/O error -5
Modules linked in: dm_thin_pool dm_persistent_data
XFS (dm-0): Metadata I/O Error (0x1) detected at xfs_trans_read_buf_map+0x1ec/0x590 [xfs] (fs/xfs/xfs_trans_buf.c:296).
 dm_bio_prison
XFS (dm-0): Please unmount the filesystem and rectify the problem(s)
XFS (dm-0): xfs_imap_lookup: xfs_ialloc_read_agi() returned error -5, agno 0
 dm_bufio dm_log_writes xfs nft_chain_nat xt_REDIRECT nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6t_REJECT
potentially unexpected fatal signal 6.
 nf_reject_ipv6
potentially unexpected fatal signal 6.
 ipt_REJECT nf_reject_ipv4
CPU: 1 PID: 122166 Comm: fsstress Tainted: G        W          6.0.0-rc5-djwa #rc5 3004c9f1de887ebae86015f2677638ce51ee7
 rpcsec_gss_krb5 auth_rpcgss xt_tcpudp ip_set_hash_ip ip_set_hash_net xt_set nft_compat ip_set_hash_mac ip_set nf_tables
Hardware name: QEMU KVM Virtual Machine, BIOS 1.5.1 06/16/2021
pstate: 60001000 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
 ip_tables
pc : 000003fd6d7df200
 x_tables
lr : 000003fd6d7df1ec
 overlay nfsv4
CPU: 0 PID: 54031 Comm: u4:3 Tainted: G        W          6.0.0-rc5-djwa #rc5 3004c9f1de887ebae86015f2677638ce51ee7405
Hardware name: QEMU KVM Virtual Machine, BIOS 1.5.1 06/16/2021
Workqueue: writeback wb_workfn
sp : 000003ffd9522fd0
 (flush-253:0)
pstate: 60401005 (nZCv daif +PAN -UAO -TCO -DIT +SSBS BTYPE=--)
pc : errseq_set+0x1c/0x100
x29: 000003ffd9522fd0 x28: 0000000000000023 x27: 000002acefeb6780
x26: 0000000000000005 x25: 0000000000000001 x24: 0000000000000000
x23: 00000000ffffffff x22: 0000000000000005
lr : __filemap_set_wb_err+0x24/0xe0
 x21: 0000000000000006
sp : fffffe000f80f760
x29: fffffe000f80f760 x28: 0000000000000003 x27: fffffe000f80f9f8
x26: 0000000002523000 x25: 00000000fffffffb x24: fffffe000f80f868
x23: fffffe000f80fbb0 x22: fffffc0180c26a78 x21: 0000000002530000
x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000000000

x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
x14: 0000000000000001 x13: 0000000000470af3 x12: fffffc0058f70000
x11: 0000000000000040 x10: 0000000000001b20 x9 : fffffe000836b288
x8 : fffffc00eb9fd480 x7 : 0000000000f83659 x6 : 0000000000000000
x5 : 0000000000000869 x4 : 0000000000000005 x3 : 00000000000000f8
x20: 000003fd6d740020 x19: 000000000001dd36 x18: 0000000000000001
x17: 000003fd6d78704c x16: 0000000000000001 x15: 000002acfac87668
x2 : 0000000000000ffa x1 : 00000000fffffffb x0 : 00000000000000f8
Call trace:
 errseq_set+0x1c/0x100
 __filemap_set_wb_err+0x24/0xe0
 iomap_do_writepage+0x5e4/0xd5c
 write_cache_pages+0x208/0x674
 iomap_writepages+0x34/0x60
 xfs_vm_writepages+0x8c/0xcc [xfs 7a861f39c43631f15d3a5884246ba5035d4ca78b]
x14: 0000000000000000 x13: 2064656e72757465 x12: 0000000000002180
x11: 000003fd6d8a82d0 x10: 0000000000000000 x9 : 000003fd6d8ae288
x8 : 0000000000000083 x7 : 00000000ffffffff x6 : 00000000ffffffee
x5 : 00000000fbad2887 x4 : 000003fd6d9abb58 x3 : 000003fd6d740020
x2 : 0000000000000006 x1 : 000000000001dd36 x0 : 0000000000000000
CPU: 1 PID: 122167 Comm: fsstress Tainted: G        W          6.0.0-rc5-djwa #rc5 3004c9f1de887ebae86015f2677638ce51ee7
 do_writepages+0x90/0x1c4
 __writeback_single_inode+0x4c/0x4ac
Hardware name: QEMU KVM Virtual Machine, BIOS 1.5.1 06/16/2021
 writeback_sb_inodes+0x214/0x4ac
 wb_writeback+0xf4/0x3b0
pstate: 60001000 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
 wb_workfn+0xfc/0x580
 process_one_work+0x1e8/0x480
pc : 000003fd6d7df200
 worker_thread+0x78/0x430

This crash is a result of iomap_writepage_map encountering some sort of
error during writeback and wanting to set that error code in the file
mapping so that fsync will report it.  Unfortunately, the code
dereferences folio->mapping after unlocking the folio, which means that
another thread could have removed the page from the page cache
(writeback doesn't hold the invalidation lock) and give it to somebody
else.

At best we crash the system like above; at worst, we corrupt memory or
set an error on some other unsuspecting file while failing to record the
problems with *this* file.  Regardless, fix the problem by reporting the
error to the inode mapping.

NOTE: Commit 598ecfbaa7 lifted the XFS writeback code to iomap, so
this fix should be backported to XFS in the 4.6-5.4 kernels in addition
to iomap in the 5.5-5.19 kernels.

Fixes: e735c00794 ("iomap: Convert iomap_add_to_ioend() to take a folio") # 5.17 onward
Fixes: 598ecfbaa7 ("iomap: lift the xfs writeback code to iomap") # 5.5-5.16, needs backporting
Fixes: 150d5be09c ("xfs: remove xfs_cancel_ioend") # 4.6-5.4, needs backporting
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-10-02 11:42:19 -07:00
Linus Torvalds a962b54e16 Add missing DT bindings for STM32 and a resource leak fix for DaVinci
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmM5dQwACgkQFA3kzBSg
 KbZaLw/+KM4ITFAzE9JqCp51Mf/KIKI9LpnGie5yOB6aXogx8DHS9tpUAjS8AWql
 u8GDlyV0P2lek16pyJF0f/7bdMur5qKthLg8VY/cUDcejs+6BIkDasffH1UpMzM4
 ZnDq5FpuRlHjrFA8vAqTFJfd8kClRNgt8J6DvXx+n3jvM4zZdq/FuzIjbuTuDGMF
 yCaKBQoKAi0dlKz6lT25hrEsrgVgD7QXwwWKsAERV1GTfBVGY/MGXGBotTxuRRbw
 Dlc0Bb/SUlFJg6dXvY5Khh2obo4B676nx07M1Hh5iL3aYfcdeoKMsHDM0fAayWiL
 xdQvTsUxVrn7dSqPL1uWMfOohacmOhbi90llO/2isG2mu/+jW4rctk9ovYORgb8Q
 auK4xqHBI+XLin3+R3UFIO5csMZ7POKYUPljChEvKIUsMv0RdS8od+TCRFll8NFl
 OeJtlHZwt5HFwj7Jl7JZfHMXNFLWz0KM+QbggWOX6H2MpDWx19C1h3WhExae6Jfg
 omeQHE5EtIdzDK20vZcnEF3Clvz035PnHlD48TPQM+4cmw7Axq2R0HBklW0O18O+
 S6KTFbUVvxo6qIHr/rexirosPWc98cpSILxjesH9y+syuZPOiqMGuxtpEqqmxBWI
 gEYQy3sStnMCkJMHm2sojasMX7ce2UCExZpybY1Im0j3z+ipU7U=
 =k3of
 -----END PGP SIGNATURE-----

Merge tag 'i2c-for-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Add missing DT bindings for STM32 and a resource leak fix for DaVinci"

* tag 'i2c-for-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: davinci: fix PM disable depth imbalance in davinci_i2c_probe
  dt-bindings: i2c: st,stm32-i2c: Document wakeup-source property
  dt-bindings: i2c: st,stm32-i2c: Document interrupt-names property
2022-10-02 10:00:45 -07:00
Linus Torvalds febae48afe Misc fixes:
- Fix a PMU enumeration/initialization bug on Intel Alder Lake CPUs.
  - Fix KVM guest PEBS register handling.
  - Fix race/reentry bug in perf_output_read_group() reading of PMU counters.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmM5bd8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gMmw/+Nudwq3g/YcarTfoiBOTV0Ey9b8KEDjzU
 BKk/k/W8+3Qd4lU2u6DzXIVMcIyJM2SpgdagsvVdWAPjx/qgu//zuQKP8ai8uRww
 ipBDB+PU39hDPyJwOy3YLVEdnPqiMBvzaWcfb35R5p/ZA+Y7p/ituw9HwZ/jql5d
 C1rEcu9vjleY8Cs5dVLuvlz57VPq8VuHcYsnMGODo2WYdjX3CRNnfjWQyJFBQYJk
 f/4WYGLqcDeFHWZ92X527mxsKHBFCZFx8zxLHyhjfckPPGLOophAQkimg0X0TWrH
 HU2iNVQWV6BlCvirWnovR9jcPvmEjabl1BWd/1KCdR+L+AYTveYxd10nXCbQHIT2
 fT0T6m7TgPb4Resl8Jk33VuKFNaeNmdPrN0iKeEfFeIbT/p6+TAshhzBDbbhsrCB
 JVQx8Ri4kfbUSdiCsWLlreczslSYncfDvDrVB9WW8ngnv2VDwwKzJja/3Q2/hZH8
 RDd9DVLfT7l4zdUvBmOIU/j4vPPTKf3bJVn+CVcxtztC10cdJC5Xsfseh5N4nyzT
 BjPxDDo7nX/fx53iNEb5aSfPz68KBUMmPdDLupPY+2olO9EOixS2tfA9AsU6tmIf
 4y3RbnkEcKYknYRxJjmipKGsYL9AEeG9O7E+4aOJU4zfNbXJkoN0CwfRXY7+u/Ov
 nCLZTs2iWDM=
 =JFXZ
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2022-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc perf fixes from Ingo Molnar:

 - Fix a PMU enumeration/initialization bug on Intel Alder Lake CPUs

 - Fix KVM guest PEBS register handling

 - Fix race/reentry bug in perf_output_read_group() reading of PMU
   counters

* tag 'perf-urgent-2022-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix reentry problem in perf_output_read_group()
  perf/x86/core: Completely disable guest PEBS via guest's global_ctrl
  perf/x86/intel: Fix unchecked MSR access error for Alder Lake N
2022-10-02 09:41:27 -07:00
Linus Torvalds 534b0abc62 - Add the respective UP last level cache mask accessors in order not to
cause segfaults when lscpu accesses their representation in sysfs
 
 - Fix for a race in the alternatives batch patching machinery when
 kprobes are set
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmM5ZDkACgkQEsHwGGHe
 VUrkaA//dXhnPu2AM9x/v7JMZw0BO2peKMNCmO7b6z4+xIXlxNGNYeO766ZqpjSd
 eFJj5Hv9ESOZw4UG5cvPA1Vj14nSa6/03Lo9JBFthl2KLOZEgVrD+GNQEJMqxPi/
 9s1+764NXYi8iILHj7N4epQmz+oIbCUlnHLWZRkmG5ys40cPPI/d5li/rKBK8yIQ
 W89f+WgbqCmpn9Ha8PFYy5uuLxQJnN/McDVZyW2d4MSxJ/FukRl4x1agrfnJq1fb
 xz9Y/ZpVRPQCc4fJbQcTTffyFyg42AAqC0O0jJ5ZsOJDjZoQS7WvkcKYO33FiwKv
 /wo61B+7SxbNMcZYhQGP8BxaBeSPlXmMKaifW+xZDS6RN4zfCq/M1+ziVB45GdUq
 S5hN699vhImciXM5t18wPw6mrpoBBkQYBv+xKkC9ykUw2vxEZ32DeFzwxrybdcGC
 hWKZJAVTQpvzr1FlrUAbBtQnhUTxSAB6EAdTtIuHQ+ts+OcraR8JNe59GCsEdCVI
 as+mfqMKB8lwoSyDwomkeMcx5yL9XYy+STLPsPTHLrYFjqwTBOZgWRGrVZzt0EBo
 0z12tqxpaFc7RI48Vi0qifkeX2Fi63HSBI/Ba+i11a2jM6NT2d2EcO26rDpO6R2S
 6K0N7cD3o0wO+QK2hwxBgGnX8e2aRUE8tjYmW40aclfxl4nh/08=
 =MiiB
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add the respective UP last level cache mask accessors in order not to
   cause segfaults when lscpu accesses their representation in sysfs

 - Fix for a race in the alternatives batch patching machinery when
   kprobes are set

* tag 'x86_urgent_for_v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cacheinfo: Add a cpu_llc_shared_mask() UP variant
  x86/alternative: Fix race in try_get_desc()
2022-10-02 09:30:35 -07:00
David S. Miller 9d43507319 Merge branch 'tc-bind_class-hook'
Zhengchao Shao says:

====================
refactor duplicate codes in bind_class hook function

All the bind_class callback duplicate the same logic, so we can refactor
them. First, ensure n arg not empty before call bind_class hook function.
Then, add tc_cls_bind_class() helper. Last, use tc_cls_bind_class() in
filter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-02 16:07:17 +01:00
Zhengchao Shao cc9039a134 net: sched: use tc_cls_bind_class() in filter
Use tc_cls_bind_class() in filter.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-02 16:07:17 +01:00
Zhengchao Shao 402963e34a net: sched: cls_api: introduce tc_cls_bind_class() helper
All the bind_class callback duplicate the same logic, this patch
introduces tc_cls_bind_class() helper for common usage.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-02 16:07:17 +01:00
Zhengchao Shao 4e6263ec8b net: sched: ensure n arg not empty before call bind_class
All bind_class callbacks are directly returned when n arg is empty.
Therefore, bind_class is invoked only when n arg is not empty.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-02 16:07:17 +01:00
Oleksandr Shamray 525dd5aed6 hwmon: (pmbus/mp2888) Fix sensors readouts for MPS Multi-phase mp2888 controller
Fix scale factors for reading MPS Multi-phase mp2888 controller.
Fixed sensors:
    - PIN/POUT: based on vendor documentation, set bscale factor 0.5W/LSB
    - IOUT: based on vendor documentation, set scale factor 0.25 A/LSB

Fixes: e4db7719d0 ("hwmon: (pmbus) Add support for MPS Multi-phase mp2888 controller")
Signed-off-by: Oleksandr Shamray <oleksandrs@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20220929121642.63051-1-oleksandrs@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-10-02 08:07:11 -07:00
Colin Ian King 1793bed346 dt-bindings: hwmon: sensirion,shtc1: Clean up spelling mistakes and grammar
The yaml text contains some minor spelling mistakes and grammar issues,
clean these up.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220928213139.63819-1-colin.i.king@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-10-02 08:02:24 -07:00
Zeng Heng 1e4aa3e18d hwmon: (nct6683) remove unused variable in nct6683_create_attr_group
When enable 'unused-but-set-variable' compile
warning option, it would raise warning as below:

drivers/hwmon/nct6683.c:415:9:
warning: variable 'j' set but not used [-Wunused-but-set-variable]

Variable 'j' in nct6683_create_attr_group is unused,
so remove it and simplify the 'for' loop.

Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Link: https://lore.kernel.org/r/20220927114352.2498079-1-zengheng4@huawei.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-10-02 08:01:18 -07:00
Wolfram Sang 228336f507 i2c: pci1xxxx: prevent signed integer overflow
Some constants need 'UL' markings, otherwise they are shifted into the
sign bit.

Fixes: 3616936972 ("i2c: microchip: pci1xxxx: Add driver for I2C host controller in multifunction endpoint of pci1xxxx switch")
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-10-02 00:59:24 +02:00
Gustavo A. R. Silva 492baeb958 i2c: acpi: Replace zero-length array with DECLARE_FLEX_ARRAY() helper
Zero-length arrays are deprecated and we are moving towards adopting
C99 flexible-array members, instead. So, replace zero-length arrays
declarations in anonymous union with the new DECLARE_FLEX_ARRAY()
helper macro.

This helper allows for flexible-array members in unions.

Link: https://github.com/KSPP/linux/issues/193
Link: https://github.com/KSPP/linux/issues/218
Link: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-10-02 00:58:21 +02:00
Mani Milani 342530f7fe i2c: i801: Prefer async probe
This i801 driver probe can take more than ~190ms in some devices, since
the "i2c_register_spd()" call was added inside
"i801_probe_optional_slaves()".

Prefer async probe so that other drivers can be probed and boot can
continue in parallel while this driver loads, to reduce boot time. There is
no reason to block other drivers from probing while this driver is
loading.

Signed-off-by: Mani Milani <mani@chromium.org>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-10-02 00:56:15 +02:00
Zhang Qilong e2062df704 i2c: davinci: fix PM disable depth imbalance in davinci_i2c_probe
The pm_runtime_enable will increase power disable depth. Thus a
pairing decrement is needed on the error handling path to keep
it balanced according to context.

Fixes: 17f88151ff ("i2c: davinci: Add PM Runtime Support")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Reviewed-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-10-02 00:46:42 +02:00
Andy Shevchenko fe682780d5 i2c: designware-pci: Use standard pattern for memory allocation
The pattern
	foo = kmalloc(sizeof(*foo), GFP_KERNEL);
has an advantage when foo type is changed. Since we are planning a such,
better to be prepared by using standard pattern for memory allocation.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-10-02 00:43:11 +02:00
Andy Shevchenko 65769162ae i2c: designware-pci: Group AMD NAVI quirk parts together
The code is ogranized in a way that all related parts
to the certain platform quirk go together. This is not
the case for AMD NAVI. Shuffle code to make it happen.

While at it, drop the frequency definition and use
hard coded value as it's done for other platforms and
add a comment to the PCI ID list.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-10-02 00:42:55 +02:00
Marek Vasut 367d4c887a dt-bindings: i2c: st,stm32-i2c: Document wakeup-source property
Document wakeup-source property. This fixes dtbs_check warnings
when building current Linux DTs:

"
arch/arm/boot/dts/stm32mp153c-dhcom-drc02.dtb: i2c@40015000: Unevaluated properties are not allowed ('wakeup-source' was unexpected)
"

Signed-off-by: Marek Vasut <marex@denx.de>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-10-02 00:34:28 +02:00
Marek Vasut f938a5295c dt-bindings: i2c: st,stm32-i2c: Document interrupt-names property
Document interrupt-names property with "event" and "error" interrupt names.
This fixes dtbs_check warnings when building current Linux DTs:

"
arch/arm/boot/dts/stm32mp153c-dhcom-drc02.dtb: i2c@40015000: Unevaluated properties are not allowed ('interrupt-names' was unexpected)
"

Signed-off-by: Marek Vasut <marex@denx.de>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-10-02 00:34:16 +02:00
Jakub Kicinski bc37b24ee0 Merge branch 'mlx5-xsk-updates-part3-2022-09-30'
Saeed Mahameed says:

====================
mlx5 xsk updates part3 2022-09-30

The gist of this 4 part series is in this patchset's last patch

This series contains performance optimizations. XSK starts using the
batching allocator, and XSK data path gets separated from the regular
RX, allowing to drop some branches not relevant for non-XSK use cases.
Some minor optimizations for indirect calls and need_wakeup are also
included.

Other than that, this series adds a few features to the mlx5e
implementation of XSK:

1. XDP metadata support on XSK RQs.

2. RSS contexts support for XSK RQs.

3. Some other optimizations

4. Last but not least, change the queuing scheme, so that XSK RQs no longer
use higher indices, but replace the regular RQs.

Maxim Says:
==========

In the initial implementation of XSK in mlx5e, XSK RQs coexisted with
regular RQs in the same channel. The main idea was to allow RSS work the
same for regular traffic, without need to reconfigure RSS to exclude XSK
queues.

However, this scheme didn't prove to be beneficial, mainly because of
incompatibility with other vendors. Some tools don't properly support
using higher indices for XSK queues, some tools get confused with the
double amount of RQs exposed in sysfs. Some use cases are purely XSK,
and allocating the same amount of unused regular RQs is a waste of
resources.

This commit changes the queuing scheme to the standard one, where XSK
RQs replace regular RQs on the channels where XSK sockets are open. Two
RQs still exist in the channel to allow failsafe disable of XSK, but
only one is exposed at a time. The next commit will achieve the desired
memory save by flushing the buffers when the regular RQ is unused.

As the result of this transition:

1. It's possible to use RSS contexts over XSK RQs.

2. It's possible to dedicate all queues to XSK.

3. When XSK RQs coexist with regular RQs, the admin should make sure no
unwanted traffic goes into XSK RQs by either excluding them from RSS or
settings up the XDP program to return XDP_PASS for non-XSK traffic.

4. When using a mixed fleet of mlx5e devices and other netdevs, the same
configuration can be applied. If the application supports the fallback
to copy mode on unsupported drivers, it will work too.

==========

Part 4 will include some final xsk optimizations and minor improvements

part 1: https://lore.kernel.org/netdev/20220927203611.244301-1-saeed@kernel.org/
part 2: https://lore.kernel.org/netdev/20220929072156.93299-1-saeed@kernel.org/
====================

Link: https://lore.kernel.org/r/20220930162903.62262-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:26 -07:00
Maxim Mikityanskiy 3db4c85cde net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues
In the initial implementation of XSK in mlx5e, XSK RQs coexisted with
regular RQs in the same channel. The main idea was to allow RSS work the
same for regular traffic, without need to reconfigure RSS to exclude XSK
queues.

However, this scheme didn't prove to be beneficial, mainly because of
incompatibility with other vendors. Some tools don't properly support
using higher indices for XSK queues, some tools get confused with the
double amount of RQs exposed in sysfs. Some use cases are purely XSK,
and allocating the same amount of unused regular RQs is a waste of
resources.

This commit changes the queuing scheme to the standard one, where XSK
RQs replace regular RQs on the channels where XSK sockets are open. Two
RQs still exist in the channel to allow failsafe disable of XSK, but
only one is exposed at a time. The next commit will achieve the desired
memory save by flushing the buffers when the regular RQ is unused.

As the result of this transition:

1. It's possible to use RSS contexts over XSK RQs.

2. It's possible to dedicate all queues to XSK.

3. When XSK RQs coexist with regular RQs, the admin should make sure no
unwanted traffic goes into XSK RQs by either excluding them from RSS or
settings up the XDP program to return XDP_PASS for non-XSK traffic.

4. When using a mixed fleet of mlx5e devices and other netdevs, the same
configuration can be applied. If the application supports the fallback
to copy mode on unsupported drivers, it will work too.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:21 -07:00
Maxim Mikityanskiy d9ba64deb2 net/mlx5e: Introduce the mlx5e_flush_rq function
Add a function to flush an RQ: clean up descriptors, release pages and
reset the RQ. This procedure is used by the recovery flow, and it will
also be used in a following commit to free some memory when switching a
channel to the XSK mode.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:21 -07:00
Maxim Mikityanskiy a752b2edb5 net/mlx5e: xsk: Support XDP metadata on XSK RQs
Add support for XDP metadata on XSK RQs for cross-program
communication. The driver no longer calls xdp_set_data_meta_invalid and
copies the metadata to a newly allocated SKB on XDP_PASS.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:21 -07:00
Maxim Mikityanskiy ddb7afeee2 net/mlx5e: Optimize RQ page deallocation
mlx5e_free_rx_mpwqe loops over all pages of a MPWQE, calling
mlx5e_page_release for ones that are not scheduled for XDP_TX or
XDP_REDIRECT; and mlx5e_page_release checks whether it's an XSK RQ or a
regular one for each page/XSK frame. This check can be moved outside the
loop to reduce the number of branches.

mlx5e_free_rx_wqe loops over all fragments, calling mlx5e_page_release
for the ones that are last in a page; and mlx5e_page_release checks
whether it's an XSK RQ or a regular one for each fragment. Using the
fact that XSK doesn't support multiple fragments, it can be optimized
for both XSK and regular usages:

1. Make an early check for XSK and call its deallocator directly, saving
3 branches (loop condition, frag->last_in_page and selection of
deallocator).

2. Call the regular deallocator directly in the non-XSK case, saving a
branch per fragment, except the first one.

After the changes, mlx5e_page_release is removed, as there are no
callers left.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:21 -07:00
Maxim Mikityanskiy 96d37d861a net/mlx5e: Call mlx5e_page_release_dynamic directly where possible
mlx5e_page_release calls the appropriate deallocator depending on
whether it's an XSK RQ or a regular one. Some flows that call this
function are not compatible with XSK, so they can call the non-XSK
deallocator directly to save a branch.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:21 -07:00
Maxim Mikityanskiy 132857d912 net/mlx5e: Use non-XSK page allocator in SHAMPO
The SHAMPO flow is not compatible with XSK, it can call the page pool
allocator directly to save a branch.

mlx5e_page_alloc is removed, as it's no longer used in any flow.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:20 -07:00
Maxim Mikityanskiy cf544517c4 net/mlx5e: xsk: Use xsk_buff_alloc_batch on striding RQ
XSK provides a function to allocate frames in batches for more efficient
processing. This commit starts using this function on striding RQ and
creates an optimized flow for XSK. A side effect is an opportunity to
optimize the regular RX flow by dropping branching for XSK cases.

Performance improvement is up to 6.4% in the aligned mode and up to 7.5%
in the unaligned mode.

Aligned mode, 2048-byte frames: 12.9 Mpps -> 13.8 Mpps
Aligned mode, 4096-byte frames: 11.8 Mpps -> 12.5 Mpps
Unaligned mode, 2048-byte frames: 11.9 Mpps -> 12.8 Mpps
Unaligned mode, 3072-byte frames: 11.4 Mpps -> 12.1 Mpps
Unaligned mode, 4096-byte frames: 11.0 Mpps -> 11.2 Mpps

CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:20 -07:00
Maxim Mikityanskiy 259bbc6436 net/mlx5e: xsk: Use xsk_buff_alloc_batch on legacy RQ
XSK provides a function to allocate frames in batches for more efficient
processing. This commit starts using this function on legacy RQ, adding
a special case for XSK. The new branch introduced basically replaces the
branch that was removed from the same place a few commits before.

A check is made that DMA sync is not needed, because the batching
allocator falls back to returning one frame when DMA sync is needed, and
this is best handled by the loop in the standard case.

Performance improvement is up to 8% in the aligned mode and up to 9% in
the unaligned mode.

Aligned mode, 2048-byte frames: 12.8 Mpps -> 13.5 Mpps
Aligned mode, 4096-byte frames: 11.5 Mpps -> 12.4 Mpps
Unaligned mode, 2048-byte frames: 12.2 Mpps -> 13.4 Mpps
Unaligned mode, 3072-byte frames: 11.6 Mpps -> 12.5 Mpps
Unaligned mode, 4096-byte frames: 11.2 Mpps -> 12.2 Mpps

CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:20 -07:00
Maxim Mikityanskiy a2e5ba242c net/mlx5e: xsk: Split out WQE allocation for legacy XSK RQ
Allocation of XSK frames on legacy RQ may be made more efficient with a
specialized routine that relies on certain assumptions, such as there is
only one fragment, allocation units (XSK frames) are not shared among
multiple packets. It reduces the number of branches both in the XSK code
and in the regular RQ, because with this approach there is only a single
check whether it's an XSK or regular RQ.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:20 -07:00
Maxim Mikityanskiy 0b48223237 net/mlx5e: Remove the outer loop when allocating legacy RQ WQEs
Legacy RQ WQEs are allocated in a loop in small batches (8 WQEs). As
partial batches are allowed, there is no point to have a loop in a loop,
so the outer loop is removed, and the batch size is increased up to the
total number of WQEs to allocate, still not smaller than 8.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:20 -07:00
Maxim Mikityanskiy 3f5fe0b2e6 net/mlx5e: xsk: Use partial batches in legacy RQ with XSK
The previous commit allowed allocating WQE batches in legacy RQ
partially, however, XSK still checks whether there are enough frames in
the fill ring. Remove this check to allow to allocate batches partially
also with XSK.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:19 -07:00
Maxim Mikityanskiy 42847fed55 net/mlx5e: Use partial batches in legacy RQ
Legacy RQ allocates WQEs in batches. If the batch allocation fails, the
pages of the allocated part are released. This commit changes this
behavior to allow to use the pages that have been already allocated.

After this change, we need to be careful about indexing rq->wqe.frags[].
The WQ size is a power of two that divides by wqe_bulk (8), and the old
code used whole bulks, which allowed to use indices [8*K; 8*K+7] without
overflowing. Now that the bulks may be partial, the range can start at
any location (not only at 8*K), so we need to wrap them around to avoid
out-of-bounds array access.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:19 -07:00
Maxim Mikityanskiy 5758c3145b net/mlx5e: Make the wqe_index_mask calculation more exact
The old calculation of wqe_index_mask may give false positives, i.e.
request bulking of pairs of WQEs when not strictly needed, for example,
when the first fragment size is equal to the PAGE_SIZE, bulking is not
needed, even if the number of fragments is odd.

Make the calculation more exact to cut false positives.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:19 -07:00
Maxim Mikityanskiy a064c60984 net/mlx5e: Introduce wqe_index_mask for legacy RQ
When fragments of different WQEs share the same page, mlx5e_post_rx_wqes
must wait until the old WQE stops using the page, only then the new WQE
can allocate the new page. Essentially, it means that if WQE index i is
still in use, the allocation must stop before `i % bulk`, where bulk is
the number of WQEs that may share the same page.

As bulk is always a power of two, `i % bulk = i & (bulk - 1)`, and the
new wqe_index_mask field will be equal to `bulk - 1`.

At the same time, wqe_bulk remains for optimization purposes and stores
`max(bulk, 8)`, which allows to skip the allocation until we have at
least 8 WQEs free.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:19 -07:00
Maxim Mikityanskiy 8cbcafcee1 net/mlx5e: xsk: Drop the check for XSK state in mlx5e_xsk_wakeup
The MLX5E_CHANNEL_STATE_XSK flag checked in mlx5e_xsk_wakeup indicates
that XSK queues are open, but not necessarily activated. This check is
not very useful, because:

0. Both XSK setup and netdev state transitions take the same state_lock
mutex, so they can't happen at the same time.

1. If the netdev is up, xsk_is_bound can return true only when
MLX5E_CHANNEL_STATE_XSK is set on the corresponding channel.
mlx5e_xsk_wakeup is only called when xsk_is_bound is true.

2. If the XSK socket is bound, and the netdev is going up or down,
mlx5e_xsk_wakeup can take one of two branches, depending on the return
value of napi_if_scheduled_mark_missed:

2.1. True means one of two things: either NAPI was enabled at this
point, which means MLX5E_CHANNEL_STATE_XSK was also set; or NAPI was
disabled, and nothing really happened.

2.2. False means that NAPI was enabled by this point, which also implies
MLX5E_CHANNEL_STATE_XSK was set. Additionally, mlx5e_xsk_wakeup contains
a following check for MLX5E_SQ_STATE_ENABLED on async_icosq, and this
flag implies MLX5E_CHANNEL_STATE_XSK too on XSK channels.

As checking this flag doesn't cut any flows, remove the check.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01 13:30:19 -07:00