Commit graph

17430 commits

Author SHA1 Message Date
Miri Korenblit d73fbaf24c wifi: mac80211: make associated BSS pointer visible to the driver
Some drivers need the data in it, so move it to the link conf,
which is exposed to the driver.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240206164849.6fe9782b87b4.Ifbffef638f07ca7f5c2b27f40d2cf2942d21de0b@changeid
[remove bss pointer from internal struct, update docs]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21 15:19:03 +01:00
Aditya Kumar Singh 6030b3a469 wifi: mac80211: check beacon countdown is complete on per link basis
Currently, function to check if beacon countdown is complete uses deflink
to fetch the beacon and check the counter. However, with MLO, there is
a need to check the counter for the beacon in a particular link.

Add support to use link_id in order to fetch the beacon from a particular
link data.

Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Link: https://msgid.link/20240216144621.514385-2-quic_adisi@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21 15:19:03 +01:00
David S. Miller e199c4ba82 wireless-next patches for v6.9
The second "new features" pull request for v6.9.  Lots of iwlwifi and
 stack changes this time. And naturally smaller changes to other drivers.
 
 We also twice merged wireless into wireless-next to avoid conflicts
 between the trees.
 
 Major changes:
 
 stack
 
 * mac80211: negotiated TTLM request support
 
 * SPP A-MSDU support
 
 * mac80211: wider bandwidth OFDMA config support
 
 iwlwifi
 
 * kunit tests
 
 * bump FW API to 89 for AX/BZ/SC devices
 
 * enable SPP A-MSDUs
 
 * support for new devices
 
 ath12k
 
 * refactoring in preparation for Multi-Link Operation (MLO) support
 
 * 1024 Block Ack window size support
 
 * provide firmware wmi logs via a trace event
 
 ath11k
 
 * 36 bit DMA mask support
 
 * support 6 GHz station power modes: Low Power Indoor (LPI), Standard
   Power) SP and Very Low Power (VLP)
 
 rtl8xxxu
 
 * TP-Link TL-WN823N V2 support
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmXU2PgRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZuzZAf+NsvOkkhIoMG3rYmqli9ELEgupBIEoTwo
 2favVGBbLOPIlvUJab3ZZ8Bsntpk3deRmISN27whNm5B3+36c7DKn3aYauVwUNs2
 Qb99f3HXkGZQJ8DdKLZMviXXMgKfXzpVISwzD7HdV/GhkVX4LZ/MFzv1zrvLAC/J
 LN5K6xKUqbgRJ1kAWbEoJpRCzNtKwx9GHAsO1vhL69yjBAqKkHivV9LE+BNjoXEz
 g/LD0z05JqWDyxJ7yud3+DiBlZtvpmK9oa9gpWnuF8sdvkywyBdP/ipfDDLgbCzY
 vKF1IUy5GNJSt5+AQS+zO0a8HrwzHR+XG8w5sCEKpjh3Nj0cxtFJ5w==
 =Bnyy
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2024-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.9

The second "new features" pull request for v6.9.  Lots of iwlwifi and
stack changes this time. And naturally smaller changes to other drivers.

We also twice merged wireless into wireless-next to avoid conflicts
between the trees.

Major changes:

stack

* mac80211: negotiated TTLM request support

* SPP A-MSDU support

* mac80211: wider bandwidth OFDMA config support

iwlwifi

* kunit tests

* bump FW API to 89 for AX/BZ/SC devices

* enable SPP A-MSDUs

* support for new devices

ath12k

* refactoring in preparation for Multi-Link Operation (MLO) support

* 1024 Block Ack window size support

* provide firmware wmi logs via a trace event

ath11k

* 36 bit DMA mask support

* support 6 GHz station power modes: Low Power Indoor (LPI), Standard
  Power) SP and Very Low Power (VLP)

rtl8xxxu

* TP-Link TL-WN823N V2 support
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-21 11:48:20 +00:00
Florian Westphal 3f80196888 netfilter: move nf_reinject into nfnetlink_queue modules
No need to keep this in the core, move it to the nfnetlink_queue module.
nf_reroute is moved too, there were no other callers.

Signed-off-by: Florian Westphal <fw@strlen.de>
2024-02-21 12:03:22 +01:00
Eric Dumazet 5d4cc87414 net: reorganize "struct sock" fields
Last major reorg happened in commit 9115e8cd2a ("net: reorganize
struct sock for better data locality")

Since then, many changes have been done.

Before SO_PEEK_OFF support is added to TCP, we need
to move sk_peek_off to a better location.

It is time to make another pass, and add six groups,
without explicit alignment.

- sock_write_rx (following sk_refcnt) read-write fields in rx path.
- sock_read_rx read-mostly fields in rx path.
- sock_read_rxtx read-mostly fields in both rx and tx paths.
- sock_write_rxtx read-write fields in both rx and tx paths.
- sock_write_tx read-write fields in tx paths.
- sock_read_tx read-mostly fields in tx paths.

Results on TCP_RR benchmarks seem to show a gain (4 to 5 %).

It is possible UDP needs a change, because sk_peek_off
shares a cache line with sk_receive_queue.
If this the case, we can exchange roles of sk->sk_receive
and up->reader_queue queues.

After this change, we have the following layout:

struct sock {
	struct sock_common         __sk_common;          /*     0  0x88 */
	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
	__u8                       __cacheline_group_begin__sock_write_rx[0]; /*  0x88     0 */
	atomic_t                   sk_drops;             /*  0x88   0x4 */
	__s32                      sk_peek_off;          /*  0x8c   0x4 */
	struct sk_buff_head        sk_error_queue;       /*  0x90  0x18 */
	struct sk_buff_head        sk_receive_queue;     /*  0xa8  0x18 */
	/* --- cacheline 3 boundary (192 bytes) --- */
	struct {
		atomic_t           rmem_alloc;           /*  0xc0   0x4 */
		int                len;                  /*  0xc4   0x4 */
		struct sk_buff *   head;                 /*  0xc8   0x8 */
		struct sk_buff *   tail;                 /*  0xd0   0x8 */
	} sk_backlog;                                    /*  0xc0  0x18 */
	struct {
		atomic_t                   rmem_alloc;           /*     0   0x4 */
		int                        len;                  /*   0x4   0x4 */
		struct sk_buff *           head;                 /*   0x8   0x8 */
		struct sk_buff *           tail;                 /*  0x10   0x8 */

		/* size: 24, cachelines: 1, members: 4 */
		/* last cacheline: 24 bytes */
	};

	__u8                       __cacheline_group_end__sock_write_rx[0]; /*  0xd8     0 */
	__u8                       __cacheline_group_begin__sock_read_rx[0]; /*  0xd8     0 */
	rcu *                      sk_rx_dst;            /*  0xd8   0x8 */
	int                        sk_rx_dst_ifindex;    /*  0xe0   0x4 */
	u32                        sk_rx_dst_cookie;     /*  0xe4   0x4 */
	unsigned int               sk_ll_usec;           /*  0xe8   0x4 */
	unsigned int               sk_napi_id;           /*  0xec   0x4 */
	u16                        sk_busy_poll_budget;  /*  0xf0   0x2 */
	u8                         sk_prefer_busy_poll;  /*  0xf2   0x1 */
	u8                         sk_userlocks;         /*  0xf3   0x1 */
	int                        sk_rcvbuf;            /*  0xf4   0x4 */
	rcu *                      sk_filter;            /*  0xf8   0x8 */
	/* --- cacheline 4 boundary (256 bytes) --- */
	union {
		rcu *              sk_wq;                /* 0x100   0x8 */
		struct socket_wq * sk_wq_raw;            /* 0x100   0x8 */
	};                                               /* 0x100   0x8 */
	union {
		rcu *                      sk_wq;                /*     0   0x8 */
		struct socket_wq *         sk_wq_raw;            /*     0   0x8 */
	};

	void                       (*sk_data_ready)(struct sock *); /* 0x108   0x8 */
	long                       sk_rcvtimeo;          /* 0x110   0x8 */
	int                        sk_rcvlowat;          /* 0x118   0x4 */
	__u8                       __cacheline_group_end__sock_read_rx[0]; /* 0x11c     0 */
	__u8                       __cacheline_group_begin__sock_read_rxtx[0]; /* 0x11c     0 */
	int                        sk_err;               /* 0x11c   0x4 */
	struct socket *            sk_socket;            /* 0x120   0x8 */
	struct mem_cgroup *        sk_memcg;             /* 0x128   0x8 */
	rcu *                      sk_policy[2];         /* 0x130  0x10 */
	/* --- cacheline 5 boundary (320 bytes) --- */
	__u8                       __cacheline_group_end__sock_read_rxtx[0]; /* 0x140     0 */
	__u8                       __cacheline_group_begin__sock_write_rxtx[0]; /* 0x140     0 */
	socket_lock_t              sk_lock;              /* 0x140  0x20 */
	u32                        sk_reserved_mem;      /* 0x160   0x4 */
	int                        sk_forward_alloc;     /* 0x164   0x4 */
	u32                        sk_tsflags;           /* 0x168   0x4 */
	__u8                       __cacheline_group_end__sock_write_rxtx[0]; /* 0x16c     0 */
	__u8                       __cacheline_group_begin__sock_write_tx[0]; /* 0x16c     0 */
	int                        sk_write_pending;     /* 0x16c   0x4 */
	atomic_t                   sk_omem_alloc;        /* 0x170   0x4 */
	int                        sk_sndbuf;            /* 0x174   0x4 */
	int                        sk_wmem_queued;       /* 0x178   0x4 */
	refcount_t                 sk_wmem_alloc;        /* 0x17c   0x4 */
	/* --- cacheline 6 boundary (384 bytes) --- */
	unsigned long              sk_tsq_flags;         /* 0x180   0x8 */
	union {
		struct sk_buff *   sk_send_head;         /* 0x188   0x8 */
		struct rb_root     tcp_rtx_queue;        /* 0x188   0x8 */
	};                                               /* 0x188   0x8 */
	union {
		struct sk_buff *           sk_send_head;         /*     0   0x8 */
		struct rb_root             tcp_rtx_queue;        /*     0   0x8 */
	};

	struct sk_buff_head        sk_write_queue;       /* 0x190  0x18 */
	u32                        sk_dst_pending_confirm; /* 0x1a8   0x4 */
	u32                        sk_pacing_status;     /* 0x1ac   0x4 */
	struct page_frag           sk_frag;              /* 0x1b0  0x10 */
	/* --- cacheline 7 boundary (448 bytes) --- */
	struct timer_list          sk_timer;             /* 0x1c0  0x28 */

	/* XXX last struct has 4 bytes of padding */

	unsigned long              sk_pacing_rate;       /* 0x1e8   0x8 */
	atomic_t                   sk_zckey;             /* 0x1f0   0x4 */
	atomic_t                   sk_tskey;             /* 0x1f4   0x4 */
	__u8                       __cacheline_group_end__sock_write_tx[0]; /* 0x1f8     0 */
	__u8                       __cacheline_group_begin__sock_read_tx[0]; /* 0x1f8     0 */
	unsigned long              sk_max_pacing_rate;   /* 0x1f8   0x8 */
	/* --- cacheline 8 boundary (512 bytes) --- */
	long                       sk_sndtimeo;          /* 0x200   0x8 */
	u32                        sk_priority;          /* 0x208   0x4 */
	u32                        sk_mark;              /* 0x20c   0x4 */
	rcu *                      sk_dst_cache;         /* 0x210   0x8 */
	netdev_features_t          sk_route_caps;        /* 0x218   0x8 */
	u16                        sk_gso_type;          /* 0x220   0x2 */
	u16                        sk_gso_max_segs;      /* 0x222   0x2 */
	unsigned int               sk_gso_max_size;      /* 0x224   0x4 */
	gfp_t                      sk_allocation;        /* 0x228   0x4 */
	u32                        sk_txhash;            /* 0x22c   0x4 */
	u8                         sk_pacing_shift;      /* 0x230   0x1 */
	bool                       sk_use_task_frag;     /* 0x231   0x1 */
	__u8                       __cacheline_group_end__sock_read_tx[0]; /* 0x232     0 */
	u8                         sk_gso_disabled:1;    /* 0x232: 0 0x1 */
	u8                         sk_kern_sock:1;       /* 0x232:0x1 0x1 */
	u8                         sk_no_check_tx:1;     /* 0x232:0x2 0x1 */
	u8                         sk_no_check_rx:1;     /* 0x232:0x3 0x1 */

	/* XXX 4 bits hole, try to pack */

	u8                         sk_shutdown;          /* 0x233   0x1 */
	u16                        sk_type;              /* 0x234   0x2 */
	u16                        sk_protocol;          /* 0x236   0x2 */
	unsigned long              sk_lingertime;        /* 0x238   0x8 */
	/* --- cacheline 9 boundary (576 bytes) --- */
	struct proto *             sk_prot_creator;      /* 0x240   0x8 */
	rwlock_t                   sk_callback_lock;     /* 0x248   0x8 */
	int                        sk_err_soft;          /* 0x250   0x4 */
	u32                        sk_ack_backlog;       /* 0x254   0x4 */
	u32                        sk_max_ack_backlog;   /* 0x258   0x4 */
	kuid_t                     sk_uid;               /* 0x25c   0x4 */
	spinlock_t                 sk_peer_lock;         /* 0x260   0x4 */
	int                        sk_bind_phc;          /* 0x264   0x4 */
	struct pid *               sk_peer_pid;          /* 0x268   0x8 */
	const struct cred  *       sk_peer_cred;         /* 0x270   0x8 */
	ktime_t                    sk_stamp;             /* 0x278   0x8 */
	/* --- cacheline 10 boundary (640 bytes) --- */
	int                        sk_disconnects;       /* 0x280   0x4 */
	u8                         sk_txrehash;          /* 0x284   0x1 */
	u8                         sk_clockid;           /* 0x285   0x1 */
	u8                         sk_txtime_deadline_mode:1; /* 0x286: 0 0x1 */
	u8                         sk_txtime_report_errors:1; /* 0x286:0x1 0x1 */
	u8                         sk_txtime_unused:6;   /* 0x286:0x2 0x1 */

	/* XXX 1 byte hole, try to pack */

	void *                     sk_user_data;         /* 0x288   0x8 */
	void *                     sk_security;          /* 0x290   0x8 */
	struct sock_cgroup_data    sk_cgrp_data;         /* 0x298   0x8 */
	void                       (*sk_state_change)(struct sock *); /* 0x2a0   0x8 */
	void                       (*sk_write_space)(struct sock *); /* 0x2a8   0x8 */
	void                       (*sk_error_report)(struct sock *); /* 0x2b0   0x8 */
	int                        (*sk_backlog_rcv)(struct sock *, struct sk_buff *); /* 0x2b8   0x8 */
	/* --- cacheline 11 boundary (704 bytes) --- */
	void                       (*sk_destruct)(struct sock *); /* 0x2c0   0x8 */
	rcu *                      sk_reuseport_cb;      /* 0x2c8   0x8 */
	rcu *                      sk_bpf_storage;       /* 0x2d0   0x8 */
	struct callback_head       sk_rcu __attribute__((__aligned__(8))); /* 0x2d8  0x10 */
	netns_tracker              ns_tracker;           /* 0x2e8   0x8 */

	/* size: 752, cachelines: 12, members: 105 */
	/* sum members: 749, holes: 1, sum holes: 1 */
	/* sum bitfield members: 12 bits, bit holes: 1, sum bit holes: 4 bits */
	/* paddings: 1, sum paddings: 4 */
	/* forced alignments: 1 */
	/* last cacheline: 48 bytes */
};

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240216162006.2342759-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-20 12:01:45 +01:00
Mina Almasry 18ddbf5cf0 net: introduce abstraction for network memory
Add the netmem_ref type, an abstraction for network memory.

To add support for new memory types to the net stack, we must first
abstract the current memory type. Currently parts of the net stack
use struct page directly:

- page_pool
- drivers
- skb_frag_t

Originally the plan was to reuse struct page* for the new memory types,
and to set the LSB on the page* to indicate it's not really a page.
However, for compiler type checking we need to introduce a new type.

netmem_ref is introduced to abstract the underlying memory type.
Currently it's a no-op abstraction that is always a struct page
underneath. In parallel there is an undergoing effort to add support
for devmem to the net stack:

https://lore.kernel.org/netdev/20231208005250.2910004-1-almasrymina@google.com/

netmem_ref can be pointers to different underlying memory types, and the
low bits are set to indicate the memory type. Helpers are provided
to convert netmem pointers to the underlying memory type (currently only
struct page). In the devmem series helpers are provided so that calling
code can use netmem without worrying about the underlying memory type
unless absolutely necessary.

Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-20 09:22:58 +01:00
Lorenzo Bianconi f853fa5c54 net: page_pool: fix recycle stats for system page_pool allocator
Use global percpu page_pool_recycle_stats counter for system page_pool
allocator instead of allocating a separate percpu variable for each
(also percpu) page pool instance.

Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/87f572425e98faea3da45f76c3c68815c01a20ee.1708075412.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-19 12:30:27 -08:00
Alexander Lobakin 56ef27e3ab page_pool: disable direct recycling based on pool->cpuid on destroy
Now that direct recycling is performed basing on pool->cpuid when set,
memory leaks are possible:

1. A pool is destroyed.
2. Alloc cache is emptied (it's done only once).
3. pool->cpuid is still set.
4. napi_pp_put_page() does direct recycling basing on pool->cpuid.
5. Now alloc cache is not empty, but it won't ever be freed.

In order to avoid that, rewrite pool->cpuid to -1 when unlinking NAPI to
make sure no direct recycling will be possible after emptying the cache.
This involves a bit of overhead as pool->cpuid now must be accessed
via READ_ONCE() to avoid partial reads.
Rename page_pool_unlink_napi() -> page_pool_disable_direct_recycling()
to reflect what it actually does and unexport it.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20240215113905.96817-1-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-19 11:48:00 -08:00
Paolo Abeni b8adb69a7d mptcp: fix lockless access in subflow ULP diag
Since the introduction of the subflow ULP diag interface, the
dump callback accessed all the subflow data with lockless.

We need either to annotate all the read and write operation accordingly,
or acquire the subflow socket lock. Let's do latter, even if slower, to
avoid a diffstat havoc.

Fixes: 5147dfb508 ("mptcp: allow dumping subflow context to userspace")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-18 10:25:00 +00:00
Tobias Waldekranz dc489f8625 net: bridge: switchdev: Skip MDB replays of deferred events on offload
Before this change, generation of the list of MDB events to replay
would race against the creation of new group memberships, either from
the IGMP/MLD snooping logic or from user configuration.

While new memberships are immediately visible to walkers of
br->mdb_list, the notification of their existence to switchdev event
subscribers is deferred until a later point in time. So if a replay
list was generated during a time that overlapped with such a window,
it would also contain a replay of the not-yet-delivered event.

The driver would thus receive two copies of what the bridge internally
considered to be one single event. On destruction of the bridge, only
a single membership deletion event was therefore sent. As a
consequence of this, drivers which reference count memberships (at
least DSA), would be left with orphan groups in their hardware
database when the bridge was destroyed.

This is only an issue when replaying additions. While deletion events
may still be pending on the deferred queue, they will already have
been removed from br->mdb_list, so no duplicates can be generated in
that scenario.

To a user this meant that old group memberships, from a bridge in
which a port was previously attached, could be reanimated (in
hardware) when the port joined a new bridge, without the new bridge's
knowledge.

For example, on an mv88e6xxx system, create a snooping bridge and
immediately add a port to it:

    root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \
    > ip link set dev x3 up master br0

And then destroy the bridge:

    root@infix-06-0b-00:~$ ip link del dev br0
    root@infix-06-0b-00:~$ mvls atu
    ADDRESS             FID  STATE      Q  F  0  1  2  3  4  5  6  7  8  9  a
    DEV:0 Marvell 88E6393X
    33:33:00:00:00:6a     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
    33:33:ff:87:e4:3f     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
    ff:ff:ff:ff:ff:ff     1  static     -  -  0  1  2  3  4  5  6  7  8  9  a
    root@infix-06-0b-00:~$

The two IPv6 groups remain in the hardware database because the
port (x3) is notified of the host's membership twice: once via the
original event and once via a replay. Since only a single delete
notification is sent, the count remains at 1 when the bridge is
destroyed.

Then add the same port (or another port belonging to the same hardware
domain) to a new bridge, this time with snooping disabled:

    root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \
    > ip link set dev x3 up master br1

All multicast, including the two IPv6 groups from br0, should now be
flooded, according to the policy of br1. But instead the old
memberships are still active in the hardware database, causing the
switch to only forward traffic to those groups towards the CPU (port
0).

Eliminate the race in two steps:

1. Grab the write-side lock of the MDB while generating the replay
   list.

This prevents new memberships from showing up while we are generating
the replay list. But it leaves the scenario in which a deferred event
was already generated, but not delivered, before we grabbed the
lock. Therefore:

2. Make sure that no deferred version of a replay event is already
   enqueued to the switchdev deferred queue, before adding it to the
   replay list, when replaying additions.

Fixes: 4f2673b3a2 ("net: bridge: add helper to replay port and host-joined mdb entries")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16 09:36:37 +00:00
Jakub Kicinski fc906e7922 Merge branch 'for-thermal-genetlink-family-bind-unbind-callbacks'
Stanislaw Gruszka says:

====================
thermal/netlink/intel_hfi: Enable HFI feature only when required

The patchset introduces a new genetlink family bind/unbind callbacks
and thermal/netlink notifications, which allow drivers to send netlink
multicast events based on the presence of actual user-space consumers.
This functionality optimizes resource usage by allowing disabling
of features when not needed.

v1: https://lore.kernel.org/linux-pm/20240131120535.933424-1-stanislaw.gruszka@linux.intel.com//
v2: https://lore.kernel.org/linux-pm/20240206133605.1518373-1-stanislaw.gruszka@linux.intel.com/
v3: https://lore.kernel.org/linux-pm/20240209120625.1775017-1-stanislaw.gruszka@linux.intel.com/
====================

Link: https://lore.kernel.org/r/20240212161615.161935-1-stanislaw.gruszka@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15 17:49:24 -08:00
Stanislaw Gruszka 3de21a8990 genetlink: Add per family bind/unbind callbacks
Add genetlink family bind()/unbind() callbacks when adding/removing
multicast group to/from netlink client socket via setsockopt() or
bind() syscall.

They can be used to track if consumers of netlink multicast messages
emerge or disappear. Thus, a client implementing callbacks, can now
send events only when there are active consumers, preventing unnecessary
work when none exist.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240212161615.161935-2-stanislaw.gruszka@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15 17:49:16 -08:00
Jakub Kicinski 73be9a3aab Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

net/core/dev.c
  9f30831390 ("net: add rcu safety to rtnl_prop_list_size()")
  723de3ebef ("net: free altname using an RCU callback")

net/unix/garbage.c
  11498715f2 ("af_unix: Remove io_uring code for GC.")
  25236c91b5 ("af_unix: Fix task hung while purging oob_skb in GC.")

drivers/net/ethernet/renesas/ravb_main.c
  ed4adc0720 ("net: ravb: Count packets instead of descriptors in GbEth RX path"
)
  c2da940857 ("ravb: Add Rx checksum offload support for GbEth")

net/mptcp/protocol.c
  bdd70eb689 ("mptcp: drop the push_pending field")
  28e5c13805 ("mptcp: annotate lockless accesses around read-mostly fields")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15 16:20:04 -08:00
Alex Henrie a5fcea2d2f net: ipv6/addrconf: introduce a regen_min_advance sysctl
In RFC 8981, REGEN_ADVANCE cannot be less than 2 seconds, and the RFC
does not permit the creation of temporary addresses with lifetimes
shorter than that:

> When processing a Router Advertisement with a
> Prefix Information option carrying a prefix for the purposes of
> address autoconfiguration (i.e., the A bit is set), the host MUST
> perform the following steps:

> 5.  A temporary address is created only if this calculated preferred
>     lifetime is greater than REGEN_ADVANCE time units.

However, some users want to change their IPv6 address as frequently as
possible regardless of the RFC's arbitrary minimum lifetime. For the
benefit of those users, add a regen_min_advance sysctl parameter that
can be set to below or above 2 seconds.

Link: https://datatracker.ietf.org/doc/html/rfc8981
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15 15:34:40 +01:00
Johannes Berg 414532d8aa wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately
Even if that's the same as IEEE80211_MAX_SSID_LEN, we really
should just use IEEE80211_MAX_MESH_ID_LEN for mesh, rather
than having the BUILD_BUG_ON()s.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-15 10:59:08 +01:00
Ingo Molnar 4589f199eb Merge branch 'x86/bugs' into x86/core, to pick up pending changes before dependent patches
Merge in pending alternatives patching infrastructure changes, before
applying more patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-02-14 10:49:37 +01:00
Lorenzo Bianconi 2b0cfa6e49 net: add generic percpu page_pool allocator
Introduce generic percpu page_pools allocator.
Moreover add page_pool_create_percpu() and cpuid filed in page_pool struct
in order to recycle the page in the page_pool "hot" cache if
napi_pp_put_page() is running on the same cpu.
This is a preliminary patch to add xdp multi-buff support for xdp running
in generic mode.

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/80bc4285228b6f4220cd03de1999d86e46e3fcbd.1707729884.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13 19:22:30 -08:00
Guillaume Nault a3522a2edb ipv4: Set the routing scope properly in ip_route_output_ports().
Set scope automatically in ip_route_output_ports() (using the socket
SOCK_LOCALROUTE flag). This way, callers don't have to overload the
tos with the RTO_ONLINK flag, like RT_CONN_FLAGS() does.

For callers that don't pass a struct sock, this doesn't change anything
as the scope is still set to RT_SCOPE_UNIVERSE when sk is NULL.

Callers that passed a struct sock and used RT_CONN_FLAGS(sk) or
RT_CONN_FLAGS_TOS(sk, tos) for the tos are modified to use
ip_sock_tos(sk) and RT_TOS(tos) respectively, as overloading tos with
the RTO_ONLINK flag now becomes unnecessary.

In drivers/net/amt.c, all ip_route_output_ports() calls use a 0 tos
parameter, ignoring the SOCK_LOCALROUTE flag of the socket. But the sk
parameter is a kernel socket, which doesn't have any configuration path
for setting SOCK_LOCALROUTE anyway. Therefore, ip_route_output_ports()
will continue to initialise scope with RT_SCOPE_UNIVERSE and amt.c
doesn't need to be modified.

Also, remove RT_CONN_FLAGS() and RT_CONN_FLAGS_TOS() from route.h as
these macros are now unused.

The objective is to eventually remove RTO_ONLINK entirely to allow
converting ->flowi4_tos to dscp_t. This will ensure proper isolation
between the DSCP and ECN bits, thus minimising the risk of introducing
bugs where TOS values interfere with ECN.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/dacfd2ab40685e20959ab7b53c427595ba229e7d.1707496938.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-12 17:33:05 -08:00
Shaul Triebitz a64be8296e wifi: cfg80211: report unprotected deauth/disassoc in wowlan
Add to cfg80211_wowlan_wakeup another wakeup reason -
unprot_deauth_disassoc.
To be set to true if the woke up was due to an
unprotected deauth or disassoc frame in MFP.
In that case report WOWLAN_TRIG_UNPROTECTED_DEAUTH_DISASSOC.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240206164849.a3d739850d03.I8f52a21c4f36d1af1f8068bed79e2f9cbf8289ef@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-12 21:22:48 +01:00
Johannes Berg a110a3b791 wifi: cfg80211: optionally support monitor on disabled channels
If the hardware supports a disabled channel, it may in
some cases be possible to use monitor mode (without any
transmit) on it when it's otherwise disabled. Add a new
channel flag IEEE80211_CHAN_CAN_MONITOR that makes it
possible for a driver to indicate such a thing.

Make it per channel so drivers could have a choice with
it, perhaps it's only possible on some channels, perhaps
some channels are not supported at all, but still there
and marked disabled.

In _nl80211_parse_chandef() simplify the code and check
only for an unknown channel, _cfg80211_chandef_usable()
will later check for IEEE80211_CHAN_DISABLED anyway.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240206164849.87fad3a21a09.I9116b2fdc2e2c9fd59a9273a64db7fcb41fc0328@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-12 21:22:48 +01:00
Johannes Berg 7b5e25b8ba wifi: cfg80211: rename UHB to 6 GHz
UHB stands for "Ultra High Band", but this term doesn't really
exist in the spec. Rename all occurrences to "6 GHz", but keep
a few defines for userspace API compatibility.

Link: https://msgid.link/20240206164849.c9cfb9400839.I153db3b951934a1d84409c17fbe1f1d1782543fa@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-12 21:22:46 +01:00
Aditya Kumar Singh f6ca96aa51 wifi: cfg80211: add support for link id attribute in NL80211_CMD_DEL_STATION
Currently whenever NL80211_CMD_DEL_STATION command is called without any
MAC address, all stations present on that interface are flushed.
However with MLO there is a need to flush such stations only which are
using at least a particular link from the AP MLD interface.

For example - 2 GHz and 5 GHz are part of an AP MLD.
To this interface, following stations are connected -
   1. One non-EHT STA on 2 GHz link.
   2. One non-EHT STA on 5 GHz link.
   3. One Multi-Link STA having 2 GHz and 5 GHz as active links.

Now if currently, NL80211_CMD_DEL_STATION is issued by the 2 GHz link
without any MAC address, it would flush all station entries. However,
flushing of station entry #2 at least is not desireable since it
is connected to 5 GHz link alone.

Hence, add an option to pass link ID as well in the command so that if link
ID is passed, stations using that passed link ID alone would be flushed
and others will not.

So after this, station entries #1 and #3 alone would be flushed and #2 will
remain as it is.

Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Link: https://msgid.link/20240205162952.1697646-2-quic_adisi@quicinc.com
[clarify documentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-12 21:11:24 +01:00
Lorenzo Bianconi 6f656131f6 wifi: mac80211: remove gfp parameter from ieee80211_obss_color_collision_notify
Get rid of gfp parameter from ieee80211_obss_color_collision_notify
since it is no longer used.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://msgid.link/f91e1c78896408ac556586ba8c99e4e389aeba02.1707389901.git.lorenzo@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-12 21:07:30 +01:00
Kui-Feng Lee 5eb902b8e7 net/ipv6: Remove expired routes with a separated list of routes.
FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
can be expensive if the number of routes in a table is big, even if most of
them are permanent. Checking routes in a separated list of routes having
expiration will avoid this potential issue.

Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12 10:24:12 +00:00
Kui-Feng Lee 129e406e18 net/ipv6: set expires in rt6_add_dflt_router().
Pass the duration of a lifetime (in seconds) to the function
rt6_add_dflt_router() so that it can properly set the expiration time.

The function ndisc_router_discovery() is the only one that calls
rt6_add_dflt_router(), and it will later set the expiration time for the
route created by rt6_add_dflt_router(). However, there is a gap of time
between calling rt6_add_dflt_router() and setting the expiration time in
ndisc_router_discovery(). During this period, there is a possibility that a
new route may be removed from the routing table. By setting the correct
expiration time in rt6_add_dflt_router(), we can prevent this from
happening. The reason for setting RTF_EXPIRES in rt6_add_dflt_router() is
to start the Garbage Collection (GC) timer, as it only activates when a
route with RTF_EXPIRES is added to a table.

Suggested-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12 10:24:12 +00:00
Jakub Kicinski aec7961916 tls: fix race between async notify and socket close
The submitting thread (one which called recvmsg/sendmsg)
may exit as soon as the async crypto handler calls complete()
so any code past that point risks touching already freed data.

Try to avoid the locking and extra flags altogether.
Have the main thread hold an extra reference, this way
we can depend solely on the atomic ref counter for
synchronization.

Don't futz with reiniting the completion, either, we are now
tightly controlling when completion fires.

Reported-by: valis <sec@valis.email>
Fixes: 0cada33241 ("net/tls: fix race condition causing kernel panic")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-10 21:38:19 +00:00
Jakub Kicinski f6ce9a1f6a Merge branch 'for-io_uring-add-napi-busy-polling-support'
Merge netdev bits of io_uring busy polling support.

Jens Axboe says:

====================
io_uring: add napi busy polling support

I finally got around to testing this patchset in its current form, and
results look fine to me. It Works. Using the basic ping/pong test that's
part of the liburing addition, without enabling NAPI I get:

Stock settings, no NAPI, 100k packets:

 rtt(us) min/avg/max/mdev = 31.730/37.006/87.960/0.497

 and with -t10 -b enabled:

 rtt(us) min/avg/max/mdev = 23.250/29.795/63.511/1.203

In short, this patchset enables per io_uring NAPI enablement, rather
than need to enable that globally. This allows targeted NAPI usage with
io_uring.

Here's Stefan's v15 posting, which predates this one:

https://lore.kernel.org/io-uring/20230608163839.2891748-1-shr@devkernel.io/
====================

Link: https://lore.kernel.org/r/20240206163422.646218-1-axboe@kernel.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09 10:35:34 -08:00
Stefan Roesch b4e8ae5c8c net: add napi_busy_loop_rcu()
This adds the napi_busy_loop_rcu() function. This function assumes that
the calling function is already holding the rcu read lock and
napi_busy_loop() does not need to take the rcu read lock. Add a
NAPI_F_NO_SCHED flag, which tells __napi_busy_loop() to abort if we
need to reschedule rather than drop the RCU read lock and reschedule.

Signed-off-by: Stefan Roesch <shr@devkernel.io>
Link: https://lore.kernel.org/r/20230608163839.2891748-3-shr@devkernel.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09 10:01:09 -08:00
Johannes Berg d4655db0a1 wifi: cfg80211: fix kernel-doc for cfg80211_chandef_primary
This was still referring to cfg80211_chandef_primary_freq(),
fix it.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: b82730bf57 ("wifi: cfg80211/mac80211: move puncturing into chandef")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-09 08:04:59 +01:00
Jakub Kicinski 3be042cf46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

drivers/net/ethernet/stmicro/stmmac/common.h
  38cc3c6dcc ("net: stmmac: protect updates of 64-bit statistics counters")
  fd5a6a7131 ("net: stmmac: est: Per Tx-queue error count for HLBF")
  c5c3e1bfc9 ("net: stmmac: Offload queueMaxSDU from tc-taprio")

drivers/net/wireless/microchip/wilc1000/netdev.c
  c901388028 ("wifi: fill in MODULE_DESCRIPTION()s for wilc1000")
  328efda22a ("wifi: wilc1000: do not realloc workqueue everytime an interface is added")

net/unix/garbage.c
  11498715f2 ("af_unix: Remove io_uring code for GC.")
  1279f9d9de ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-08 15:30:33 -08:00
Aditya Kumar Singh 04ada8599c wifi: mac80211: add support to call csa_finish on a link
Currently ieee80211_csa_finish() function finalizes CSA by scheduling a
finalizing worker using the deflink. With MLO, there is a need to do it
on a given link basis.

Pass link ID of the link on which CSA needs to be finalized.

Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Link: https://msgid.link/20240130140918.1172387-6-quic_adisi@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-08 15:00:45 +01:00
Aditya Kumar Singh 480e7048aa wifi: mac80211: update beacon counters per link basis
Currently, function to update beacon counter uses deflink to fetch
the beacon and then update the counter. However, with MLO, there is
a need to update the counter for the beacon in a particular link.

Add support to use link_id in order to fetch the beacon from a particular
link data during beacon update counter.

Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Link: https://msgid.link/20240130140918.1172387-3-quic_adisi@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-08 15:00:45 +01:00
Aditya Kumar Singh 4ace04c0bd wifi: cfg80211: send link id in channel_switch ops
Currently, during channel switch, no link id information is passed down.
In order to support channel switch during Multi Link Operation, it is
required to pass link id as well.

Add changes to pass link id in the channel_switch cfg80211_ops.

Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Link: https://msgid.link/20240130140918.1172387-2-quic_adisi@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-08 15:00:45 +01:00
Michael-CY Lee 68de13028b wifi: cfg80211: Add utility for converting op_class into chandef
This utility is used in STA CSA handling. The op_class in the ECSA
Element can be converted into chandef.

Co-developed-by: Money Wang <money.wang@mediatek.com>
Signed-off-by: Michael-CY Lee <michael-cy.lee@mediatek.com>
Link: https://msgid.link/20231222010914.6521-2-michael-cy.lee@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-08 15:00:44 +01:00
Johannes Berg b82730bf57 wifi: cfg80211/mac80211: move puncturing into chandef
Aloka originally suggested that puncturing should be part of
the chandef, so that it's treated correctly. At the time, I
disagreed and it ended up not part of the chandef, but I've
now realized that this was wrong. Even for clients, the RX,
and perhaps more importantly, CCA configuration needs to take
puncturing into account.

Move puncturing into the chandef, and adjust all the code
accordingly. Also add a few tests for puncturing in chandef
compatibility checking.

Link: https://lore.kernel.org/linux-wireless/20220214223051.3610-1-quic_alokad@quicinc.com/
Suggested-by: Aloka Dixit <quic_alokad@quicinc.com>
Link: https://msgid.link/20240129194108.307183a5d2e5.I4d7fe2f126b2366c1312010e2900dfb2abffa0f6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-08 15:00:39 +01:00
Johannes Berg 8f251a0a15 wifi: cfg80211: simplify cfg80211_chandef_compatible()
Simplify cfg80211_chandef_compatible() a bit by switching
c1 and c2 around so that c1 is always the narrower one
(once they're not identical or narrow/S1G). Then we can
just check the various primary channels and exit with the
wider one (c2), or NULL.

Also refactor the primary 40/80/160 function to not have
all the calculations hard-coded, and use a wrapper around
it to check primary 40/80/160 compatibility.

While at it, add some kunit tests for this functionality.

Also expose the new cfg80211_chandef_primary_freq() to
drivers, mac80211 will use it.

Link: https://msgid.link/20240129194108.be3e6eccaba3.I8399c2ff1435d7378e5837794cb5aa6dd2ee1416@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-08 13:07:38 +01:00
Johannes Berg 761748f001 wifi: mac80211: support wider bandwidth OFDMA config
EHT requires that stations are able to participate in
wider bandwidth OFDMA, i.e. parse downlink OFDMA and
uplink OFDMA triggers when they're not capable of (or
not connected at) the (wider) bandwidth that the AP
is using. This requires hardware configuration, since
the entity responsible for parsing (possibly hardware)
needs to know the AP bandwidth.

To support this, change the channel request to have
the AP's bandwidth for clients, and track that in the
channel context in mac80211. This means that the same
chandef might need to be split up into two different
contexts, if the APs are different. Interfaces other
than client are not participating in OFDMA the same
way, so they don't request any AP setting.

Note that this doesn't introduce any API to split a
channel context, so that there are cases where this
might lead to a disconnect, e.g. if there are two
client interfaces using the same channel context, e.g.
both 160 MHz connected to different 320 MHz APs, and
one of the APs switches to 160 MHz.

Note also there are possible cases where this can be
optimised, e.g. when using the upper or lower 160 Mhz,
but I haven't been able to really fully understand the
spec and/or hardware limitations.

If, for some reason, there are no hardware limits on
this because the OFDMA (downlink/trigger) parsing is
done in firmware and can take the transmitter into
account, then drivers can set the new flag
IEEE80211_VIF_IGNORE_OFDMA_WIDER_BW on interfaces to
not have them request any AP bandwidth in the channel
context and ignore this issue entirely. The bss_conf
still contains the AP configuration (if any, i.e. EHT)
in the chanreq.

Link: https://msgid.link/20240129194108.d3d5b35dd783.I939d04674f4ff06f39934b1591c8d36a30ce74c2@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-08 13:07:37 +01:00
Johannes Berg 6092077ad0 wifi: mac80211: introduce 'channel request'
For channel contexts, mac80211 currently uses the cfg80211
chandef struct (control channel, center freq(s), width) to
define towards drivers and internally how these behave. In
fact, there are _two_ such structs used, where the min_def
can reduce bandwidth according to the stations connected.

Unfortunately,  with EHT this is longer be sufficient,  at
least not for all hardware.  EHT requires that non-AP STAs
that are connected to an AP with a lower bandwidth than it
(the AP) advertises (e.g. 160 MHz STA connected to 320 MHz
AP) still be able to receive downlink OFDMA and respond to
trigger frames for uplink OFDMA  that specify the position
and bandwidth  for the non-AP STA  relative to the channel
the AP is using.  Therefore, they need to be aware of this,
and at least for some hardware (e.g. Intel) this awareness
is in the hardware. As a result, use of the "same" channel
may need to be split over  two channel contexts where they
differ by the AP being used.

As a first step,  introduce a concept of a channel request
('chanreq') for each interface,  to control the context it
requests.   This step does nothing but reorganise the code,
so that later the AP's chandef can be added to the request
in order to handle the EHT case described above.

Link: https://msgid.link/20240129194108.2e88e48bd2e9.I4256183debe975c5ed71621611206fdbb69ba330@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-08 13:07:34 +01:00
Johannes Berg 0a44dfc070 wifi: mac80211: simplify non-chanctx drivers
There are still surprisingly many non-chanctx drivers, but in
mac80211 that code is a bit awkward. Simplify this by having
those drivers assign 'emulated' ops, so that the mac80211 code
can be more unified between non-chanctx/chanctx drivers. This
cuts the number of places caring about it by about 15, which
are scattered across - now they're fewer and no longer in the
channel context handling.

Link: https://msgid.link/20240129194108.6d0ead50f5cf.I60d093b2fc81ca1853925a4d0ac3a2337d5baa5b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-08 12:58:32 +01:00
Paolo Abeni 63e4b9d693 netfilter pull request 24-02-08
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmXEuicACgkQ1V2XiooU
 IOSbvA/9F2BC9TYKAh23/0EFbD4jOl4e26YE4E+Eu8AteoQ/nD+oI+mtWgw2hVXg
 zXvm1vfIc02jGuGfcPZ+EIv/dkznnDqqUpUGa4ixtgvRw2bKkb2kKMlrFsjzsihj
 yabXydwhxYE9b4Ch2AmRyApTLRMocte1IJ3ci4YUXwf68wZlOe2bIG5wyzGkFpjF
 QZN/Rr14UKjC57EYNdUG9UdybWSqSKD23LPZSaLvi6wxoZd8cIcIkng5K4N0WVKF
 lNskuNFY+j+bJz2Yn3mWIlCoM3R1N2B04t7wRkYnKWkSuwymG3O7JC3RUQaZDBZw
 8AogEbvXaIY3nxyN4lHZ/jzM/QzNB1WHlPx6RjWKHoNhnas+xuBYrjCdJZwtEu8g
 xs27Tjk3QtCIuaMuhN0RFqiq93MqZD/qx++kwMwJA0Wrg76MLPpf8yEWwVGYcAEG
 0EWa61UfPezbcVkW8XveW6lgDfcOIOpBevxDQ3Nf7JB0AcbVBks7oDpGwDc5Pdz5
 6y7WQIilxUtu9bHODUxrshxgTBwsocVkXUTIogCihUC+SgSZF+/G796c9Iy5/kPq
 BtmSNJOJyCbnivkqKTLF0Pv0BplOv7W1sx2/fo+IfRXYTHoXVjHe1BYP0Ck3WEtS
 9EPsFlI5f4AOtnPF3JrTPec9PvuHyVN+8aOPi82wlKiayJcXy1I=
 =Rh2n
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Narrow down target/match revision to u8 in nft_compat.

2) Bail out with unused flags in nft_compat.

3) Restrict layer 4 protocol to u16 in nft_compat.

4) Remove static in pipapo get command that slipped through when
   reducing set memory footprint.

5) Follow up incremental fix for the ipset performance regression,
   this includes the missing gc cancellation, from Jozsef Kadlecsik.

6) Allow to filter by zone 0 in ctnetlink, do not interpret zone 0
   as no filtering, from Felix Huettner.

7) Reject direction for NFT_CT_ID.

8) Use timestamp to check for set element expiration while transaction
   is handled to prevent garbage collection from removing set elements
   that were just added by this transaction. Packet path and netlink
   dump/get path still use current time to check for expiration.

9) Restore NF_REPEAT in nfnetlink_queue, from Florian Westphal.

10) map_index needs to be percpu and per-set, not just percpu.
    At this time its possible for a pipapo set to fill the all-zero part
    with ones and take the 'might have bits set' as 'start-from-zero' area.
    From Florian Westphal. This includes three patches:

    - Change scratchpad area to a structure that provides space for a
      per-set-and-cpu toggle and uses it of the percpu one.

    - Add a new free helper to prepare for the next patch.

    - Remove the scratch_aligned pointer and makes AVX2 implementation
      use the exact same memory addresses for read/store of the matching
      state.

netfilter pull request 24-02-08

* tag 'nf-24-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_set_pipapo: remove scratch_aligned pointer
  netfilter: nft_set_pipapo: add helper to release pcpu scratch area
  netfilter: nft_set_pipapo: store index in scratch maps
  netfilter: nft_set_rbtree: skip end interval element from gc
  netfilter: nfnetlink_queue: un-break NF_REPEAT
  netfilter: nf_tables: use timestamp to check for set element timeout
  netfilter: nft_ct: reject direction for ct id
  netfilter: ctnetlink: fix filtering for zone 0
  netfilter: ipset: Missing gc cancellations fixed
  netfilter: nft_set_pipapo: remove static in nft_pipapo_get()
  netfilter: nft_compat: restrict match/target protocol to u16
  netfilter: nft_compat: reject unused compat flag
  netfilter: nft_compat: narrow down revision to unsigned 8-bits
====================

Link: https://lore.kernel.org/r/20240208112834.1433-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08 12:56:40 +01:00
Pablo Neira Ayuso 7395dfacff netfilter: nf_tables: use timestamp to check for set element timeout
Add a timestamp field at the beginning of the transaction, store it
in the nftables per-netns area.

Update set backend .insert, .deactivate and sync gc path to use the
timestamp, this avoids that an element expires while control plane
transaction is still unfinished.

.lookup and .update, which are used from packet path, still use the
current time to check if the element has expired. And .get path and dump
also since this runs lockless under rcu read size lock. Then, there is
async gc which also needs to check the current time since it runs
asynchronously from a workqueue.

Fixes: c3e1b005ed ("netfilter: nf_tables: add set element timeout support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08 12:10:19 +01:00
Johannes Berg af4acac7ca Merge wireless into wireless-next
There are some changes coming to wireless-next that will
otherwise cause conflicts, pull wireless in first to be
able to resolve that when applying the individual changes
rather than having to do merge resolution later.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-08 09:58:25 +01:00
Eric Dumazet 9b5b36374e ip_tunnel: use exit_batch_rtnl() method
exit_batch_rtnl() is called while RTNL is held,
and devices to be unregistered can be queued in the dev_kill_list.

This saves one rtnl_lock()/rtnl_unlock() pair
and one unregister_netdevice_many() call.

This patch takes care of ipip, ip_vti, and ip_gre tunnels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Link: https://lore.kernel.org/r/20240206144313.2050392-15-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07 18:55:12 -08:00
Eric Dumazet 70f16ea2e4 ipv4: add __unregister_nexthop_notifier()
unregister_nexthop_notifier() assumes the caller does not hold rtnl.

We need in the following patch to use it from a context
already holding rtnl.

Add __unregister_nexthop_notifier().

unregister_nexthop_notifier() becomes a wrapper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Link: https://lore.kernel.org/r/20240206144313.2050392-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07 18:55:11 -08:00
Eric Dumazet fd4f101edb net: add exit_batch_rtnl() method
Many (struct pernet_operations)->exit_batch() methods have
to acquire rtnl.

In presence of rtnl mutex pressure, this makes cleanup_net()
very slow.

This patch adds a new exit_batch_rtnl() method to reduce
number of rtnl acquisitions from cleanup_net().

exit_batch_rtnl() handlers are called while rtnl is locked,
and devices to be killed can be queued in a list provided
as their second argument.

A single unregister_netdevice_many() is called right
before rtnl is released.

exit_batch_rtnl() handlers are called before ->exit() and
->exit_batch() handlers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Link: https://lore.kernel.org/r/20240206144313.2050392-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07 18:55:10 -08:00
Jakub Kicinski 006e89649f mlx5-updates-2024-02-01
1) IPSec global stats for xfrm and mlx5
 2) XSK memory improvements for non-linear SKBs
 3) Software steering debug dump to use seq_file ops
 4) Various code clean-ups
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmXBgUEACgkQSD+KveBX
 +j4qmQgAr5pNBUkxyylEKU0cB8kh21LaOzimVjpGrKCzPtb/Fz8h6E1h668icpFy
 2G4iuyJ2/doA0Bn4FJC730i6Xk2sqkXiY85yclvMPObS7b0PckD72d6iH9UfzfYB
 Vof1lg3HdVvTq1KhJ1T/sCEB2skBLzEGV/fZWmL39zPaxM9JfHA2o5xDxdW7/GhT
 0LPg84MinikoKbfw9m45l2zp52yTRrLzTUbDubwXFC4Ltg/yM4uAsbTArQzA8VJ9
 q95XGZ0ZmZDFsVbjp9suCJFaTVrKmh2yy16a3BpKpm72L+K8yciUTqcqDd5vH2wW
 CQ0eKMqBtB8c1Jr16oKfQ42NFF0FIQ==
 =0IjR
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2024-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2024-02-01

1) IPSec global stats for xfrm and mlx5
2) XSK memory improvements for non-linear SKBs
3) Software steering debug dump to use seq_file ops
4) Various code clean-ups

* tag 'mlx5-updates-2024-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: XDP, Exclude headroom and tailroom from memory calculations
  net/mlx5e: XSK, Exclude tailroom from non-linear SKBs memory calculations
  net/mlx5: DR, Change SWS usage to debug fs seq_file interface
  net/mlx5: Change missing SyncE capability print to debug
  net/mlx5: Remove initial segmentation duplicate definitions
  net/mlx5: Return specific error code for timeout on wait_fw_init
  net/mlx5: SF, Stop waiting for FW as teardown was called
  net/mlx5: remove fw reporter dump option for non PF
  net/mlx5: remove fw_fatal reporter dump option for non PF
  net/mlx5: Rename mlx5_sf_dev_remove
  Documentation: Fix counter name of mlx5 vnic reporter
  net/mlx5e: Delete obsolete IPsec code
  net/mlx5e: Connect mlx5 IPsec statistics with XFRM core
  xfrm: get global statistics from the offloaded device
  xfrm: generalize xdo_dev_state_update_curlft to allow statistics update
====================

Link: https://lore.kernel.org/r/20240206005527.1353368-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07 18:34:34 -08:00
Jakub Kicinski 335bac1daa wireless fixes for v6.8-rc4
This time we have unusually large wireless pull request. Several
 functionality fixes to both stack and iwlwifi. Lots of fixes to
 warnings, especially to MODULE_DESCRIPTION().
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmXCAewRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZtbOAf+PFH5X248YVlVgeamCDVYNsFO0bgsyBN1
 xHDWxxxLFv5uKuRvjx5oa4C4tzRs7kH8h3u5SUqTkIvMcHbzpVe6oBNMDknOtS96
 lQ+lmfuIlKMvfmdG3QK3GyAcn57detHZ0InLsM5OmOdcUexmyJa7qc8qbqKZvX3d
 jux3ISQbyHyisnWHGHwWgKIY0UtZiesAuf/Ug09Ah0WW6KMLgNSoDc9/mQlKoGu7
 gZpmB1xsJaPBPB0Tb1kD5XrGouGnDqN9obS4HH9fCDZwPDBTLDjuYtrtQU9cqtk5
 4EUhuoeefTuep8ZJQgxn05j4D5NWQj9VCEvhaRwoF+Iz+t2/HLHuKQ==
 =4UCw
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2024-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.8-rc4

This time we have unusually large wireless pull request. Several
functionality fixes to both stack and iwlwifi. Lots of fixes to
warnings, especially to MODULE_DESCRIPTION().

* tag 'wireless-2024-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (31 commits)
  wifi: mt76: mt7996: fix fortify warning
  wifi: brcmfmac: Adjust n_channels usage for __counted_by
  wifi: iwlwifi: do not announce EPCS support
  wifi: iwlwifi: exit eSR only after the FW does
  wifi: iwlwifi: mvm: fix a battery life regression
  wifi: mac80211: accept broadcast probe responses on 6 GHz
  wifi: mac80211: adding missing drv_mgd_complete_tx() call
  wifi: mac80211: fix waiting for beacons logic
  wifi: mac80211: fix unsolicited broadcast probe config
  wifi: mac80211: initialize SMPS mode correctly
  wifi: mac80211: fix driver debugfs for vif type change
  wifi: mac80211: set station RX-NSS on reconfig
  wifi: mac80211: fix RCU use in TDLS fast-xmit
  wifi: mac80211: improve CSA/ECSA connection refusal
  wifi: cfg80211: detect stuck ECSA element in probe resp
  wifi: iwlwifi: remove extra kernel-doc
  wifi: fill in MODULE_DESCRIPTION()s for mt76 drivers
  wifi: fill in MODULE_DESCRIPTION()s for wilc1000
  wifi: fill in MODULE_DESCRIPTION()s for wl18xx
  wifi: fill in MODULE_DESCRIPTION()s for p54spi
  ...
====================

Link: https://lore.kernel.org/r/20240206095722.CD9D2C433F1@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07 10:34:51 -08:00
George Guo 23c5ae6d46 netlabel: cleanup struct netlbl_lsm_catmap
Simplify the code from macro NETLBL_CATMAP_MAPTYPE to u64, and fix
warning "Macros with complex values should be enclosed in parentheses"
on "#define NETLBL_CATMAP_BIT (NETLBL_CATMAP_MAPTYPE)0x01", which is
modified to "#define NETLBL_CATMAP_BIT ((u64)0x01)".

Signed-off-by: George Guo <guodongtai@kylinos.cn>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-07 12:38:30 +00:00
Aahil Awatramani 240fd40552 bonding: Add independent control state machine
Add support for the independent control state machine per IEEE
802.1AX-2008 5.4.15 in addition to the existing implementation of the
coupled control state machine.

Introduces two new states, AD_MUX_COLLECTING and AD_MUX_DISTRIBUTING in
the LACP MUX state machine for separated handling of an initial
Collecting state before the Collecting and Distributing state. This
enables a port to be in a state where it can receive incoming packets
while not still distributing. This is useful for reducing packet loss when
a port begins distributing before its partner is able to collect.

Added new functions such as bond_set_slave_tx_disabled_flags and
bond_set_slave_rx_enabled_flags to precisely manage the port's collecting
and distributing states. Previously, there was no dedicated method to
disable TX while keeping RX enabled, which this patch addresses.

Note that the regular flow process in the kernel's bonding driver remains
unaffected by this patch. The extension requires explicit opt-in by the
user (in order to ensure no disruptions for existing setups) via netlink
support using the new bonding parameter coupled_control. The default value
for coupled_control is set to 1 so as to preserve existing behaviour.

Signed-off-by: Aahil Awatramani <aahila@google.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240202175858.1573852-1-aahila@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-06 13:17:54 +01:00
Sebastian Andrzej Siewior 03ba6dc035 net: dst: Make dst_destroy() static and return void.
Since commit 52df157f17 ("xfrm: take refcnt of dst when creating
struct xfrm_dst bundle") dst_destroy() returns only NULL and no caller
cares about the return value.
There are no in in-tree users of dst_destroy() outside of the file.

Make dst_destroy() static and return void.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240202163746.2489150-1-bigeasy@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-06 11:45:53 +01:00
Leon Romanovsky f9f221c98f xfrm: get global statistics from the offloaded device
Iterate over all SAs in order to fill global IPsec statistics.

Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05 16:45:49 -08:00
Leon Romanovsky fd2bc4195d xfrm: generalize xdo_dev_state_update_curlft to allow statistics update
In order to allow drivers to fill all statistics, change the name
of xdo_dev_state_update_curlft to be xdo_dev_state_update_stats.

Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05 16:45:49 -08:00
Eric Dumazet 89304f91bf sctp: preserve const qualifier in sctp_sk()
We can change sctp_sk() to propagate its argument const qualifier,
thanks to container_of_const().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Xin Long <lucien.xin@gmail.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-05 11:08:06 +00:00
Eric Dumazet ffabe98cb5 net: make dev_unreg_count global
We can use a global dev_unreg_count counter instead
of a per netns one.

As a bonus we can factorize the changes done on it
for bulk device removals.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-04 16:08:21 +00:00
Michal Koutný b26577001a net/sched: Add helper macros with module names
The macros are preparation for adding module aliases en mass in a
separate commit.
Although it would be tempting to create aliases like cls-foo for name
cls_foo, this could not be used because modprobe utilities treat '-' and
'_' interchangeably.
In the end, the naming follows pattern of proto modules in linux/net.h.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240201130943.19536-2-mkoutny@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-02 10:57:55 -08:00
Johannes Berg 177fbbcb4e wifi: cfg80211: detect stuck ECSA element in probe resp
We recently added some validation that we don't try to
connect to an AP that is currently in a channel switch
process, since that might want the channel to be quiet
or we might not be able to connect in time to hear the
switching in a beacon. This was in commit c09c4f3199
("wifi: mac80211: don't connect to an AP while it's in
a CSA process").

However, we promptly got a report that this caused new
connection failures, and it turns out that the AP that
we now cannot connect to is permanently advertising an
extended channel switch announcement, even with quiet.
The AP in question was an Asus RT-AC53, with firmware
3.0.0.4.380_10760-g21a5898.

As a first step, attempt to detect that we're dealing
with such a situation, so mac80211 can use this later.

Reported-by: coldolt <andypalmadi@gmail.com>
Closes: https://lore.kernel.org/linux-wireless/CAJvGw+DQhBk_mHXeu6RTOds5iramMW2FbMB01VbKRA4YbHHDTA@mail.gmail.com/
Fixes: c09c4f3199 ("wifi: mac80211: don't connect to an AP while it's in a CSA process")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240129131413.246972c8775e.Ibf834d7f52f9951a353b6872383da710a7358338@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-02 13:08:58 +01:00
Jakub Kicinski cf244463a2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01 15:12:37 -08:00
Jakub Kicinski 93561eefbf netfilter pull request 24-01-31
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmW6zq0ACgkQ1V2XiooU
 IOTTbQ//SnsPif56pzc+CzOhb6v3Gih1o+4dyH1GX5JHozwmuvFx4Iw3jz5DgMsL
 vzQBb89htlGqdj3tXnuX0mVZEGDBr/KO8jNT7Qnxrms3oRfhu+UKBl8y51f73LRe
 79vqKaMmmaUen/fzyhTqxyqQFYzguEAzXq1Rs/arTuNjhnKLf4/R255sQRTRUe0v
 08KEk/7D3FnmJ0pUtoh3E7mukodl2e5w/UH9CrerTtJ04jaRi17I+QmW3TmVFjx4
 8k9+echgiF7iKPLlPJRWhHgzXPKEwUupu8rDemIg+QXNsicp+2T8HNCmJjE3Y8Qx
 jcEpqZOMxmlnvBeoelrGqnJ1GVAWTgOhi2R2hLLYEyRJ8LpIxhcZLI6ovQTO2u3v
 HIuPdAMmcsLI1wX9Lq/ybF9TqrCH1l2M9LqVerz139tXgDgOerMZvjsHp8Xv7nQz
 L2HDA39EI+ZPEEh3SCdomrydM2gp3r3d1hp4SUZLMFUZYR/CEUYt8DlX+iHWGgn0
 GaVKgmgf/PqokyhYxpihB4m/ctmghPcxkscPaEDJas4uKYqDRRB1c1btA2f7IVnr
 8sMt6gSG0HQgK6OoD6Sl+nACt3EoPdg/NaivrJ1rmWmQpb6EhIqJdILMZ0+s+lyx
 WBJnI16uJQUPVifGUb30qJdm/Go0cix5fOVJRMAaIUU0IFgezkE=
 =OdAk
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) TCP conntrack now only evaluates window negotiation for packets in
   the REPLY direction, from Ryan Schaefer. Otherwise SYN retransmissions
   trigger incorrect window scale negotiation. From Ryan Schaefer.

2) Restrict tunnel objects to NFPROTO_NETDEV which is where it makes sense
   to use this object type.

3) Fix conntrack pick up from the middle of SCTP_CID_SHUTDOWN_ACK packets.
   From Xin Long.

4) Another attempt from Jozsef Kadlecsik to address the slow down of the
   swap command in ipset.

5) Replace a BUG_ON by WARN_ON_ONCE in nf_log, and consolidate check for
   the case that the logger is NULL from the read side lock section.

6) Address lack of sanitization for custom expectations. Restrict layer 3
   and 4 families to what it is supported by userspace.

* tag 'nf-24-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations
  netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger
  netfilter: ipset: fix performance regression in swap operation
  netfilter: conntrack: check SCTP_CID_SHUTDOWN_ACK for vtag setting in sctp_new
  netfilter: nf_tables: restrict tunnel object to NFPROTO_NETDEV
  netfilter: conntrack: correct window scaling with retransmitted SYN
====================

Link: https://lore.kernel.org/r/20240131225943.7536-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01 09:14:13 -08:00
Eric Dumazet 4d322dce82 af_unix: fix lockdep positive in sk_diag_dump_icons()
syzbot reported a lockdep splat [1].

Blamed commit hinted about the possible lockdep
violation, and code used unix_state_lock_nested()
in an attempt to silence lockdep.

It is not sufficient, because unix_state_lock_nested()
is already used from unix_state_double_lock().

We need to use a separate subclass.

This patch adds a distinct enumeration to make things
more explicit.

Also use swap() in unix_state_double_lock() as a clean up.

v2: add a missing inline keyword to unix_state_lock_nested()

[1]
WARNING: possible circular locking dependency detected
6.8.0-rc1-syzkaller-00356-g8a696a29c690 #0 Not tainted

syz-executor.1/2542 is trying to acquire lock:
 ffff88808b5df9e8 (rlock-AF_UNIX){+.+.}-{2:2}, at: skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863

but task is already holding lock:
 ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&u->lock/1){+.+.}-{2:2}:
        lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
        _raw_spin_lock_nested+0x31/0x40 kernel/locking/spinlock.c:378
        sk_diag_dump_icons net/unix/diag.c:87 [inline]
        sk_diag_fill+0x6ea/0xfe0 net/unix/diag.c:157
        sk_diag_dump net/unix/diag.c:196 [inline]
        unix_diag_dump+0x3e9/0x630 net/unix/diag.c:220
        netlink_dump+0x5c1/0xcd0 net/netlink/af_netlink.c:2264
        __netlink_dump_start+0x5d7/0x780 net/netlink/af_netlink.c:2370
        netlink_dump_start include/linux/netlink.h:338 [inline]
        unix_diag_handler_dump+0x1c3/0x8f0 net/unix/diag.c:319
       sock_diag_rcv_msg+0xe3/0x400
        netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2543
        sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
        netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
        netlink_unicast+0x7e6/0x980 net/netlink/af_netlink.c:1367
        netlink_sendmsg+0xa37/0xd70 net/netlink/af_netlink.c:1908
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg net/socket.c:745 [inline]
        sock_write_iter+0x39a/0x520 net/socket.c:1160
        call_write_iter include/linux/fs.h:2085 [inline]
        new_sync_write fs/read_write.c:497 [inline]
        vfs_write+0xa74/0xca0 fs/read_write.c:590
        ksys_write+0x1a0/0x2c0 fs/read_write.c:643
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b

-> #0 (rlock-AF_UNIX){+.+.}-{2:2}:
        check_prev_add kernel/locking/lockdep.c:3134 [inline]
        check_prevs_add kernel/locking/lockdep.c:3253 [inline]
        validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869
        __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
        lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
        _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
        skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863
        unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg net/socket.c:745 [inline]
        ____sys_sendmsg+0x592/0x890 net/socket.c:2584
        ___sys_sendmsg net/socket.c:2638 [inline]
        __sys_sendmmsg+0x3b2/0x730 net/socket.c:2724
        __do_sys_sendmmsg net/socket.c:2753 [inline]
        __se_sys_sendmmsg net/socket.c:2750 [inline]
        __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&u->lock/1);
                               lock(rlock-AF_UNIX);
                               lock(&u->lock/1);
  lock(rlock-AF_UNIX);

 *** DEADLOCK ***

1 lock held by syz-executor.1/2542:
  #0: ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089

stack backtrace:
CPU: 1 PID: 2542 Comm: syz-executor.1 Not tainted 6.8.0-rc1-syzkaller-00356-g8a696a29c690 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Call Trace:
 <TASK>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
  check_noncircular+0x366/0x490 kernel/locking/lockdep.c:2187
  check_prev_add kernel/locking/lockdep.c:3134 [inline]
  check_prevs_add kernel/locking/lockdep.c:3253 [inline]
  validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869
  __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
  lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
  skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863
  unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112
  sock_sendmsg_nosec net/socket.c:730 [inline]
  __sock_sendmsg net/socket.c:745 [inline]
  ____sys_sendmsg+0x592/0x890 net/socket.c:2584
  ___sys_sendmsg net/socket.c:2638 [inline]
  __sys_sendmmsg+0x3b2/0x730 net/socket.c:2724
  __do_sys_sendmmsg net/socket.c:2753 [inline]
  __se_sys_sendmmsg net/socket.c:2750 [inline]
  __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f26d887cda9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f26d95a60c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f26d89abf80 RCX: 00007f26d887cda9
RDX: 000000000000003e RSI: 00000000200bd000 RDI: 0000000000000004
RBP: 00007f26d88c947a R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000008c0 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f26d89abf80 R15: 00007ffcfe081a68

Fixes: 2aac7a2cb0 ("unix_diag: Pending connections IDs NLA")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240130184235.1620738-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31 17:51:55 -08:00
Kuniyuki Iwashima 99a7a5b994 af_unix: Remove CONFIG_UNIX_SCM.
Originally, the code related to garbage collection was all in garbage.c.

Commit f4e65870e5 ("net: split out functions related to registering
inflight socket files") moved some functions to scm.c for io_uring and
added CONFIG_UNIX_SCM just in case AF_UNIX was built as module.

However, since commit 97154bcf4d ("af_unix: Kconfig: make CONFIG_UNIX
bool"), AF_UNIX is no longer built separately.  Also, io_uring does not
support SCM_RIGHTS now.

Let's move the functions back to garbage.c

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20240129190435.57228-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31 16:41:16 -08:00
Kuniyuki Iwashima 11498715f2 af_unix: Remove io_uring code for GC.
Since commit 705318a99a ("io_uring/af_unix: disable sending
io_uring over sockets"), io_uring's unix socket cannot be passed
via SCM_RIGHTS, so it does not contribute to cyclic reference and
no longer be candidate for garbage collection.

Also, commit 6e5e6d2749 ("io_uring: drop any code related to
SCM_RIGHTS") cleaned up SCM_RIGHTS code in io_uring.

Let's do it in AF_UNIX as well by reverting commit 0091bfc817
("io_uring/af_unix: defer registered files gc to io_uring release")
and commit 1036908045 ("net: reclaim skb->scm_io_uring bit").

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20240129190435.57228-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-31 16:41:16 -08:00
Pablo Neira Ayuso 776d451648 netfilter: nf_tables: restrict tunnel object to NFPROTO_NETDEV
Bail out on using the tunnel dst template from other than netdev family.
Add the infrastructure to check for the family in objects.

Fixes: af308b94a2 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-31 23:07:04 +01:00
David S. Miller 84fc2408cf nf-next pr 2024-01-29
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmW3ugcNHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2AP+IEADdlinxL+a5Rqx0W3I0gR4LiOrnHdl2SQesCjEE
 iBm8Fgx7pQh6jQpjsEl+dg85CFbqI4iVxgLV/uAVCOvRFELH5aR/WHjAdoXQjrTS
 55bexDCG9q9KBYCm721h2mSUTdmmx+aKfndFYMhEULzQPfDy+cS2lIh4epQPnlFH
 Idc1zXuMNWM/QY0vvwkAxsZ6TMG61GIYDAH4PtEtfCUVksdkLRPG8qWs5tJJgKFp
 SIyqKSB3Ab4LqY9e/HG0FwcrMwrSmNhcbO4CwpDfIrHEuIUtMKCqOp6X4lU1ekeb
 xVTuQ7fU64KmO+a/sS4QH8rPfDgT31GnxaVfeL7AM9pQsiLhJGMTlfFqgItJjZrS
 uch7Jtx0iWMDfuP7OgIYnS46FYD2wXShuz4wIbHI8RSEkln7GBJ2KGpnvyoF07Tf
 V6ZrGQk0TnAr7MAEXHe8rd0WEVvbZuBiVHo1xpSxKI9rGJYDdgSRz16wMdBowhIW
 Q++nacicTs8ak64vlAsigI4bnDYTNXsHQO2S84tXTikaq88m1/f9EqIVr/V2uMoR
 xTQcAaob2TqaGirS/bx/9twEuiwB/gg/nbqmVHni285SO2JbdNQ/iglopc/+EMYS
 ES3wibdQzfPL9h61KyHMGUbZke3w72Gn5X5Fp3lnoi7+ZSLMMRTBoMFv4T+DLzqJ
 dyouYw==
 =iDKQ
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-24-01-29' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:

====================
nf-next pr 2024-01-29

This batch contains updates for your *next* tree.

First three changes, from Phil Sutter, allow userspace to define
a table that is exclusively owned by a daemon (via netlink socket
aliveness) without auto-removing this table when the userspace program
exits.  Such table gets marked as orphaned and a restarting management
daemon may re-attach/reassume ownership.

Next patch, from Pablo, passes already-validated flags variable around
rather than having called code re-fetch it from netlnik message.

Patches 5 and 6 update ipvs and nf_conncount to use the recently
introduced KMEM_CACHE() macro.

Last three patches, from myself, tweak kconfig logic a little to
permit a kernel configuration that can run iptables-over-nftables
but not classic (setsockopt) iptables.

Such builds lack the builtin-filter/mangle/raw/nat/security tables,
the set/getsockopt interface and the "old blob format"
interpreter/traverser.  For now, this is 'oldconfig friendly', users
need to manually deselect existing config options for this.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-31 15:13:26 +00:00
Heiner Kallweit d80a523353 ethtool: replace struct ethtool_eee with a new struct ethtool_keee on kernel side
In order to pass EEE link modes beyond bit 32 to userspace we have to
complement the 32 bit bitmaps in struct ethtool_eee with linkmode
bitmaps. Therefore, similar to ethtool_link_settings and
ethtool_link_ksettings, add a struct ethtool_keee. In a first step
it's an identical copy of ethtool_eee. This patch simply does a
s/ethtool_eee/ethtool_keee/g for all users.
No functional change intended.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-31 12:30:47 +00:00
Phil Sutter 31bf508be6 netfilter: nf_tables: Implement table adoption support
Allow a new process to take ownership of a previously owned table,
useful mostly for firewall management services restarting or suspending
when idle.

By extending __NFT_TABLE_F_UPDATE, the on/off/on check in
nf_tables_updtable() also covers table adoption, although it is actually
not needed: Table adoption is irreversible because nf_tables_updtable()
rejects attempts to drop NFT_TABLE_F_OWNER so table->nlpid setting can
happen just once within the transaction.

If the transaction commences, table's nlpid and flags fields are already
set and no further action is required. If it aborts, the table returns
to orphaned state.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2024-01-29 15:43:20 +01:00
Jakub Kicinski 92046e83c0 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZbQV+gAKCRDbK58LschI
 g2OeAP0VvhZS9SPiS+/AMAFuw2W1BkMrFNbfBTc3nzRnyJSmNAD+NG4CLLJvsKI9
 olu7VC20B8pLTGLUGIUSwqnjOC+Kkgc=
 =wVMl
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-01-26

We've added 107 non-merge commits during the last 4 day(s) which contain
a total of 101 files changed, 6009 insertions(+), 1260 deletions(-).

The main changes are:

1) Add BPF token support to delegate a subset of BPF subsystem
   functionality from privileged system-wide daemons such as systemd
   through special mount options for userns-bound BPF fs to a trusted
   & unprivileged application. With addressed changes from Christian
   and Linus' reviews, from Andrii Nakryiko.

2) Support registration of struct_ops types from modules which helps
   projects like fuse-bpf that seeks to implement a new struct_ops type,
   from Kui-Feng Lee.

3) Add support for retrieval of cookies for perf/kprobe multi links,
   from Jiri Olsa.

4) Bigger batch of prep-work for the BPF verifier to eventually support
   preserving boundaries and tracking scalars on narrowing fills,
   from Maxim Mikityanskiy.

5) Extend the tc BPF flavor to support arbitrary TCP SYN cookies to help
   with the scenario of SYN floods, from Kuniyuki Iwashima.

6) Add code generation to inline the bpf_kptr_xchg() helper which
   improves performance when stashing/popping the allocated BPF objects,
   from Hou Tao.

7) Extend BPF verifier to track aligned ST stores as imprecise spilled
   registers, from Yonghong Song.

8) Several fixes to BPF selftests around inline asm constraints and
   unsupported VLA code generation, from Jose E. Marchesi.

9) Various updates to the BPF IETF instruction set draft document such
   as the introduction of conformance groups for instructions,
   from Dave Thaler.

10) Fix BPF verifier to make infinite loop detection in is_state_visited()
    exact to catch some too lax spill/fill corner cases,
    from Eduard Zingerman.

11) Refactor the BPF verifier pointer ALU check to allow ALU explicitly
    instead of implicitly for various register types, from Hao Sun.

12) Fix the flaky tc_redirect_dtime BPF selftest due to slowness
    in neighbor advertisement at setup time, from Martin KaFai Lau.

13) Change BPF selftests to skip callback tests for the case when the
    JIT is disabled, from Tiezhu Yang.

14) Add a small extension to libbpf which allows to auto create
    a map-in-map's inner map, from Andrey Grafin.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (107 commits)
  selftests/bpf: Add missing line break in test_verifier
  bpf, docs: Clarify definitions of various instructions
  bpf: Fix error checks against bpf_get_btf_vmlinux().
  bpf: One more maintainer for libbpf and BPF selftests
  selftests/bpf: Incorporate LSM policy to token-based tests
  selftests/bpf: Add tests for LIBBPF_BPF_TOKEN_PATH envvar
  libbpf: Support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar
  selftests/bpf: Add tests for BPF object load with implicit token
  selftests/bpf: Add BPF object loading tests with explicit token passing
  libbpf: Wire up BPF token support at BPF object level
  libbpf: Wire up token_fd into feature probing logic
  libbpf: Move feature detection code into its own file
  libbpf: Further decouple feature checking logic from bpf_object
  libbpf: Split feature detectors definitions from cached results
  selftests/bpf: Utilize string values for delegate_xxx mount options
  bpf: Support symbolic BPF FS delegation mount options
  bpf: Fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS
  bpf,selinux: Allocate bpf_security_struct per BPF token
  selftests/bpf: Add BPF token-enabled tests
  libbpf: Add BPF token support to bpf_prog_load() API
  ...
====================

Link: https://lore.kernel.org/r/20240126215710.19855-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 21:08:22 -08:00
Nicolas Dichtel e622502c31 ipmr: fix kernel panic when forwarding mcast packets
The stacktrace was:
[   86.305548] BUG: kernel NULL pointer dereference, address: 0000000000000092
[   86.306815] #PF: supervisor read access in kernel mode
[   86.307717] #PF: error_code(0x0000) - not-present page
[   86.308624] PGD 0 P4D 0
[   86.309091] Oops: 0000 [#1] PREEMPT SMP NOPTI
[   86.309883] CPU: 2 PID: 3139 Comm: pimd Tainted: G     U             6.8.0-6wind-knet #1
[   86.311027] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
[   86.312728] RIP: 0010:ip_mr_forward (/build/work/knet/net/ipv4/ipmr.c:1985)
[ 86.313399] Code: f9 1f 0f 87 85 03 00 00 48 8d 04 5b 48 8d 04 83 49 8d 44 c5 00 48 8b 40 70 48 39 c2 0f 84 d9 00 00 00 49 8b 46 58 48 83 e0 fe <80> b8 92 00 00 00 00 0f 84 55 ff ff ff 49 83 47 38 01 45 85 e4 0f
[   86.316565] RSP: 0018:ffffad21c0583ae0 EFLAGS: 00010246
[   86.317497] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   86.318596] RDX: ffff9559cb46c000 RSI: 0000000000000000 RDI: 0000000000000000
[   86.319627] RBP: ffffad21c0583b30 R08: 0000000000000000 R09: 0000000000000000
[   86.320650] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[   86.321672] R13: ffff9559c093a000 R14: ffff9559cc00b800 R15: ffff9559c09c1d80
[   86.322873] FS:  00007f85db661980(0000) GS:ffff955a79d00000(0000) knlGS:0000000000000000
[   86.324291] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   86.325314] CR2: 0000000000000092 CR3: 000000002f13a000 CR4: 0000000000350ef0
[   86.326589] Call Trace:
[   86.327036]  <TASK>
[   86.327434] ? show_regs (/build/work/knet/arch/x86/kernel/dumpstack.c:479)
[   86.328049] ? __die (/build/work/knet/arch/x86/kernel/dumpstack.c:421 /build/work/knet/arch/x86/kernel/dumpstack.c:434)
[   86.328508] ? page_fault_oops (/build/work/knet/arch/x86/mm/fault.c:707)
[   86.329107] ? do_user_addr_fault (/build/work/knet/arch/x86/mm/fault.c:1264)
[   86.329756] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
[   86.330350] ? __irq_work_queue_local (/build/work/knet/kernel/irq_work.c:111 (discriminator 1))
[   86.331013] ? exc_page_fault (/build/work/knet/./arch/x86/include/asm/paravirt.h:693 /build/work/knet/arch/x86/mm/fault.c:1515 /build/work/knet/arch/x86/mm/fault.c:1563)
[   86.331702] ? asm_exc_page_fault (/build/work/knet/./arch/x86/include/asm/idtentry.h:570)
[   86.332468] ? ip_mr_forward (/build/work/knet/net/ipv4/ipmr.c:1985)
[   86.333183] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
[   86.333920] ipmr_mfc_add (/build/work/knet/./include/linux/rcupdate.h:782 /build/work/knet/net/ipv4/ipmr.c:1009 /build/work/knet/net/ipv4/ipmr.c:1273)
[   86.334583] ? __pfx_ipmr_hash_cmp (/build/work/knet/net/ipv4/ipmr.c:363)
[   86.335357] ip_mroute_setsockopt (/build/work/knet/net/ipv4/ipmr.c:1470)
[   86.336135] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
[   86.336854] ? ip_mroute_setsockopt (/build/work/knet/net/ipv4/ipmr.c:1470)
[   86.337679] do_ip_setsockopt (/build/work/knet/net/ipv4/ip_sockglue.c:944)
[   86.338408] ? __pfx_unix_stream_read_actor (/build/work/knet/net/unix/af_unix.c:2862)
[   86.339232] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
[   86.339809] ? aa_sk_perm (/build/work/knet/security/apparmor/include/cred.h:153 /build/work/knet/security/apparmor/net.c:181)
[   86.340342] ip_setsockopt (/build/work/knet/net/ipv4/ip_sockglue.c:1415)
[   86.340859] raw_setsockopt (/build/work/knet/net/ipv4/raw.c:836)
[   86.341408] ? security_socket_setsockopt (/build/work/knet/security/security.c:4561 (discriminator 13))
[   86.342116] sock_common_setsockopt (/build/work/knet/net/core/sock.c:3716)
[   86.342747] do_sock_setsockopt (/build/work/knet/net/socket.c:2313)
[   86.343363] __sys_setsockopt (/build/work/knet/./include/linux/file.h:32 /build/work/knet/net/socket.c:2336)
[   86.344020] __x64_sys_setsockopt (/build/work/knet/net/socket.c:2340)
[   86.344766] do_syscall_64 (/build/work/knet/arch/x86/entry/common.c:52 /build/work/knet/arch/x86/entry/common.c:83)
[   86.345433] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
[   86.346161] ? syscall_exit_work (/build/work/knet/./include/linux/audit.h:357 /build/work/knet/kernel/entry/common.c:160)
[   86.346938] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
[   86.347657] ? syscall_exit_to_user_mode (/build/work/knet/kernel/entry/common.c:215)
[   86.348538] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
[   86.349262] ? do_syscall_64 (/build/work/knet/./arch/x86/include/asm/cpufeature.h:171 /build/work/knet/arch/x86/entry/common.c:98)
[   86.349971] entry_SYSCALL_64_after_hwframe (/build/work/knet/arch/x86/entry/entry_64.S:129)

The original packet in ipmr_cache_report() may be queued and then forwarded
with ip_mr_forward(). This last function has the assumption that the skb
dst is set.

After the below commit, the skb dst is dropped by ipv4_pktinfo_prepare(),
which causes the oops.

Fixes: bb7403655b ("ipmr: support IP_PKTINFO on cache report IGMP msg")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240125141847.1931933-1-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 21:05:26 -08:00
Kuniyuki Iwashima d9f21b3613 af_unix: Try to run GC async.
If more than 16000 inflight AF_UNIX sockets exist and the garbage
collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc().
Also, they wait for unix_gc() to complete.

In unix_gc(), all inflight AF_UNIX sockets are traversed at least once,
and more if they are the GC candidate.  Thus, sendmsg() significantly
slows down with too many inflight AF_UNIX sockets.

However, if a process sends data with no AF_UNIX FD, the sendmsg() call
does not need to wait for GC.  After this change, only the process that
meets the condition below will be blocked under such a situation.

  1) cmsg contains AF_UNIX socket
  2) more than 32 AF_UNIX sent by the same user are still inflight

Note that even a sendmsg() call that does not meet the condition but has
AF_UNIX FD will be blocked later in unix_scm_to_skb() by the spinlock,
but we allow that as a bonus for sane users.

The results below are the time spent in unix_dgram_sendmsg() sending 1
byte of data with no FD 4096 times on a host where 32K inflight AF_UNIX
sockets exist.

Without series: the sane sendmsg() needs to wait gc unreasonably.

  $ sudo /usr/share/bcc/tools/funclatency -p 11165 unix_dgram_sendmsg
  Tracing 1 functions for "unix_dgram_sendmsg"... Hit Ctrl-C to end.
  ^C
       nsecs               : count     distribution
  [...]
      524288 -> 1048575    : 0        |                                        |
     1048576 -> 2097151    : 3881     |****************************************|
     2097152 -> 4194303    : 214      |**                                      |
     4194304 -> 8388607    : 1        |                                        |

  avg = 1825567 nsecs, total: 7477526027 nsecs, count: 4096

With series: the sane sendmsg() can finish much faster.

  $ sudo /usr/share/bcc/tools/funclatency -p 8702  unix_dgram_sendmsg
  Tracing 1 functions for "unix_dgram_sendmsg"... Hit Ctrl-C to end.
  ^C
       nsecs               : count     distribution
  [...]
         128 -> 255        : 0        |                                        |
         256 -> 511        : 4092     |****************************************|
         512 -> 1023       : 2        |                                        |
        1024 -> 2047       : 0        |                                        |
        2048 -> 4095       : 0        |                                        |
        4096 -> 8191       : 1        |                                        |
        8192 -> 16383      : 1        |                                        |

  avg = 410 nsecs, total: 1680510 nsecs, count: 4096

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240123170856.41348-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 20:34:25 -08:00
Kuniyuki Iwashima 5b17307bd0 af_unix: Return struct unix_sock from unix_get_socket().
Currently, unix_get_socket() returns struct sock, but after calling
it, we always cast it to unix_sk().

Let's return struct unix_sock from unix_get_socket().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240123170856.41348-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 20:34:25 -08:00
Kuniyuki Iwashima 97af84a6bb af_unix: Do not use atomic ops for unix_sk(sk)->inflight.
When touching unix_sk(sk)->inflight, we are always under
spin_lock(&unix_gc_lock).

Let's convert unix_sk(sk)->inflight to the normal unsigned long.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240123170856.41348-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 20:34:24 -08:00
Jeff Johnson a923ff876f Revert "nl80211/cfg80211: Specify band specific min RSSI thresholds with sched scan"
This *mostly* reverts commit 1e1b11b6a1 ("nl80211/cfg80211: Specify
band specific min RSSI thresholds with sched scan").

During the review of a new patch [1] it was observed that the
functionality being modified was not actually being used by any
in-tree driver. Further research determined that the functionality was
originally introduced to support a new Android interface, but that
interface was subsequently abandoned. Since the functionality has
apparently never been used, remove it. However, to mantain the
sanctity of the UABI, keep the nl80211.h assignments, but clearly mark
them as obsolete.

Cc: Lin Ma <linma@zju.edu.cn>
Cc: Vamsi Krishna <quic_vamsin@quicinc.com>
Link: https://lore.kernel.org/linux-wireless/20240119151201.8670-1-linma@zju.edu.cn/ [1]
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://msgid.link/20240125-for-next-v1-1-fd79e01c6c09@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-26 09:49:46 +01:00
Johannes Berg 3b220ed8b2 wifi: mac80211: add support for SPP A-MSDUs
If software crypto is used, simply add support for SPP A-MSDUs
(and use it whenever enabled as required by the cfg80211 API).

If hardware crypto is used, leave it up to the driver to set
the NL80211_EXT_FEATURE_SPP_AMSDU_SUPPORT flag and then check
sta->spp_amsdu or the IEEE80211_KEY_FLAG_SPP_AMSDU key flag.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240102213313.b8ada4514e2b.I1ac25d5f158165b5a88062a5a5e4c4fbeecf9a5d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-26 09:39:49 +01:00
Johannes Berg 2518e89d5b wifi: cfg80211: add support for SPP A-MSDUs
Add SPP (signaling and payload protected) AMSDU support.

Since userspace has to build the RSNX element, add an extended
feature flag to indicate that this is supported.

In order to avoid downgrade/mismatch attacks, add a flag to the assoc
command on the station side, so that we can be sure that the value of
the flag comes from the same RSNX element that will be validated by
the supplicant against the 4-way-handshake. If we just pulled the
data out of a beacon/probe response, we could theoretically look an
RSNX element from a different frame, with a different value for this
flag, than the supplicant is using to validate in the
4-way-handshake.

Note that this patch is only geared towards software crypto
implementations or hardware ones that can perfectly implement SPP
A-MSDUs, i.e. are able to switch the AAD construction on the fly for
each TX/RX frame.

For more limited hardware implementations, more capability
advertisement  would be required, e.g. if the hardware has no way
to switch this on the fly but has only a global configuration that
must apply to all stations.

The driver could of course *reject* mismatches, but the supplicant
must know so it can do things like not negotiating SPP A-MSDUs on
a T-DLS link when connected to an AP that doesn't support it, or
similar.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240102213313.fadac8df7030.I9240aebcba1be49636a73c647ed0af862713fc6f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-26 09:39:49 +01:00
Ayala Beker f7660b3f58 wifi: mac80211: add support for negotiated TTLM request
Update neg_ttlm and active_links according to the new mapping,
and send a negotiated TID-to-link map request with the new mapping.

Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240102213313.eeb385d771df.I2a5441c14421de884dbd93d1624ce7bb2c944833@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-26 09:39:49 +01:00
Ayala Beker 8f500fbc6c wifi: mac80211: process and save negotiated TID to Link mapping request
An MLD may send TID-to-Link mapping request frame to negotiate
TID to link mapping with a peer MLD.
Support handling negotiated TID-to-Link mapping request frame
by parsing the frame, asking the driver whether it supports the
received mapping or not, and sending a TID-to-Link mapping response
to the AP MLD.
Theoretically, links that became inactive due to the received TID-to-Link
mapping request, can be selected to be activated but this would require
tearing down the negotiated TID-to-Link mapping, which is still not
supported.

Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240102213313.0bc1a24fcc9d.Ie72e47dc6f8c77d4a2f0947b775ef6367fe0edac@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-26 09:39:48 +01:00
Jakub Kicinski 06f609b311 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 14:20:08 -08:00
Paolo Abeni fdf8e6d18c bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZbIcmAAKCRDbK58LschI
 gw7OAP4xKTkfcHSZUWcIOWgmdCX41B1zUPdyH2w3woHVH8kKRgEApWisrUKZq30Q
 7seUBxVNK1HXBS+egqqRT7Rgs4iFhQs=
 =iBo/
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-01-25

The following pull-request contains BPF updates for your *net* tree.

We've added 12 non-merge commits during the last 2 day(s) which contain
a total of 13 files changed, 190 insertions(+), 91 deletions(-).

The main changes are:

1) Fix bpf_xdp_adjust_tail() in context of XSK zero-copy drivers which
   support XDP multi-buffer. The former triggered a NULL pointer
   dereference upon shrinking, from Maciej Fijalkowski & Tirthendu Sarkar.

2) Fix a bug in riscv64 BPF JIT which emitted a wrong prologue and
   epilogue for struct_ops programs, from Pu Lehui.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
  i40e: set xdp_rxq_info::frag_size
  xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL
  ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue
  intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers
  ice: remove redundant xdp_rxq_info registration
  i40e: handle multi-buffer packets that are shrunk by xdp prog
  ice: work on pre-XDP prog frag count
  xsk: fix usage of multi-buffer BPF helpers for ZC XDP
  xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
  xsk: recycle buffer in case Rx queue was full
  riscv, bpf: Fix unpredictable kernel crash about RV64 struct_ops
====================

Link: https://lore.kernel.org/r/20240125084416.10876-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25 11:30:31 +01:00
Maciej Fijalkowski c5114710c8 xsk: fix usage of multi-buffer BPF helpers for ZC XDP
Currently when packet is shrunk via bpf_xdp_adjust_tail() and memory
type is set to MEM_TYPE_XSK_BUFF_POOL, null ptr dereference happens:

[1136314.192256] BUG: kernel NULL pointer dereference, address:
0000000000000034
[1136314.203943] #PF: supervisor read access in kernel mode
[1136314.213768] #PF: error_code(0x0000) - not-present page
[1136314.223550] PGD 0 P4D 0
[1136314.230684] Oops: 0000 [#1] PREEMPT SMP NOPTI
[1136314.239621] CPU: 8 PID: 54203 Comm: xdpsock Not tainted 6.6.0+ #257
[1136314.250469] Hardware name: Intel Corporation S2600WFT/S2600WFT,
BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[1136314.265615] RIP: 0010:__xdp_return+0x6c/0x210
[1136314.274653] Code: ad 00 48 8b 47 08 49 89 f8 a8 01 0f 85 9b 01 00 00 0f 1f 44 00 00 f0 41 ff 48 34 75 32 4c 89 c7 e9 79 cd 80 ff 83 fe 03 75 17 <f6> 41 34 01 0f 85 02 01 00 00 48 89 cf e9 22 cc 1e 00 e9 3d d2 86
[1136314.302907] RSP: 0018:ffffc900089f8db0 EFLAGS: 00010246
[1136314.312967] RAX: ffffc9003168aed0 RBX: ffff8881c3300000 RCX:
0000000000000000
[1136314.324953] RDX: 0000000000000000 RSI: 0000000000000003 RDI:
ffffc9003168c000
[1136314.336929] RBP: 0000000000000ae0 R08: 0000000000000002 R09:
0000000000010000
[1136314.348844] R10: ffffc9000e495000 R11: 0000000000000040 R12:
0000000000000001
[1136314.360706] R13: 0000000000000524 R14: ffffc9003168aec0 R15:
0000000000000001
[1136314.373298] FS:  00007f8df8bbcb80(0000) GS:ffff8897e0e00000(0000)
knlGS:0000000000000000
[1136314.386105] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1136314.396532] CR2: 0000000000000034 CR3: 00000001aa912002 CR4:
00000000007706f0
[1136314.408377] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[1136314.420173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[1136314.431890] PKRU: 55555554
[1136314.439143] Call Trace:
[1136314.446058]  <IRQ>
[1136314.452465]  ? __die+0x20/0x70
[1136314.459881]  ? page_fault_oops+0x15b/0x440
[1136314.468305]  ? exc_page_fault+0x6a/0x150
[1136314.476491]  ? asm_exc_page_fault+0x22/0x30
[1136314.484927]  ? __xdp_return+0x6c/0x210
[1136314.492863]  bpf_xdp_adjust_tail+0x155/0x1d0
[1136314.501269]  bpf_prog_ccc47ae29d3b6570_xdp_sock_prog+0x15/0x60
[1136314.511263]  ice_clean_rx_irq_zc+0x206/0xc60 [ice]
[1136314.520222]  ? ice_xmit_zc+0x6e/0x150 [ice]
[1136314.528506]  ice_napi_poll+0x467/0x670 [ice]
[1136314.536858]  ? ttwu_do_activate.constprop.0+0x8f/0x1a0
[1136314.546010]  __napi_poll+0x29/0x1b0
[1136314.553462]  net_rx_action+0x133/0x270
[1136314.561619]  __do_softirq+0xbe/0x28e
[1136314.569303]  do_softirq+0x3f/0x60

This comes from __xdp_return() call with xdp_buff argument passed as
NULL which is supposed to be consumed by xsk_buff_free() call.

To address this properly, in ZC case, a node that represents the frag
being removed has to be pulled out of xskb_list. Introduce
appropriate xsk helpers to do such node operation and use them
accordingly within bpf_xdp_adjust_tail().

Fixes: 24ea50127e ("xsk: support mbuf on ZC RX")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> # For the xsk header part
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-4-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:24:06 -08:00
Maciej Fijalkowski f7f6aa8e24 xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
XDP multi-buffer support introduced XDP_FLAGS_HAS_FRAGS flag that is
used by drivers to notify data path whether xdp_buff contains fragments
or not. Data path looks up mentioned flag on first buffer that occupies
the linear part of xdp_buff, so drivers only modify it there. This is
sufficient for SKB and XDP_DRV modes as usually xdp_buff is allocated on
stack or it resides within struct representing driver's queue and
fragments are carried via skb_frag_t structs. IOW, we are dealing with
only one xdp_buff.

ZC mode though relies on list of xdp_buff structs that is carried via
xsk_buff_pool::xskb_list, so ZC data path has to make sure that
fragments do *not* have XDP_FLAGS_HAS_FRAGS set. Otherwise,
xsk_buff_free() could misbehave if it would be executed against xdp_buff
that carries a frag with XDP_FLAGS_HAS_FRAGS flag set. Such scenario can
take place when within supplied XDP program bpf_xdp_adjust_tail() is
used with negative offset that would in turn release the tail fragment
from multi-buffer frame.

Calling xsk_buff_free() on tail fragment with XDP_FLAGS_HAS_FRAGS would
result in releasing all the nodes from xskb_list that were produced by
driver before XDP program execution, which is not what is intended -
only tail fragment should be deleted from xskb_list and then it should
be put onto xsk_buff_pool::free_list. Such multi-buffer frame will never
make it up to user space, so from AF_XDP application POV there would be
no traffic running, however due to free_list getting constantly new
nodes, driver will be able to feed HW Rx queue with recycled buffers.
Bottom line is that instead of traffic being redirected to user space,
it would be continuously dropped.

To fix this, let us clear the mentioned flag on xsk_buff_pool side
during xdp_buff initialization, which is what should have been done
right from the start of XSK multi-buffer support.

Fixes: 1bbc04de60 ("ice: xsk: add RX multi-buffer support")
Fixes: 1c9ba9c146 ("i40e: xsk: add RX multi-buffer support")
Fixes: 24ea50127e ("xsk: support mbuf on ZC RX")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-3-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:24:06 -08:00
George Guo b253d87fd7 netfilter: nf_tables: cleanup documentation
- Correct comments for nlpid, family, udlen and udata in struct nft_table,
  and afinfo is no longer a member of enum nft_set_class.

- Add comment for data in struct nft_set_elem.

- Add comment for flags in struct nft_ctx.

- Add comments for timeout in struct nft_set_iter, and flags is not a
  member of struct nft_set_iter, remove the comment for it.

- Add comments for commit, abort, estimate and gc_init in struct
  nft_set_ops.

- Add comments for pending_update, num_exprs, exprs and catchall_list
  in struct nft_set.

- Add comment for ext_len in struct nft_set_ext_tmpl.

- Add comment for inner_ops in struct nft_expr_type.

- Add comments for clone, destroy_clone, reduce, gc, offload,
  offload_action, offload_stats in struct nft_expr_ops.

- Add comments for blob_gen_0, blob_gen_1, bound, genmask, udlen, udata,
  blob_next in struct nft_chain.

- Add comment for flags in struct nft_base_chain.

- Add comments for udlen, udata in struct nft_object.

- Add comment for type in struct nft_object_ops.

- Add comment for hook_list in struct nft_flowtable, and remove comments
  for dev_name and ops which are not members of struct nft_flowtable.

Signed-off-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-24 19:50:20 +01:00
Ido Schimmel 32f2a0afa9 net/sched: flower: Fix chain template offload
When a qdisc is deleted from a net device the stack instructs the
underlying driver to remove its flow offload callback from the
associated filter block using the 'FLOW_BLOCK_UNBIND' command. The stack
then continues to replay the removal of the filters in the block for
this driver by iterating over the chains in the block and invoking the
'reoffload' operation of the classifier being used. In turn, the
classifier in its 'reoffload' operation prepares and emits a
'FLOW_CLS_DESTROY' command for each filter.

However, the stack does not do the same for chain templates and the
underlying driver never receives a 'FLOW_CLS_TMPLT_DESTROY' command when
a qdisc is deleted. This results in a memory leak [1] which can be
reproduced using [2].

Fix by introducing a 'tmplt_reoffload' operation and have the stack
invoke it with the appropriate arguments as part of the replay.
Implement the operation in the sole classifier that supports chain
templates (flower) by emitting the 'FLOW_CLS_TMPLT_{CREATE,DESTROY}'
command based on whether a flow offload callback is being bound to a
filter block or being unbound from one.

As far as I can tell, the issue happens since cited commit which
reordered tcf_block_offload_unbind() before tcf_block_flush_all_chains()
in __tcf_block_put(). The order cannot be reversed as the filter block
is expected to be freed after flushing all the chains.

[1]
unreferenced object 0xffff888107e28800 (size 2048):
  comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s)
  hex dump (first 32 bytes):
    b1 a6 7c 11 81 88 ff ff e0 5b b3 10 81 88 ff ff  ..|......[......
    01 00 00 00 00 00 00 00 e0 aa b0 84 ff ff ff ff  ................
  backtrace:
    [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320
    [<ffffffff81ab374e>] __kmalloc+0x4e/0x90
    [<ffffffff832aec6d>] mlxsw_sp_acl_ruleset_get+0x34d/0x7a0
    [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180
    [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280
    [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340
    [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0
    [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170
    [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0
    [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440
    [<ffffffff83ac6270>] netlink_unicast+0x540/0x820
    [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0
    [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80
    [<ffffffff8379d29a>] ___sys_sendmsg+0x13a/0x1e0
    [<ffffffff8379d50c>] __sys_sendmsg+0x11c/0x1f0
    [<ffffffff843b9ce0>] do_syscall_64+0x40/0xe0
unreferenced object 0xffff88816d2c0400 (size 1024):
  comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s)
  hex dump (first 32 bytes):
    40 00 00 00 00 00 00 00 57 f6 38 be 00 00 00 00  @.......W.8.....
    10 04 2c 6d 81 88 ff ff 10 04 2c 6d 81 88 ff ff  ..,m......,m....
  backtrace:
    [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320
    [<ffffffff81ab36c1>] __kmalloc_node+0x51/0x90
    [<ffffffff81a8ed96>] kvmalloc_node+0xa6/0x1f0
    [<ffffffff82827d03>] bucket_table_alloc.isra.0+0x83/0x460
    [<ffffffff82828d2b>] rhashtable_init+0x43b/0x7c0
    [<ffffffff832aed48>] mlxsw_sp_acl_ruleset_get+0x428/0x7a0
    [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180
    [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280
    [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340
    [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0
    [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170
    [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0
    [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440
    [<ffffffff83ac6270>] netlink_unicast+0x540/0x820
    [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0
    [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80

[2]
 # tc qdisc add dev swp1 clsact
 # tc chain add dev swp1 ingress proto ip chain 1 flower dst_ip 0.0.0.0/32
 # tc qdisc del dev swp1 clsact
 # devlink dev reload pci/0000:06:00.0

Fixes: bbf73830cd ("net: sched: traverse chains in block with tcf_get_next_chain()")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-24 01:33:59 +00:00
Breno Leitao 20df28fb5b net/ipv6: resolve warning in ip6_fib.c
In some configurations, the 'iter' variable in function
fib6_repair_tree() is unused, resulting the following warning when
compiled with W=1.

    net/ipv6/ip6_fib.c:1781:6: warning: variable 'iter' set but not used [-Wunused-but-set-variable]
     1781 |         int iter = 0;
	  |             ^

It is unclear what is the advantage of this RT6_TRACE() macro[1], since
users can control pr_debug() in runtime, which is better than at
compilation time. pr_debug() has no overhead when disabled.

Remove the RT6_TRACE() in favor of simple pr_debug() helpers.

[1] Link: https://lore.kernel.org/all/ZZwSEJv2HgI0cD4J@gmail.com/
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240122181955.2391676-2-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-23 17:22:23 -08:00
Kuniyuki Iwashima b3f086a7a1 bpf: Define struct bpf_tcp_req_attrs when CONFIG_SYN_COOKIES=n.
kernel test robot reported the warning below:

  >> net/core/filter.c:11842:13: warning: declaration of 'struct bpf_tcp_req_attrs' will not be visible outside of this function [-Wvisibility]
      11842 |                                         struct bpf_tcp_req_attrs *attrs, int attrs__sz)
            |                                                ^
     1 warning generated.

struct bpf_tcp_req_attrs is defined under CONFIG_SYN_COOKIES
but used in kfunc without the config.

Let's move struct bpf_tcp_req_attrs definition outside of
CONFIG_SYN_COOKIES guard.

Fixes: e472f88891 ("bpf: tcp: Support arbitrary SYN Cookie.")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401180418.CUVc0hxF-lkp@intel.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240118211751.25790-1-kuniyu@amazon.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23 15:08:03 -08:00
Kuniyuki Iwashima e472f88891 bpf: tcp: Support arbitrary SYN Cookie.
This patch adds a new kfunc available at TC hook to support arbitrary
SYN Cookie.

The basic usage is as follows:

    struct bpf_tcp_req_attrs attrs = {
        .mss = mss,
        .wscale_ok = wscale_ok,
        .rcv_wscale = rcv_wscale, /* Server's WScale < 15 */
        .snd_wscale = snd_wscale, /* Client's WScale < 15 */
        .tstamp_ok = tstamp_ok,
        .rcv_tsval = tsval,
        .rcv_tsecr = tsecr, /* Server's Initial TSval */
        .usec_ts_ok = usec_ts_ok,
        .sack_ok = sack_ok,
        .ecn_ok = ecn_ok,
    }

    skc = bpf_skc_lookup_tcp(...);
    sk = (struct sock *)bpf_skc_to_tcp_sock(skc);
    bpf_sk_assign_tcp_reqsk(skb, sk, attrs, sizeof(attrs));
    bpf_sk_release(skc);

bpf_sk_assign_tcp_reqsk() takes skb, a listener sk, and struct
bpf_tcp_req_attrs and allocates reqsk and configures it.  Then,
bpf_sk_assign_tcp_reqsk() links reqsk with skb and the listener.

The notable thing here is that we do not hold refcnt for both reqsk
and listener.  To differentiate that, we mark reqsk->syncookie, which
is only used in TX for now.  So, if reqsk->syncookie is 1 in RX, it
means that the reqsk is allocated by kfunc.

When skb is freed, sock_pfree() checks if reqsk->syncookie is 1,
and in that case, we set NULL to reqsk->rsk_listener before calling
reqsk_free() as reqsk does not hold a refcnt of the listener.

When the TCP stack looks up a socket from the skb, we steal the
listener from the reqsk in skb_steal_sock() and create a full sk
in cookie_v[46]_check().

The refcnt of reqsk will finally be set to 1 in tcp_get_cookie_sock()
after creating a full sk.

Note that we can extend struct bpf_tcp_req_attrs in the future when
we add a new attribute that is determined in 3WHS.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240115205514.68364-6-kuniyu@amazon.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23 14:40:24 -08:00
Kuniyuki Iwashima 695751e31a bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().
We will support arbitrary SYN Cookie with BPF in the following
patch.

If BPF prog validates ACK and kfunc allocates a reqsk, it will
be carried to cookie_[46]_check() as skb->sk.  If skb->sk is not
NULL, we call cookie_bpf_check().

Then, we clear skb->sk and skb->destructor, which are needed not
to hold refcnt for reqsk and the listener.  See the following patch
for details.

After that, we finish initialisation for the remaining fields with
cookie_tcp_reqsk_init().

Note that the server side WScale is set only for non-BPF SYN Cookie.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240115205514.68364-5-kuniyu@amazon.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23 14:40:24 -08:00
Kuniyuki Iwashima 8b5ac68fb5 bpf: tcp: Handle BPF SYN Cookie in skb_steal_sock().
We will support arbitrary SYN Cookie with BPF.

If BPF prog validates ACK and kfunc allocates a reqsk, it will
be carried to TCP stack as skb->sk with req->syncookie 1.  Also,
the reqsk has its listener as req->rsk_listener with no refcnt
taken.

When the TCP stack looks up a socket from the skb, we steal
inet_reqsk(skb->sk)->rsk_listener in skb_steal_sock() so that
the skb will be processed in cookie_v[46]_check() with the
listener.

Note that we do not clear skb->sk and skb->destructor so that we
can carry the reqsk to cookie_v[46]_check().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240115205514.68364-4-kuniyu@amazon.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23 14:40:24 -08:00
Kuniyuki Iwashima 95e752b529 tcp: Move skb_steal_sock() to request_sock.h
We will support arbitrary SYN Cookie with BPF.

If BPF prog validates ACK and kfunc allocates a reqsk, it will
be carried to TCP stack as skb->sk with req->syncookie 1.

In skb_steal_sock(), we need to check inet_reqsk(sk)->syncookie
to see if the reqsk is created by kfunc.  However, inet_reqsk()
is not available in sock.h.

Let's move skb_steal_sock() to request_sock.h.

While at it, we refactor skb_steal_sock() so it returns early if
skb->sk is NULL to minimise the following patch.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240115205514.68364-3-kuniyu@amazon.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23 14:40:23 -08:00
Kuniyuki Iwashima b18afb6f42 tcp: Move tcp_ns_to_ts() to tcp.h
We will support arbitrary SYN Cookie with BPF.

When BPF prog validates ACK and kfunc allocates a reqsk, we need
to call tcp_ns_to_ts() to calculate an offset of TSval for later
use:

  time
  t0 : Send SYN+ACK
       -> tsval = Initial TSval (Random Number)

  t1 : Recv ACK of 3WHS
       -> tsoff = TSecr - tcp_ns_to_ts(usec_ts_ok, tcp_clock_ns())
                = Initial TSval - t1

  t2 : Send ACK
       -> tsval = t2 + tsoff
                = Initial TSval + (t2 - t1)
                = Initial TSval + Time Delta (x)

  (x) Note that the time delta does not include the initial RTT
      from t0 to t1.

Let's move tcp_ns_to_ts() to tcp.h.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240115205514.68364-2-kuniyu@amazon.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23 14:40:23 -08:00
Eric Dumazet a54d51fb2d udp: fix busy polling
Generic sk_busy_loop_end() only looks at sk->sk_receive_queue
for presence of packets.

Problem is that for UDP sockets after blamed commit, some packets
could be present in another queue: udp_sk(sk)->reader_queue

In some cases, a busy poller could spin until timeout expiration,
even if some packets are available in udp_sk(sk)->reader_queue.

v3: - make sk_busy_loop_end() nicer (Willem)

v2: - add a READ_ONCE(sk->sk_family) in sk_is_inet() to avoid KCSAN splats.
    - add a sk_is_inet() check in sk_is_udp() (Willem feedback)
    - add a sk_is_inet() check in sk_is_tcp().

Fixes: 2276f58ac5 ("udp: use a separate rx queue for packet reception")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-21 18:09:30 +00:00
Kuniyuki Iwashima e3f9bed9be llc: Drop support for ETH_P_TR_802_2.
syzbot reported an uninit-value bug below. [0]

llc supports ETH_P_802_2 (0x0004) and used to support ETH_P_TR_802_2
(0x0011), and syzbot abused the latter to trigger the bug.

  write$tun(r0, &(0x7f0000000040)={@val={0x0, 0x11}, @val, @mpls={[], @llc={@snap={0xaa, 0x1, ')', "90e5dd"}}}}, 0x16)

llc_conn_handler() initialises local variables {saddr,daddr}.mac
based on skb in llc_pdu_decode_sa()/llc_pdu_decode_da() and passes
them to __llc_lookup().

However, the initialisation is done only when skb->protocol is
htons(ETH_P_802_2), otherwise, __llc_lookup_established() and
__llc_lookup_listener() will read garbage.

The missing initialisation existed prior to commit 211ed86510
("net: delete all instances of special processing for token ring").

It removed the part to kick out the token ring stuff but forgot to
close the door allowing ETH_P_TR_802_2 packets to sneak into llc_rcv().

Let's remove llc_tr_packet_type and complete the deprecation.

[0]:
BUG: KMSAN: uninit-value in __llc_lookup_established+0xe9d/0xf90
 __llc_lookup_established+0xe9d/0xf90
 __llc_lookup net/llc/llc_conn.c:611 [inline]
 llc_conn_handler+0x4bd/0x1360 net/llc/llc_conn.c:791
 llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206
 __netif_receive_skb_one_core net/core/dev.c:5527 [inline]
 __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5641
 netif_receive_skb_internal net/core/dev.c:5727 [inline]
 netif_receive_skb+0x58/0x660 net/core/dev.c:5786
 tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1555
 tun_get_user+0x53af/0x66d0 drivers/net/tun.c:2002
 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
 call_write_iter include/linux/fs.h:2020 [inline]
 new_sync_write fs/read_write.c:491 [inline]
 vfs_write+0x8ef/0x1490 fs/read_write.c:584
 ksys_write+0x20f/0x4c0 fs/read_write.c:637
 __do_sys_write fs/read_write.c:649 [inline]
 __se_sys_write fs/read_write.c:646 [inline]
 __x64_sys_write+0x93/0xd0 fs/read_write.c:646
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Local variable daddr created at:
 llc_conn_handler+0x53/0x1360 net/llc/llc_conn.c:783
 llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206

CPU: 1 PID: 5004 Comm: syz-executor994 Not tainted 6.6.0-syzkaller-14500-g1c41041124bd #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023

Fixes: 211ed86510 ("net: delete all instances of special processing for token ring")
Reported-by: syzbot+b5ad66046b913bc04c6f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b5ad66046b913bc04c6f
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240119015515.61898-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-19 21:30:09 -08:00
Zhengchao Shao 198bc90e0e tcp: make sure init the accept_queue's spinlocks once
When I run syz's reproduction C program locally, it causes the following
issue:
pvqspinlock: lock 0xffff9d181cd5c660 has corrupted value 0x0!
WARNING: CPU: 19 PID: 21160 at __pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508)
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:__pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508)
Code: 73 56 3a ff 90 c3 cc cc cc cc 8b 05 bb 1f 48 01 85 c0 74 05 c3 cc cc cc cc 8b 17 48 89 fe 48 c7 c7
30 20 ce 8f e8 ad 56 42 ff <0f> 0b c3 cc cc cc cc 0f 0b 0f 1f 40 00 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffa8d200604cb8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9d1ef60e0908
RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9d1ef60e0900
RBP: ffff9d181cd5c280 R08: 0000000000000000 R09: 00000000ffff7fff
R10: ffffa8d200604b68 R11: ffffffff907dcdc8 R12: 0000000000000000
R13: ffff9d181cd5c660 R14: ffff9d1813a3f330 R15: 0000000000001000
FS:  00007fa110184640(0000) GS:ffff9d1ef60c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000000 CR3: 000000011f65e000 CR4: 00000000000006f0
Call Trace:
<IRQ>
  _raw_spin_unlock (kernel/locking/spinlock.c:186)
  inet_csk_reqsk_queue_add (net/ipv4/inet_connection_sock.c:1321)
  inet_csk_complete_hashdance (net/ipv4/inet_connection_sock.c:1358)
  tcp_check_req (net/ipv4/tcp_minisocks.c:868)
  tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2260)
  ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205)
  ip_local_deliver_finish (net/ipv4/ip_input.c:234)
  __netif_receive_skb_one_core (net/core/dev.c:5529)
  process_backlog (./include/linux/rcupdate.h:779)
  __napi_poll (net/core/dev.c:6533)
  net_rx_action (net/core/dev.c:6604)
  __do_softirq (./arch/x86/include/asm/jump_label.h:27)
  do_softirq (kernel/softirq.c:454 kernel/softirq.c:441)
</IRQ>
<TASK>
  __local_bh_enable_ip (kernel/softirq.c:381)
  __dev_queue_xmit (net/core/dev.c:4374)
  ip_finish_output2 (./include/net/neighbour.h:540 net/ipv4/ip_output.c:235)
  __ip_queue_xmit (net/ipv4/ip_output.c:535)
  __tcp_transmit_skb (net/ipv4/tcp_output.c:1462)
  tcp_rcv_synsent_state_process (net/ipv4/tcp_input.c:6469)
  tcp_rcv_state_process (net/ipv4/tcp_input.c:6657)
  tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1929)
  __release_sock (./include/net/sock.h:1121 net/core/sock.c:2968)
  release_sock (net/core/sock.c:3536)
  inet_wait_for_connect (net/ipv4/af_inet.c:609)
  __inet_stream_connect (net/ipv4/af_inet.c:702)
  inet_stream_connect (net/ipv4/af_inet.c:748)
  __sys_connect (./include/linux/file.h:45 net/socket.c:2064)
  __x64_sys_connect (net/socket.c:2073 net/socket.c:2070 net/socket.c:2070)
  do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82)
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
  RIP: 0033:0x7fa10ff05a3d
  Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89
  c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ab a3 0e 00 f7 d8 64 89 01 48
  RSP: 002b:00007fa110183de8 EFLAGS: 00000202 ORIG_RAX: 000000000000002a
  RAX: ffffffffffffffda RBX: 0000000020000054 RCX: 00007fa10ff05a3d
  RDX: 000000000000001c RSI: 0000000020000040 RDI: 0000000000000003
  RBP: 00007fa110183e20 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000202 R12: 00007fa110184640
  R13: 0000000000000000 R14: 00007fa10fe8b060 R15: 00007fff73e23b20
</TASK>

The issue triggering process is analyzed as follows:
Thread A                                       Thread B
tcp_v4_rcv	//receive ack TCP packet       inet_shutdown
  tcp_check_req                                  tcp_disconnect //disconnect sock
  ...                                              tcp_set_state(sk, TCP_CLOSE)
    inet_csk_complete_hashdance                ...
      inet_csk_reqsk_queue_add                 inet_listen  //start listen
        spin_lock(&queue->rskq_lock)             inet_csk_listen_start
        ...                                        reqsk_queue_alloc
        ...                                          spin_lock_init
        spin_unlock(&queue->rskq_lock)	//warning

When the socket receives the ACK packet during the three-way handshake,
it will hold spinlock. And then the user actively shutdowns the socket
and listens to the socket immediately, the spinlock will be initialized.
When the socket is going to release the spinlock, a warning is generated.
Also the same issue to fastopenq.lock.

Move init spinlock to inet_create and inet_accept to make sure init the
accept_queue's spinlocks once.

Fixes: fff1f3001c ("tcp: add a spinlock to protect struct request_sock_queue")
Fixes: 168a8f5805 ("tcp: TCP Fast Open Server - main code path")
Reported-by: Ming Shu <sming56@aliyun.com>
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240118012019.1751966-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-19 21:13:25 -08:00
Linus Torvalds 736b5545d3 Including fixes from bpf and netfilter.
Previous releases - regressions:
 
  - Revert "net: rtnetlink: Enslave device before bringing it up",
    breaks the case inverse to the one it was trying to fix
 
  - net: dsa: fix oob access in DSA's netdevice event handler
    dereference netdev_priv() before check its a DSA port
 
  - sched: track device in tcf_block_get/put_ext() only for clsact
    binder types
 
  - net: tls, fix WARNING in __sk_msg_free when record becomes full
    during splice and MORE hint set
 
  - sfp-bus: fix SFP mode detect from bitrate
 
  - drv: stmmac: prevent DSA tags from breaking COE
 
 Previous releases - always broken:
 
  - bpf: fix no forward progress in in bpf_iter_udp if output
    buffer is too small
 
  - bpf: reject variable offset alu on registers with a type
    of PTR_TO_FLOW_KEYS to prevent oob access
 
  - netfilter: tighten input validation
 
  - net: add more sanity check in virtio_net_hdr_to_skb()
 
  - rxrpc: fix use of Don't Fragment flag on RESPONSE packets,
    avoid infinite loop
 
  - amt: do not use the portion of skb->cb area which may get clobbered
 
  - mptcp: improve validation of the MPTCPOPT_MP_JOIN MCTCP option
 
 Misc:
 
  - spring cleanup of inactive maintainers
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmWpnvoACgkQMUZtbf5S
 Irvskg/+Or5tETxOmpQXxnj6ECZyrSp0Jcyd7+TIcos/7JfPdn3Kebl004SG4h/s
 bwKDOIIP1iSjQ+0NFsPjyYIVd6wFuCElSB7npV5uQAT6ptXx7A4Ym68/rVxodI8T
 6hiYV/mlPuZF8JjRhtp/VJL8sY1qnG7RIUB4oH3y9HQNfwZX0lIWChuUilHuWfbq
 zQ2Iu97tMkoIBjXrkIT3Qaj0aFxYbjCOrg9zy+FZ69a7Rmrswr//7amlCH6saNTx
 Ku7Wl8FXhe7O23OiM6GSl7AechSM1aJ5kOS3orseej0+aSp9eH3ekYGmbsQr6sjz
 ix/eZ7V7SUkJK3bEH5haeymk4TDV3lHE8SziMbosK4wVbHOyPwEmqCxppADYJLZs
 WycHZKcTBluFBOxknAofH7m5Hh0ToXkeTfpptSSGtRB4WncAOMsMapr3yS4WXg/q
 AnOo/tzCBgMrnSJtD/kjqgUiCk8vYoLc8lBR9K74l0zqI1sf13OfuTHvEgqIS6z1
 Ir/ewlAV6fCH8gQbyzjKUVlyjZS4+vFv19xg/2GgLf+LdyzcCOxUZkND3/DE6+OA
 Dgf9gtABYU4hGXMUfTfml3KCBTF65QmY8dIh17zraNylYUHEJ2lI4D+sdiqWUrXb
 mXPBJh4nOPwIV5t2gT80skNwF3aWPr6l4ieY2codSbP04rO74S8=
 =YhQQ
 -----END PGP SIGNATURE-----

Merge tag 'net-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf and netfilter.

  Previous releases - regressions:

   - Revert "net: rtnetlink: Enslave device before bringing it up",
     breaks the case inverse to the one it was trying to fix

   - net: dsa: fix oob access in DSA's netdevice event handler
     dereference netdev_priv() before check its a DSA port

   - sched: track device in tcf_block_get/put_ext() only for clsact
     binder types

   - net: tls, fix WARNING in __sk_msg_free when record becomes full
     during splice and MORE hint set

   - sfp-bus: fix SFP mode detect from bitrate

   - drv: stmmac: prevent DSA tags from breaking COE

  Previous releases - always broken:

   - bpf: fix no forward progress in in bpf_iter_udp if output buffer is
     too small

   - bpf: reject variable offset alu on registers with a type of
     PTR_TO_FLOW_KEYS to prevent oob access

   - netfilter: tighten input validation

   - net: add more sanity check in virtio_net_hdr_to_skb()

   - rxrpc: fix use of Don't Fragment flag on RESPONSE packets, avoid
     infinite loop

   - amt: do not use the portion of skb->cb area which may get clobbered

   - mptcp: improve validation of the MPTCPOPT_MP_JOIN MCTCP option

  Misc:

   - spring cleanup of inactive maintainers"

* tag 'net-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
  i40e: Include types.h to some headers
  ipv6: mcast: fix data-race in ipv6_mc_down / mld_ifc_work
  selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes
  selftests: mlxsw: qos_pfc: Remove wrong description
  mlxsw: spectrum_router: Register netdevice notifier before nexthop
  mlxsw: spectrum_acl_tcam: Fix stack corruption
  mlxsw: spectrum_acl_tcam: Fix NULL pointer dereference in error path
  mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure
  ethtool: netlink: Add missing ethnl_ops_begin/complete
  selftests: bonding: Add more missing config options
  selftests: netdevsim: add a config file
  libbpf: warn on unexpected __arg_ctx type when rewriting BTF
  selftests/bpf: add tests confirming type logic in kernel for __arg_ctx
  bpf: enforce types for __arg_ctx-tagged arguments in global subprogs
  bpf: extract bpf_ctx_convert_map logic and make it more reusable
  libbpf: feature-detect arg:ctx tag support in kernel
  ipvs: avoid stat macros calls from preemptible context
  netfilter: nf_tables: reject NFT_SET_CONCAT with not field length description
  netfilter: nf_tables: skip dead set elements in netlink dump
  netfilter: nf_tables: do not allow mismatch field size and set key length
  ...
2024-01-18 17:33:50 -08:00
Marc Kleine-Budde 894d750831 net: netdev_queue: netdev_txq_completed_mb(): fix wake condition
netif_txq_try_stop() uses "get_desc >= start_thrs" as the check for
the call to netif_tx_start_queue().

Use ">=" i netdev_txq_completed_mb(), too.

Fixes: c91c46de6b ("net: provide macros for commonly copied lockless queue stop/wake code")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-13 18:26:23 +00:00
Linus Torvalds bf9ca811bb RDMA v6.8 merge window
Small cycle, with some typical driver updates
 
  - General code tidying in siw, hfi1, idrdma, usnic, hns rtrs and bnxt_re
 
  - Many small siw cleanups without an overeaching theme
 
  - Debugfs stats for hns
 
  - Fix a TX queue timeout in IPoIB and missed locking of the mcast list
 
  - Support more features of P7 devices in bnxt_re including a new work
    submission protocol
 
  - CQ interrupts for MANA
 
  - netlink stats for erdma
 
  - EFA multipath PCI support
 
  - Fix Incorrect MR invalidation in iser
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZaCQDQAKCRCFwuHvBreF
 YemEAQCTGebv0k2hbocDOmKml5awt8j9aDJX3aO7Zpfi0AYUtwEAzk+kgN4yAo+B
 Vinvpu171zry+QvmGJsXv2mtZkXH6QY=
 =HT3p
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Small cycle, with some typical driver updates:

   - General code tidying in siw, hfi1, idrdma, usnic, hns rtrs and
     bnxt_re

   - Many small siw cleanups without an overeaching theme

   - Debugfs stats for hns

   - Fix a TX queue timeout in IPoIB and missed locking of the mcast
     list

   - Support more features of P7 devices in bnxt_re including a new work
     submission protocol

   - CQ interrupts for MANA

   - netlink stats for erdma

   - EFA multipath PCI support

   - Fix Incorrect MR invalidation in iser"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits)
  RDMA/bnxt_re: Fix error code in bnxt_re_create_cq()
  RDMA/efa: Add EFA query MR support
  IB/iser: Prevent invalidating wrong MR
  RDMA/erdma: Add hardware statistics support
  RDMA/erdma: Introduce dma pool for hardware responses of CMDQ requests
  IB/iser: iscsi_iser.h: fix kernel-doc warning and spellos
  RDMA/mana_ib: Add CQ interrupt support for RAW QP
  RDMA/mana_ib: query device capabilities
  RDMA/mana_ib: register RDMA device with GDMA
  RDMA/bnxt_re: Fix the sparse warnings
  RDMA/bnxt_re: Fix the offset for GenP7 adapters for user applications
  RDMA/bnxt_re: Share a page to expose per CQ info with userspace
  RDMA/bnxt_re: Add UAPI to share a page with user space
  IB/ipoib: Fix mcast list locking
  RDMA/mlx5: Expose register c0 for RDMA device
  net/mlx5: E-Switch, expose eswitch manager vport
  net/mlx5: Manage ICM type of SW encap
  RDMA/mlx5: Support handling of SW encap ICM area
  net/mlx5: Introduce indirect-sw-encap ICM properties
  RDMA/bnxt_re: Adds MSN table capability for Gen P7 adapters
  ...
2024-01-12 13:52:21 -08:00
Linus Torvalds 3e7aeb78ab Networking changes for 6.8.
Core & protocols
 ----------------
 
  - Analyze and reorganize core networking structs (socks, netdev,
    netns, mibs) to optimize cacheline consumption and set up
    build time warnings to safeguard against future header changes.
    This improves TCP performances with many concurrent connections
    up to 40%.
 
  - Add page-pool netlink-based introspection, exposing the
    memory usage and recycling stats. This helps indentify
    bad PP users and possible leaks.
 
  - Refine TCP/DCCP source port selection to no longer favor even
    source port at connect() time when IP_LOCAL_PORT_RANGE is set.
    This lowers the time taken by connect() for hosts having
    many active connections to the same destination.
 
  - Refactor the TCP bind conflict code, shrinking related socket
    structs.
 
  - Refactor TCP SYN-Cookie handling, as a preparation step to
    allow arbitrary SYN-Cookie processing via eBPF.
 
  - Tune optmem_max for 0-copy usage, increasing the default value
    to 128KB and namespecifying it.
 
  - Allow coalescing for cloned skbs coming from page pools, improving
    RX performances with some common configurations.
 
  - Reduce extension header parsing overhead at GRO time.
 
  - Add bridge MDB bulk deletion support, allowing user-space to
    request the deletion of matching entries.
 
  - Reorder nftables struct members, to keep data accessed by the
    datapath first.
 
  - Introduce TC block ports tracking and use. This allows supporting
    multicast-like behavior at the TC layer.
 
  - Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and
    classifiers (RSVP and tcindex).
 
  - More data-race annotations.
 
  - Extend the diag interface to dump TCP bound-only sockets.
 
  - Conditional notification of events for TC qdisc class and actions.
 
  - Support for WPAN dynamic associations with nearby devices, to form
    a sub-network using a specific PAN ID.
 
  - Implement SMCv2.1 virtual ISM device support.
 
  - Add support for Batman-avd mulicast packet type.
 
 BPF
 ---
 
  - Tons of verifier improvements:
    - BPF register bounds logic and range support along with a large
      test suite
    - log improvements
    - complete precision tracking support for register spills
    - track aligned STACK_ZERO cases as imprecise spilled registers. It
      improves the verifier "instructions processed" metric from single
      digit to 50-60% for some programs
    - support for user's global BPF subprogram arguments with few
      commonly requested annotations for a better developer experience
    - support tracking of BPF_JNE which helps cases when the compiler
      transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the
      like
    - several fixes
 
  - Add initial TX metadata implementation for AF_XDP with support in
    mlx5 and stmmac drivers. Two types of offloads are supported right
    now, that is, TX timestamp and TX checksum offload.
 
  - Fix kCFI bugs in BPF all forms of indirect calls from BPF into
    kernel and from kernel into BPF work with CFI enabled. This allows
    BPF to work with CONFIG_FINEIBT=y.
 
  - Change BPF verifier logic to validate global subprograms lazily
    instead of unconditionally before the main program, so they can be
    guarded using BPF CO-RE techniques.
 
  - Support uid/gid options when mounting bpffs.
 
  - Add a new kfunc which acquires the associated cgroup of a task
    within a specific cgroup v1 hierarchy where the latter is identified
    by its id.
 
  - Extend verifier to allow bpf_refcount_acquire() of a map value field
    obtained via direct load which is a use-case needed in sched_ext.
 
  - Add BPF link_info support for uprobe multi link along with bpftool
    integration for the latter.
 
  - Support for VLAN tag in XDP hints.
 
  - Remove deprecated bpfilter kernel leftovers given the project
    is developed in user-space (https://github.com/facebook/bpfilter).
 
 Misc
 ----
 
  - Support for parellel TC self-tests execution.
 
  - Increase MPTCP self-tests coverage.
 
  - Updated the bridge documentation, including several so-far
    undocumented features.
 
  - Convert all the net self-tests to run in unique netns, to
    avoid random failures due to conflict and allow concurrent
    runs.
 
  - Add TCP-AO self-tests.
 
  - Add kunit tests for both cfg80211 and mac80211.
 
  - Autogenerate Netlink families documentation from YAML spec.
 
  - Add yml-gen support for fixed headers and recursive nests, the
    tool can now generate user-space code for all genetlink families
    for which we have specs.
 
  - A bunch of additional module descriptions fixes.
 
  - Catch incorrect freeing of pages belonging to a page pool.
 
 Driver API
 ----------
 
  - Rust abstractions for network PHY drivers; do not cover yet the
    full C API, but already allow implementing functional PHY drivers
    in rust.
 
  - Introduce queue and NAPI support in the netdev Netlink interface,
    allowing complete access to the device <> NAPIs <> queues
    relationship.
 
  - Introduce notifications filtering for devlink to allow control
    application scale to thousands of instances.
 
  - Improve PHY validation, requesting rate matching information for
    each ethtool link mode supported by both the PHY and host.
 
  - Add support for ethtool symmetric-xor RSS hash.
 
  - ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD
    platform.
 
  - Expose pin fractional frequency offset value over new DPLL generic
    netlink attribute.
 
  - Convert older drivers to platform remove callback returning void.
 
  - Add support for PHY package MMD read/write.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Octeon CN10K devices
    - Broadcom 5760X P7
    - Qualcomm SM8550 SoC
    - Texas Instrument DP83TG720S PHY
 
  - Bluetooth:
    - IMC Networks Bluetooth radio
 
 Removed
 -------
 
  - WiFi:
    - libertas 16-bit PCMCIA support
    - Atmel at76c50x drivers
    - HostAP ISA/PCMCIA style 802.11b driver
    - zd1201 802.11b USB dongles
    - Orinoco ISA/PCMCIA 802.11b driver
    - Aviator/Raytheon driver
    - Planet WL3501 driver
    - RNDIS USB 802.11b driver
 
 Drivers
 -------
 
  - Ethernet high-speed NICs:
    - Intel (100G, ice, idpf):
      - allow one by one port representors creation and removal
      - add temperature and clock information reporting
      - add get/set for ethtool's header split ringparam
      - add again FW logging
      - adds support switchdev hardware packet mirroring
      - iavf: implement symmetric-xor RSS hash
      - igc: add support for concurrent physical and free-running timers
      - i40e: increase the allowable descriptors
    - nVidia/Mellanox:
      - Preparation for Socket-Direct multi-dev netdev. That will allow
        in future releases combining multiple PFs devices attached to
        different NUMA nodes under the same netdev
    - Broadcom (bnxt):
      - TX completion handling improvements
      - add basic ntuple filter support
      - reduce MSIX vectors usage for MQPRIO offload
      - add VXLAN support, USO offload and TX coalesce completion for P7
    - Marvell Octeon EP:
      - xmit-more support
      - add PF-VF mailbox support and use it for FW notifications for VFs
    - Wangxun (ngbe/txgbe):
      - implement ethtool functions to operate pause param, ring param,
        coalesce channel number and msglevel
    - Netronome/Corigine (nfp):
      - add flow-steering support
      - support UDP segmentation offload
 
  - Ethernet NICs embedded, slower, virtual:
    - Xilinx AXI: remove duplicate DMA code adopting the dma engine driver
    - stmmac: add support for HW-accelerated VLAN stripping
    - TI AM654x sw: add mqprio, frame preemption & coalescing
    - gve: add support for non-4k page sizes.
    - virtio-net: support dynamic coalescing moderation
 
  - nVidia/Mellanox Ethernet datacenter switches:
    - allow firmware upgrade without a reboot
    - more flexible support for bridge flooding via the compressed
      FID flooding mode
 
  - Ethernet embedded switches:
    - Microchip:
      - fine-tune flow control and speed configurations in KSZ8xxx
      - KSZ88X3: enable setting rmii reference
    - Renesas:
      - add jumbo frames support
    - Marvell:
      - 88E6xxx: add "eth-mac" and "rmon" stats support
 
  - Ethernet PHYs:
    - aquantia: add firmware load support
    - at803x: refactor the driver to simplify adding support for more
      chip variants
    - NXP C45 TJA11xx: Add MACsec offload support
 
  - Wifi:
    - MediaTek (mt76):
      - NVMEM EEPROM improvements
      - mt7996 Extremely High Throughput (EHT) improvements
      - mt7996 Wireless Ethernet Dispatcher (WED) support
      - mt7996 36-bit DMA support
    - Qualcomm (ath12k):
      - support for a single MSI vector
      - WCN7850: support AP mode
    - Intel (iwlwifi):
      - new debugfs file fw_dbg_clear
      - allow concurrent P2P operation on DFS channels
 
  - Bluetooth:
    - QCA2066: support HFP offload
    - ISO: more broadcast-related improvements
    - NXP: better recovery in case receiver/transmitter get out of sync
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmWdamsSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkGC4P/2xjLzdw22ckSssuE9ORbGko9SNjnqHk
 PQh1E+26BHiCg5KB8VvzMsL78E79MRNXEattSW+1g7dhCvln3oi+Vd0WkdRkgt35
 98Iv18zLbbwFAJeyKvmLAPAkQkMLtVj19QILBBRrugF+egEZgVSE3JBcTAiKv2ZQ
 HzkabA171Ri6LpCcEEtY5XuaKvimGnGzF8YMFf8rX0wtqd2p5kbY9aMe47WAGxvU
 Vf9548XvH+A5yVH2/4/gujtUOpA/RHuhuCMb+oo0cZ+VCC1x9MGzoXzj6r87OTkf
 k2W1whNzcGoin92f+9Lk1JYMuiGKBH4QVaDdNXJnYFSJWPTE7RvRsPzYTSD4/GzK
 yEZbzSJXpy/2vDQm16NoAxl7evRs8Sorzkw4LQRviZHI/5SAkK2ZQiCK5CO8QSYy
 C1LELcV5kn6Foe24xWnrWLjAGug9oJnYoGPMU5gvPmFJMvUMXqm5rmbBgUWL5Rxw
 q1M6gVzabCyWUy6z2G2vaqW2ZntNVvCkdsLtIX0XZkcTzNoP0MA+TuhyGz4wbiuo
 PeyQp/mbGnDgCYggqKIA0YWrTVxkhFrKN520cbO8qXBQytV9oFbM/0/+C0/r/5WX
 pL1JVzLrh6l5ME7EIQfha8UOF9j8q4ueSwb40P3AR2NaZiDABM0zfUZ6+sx+91WF
 ucqPEcZB5cRE
 =1bW6
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "The most interesting thing is probably the networking structs
  reorganization and a significant amount of changes is around
  self-tests.

  Core & protocols:

   - Analyze and reorganize core networking structs (socks, netdev,
     netns, mibs) to optimize cacheline consumption and set up build
     time warnings to safeguard against future header changes

     This improves TCP performances with many concurrent connections up
     to 40%

   - Add page-pool netlink-based introspection, exposing the memory
     usage and recycling stats. This helps indentify bad PP users and
     possible leaks

   - Refine TCP/DCCP source port selection to no longer favor even
     source port at connect() time when IP_LOCAL_PORT_RANGE is set. This
     lowers the time taken by connect() for hosts having many active
     connections to the same destination

   - Refactor the TCP bind conflict code, shrinking related socket
     structs

   - Refactor TCP SYN-Cookie handling, as a preparation step to allow
     arbitrary SYN-Cookie processing via eBPF

   - Tune optmem_max for 0-copy usage, increasing the default value to
     128KB and namespecifying it

   - Allow coalescing for cloned skbs coming from page pools, improving
     RX performances with some common configurations

   - Reduce extension header parsing overhead at GRO time

   - Add bridge MDB bulk deletion support, allowing user-space to
     request the deletion of matching entries

   - Reorder nftables struct members, to keep data accessed by the
     datapath first

   - Introduce TC block ports tracking and use. This allows supporting
     multicast-like behavior at the TC layer

   - Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and
     classifiers (RSVP and tcindex)

   - More data-race annotations

   - Extend the diag interface to dump TCP bound-only sockets

   - Conditional notification of events for TC qdisc class and actions

   - Support for WPAN dynamic associations with nearby devices, to form
     a sub-network using a specific PAN ID

   - Implement SMCv2.1 virtual ISM device support

   - Add support for Batman-avd mulicast packet type

  BPF:

   - Tons of verifier improvements:
       - BPF register bounds logic and range support along with a large
         test suite
       - log improvements
       - complete precision tracking support for register spills
       - track aligned STACK_ZERO cases as imprecise spilled registers.
         This improves the verifier "instructions processed" metric from
         single digit to 50-60% for some programs
       - support for user's global BPF subprogram arguments with few
         commonly requested annotations for a better developer
         experience
       - support tracking of BPF_JNE which helps cases when the compiler
         transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the
         like
       - several fixes

   - Add initial TX metadata implementation for AF_XDP with support in
     mlx5 and stmmac drivers. Two types of offloads are supported right
     now, that is, TX timestamp and TX checksum offload

   - Fix kCFI bugs in BPF all forms of indirect calls from BPF into
     kernel and from kernel into BPF work with CFI enabled. This allows
     BPF to work with CONFIG_FINEIBT=y

   - Change BPF verifier logic to validate global subprograms lazily
     instead of unconditionally before the main program, so they can be
     guarded using BPF CO-RE techniques

   - Support uid/gid options when mounting bpffs

   - Add a new kfunc which acquires the associated cgroup of a task
     within a specific cgroup v1 hierarchy where the latter is
     identified by its id

   - Extend verifier to allow bpf_refcount_acquire() of a map value
     field obtained via direct load which is a use-case needed in
     sched_ext

   - Add BPF link_info support for uprobe multi link along with bpftool
     integration for the latter

   - Support for VLAN tag in XDP hints

   - Remove deprecated bpfilter kernel leftovers given the project is
     developed in user-space (https://github.com/facebook/bpfilter)

  Misc:

   - Support for parellel TC self-tests execution

   - Increase MPTCP self-tests coverage

   - Updated the bridge documentation, including several so-far
     undocumented features

   - Convert all the net self-tests to run in unique netns, to avoid
     random failures due to conflict and allow concurrent runs

   - Add TCP-AO self-tests

   - Add kunit tests for both cfg80211 and mac80211

   - Autogenerate Netlink families documentation from YAML spec

   - Add yml-gen support for fixed headers and recursive nests, the tool
     can now generate user-space code for all genetlink families for
     which we have specs

   - A bunch of additional module descriptions fixes

   - Catch incorrect freeing of pages belonging to a page pool

  Driver API:

   - Rust abstractions for network PHY drivers; do not cover yet the
     full C API, but already allow implementing functional PHY drivers
     in rust

   - Introduce queue and NAPI support in the netdev Netlink interface,
     allowing complete access to the device <> NAPIs <> queues
     relationship

   - Introduce notifications filtering for devlink to allow control
     application scale to thousands of instances

   - Improve PHY validation, requesting rate matching information for
     each ethtool link mode supported by both the PHY and host

   - Add support for ethtool symmetric-xor RSS hash

   - ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD
     platform

   - Expose pin fractional frequency offset value over new DPLL generic
     netlink attribute

   - Convert older drivers to platform remove callback returning void

   - Add support for PHY package MMD read/write

  New hardware / drivers:

   - Ethernet:
       - Octeon CN10K devices
       - Broadcom 5760X P7
       - Qualcomm SM8550 SoC
       - Texas Instrument DP83TG720S PHY

   - Bluetooth:
       - IMC Networks Bluetooth radio

  Removed:

   - WiFi:
       - libertas 16-bit PCMCIA support
       - Atmel at76c50x drivers
       - HostAP ISA/PCMCIA style 802.11b driver
       - zd1201 802.11b USB dongles
       - Orinoco ISA/PCMCIA 802.11b driver
       - Aviator/Raytheon driver
       - Planet WL3501 driver
       - RNDIS USB 802.11b driver

  Driver updates:

   - Ethernet high-speed NICs:
       - Intel (100G, ice, idpf):
          - allow one by one port representors creation and removal
          - add temperature and clock information reporting
          - add get/set for ethtool's header split ringparam
          - add again FW logging
          - adds support switchdev hardware packet mirroring
          - iavf: implement symmetric-xor RSS hash
          - igc: add support for concurrent physical and free-running
            timers
          - i40e: increase the allowable descriptors
       - nVidia/Mellanox:
          - Preparation for Socket-Direct multi-dev netdev. That will
            allow in future releases combining multiple PFs devices
            attached to different NUMA nodes under the same netdev
       - Broadcom (bnxt):
          - TX completion handling improvements
          - add basic ntuple filter support
          - reduce MSIX vectors usage for MQPRIO offload
          - add VXLAN support, USO offload and TX coalesce completion
            for P7
       - Marvell Octeon EP:
          - xmit-more support
          - add PF-VF mailbox support and use it for FW notifications
            for VFs
       - Wangxun (ngbe/txgbe):
          - implement ethtool functions to operate pause param, ring
            param, coalesce channel number and msglevel
       - Netronome/Corigine (nfp):
          - add flow-steering support
          - support UDP segmentation offload

   - Ethernet NICs embedded, slower, virtual:
       - Xilinx AXI: remove duplicate DMA code adopting the dma engine
         driver
       - stmmac: add support for HW-accelerated VLAN stripping
       - TI AM654x sw: add mqprio, frame preemption & coalescing
       - gve: add support for non-4k page sizes.
       - virtio-net: support dynamic coalescing moderation

   - nVidia/Mellanox Ethernet datacenter switches:
       - allow firmware upgrade without a reboot
       - more flexible support for bridge flooding via the compressed
         FID flooding mode

   - Ethernet embedded switches:
       - Microchip:
          - fine-tune flow control and speed configurations in KSZ8xxx
          - KSZ88X3: enable setting rmii reference
       - Renesas:
          - add jumbo frames support
       - Marvell:
          - 88E6xxx: add "eth-mac" and "rmon" stats support

   - Ethernet PHYs:
       - aquantia: add firmware load support
       - at803x: refactor the driver to simplify adding support for more
         chip variants
       - NXP C45 TJA11xx: Add MACsec offload support

   - Wifi:
       - MediaTek (mt76):
          - NVMEM EEPROM improvements
          - mt7996 Extremely High Throughput (EHT) improvements
          - mt7996 Wireless Ethernet Dispatcher (WED) support
          - mt7996 36-bit DMA support
       - Qualcomm (ath12k):
          - support for a single MSI vector
          - WCN7850: support AP mode
       - Intel (iwlwifi):
          - new debugfs file fw_dbg_clear
          - allow concurrent P2P operation on DFS channels

   - Bluetooth:
       - QCA2066: support HFP offload
       - ISO: more broadcast-related improvements
       - NXP: better recovery in case receiver/transmitter get out of sync"

* tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1714 commits)
  lan78xx: remove redundant statement in lan78xx_get_eee
  lan743x: remove redundant statement in lan743x_ethtool_get_eee
  bnxt_en: Fix RCU locking for ntuple filters in bnxt_rx_flow_steer()
  bnxt_en: Fix RCU locking for ntuple filters in bnxt_srxclsrldel()
  bnxt_en: Remove unneeded variable in bnxt_hwrm_clear_vnic_filter()
  tcp: Revert no longer abort SYN_SENT when receiving some ICMP
  Revert "mlx5 updates 2023-12-20"
  Revert "net: stmmac: Enable Per DMA Channel interrupt"
  ipvlan: Remove usage of the deprecated ida_simple_xx() API
  ipvlan: Fix a typo in a comment
  net/sched: Remove ipt action tests
  net: stmmac: Use interrupt mode INTM=1 for per channel irq
  net: stmmac: Add support for TX/RX channel interrupt
  net: stmmac: Make MSI interrupt routine generic
  dt-bindings: net: snps,dwmac: per channel irq
  net: phy: at803x: make read_status more generic
  net: phy: at803x: add support for cdt cross short test for qca808x
  net: phy: at803x: refactor qca808x cable test get status function
  net: phy: at803x: generalize cdt fault length function
  net: ethernet: cortina: Drop TSO support
  ...
2024-01-11 10:07:29 -08:00
Linus Torvalds 0c59ae1290 AFS fileserver rotation fix
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmWYJ6kACgkQ+7dXa6fL
 C2v6YBAAkDdqgWN96h2KOcd+El13Uxa3WNDjTHtzc0ZhjEDkzkU42sSF2yE0nerS
 6kX18vibXC+TPnbBn1gOSGrVoFIC1kh/vUjrz/UQYfxXN19P8LE2wSdl+bC4nPT1
 Qkrxkr+q4GSSJoYg9QUUAu0Hh2PvXMeDE/XyED6XiAkuDUbISO9yDeu+wo3wZM5L
 1e8vRlg/2EQl2v1Crh5nC0tgJZbGULc2mCqi/rU5A9wdlKHFzwjU+2PTsbQNKE0m
 0ueLblFeFRwBZpOfAUNNVAt3bwaSfhYpqUiiSldrU/JXhnx5CgY1kHzI3OPVedQt
 WMfp/epwO848i3qVM8dHJXc93NJeC3gTBK7gYRrH07MuK3Of1KRH3D8YBsE0/r0q
 NVcDQ6/eoni06CA8VMfSIEQ2+Q0m4xxUzAQURsOxRPY/FktzCKXMfpYTDZqbQfow
 SXrKmsPnMZe4DUnvdcTSU8B3+vybJH/JgEnZXRtCPOYNDSyMcPhKPG2ioOz4UV+M
 amQmpYfG4hzi1VmRrH57dwlXejBX16+zc9pLdZC5c0/phk3caYrJVMA8pwCOP4HM
 AvB5Yl6gH2aGj1kKjffL7nWnQ2QbD7VWUn98TqLPezOX7DwQHMMKvlfPnv6R87sy
 0HMmj9VxCgOvGLOf1JdQoTxtb49ndM4Y5fPvKYK2awW5FkAacLM=
 =bHoG
 -----END PGP SIGNATURE-----

Merge tag 'afs-fix-rotation-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull afs updates from David Howells:
 "The majority of the patches are aimed at fixing and improving the AFS
  filesystem's rotation over server IP addresses, but there are also
  some fixes from Oleg Nesterov for the use of read_seqbegin_or_lock().

   - Fix fileserver probe handling so that the next round of probes
     doesn't break ongoing server/address rotation by clearing all the
     probe result tracking. This could occasionally cause the rotation
     algorithm to drop straight through, give a 'successful' result
     without actually emitting any RPC calls, leaving the reply buffer
     in an undefined state.

     Instead, detach the probe results into a separate struct and
     allocate a new one each time we start probing and update the
     pointer to it. Probes are also sent in order of address preference
     to try and improve the chance that the preferred one will complete
     first.

   - Fix server rotation so that it uses configurable address
     preferences across on the probes that have completed so far than
     ranking them by RTT as the latter doesn't necessarily give the best
     route. The preference list can be altered by writing into
     /proc/net/afs/addr_prefs.

   - Fix the handling of Read-Only (and Backup) volume callbacks as
     there is one per volume, not one per file, so if someone performs a
     command that, say, offlines the volume but doesn't change it, when
     it comes back online we don't spam the server with a status fetch
     for every vnode we're using. Instead, check the Creation timestamp
     in the VolSync record when prompted by a callback break.

   - Handle volume regression (ie. a RW volume being restored from a
     backup) by scrubbing all cache data for that volume. This is
     detected from the VolSync creation timestamp.

   - Adjust abort handling and abort -> error mapping to match better
     with what other AFS clients do.

   - Fix offline and busy volume state handling as they only apply to
     individual server instances and not entire volumes and the rotation
     algorithm should go and look at other servers if available. Also
     make it sleep briefly before each retry if all the volume instances
     are unavailable"

* tag 'afs-fix-rotation-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (40 commits)
  afs: trace: Log afs_make_call(), including server address
  afs: Fix offline and busy message emission
  afs: Fix fileserver rotation
  afs: Overhaul invalidation handling to better support RO volumes
  afs: Parse the VolSync record in the reply of a number of RPC ops
  afs: Don't leave DONTUSE/NEWREPSITE servers out of server list
  afs: Fix comment in afs_do_lookup()
  afs: Apply server breaks to mmap'd files in the call processor
  afs: Move the vnode/volume validity checking code into its own file
  afs: Defer volume record destruction to a workqueue
  afs: Make it possible to find the volumes that are using a server
  afs: Combine the endpoint state bools into a bitmask
  afs: Keep a record of the current fileserver endpoint state
  afs: Dispatch vlserver probes in priority order
  afs: Dispatch fileserver probes in priority order
  afs: Mark address lists with configured priorities
  afs: Provide a way to configure address priorities
  afs: Remove the unimplemented afs_cmp_addr_list()
  afs: Add some more info to /proc/net/afs/servers
  rxrpc: Create a procfile to display outstanding client conn bundles
  ...
2024-01-10 10:11:01 -08:00
Breno Leitao aefb2f2e61 x86/bugs: Rename CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINE
Step 5/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Converted a few more uses in comments/messages as well. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ariel Miculas <amiculas@cisco.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-6-leitao@debian.org
2024-01-10 10:52:28 +01:00
Linus Torvalds c604110e66 vfs-6.8.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUxRQAKCRCRxhvAZXjc
 ov/QAQDzvge3oQ9MEymmOiyzzcF+HhAXBr+9oEsYJjFc1p0TsgEA61gXjZo7F1jY
 KBqd6znOZCR+Waj0kIVJRAo/ISRBqQc=
 =0bRl
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains the usual miscellaneous features, cleanups, and fixes
  for vfs and individual fses.

  Features:

   - Add Jan Kara as VFS reviewer

   - Show correct device and inode numbers in proc/<pid>/maps for vma
     files on stacked filesystems. This is now easily doable thanks to
     the backing file work from the last cycles. This comes with
     selftests

  Cleanups:

   - Remove a redundant might_sleep() from wait_on_inode()

   - Initialize pointer with NULL, not 0

   - Clarify comment on access_override_creds()

   - Rework and simplify eventfd_signal() and eventfd_signal_mask()
     helpers

   - Process aio completions in batches to avoid needless wakeups

   - Completely decouple struct mnt_idmap from namespaces. We now only
     keep the actual idmapping around and don't stash references to
     namespaces

   - Reformat maintainer entries to indicate that a given subsystem
     belongs to fs/

   - Simplify fput() for files that were never opened

   - Get rid of various pointless file helpers

   - Rename various file helpers

   - Rename struct file members after SLAB_TYPESAFE_BY_RCU switch from
     last cycle

   - Make relatime_need_update() return bool

   - Use GFP_KERNEL instead of GFP_USER when allocating superblocks

   - Replace deprecated ida_simple_*() calls with their current ida_*()
     counterparts

  Fixes:

   - Fix comments on user namespace id mapping helpers. They aren't
     kernel doc comments so they shouldn't be using /**

   - s/Retuns/Returns/g in various places

   - Add missing parameter documentation on can_move_mount_beneath()

   - Rename i_mapping->private_data to i_mapping->i_private_data

   - Fix a false-positive lockdep warning in pipe_write() for watch
     queues

   - Improve __fget_files_rcu() code generation to improve performance

   - Only notify writer that pipe resizing has finished after setting
     pipe->max_usage otherwise writers are never notified that the pipe
     has been resized and hang

   - Fix some kernel docs in hfsplus

   - s/passs/pass/g in various places

   - Fix kernel docs in ntfs

   - Fix kcalloc() arguments order reported by gcc 14

   - Fix uninitialized value in reiserfs"

* tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits)
  reiserfs: fix uninit-value in comp_keys
  watch_queue: fix kcalloc() arguments order
  ntfs: dir.c: fix kernel-doc function parameter warnings
  fs: fix doc comment typo fs tree wide
  selftests/overlayfs: verify device and inode numbers in /proc/pid/maps
  fs/proc: show correct device and inode numbers in /proc/pid/maps
  eventfd: Remove usage of the deprecated ida_simple_xx() API
  fs: super: use GFP_KERNEL instead of GFP_USER for super block allocation
  fs/hfsplus: wrapper.c: fix kernel-doc warnings
  fs: add Jan Kara as reviewer
  fs/inode: Make relatime_need_update return bool
  pipe: wakeup wr_wait after setting max_usage
  file: remove __receive_fd()
  file: stop exposing receive_fd_user()
  fs: replace f_rcuhead with f_task_work
  file: remove pointless wrapper
  file: s/close_fd_get_file()/file_close_fd()/g
  Improve __fget_files_rcu() code generation (and thus __fget_light())
  file: massage cleanup of files that failed to open
  fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
  ...
2024-01-08 10:26:08 -08:00
Pedro Tammela 405cd9fc6f net/sched: simplify tc_action_load_ops parameters
Instead of using two bools derived from a flags passed as arguments to
the parent function of tc_action_load_ops, just pass the flags itself
to tc_action_load_ops to simplify its parameters.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-07 14:58:26 +00:00
Jakub Kicinski e63c1822ac Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  e009b2efb7 ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()")
  0f2b214779 ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL")
https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 18:06:46 -08:00
Jakub Kicinski 172b3fccf5 This pull request mainly brings support for dynamic associations in the
WPAN world. Thanks to the recent improvements it was possible to
 discover nearby devices, it is now also possible to associate with them
 to form a sub-network using a specific PAN ID. The support includes
 several functions, such as:
 * Requesting an association to a coordinator, waiting for the response
 * Sending a disassociation notification to a coordinator
 * Receiving an association request when we are coordinator, answering
   the request (for now all devices are accepted up to a limit, to be
   refined)
 * Sending a disassociation notification to a child
 * Users may request the list of associated devices (the parent and the
   children).
 
 Here are a few example of userspace calls that can be made:
 iwpan dev <dev> associate pan_id 2 coord $COORD
 iwpan dev <dev> list_associations
 iwpan dev <dev> disassociate ext_addr $COORD
 
 There are as well two patches from Uwe turning remove callbacks into
 void functions.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEE9HuaYnbmDhq/XIDIJWrqGEe9VoQFAmWCqiYACgkQJWrqGEe9
 VoQxEwf+KzU7esLoTgtIRG+wIQyMaBm8qZ/YNdWwgEiD+kwbybdy258etavumIMN
 HSKxRypXqJwR+R4b9l6yhrLmwn83HNR0L1oq6sHooBD+Zr90Ac8NNgONcF30iRQS
 jSiQ3A7H4kTqEaSsZe3olLotVnhTrLvL/TCXt83FPst551Fj0VSDyRnrbETEmq5F
 s+WJcB8rGYWY6avv5CbWcUDOPPq7yiZlpgcA+qTIDFMUfD90p+xqUZQ2hZKlscMB
 Ia/FPW3gUufWFGd16d1WBaHD38DoLobfo/6Ou/Sufp+ErW17ZOGl4wNnViXlOpwD
 0BUEO6vhtAx+83VYAC522PABWHkxyw==
 =THkg
 -----END PGP SIGNATURE-----

Merge tag 'ieee802154-for-net-next-2023-12-20' of gitolite.kernel.org:pub/scm/linux/kernel/git/wpan/wpan-next

Miquel Raynal says:

====================
This pull request mainly brings support for dynamic associations in
the WPAN world. Thanks to the recent improvements it was possible to
discover nearby devices, it is now also possible to associate with them
to form a sub-network using a specific PAN ID. The support includes
several functions, such as:

* Requesting an association to a coordinator, waiting for the response
* Sending a disassociation notification to a coordinator
* Receiving an association request when we are coordinator, answering
  the request (for now all devices are accepted up to a limit, to be
  refined)
* Sending a disassociation notification to a child
* Users may request the list of associated devices (the parent and the
  children).

Here are a few example of userspace calls that can be made:
 # iwpan dev <dev> associate pan_id 2 coord $COORD
 # iwpan dev <dev> list_associations
 # iwpan dev <dev> disassociate ext_addr $COORD

There are as well two patches from Uwe turning remove callbacks into
void functions.

* tag 'ieee802154-for-net-next-2023-12-20' of gitolite.kernel.org:pub/scm/linux/kernel/git/wpan/wpan-next:
  mac802154: Avoid new associations while disassociating
  ieee802154: Avoid confusing changes after associating
  mac802154: Only allow PAN controllers to process association requests
  mac802154: Use the PAN coordinator parameter when stamping packets
  mac80254: Provide real PAN coordinator info in beacons
  ieee802154: Give the user the association list
  mac802154: Handle disassociation notifications from peers
  mac802154: Follow the number of associated devices
  ieee802154: Add support for limiting the number of associated devices
  mac802154: Handle association requests from peers
  mac802154: Handle disassociations
  ieee802154: Add support for user disassociation requests
  mac802154: Handle associating
  ieee802154: Add support for user association requests
  ieee802154: Internal PAN management
  ieee802154: Let PAN IDs be reset
  ieee802154: hwsim: Convert to platform remove callback returning void
  ieee802154: fakelb: Convert to platform remove callback returning void
====================

Link: https://lore.kernel.org/r/20231220095556.4d9cef91@xps-13
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 14:20:14 -08:00
Dmitry Safonov 4c8530dc7d net/tcp: Only produce AO/MD5 logs if there are any keys
User won't care about inproper hash options in the TCP header if they
don't use neither TCP-AO nor TCP-MD5. Yet, those logs can add up in
syslog, while not being a real concern to the host admin:
> kernel: TCP: TCP segment has incorrect auth options set for XX.20.239.12.54681->XX.XX.90.103.80 [S]

Keep silent and avoid logging when there aren't any keys in the system.

Side-note: I also defined static_branch_tcp_*() helpers to avoid more
ifdeffery, going to remove more ifdeffery further with their help.

Reported-by: Christian Kujau <lists@nerdbynature.de>
Closes: https://lore.kernel.org/all/f6b59324-1417-566f-a976-ff2402718a8d@nerdbynature.de/
Signed-off-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Fixes: 2717b5adea ("net/tcp: Add tcp_hash_fail() ratelimited logs")
Link: https://lore.kernel.org/r/20240104-tcp_hash_fail-logs-v1-1-ff3e1f6f9e72@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 09:07:04 -08:00
Pedro Tammela c2a67de9bb net/sched: introduce ACT_P_BOUND return code
Bound actions always return '0' and as of today we rely on '0'
being returned in order to properly skip bound actions in
tcf_idr_insert_many. In order to further improve maintainability,
introduce the ACT_P_BOUND return code.

Actions are updated to return 'ACT_P_BOUND' instead of plain '0'.
tcf_idr_insert_many is then updated to check for 'ACT_P_BOUND'.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231229132642.1489088-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:36:24 -08:00
Zhengchao Shao b4c1d4d973 fib: remove unnecessary input parameters in fib_default_rule_add
When fib_default_rule_add is invoked, the value of the input parameter
'flags' is always 0. Rules uses kzalloc to allocate memory, so 'flags' has
been initialized to 0. Therefore, remove the input parameter 'flags' in
fib_default_rule_add.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240102071519.3781384-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 16:42:48 -08:00
Vladimir Oltean 8dc4c41000 xsk: make struct xsk_cb_desc available outside CONFIG_XDP_SOCKETS
The ice driver fails to build when CONFIG_XDP_SOCKETS is disabled.

drivers/net/ethernet/intel/ice/ice_base.c:533:21: error:
variable has incomplete type 'struct xsk_cb_desc'
        struct xsk_cb_desc desc = {};
                           ^
include/net/xsk_buff_pool.h:15:8: note:
forward declaration of 'struct xsk_cb_desc'
struct xsk_cb_desc;
       ^

Fixes: d68d707dcb ("ice: Support XDP hints in AF_XDP ZC mode")
Closes: https://lore.kernel.org/netdev/8b76dad3-8847-475b-aa17-613c9c978f7a@infradead.org/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20231219110205.1289506-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-02 15:27:35 -08:00
David S. Miller 8a48a2dc24 bluetooth-next pull request for net-next:
- btnxpuart: Fix recv_buf return value
  - L2CAP: Fix responding with multiple rejects
  - Fix atomicity violation in {min,max}_key_size_set
  - ISO: Allow binding a PA sync socket
  - ISO: Reassociate a socket with an active BIS
  - ISO: Avoid creating child socket if PA sync is terminating
  - Add device 13d3:3572 IMC Networks Bluetooth Radio
  - Don't suspend when there are connections
  - Remove le_restart_scan work
  - Fix bogus check for re-auth not supported with non-ssp
  - lib: Add documentation to exported functions
  - Support HFP offload for QCA2066
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmWF2P4ZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKZ3eEACgIk7eorrf+isIBZjq+qGL
 0ESqGe2ZnRupnJs6fRFBY2L8jWkfZbURzUKXVpt0fVrQAb8qzrEzwD/ucApT38SY
 LO1lzLhPLfUpwXS4jW5pFvA+tLDS6sb9fBd/tp1ikaNavU3FTfWFeGJmJH8nubHc
 qZqOkBblm34yrdlEZwxiJxviVmbAsHHSu1AWXZ9yTO5Z5HRNUN0jxBYjOPVHkrdx
 SdP8J+icTMQ7ywoFEOLAnDaJi+bKMPwyGlz6opx75L0LkdbqjFwdu+1X6ZtjLsP3
 c51RJtdJgBJY8TX5MmwMEakHQSbcGBUPeEMhZqrrzY1A/2dsb85T4acXgU1iV6Ws
 SkYFFeEWiBhV/WJn4a/mZzMASRzpyIfTEeVfbl2Qn23wHoBwacYhFGm/be60L0ar
 jwO/Fk5qaUTWBV5DQAF8gBK+pNxe4phGFNdWwu6v43IG9DfVdvLVjiO9dBqsvTnS
 1StqENzmnhqCYJ5+3hyU2uqAYW4DopvuYD5EgMGEfaJ3jzzlWF34Gs4OVCWUBL02
 qh1Zy/4VrduuY0FRRt9foj8Ug4IvM2+a84bx4ZhW6N6BhkHOsrKaXPGf0Ng0SRUj
 vH7xHelsruiE3HGc9pYZyDavXitRbvJxdF2tHD8cxpiGkUch9hTNqSaV+xkHxck6
 4bAeyoVlAGKa/wyWvrdcTg==
 =wVdT
 -----END PGP SIGNATURE-----

Merge tag 'for-net-next-2023-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

 - btnxpuart: Fix recv_buf return value
 - L2CAP: Fix responding with multiple rejects
 - Fix atomicity violation in {min,max}_key_size_set
 - ISO: Allow binding a PA sync socket
 - ISO: Reassociate a socket with an active BIS
 - ISO: Avoid creating child socket if PA sync is terminating
 - Add device 13d3:3572 IMC Networks Bluetooth Radio
 - Don't suspend when there are connections
 - Remove le_restart_scan work
 - Fix bogus check for re-auth not supported with non-ssp
 - lib: Add documentation to exported functions
 - Support HFP offload for QCA2066
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 13:43:23 +00:00
David S. Miller a27359abc8 wireless-next patches for v6.8
The third "new features" pull request for v6.8. This is a smaller one
 to clear up our tree before the break and nothing really noteworthy
 this time.
 
 Major changes:
 
 stack
 
 * cfg80211: introduce cfg80211_ssid_eq() for SSID matching
 
 * cfg80211: support P2P operation on DFS channels
 
 * mac80211: allow 64-bit radiotap timestamps
 
 iwlwifi
 
 * AX210: allow concurrent P2P operation on DFS channels
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmWFbnIRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZs5hQf/aCvvTjqeRoMkmO+ZPFMSO+YquZNCJi1M
 TP8Fce2ALKj7woPad8vdJNNStMa9k4bu2NvShMXhoYM3xOA/4o0P9yeb5OfyYkTk
 Y6JF+SoBGzABtB3m/a3i5J19F+oC+6yKN6/OY8byfK4jqZdrAprc3qXwodC5zb9n
 blC16KKlldjoj5AWe/b6Vn/LI9P7mVhZIaWxI9IaktK0eIgfsfcgIZLuuMJPq5DJ
 NjvhmK++qCcTQrJo/4TMVoWmcZZKR3XzcSs++HYRELNCwcM2q9s07R4KkV81aB0t
 RpCaCWa2KVUCrKdk3FlnG5pS7A6US5KGP4g6sSQRnq8t3IYDUbDo5Q==
 =Bte8
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2023-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.8

The third "new features" pull request for v6.8. This is a smaller one
to clear up our tree before the break and nothing really noteworthy
this time.

Major changes:

stack

* cfg80211: introduce cfg80211_ssid_eq() for SSID matching

* cfg80211: support P2P operation on DFS channels

* mac80211: allow 64-bit radiotap timestamps

iwlwifi

* AX210: allow concurrent P2P operation on DFS channels
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 12:46:10 +00:00
Jamal Hadi Salim ba24ea1291 net/sched: Retire ipt action
The tc ipt action was intended to run all netfilter/iptables target.
Unfortunately it has not benefitted over the years from proper updates when
netfilter changes, and for that reason it has remained rudimentary.
Pinging a bunch of people that i was aware were using this indicates that
removing it wont affect them.
Retire it to reduce maintenance efforts. Buh-bye.

Reviewed-by: Victor Noguiera <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 12:41:16 +00:00
David S. Miller 109bf4cfe1 netfilter pull request 23-12-22
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmWFd9oACgkQ1V2XiooU
 IOTBog//auEj3LR5v2b5et6CLsuMESsCFkVko1AFdzzpu94sJ8TTkqlrW7SPSfPW
 CMNYz1EYFaRGK1GjfH0a49EUofeU/kPD/TmffmiYwuJlGyEIXp17ZIUQnGDrM6Mz
 UBHX8VrW73McJYuZJbxM4HX6Q3aFyshKy8iOPpffdFAQcCD5XtkYbW24Mzw6V4sI
 8fBz6C/iT+5Jq1wGtvM2xmGkNYuMLJxbf9qOpinRZCXQV/z5IWda/bFnv3kpJ9mR
 x2ZYXHnzFEr2gVFIhZ7G6FvMgQAkkKGVQE+n9zVZ9V78czqBk9depdeLaq/RYGyx
 8VYqJyaXHKQm7FqsBwSSxU964aQhi/zNrjF9SeJSfcRNhTmXgzdERndVWVjUPe9O
 rFLyQVoQ6JyKy5++1lL3+63cd6ZeGw8gUw8MCw9DX8rwamnNFunUsixKv4SLzQHA
 jfFgmAUq+HVWHXI4xmj+SDO8g+s7O3BpbZM09E3si6lwy4/LiQqHhV4sNgEVN/It
 hf76t/U6GRYZ0Hwy2dwQl8SZg1BxMSGFmzB7vyEZyGXaqxRZb1Jym4h9slOMKXWl
 gS5y4y0ZVTmzRCqJR+jUrMAKPCw/R964LrslDYwQpsQeP7nuUPeU5jH1VWD6kAbS
 Vc36uDRnfojoNiRcmAfuhvZIseM1+hI6UXCIsqYR6cekCFlBad8=
 =+Uvv
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-23-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
netfilter pull request 23-12-22

The following patchset contains Netfilter updates for net-next:

1) Add locking for NFT_MSG_GETSETELEM_RESET requests, to address a
   race scenario with two concurrent processes running a dump-and-reset
   which exposes negative counters to userspace, from Phil Sutter.

2) Use GFP_KERNEL in pipapo GC, from Florian Westphal.

3) Reorder nf_flowtable struct members, place the read-mostly parts
   accessed by the datapath first. From Florian Westphal.

4) Set on dead flag for NFT_MSG_NEWSET in abort path,
   from Florian Westphal.

5) Support filtering zone in ctnetlink, from Felix Huettner.

6) Bail out if user tries to redefine an existing chain with different
   type in nf_tables.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01 16:15:40 +00:00
Ido Schimmel cd4d7263d5 genetlink: Use internal flags for multicast groups
As explained in commit e03781879a ("drop_monitor: Require
'CAP_SYS_ADMIN' when joining "events" group"), the "flags" field in the
multicast group structure reuses uAPI flags despite the field not being
exposed to user space. This makes it impossible to extend its use
without adding new uAPI flags, which is inappropriate for internal
kernel checks.

Solve this by adding internal flags (i.e., "GENL_MCAST_*") and convert
the existing users to use them instead of the uAPI flags.

Tested using the reproducers in commit 44ec98ea5e ("psample: Require
'CAP_NET_ADMIN' when joining "packets" group") and commit e03781879a
("drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group").

No functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29 08:43:59 +00:00
David S. Miller a4255b2e5c netfilter pull request 23-12-20
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmWC/3kACgkQ1V2XiooU
 IOTDcg/+OJ4UwLrNF7GPjx3Bf76ntgkzqL8GCPYP2IMNgE7F9JBhv5t648tg0XOJ
 Pf3NAHtS0Trb8bCCwN9SWMKP2Zx/ntPlebrzp+SSlnkqzBRAl2550s+e8tcYKc9y
 S2XaQiAMOvcamMOxDbKQD2GcWqi05gEpE8w+ov5L1iXMhgFcHtPAm79H8XvqyDaj
 HKQ2B9b/4XxIexqiCnTWH4RLFq4+w3q1axUcv5GRkEFO/w3fouQ8f5FynjQOcSgp
 qD3KBVh6tJVTYj5OwhcvIi3BV/n+suiK9tcd0IarDlmUXY2MI0748W+9FLmHbMU6
 cl2IhIrVEoyOrBoThlmV6Fq2qVRZlYq/mHfTqEfWLYaqJ2iZ1f+I5nG2Gx+oq45p
 7cxSuvHN72QBhzLh1ry0tJItWGNfejnWzf4/71/eSL21wCxijoI2v2TOc8myONLZ
 qdiSyaU3Kz4blGqnRIMhMNArAkXohqEdfXrFfDSLi6lXBABgh/JmE0eJWXUgV/xU
 /PBrt+SM07NqUP02J63rvgehlfn5DEYsPt+b15Lnqu0BQNuTJYDRbu2TFNLx9TrR
 yASWXuqOB/f5mAos0xQT9wG6BTQvBTxgzvuAd9fC0oaAvAEa5JbojPLcNSXoecJO
 K5priJ0coMme/HAZNgfiw8d+hPWFGScIYFyz89meokIlCpEKdT0=
 =GlJr
 -----END PGP SIGNATURE-----

Merge tag 'nf-23-12-20' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablu Neira Syuso says:

====================
netfilter pull request 23-12-20

The following patchset contains Netfilter fixes for net:

1) Skip set commit for deleted/destroyed sets, this might trigger
   double deactivation of expired elements.

2) Fix packet mangling from egress, set transport offset from
   mac header for netdev/egress.

Both fixes address bugs already present in several releases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29 07:57:59 +00:00
Greg Kroah-Hartman f732ba4ac9 iucv: make iucv_bus const
Now that the driver core can properly handle constant struct bus_type,
move the iucv_bus variable to be a constant structure as well, placing
it into read-only memory which can not be modified at runtime.

Cc: Wenjia Zhang <wenjia@linux.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-s390@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29 07:46:38 +00:00
Jonathan Corbet 144377c340 net: sock: remove excess structure-member documentation
Remove a couple of kerneldoc entries for struct members that do not exist,
addressing these warnings:

  ./include/net/sock.h:548: warning: Excess struct member '__sk_flags_offset' description in 'sock'
  ./include/net/sock.h:548: warning: Excess struct member 'sk_padding' description in 'sock'

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29 01:21:14 +00:00
Radu Pirea (NXP OSS) a73d8779d6 net: macsec: introduce mdo_insert_tx_tag
Offloading MACsec in PHYs requires inserting the SecTAG and the ICV in
the ethernet frame. This operation will increase the frame size with up
to 32 bytes. If the frames are sent at line rate, the PHY will not have
enough room to insert the SecTAG and the ICV.

Some PHYs use a hardware buffer to store a number of ethernet frames and,
if it fills up, a pause frame is sent to the MAC to control the flow.
This HW implementation does not need any modification in the stack.

Other PHYs might offer to use a specific ethertype with some padding
bytes present in the ethernet frame. This ethertype and its associated
bytes will be replaced by the SecTAG and ICV.

mdo_insert_tx_tag allows the PHY drivers to add any specific tag in the
skb.

Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-27 13:08:10 +00:00
Radu Pirea (NXP OSS) eb97b9bd38 net: macsec: documentation for macsec_context and macsec_ops
Add description for fields of struct macsec_context and struct
macsec_ops.

Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-27 13:08:09 +00:00
Radu Pirea (NXP OSS) b1c036e835 net: macsec: move sci_to_cpu to macsec header
Move sci_to_cpu to the MACsec header to use it in drivers.

Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-27 13:08:09 +00:00
Victor Nogueira 42f39036cd net/sched: act_mirred: Allow mirred to block
So far the mirred action has dealt with syntax that handles
mirror/redirection for netdev. A matching packet is redirected or mirrored
to a target netdev.

In this patch we enable mirred to mirror to a tc block as well.
IOW, the new syntax looks as follows:
... mirred <ingress | egress> <mirror | redirect> [index INDEX] < <blockid BLOCKID> | <dev <devname>> >

Examples of mirroring or redirecting to a tc block:
$ tc filter add block 22 protocol ip pref 25 \
  flower dst_ip 192.168.0.0/16 action mirred egress mirror blockid 22

$ tc filter add block 22 protocol ip pref 25 \
  flower dst_ip 10.10.10.10/32 action mirred egress redirect blockid 22

Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Co-developed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26 21:20:09 +00:00
Victor Nogueira a7042cf8f2 net/sched: cls_api: Expose tc block to the datapath
The datapath can now find the block of the port in which the packet arrived
at.

In the next patch we show a possible usage of this patch in a new
version of mirred that multicasts to all ports except for the port in
which the packet arrived on.

Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Co-developed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26 21:20:08 +00:00
Victor Nogueira 913b47d342 net/sched: Introduce tc block netdev tracking infra
This commit makes tc blocks track which ports have been added to them.
And, with that, we'll be able to use this new information to send
packets to the block's ports. Which will be done in the patch #3 of this
series.

Suggested-by: Jiri Pirko <jiri@nvidia.com>
Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Co-developed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26 21:20:08 +00:00
Denis Kirjanov b1dffcf0da net: remove SOCK_DEBUG macro
Since there are no more users of the macro let's finally
burn it

Signed-off-by: Denis Kirjanov <dkirjanov@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26 20:31:01 +00:00
Wen Gu b3bf76024f net/smc: manage system EID in SMC stack instead of ISM driver
The System EID (SEID) is an internal EID that is used by the SMCv2
software stack that has a predefined and constant value representing
the s390 physical machine that the OS is executing on. So it should
be managed by SMC stack instead of ISM driver and be consistent for
all ISMv2 device (including virtual ISM devices) on s390 architecture.

Suggested-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26 20:24:33 +00:00
Wen Gu b40584d145 net/smc: compatible with 128-bits extended GID of virtual ISM device
According to virtual ISM support feature defined by SMCv2.1, GIDs of
virtual ISM device are UUIDs defined by RFC4122, which are 128-bits
long. So some adaptation work is required. And note that the GIDs of
existing platform firmware ISM devices still remain 64-bits long.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26 20:24:33 +00:00
David Howells 72904d7b9b rxrpc, afs: Allow afs to pin rxrpc_peer objects
Change rxrpc's API such that:

 (1) A new function, rxrpc_kernel_lookup_peer(), is provided to look up an
     rxrpc_peer record for a remote address and a corresponding function,
     rxrpc_kernel_put_peer(), is provided to dispose of it again.

 (2) When setting up a call, the rxrpc_peer object used during a call is
     now passed in rather than being set up by rxrpc_connect_call().  For
     afs, this meenat passing it to rxrpc_kernel_begin_call() rather than
     the full address (the service ID then has to be passed in as a
     separate parameter).

 (3) A new function, rxrpc_kernel_remote_addr(), is added so that afs can
     get a pointer to the transport address for display purposed, and
     another, rxrpc_kernel_remote_srx(), to gain a pointer to the full
     rxrpc address.

 (4) The function to retrieve the RTT from a call, rxrpc_kernel_get_srtt(),
     is then altered to take a peer.  This now returns the RTT or -1 if
     there are insufficient samples.

 (5) Rename rxrpc_kernel_get_peer() to rxrpc_kernel_call_get_peer().

 (6) Provide a new function, rxrpc_kernel_get_peer(), to get a ref on a
     peer the caller already has.

This allows the afs filesystem to pin the rxrpc_peer records that it is
using, allowing faster lookups and pointer comparisons rather than
comparing sockaddr_rxrpc contents.  It also makes it easier to get hold of
the RTT.  The following changes are made to afs:

 (1) The addr_list struct's addrs[] elements now hold a peer struct pointer
     and a service ID rather than a sockaddr_rxrpc.

 (2) When displaying the transport address, rxrpc_kernel_remote_addr() is
     used.

 (3) The port arg is removed from afs_alloc_addrlist() since it's always
     overridden.

 (4) afs_merge_fs_addr4() and afs_merge_fs_addr6() do peer lookup and may
     now return an error that must be handled.

 (5) afs_find_server() now takes a peer pointer to specify the address.

 (6) afs_find_server(), afs_compare_fs_alists() and afs_merge_fs_addr[46]{}
     now do peer pointer comparison rather than address comparison.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-12-24 15:22:50 +00:00
Kuniyuki Iwashima 8191792c18 tcp: Remove dead code and fields for bhash2.
Now all sockets including TIME_WAIT are linked to bhash2 using
sock_common.skc_bind_node.

We no longer use inet_bind2_bucket.deathrow, sock.sk_bind2_node,
and inet_timewait_sock.tw_bind2_node.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22 22:15:35 +00:00
Kuniyuki Iwashima 770041d337 tcp: Link sk and twsk to tb2->owners using skc_bind_node.
Now we can use sk_bind_node/tw_bind_node for bhash2, which means
we need not link TIME_WAIT sockets separately.

The dead code and sk_bind2_node will be removed in the next patch.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22 22:15:35 +00:00
Kuniyuki Iwashima b2cb9f9ef2 tcp: Unlink sk from bhash.
Now we do not use tb->owners and can unlink sockets from bhash.

sk_bind_node/tw_bind_node are available for bhash2 and will be
used in the following patch.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22 22:15:35 +00:00
Kuniyuki Iwashima 822fb91fc7 tcp: Link bhash2 to bhash.
bhash2 added a new member sk_bind2_node in struct sock to link
sockets to bhash2 in addition to bhash.

bhash is still needed to search conflicting sockets efficiently
from a port for the wildcard address.  However, bhash itself need
not have sockets.

If we link each bhash2 bucket to the corresponding bhash bucket,
we can iterate the same set of the sockets from bhash2 via bhash.

This patch links bhash2 to bhash only, and the actual use will be
in the later patches.  Finally, we will remove sk_bind2_node.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22 22:15:34 +00:00
Kuniyuki Iwashima 5a22bba13d tcp: Save address type in inet_bind2_bucket.
inet_bind2_bucket_addr_match() and inet_bind2_bucket_match_addr_any()
are called for each bhash2 bucket to check conflicts.  Thus, we call
ipv6_addr_any() and ipv6_addr_v4mapped() over and over during bind().

Let's avoid calling them by saving the address type in inet_bind2_bucket.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22 22:15:34 +00:00
Kuniyuki Iwashima 06a8c04f89 tcp: Save v4 address as v4-mapped-v6 in inet_bind2_bucket.v6_rcv_saddr.
In bhash2, IPv4/IPv6 addresses are saved in two union members,
which complicate address checks in inet_bind2_bucket_addr_match()
and inet_bind2_bucket_match_addr_any() considering uninitialised
memory and v4-mapped-v6 conflicts.

Let's simplify that by saving IPv4 address as v4-mapped-v6 address
and defining tb2.rcv_saddr as tb2.v6_rcv_saddr.s6_addr32[3].

Then, we can compare v6 address as is, and after checking v4-mapped-v6,
we can compare v4 address easily.  Also, we can remove tb2->family.

Note these functions will be further refactored in the next patch.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22 22:15:34 +00:00
Luiz Augusto von Dentz d03376c185 Bluetooth: Fix bogus check for re-auth no supported with non-ssp
This reverts 19f8def031
"Bluetooth: Fix auth_complete_evt for legacy units" which seems to be
working around a bug on a broken controller rather then any limitation
imposed by the Bluetooth spec, in fact if there ws not possible to
re-auth the command shall fail not succeed.

Fixes: 19f8def031 ("Bluetooth: Fix auth_complete_evt for legacy units")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-12-22 12:56:21 -05:00
Luiz Augusto von Dentz 78db544b5d Bluetooth: hci_core: Remove le_restart_scan work
This removes le_restart_scan work and instead just disables controller
duplicate filtering when discovery result_filtering is enabled and
HCI_QUIRK_STRICT_DUPLICATE_FILTER is set.

Link: https://github.com/bluez/bluez/issues/573
Link: https://github.com/bluez/bluez/issues/572
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-12-22 12:55:17 -05:00
Iulia Tanasescu fa224d0c09 Bluetooth: ISO: Reassociate a socket with an active BIS
For ISO Broadcast, all BISes from a BIG have the same lifespan - they
cannot be created or terminated independently from each other.

This links together all BIS hcons that are part of the same BIG, so all
hcons are kept alive as long as the BIG is active.

If multiple BIS sockets are opened for a BIG handle, and only part of
them are closed at some point, the associated hcons will be marked as
open. If new sockets will later be opened for the same BIG, they will
be reassociated with the open BIS hcons.

All BIS hcons will be cleaned up and the BIG will be terminated when
the last BIS socket is closed from userspace.

Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-12-22 12:53:29 -05:00
Florian Westphal 3fde94b6e9 netfilter: flowtable: reorder nf_flowtable struct members
Place the read-mostly parts accessed by the datapath first.

In particular, we do access ->flags member (to see if HW offload
is enabled) for every single packet, but this is placed in the 5th
cacheline.

priority could stay where it is, but move it too to cover a hole.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-22 12:08:38 +01:00
Paolo Abeni 56794e5358 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
  23c93c3b62 ("bnxt_en: do not map packet buffers twice")
  6d1add9553 ("bnxt_en: Modify TX ring indexing logic.")

tools/testing/selftests/net/Makefile
  2258b66648 ("selftests: add vlan hw filter tests")
  a0bc96c0cd ("selftests: net: verify fq per-band packet limit")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21 22:17:23 +01:00
David Ahern 5a78a8121c net/ipv6: Remove gc_link warn on in fib6_info_release
A revert of
   3dec89b14d ("net/ipv6: Remove expired routes with a separated list of routes")
was sent for net-next. Revert the remainder of 5a08d0065a
which added a warn on if a fib entry is still on the gc_link list
to avoid compile failures when net is merged to net-next

Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231219030742.25715-1-dsahern@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21 22:16:47 +01:00
Miri Korenblit e993af2ed2 wifi: mac80211: add a driver callback to check active_links
During ieee80211_set_active_links() we do (among the others):
1. Call drv_change_vif_links() with both old_active and new_active
2. Unassign the chanctx for the removed link(s) (if any)
3. Assign chanctx to the added link(s) (if any)
4. Call drv_change_vif_links() with the new_active links bitmap

The problem here is that during step #1 the driver doesn't know whether
we will activate multiple links simultaneously or are just doing a link
switch, so it can't check there if multiple links are supported/enabled.
(Some of the drivers might enable/disable this option dynamically)

And during step #3, in which the driver already knows that,
returning an error code (for example when multiple links are not
supported or disabled), will cause a warning, and we will still complete
the transition to the new_active links.
(It is hard to undo things in that stage, since we released channels etc.)

Therefore add a driver callback to check if the desired new_active links
will be supported by the driver or not. This callback will be called
in the beginning of ieee80211_set_active_links() so we won't do anything
before we are sure it is supported.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Link: https://msgid.link/20231220133549.64c4d70b33b8.I79708619be76b8ecd4ef3975205b8f903e24a2cd@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-21 20:35:15 +01:00
Johannes Berg e62c0fcc0e wifi: mac80211: allow 64-bit radiotap timestamps
When reporting the radiotap timestamp, the mactime field is
usually unused, we take the data from the device_timestamp.
However, there can be cases where the radiotap timestamp is
better reported as a 64-bit value, so since the mactime is
free, add a flag to support using the mactime as a 64-bit
radiotap timestamp.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231220133549.00c8b9234f0c.Ie3ce5eae33cce88fa01178e7aea94661ded1ac24@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-21 20:35:15 +01:00
Johannes Berg d5b6f6d595 wifi: mac80211: rework RX timestamp flags
We only have a single flag free, and before using that for
another mactime flag, instead refactor the mactime flags
to use a 2-bit field.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231220133549.d0e664832d14.I20c8900106f9bf81316bed778b1e3ce145785274@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-21 20:35:15 +01:00
Mukesh Sisodiya 645f3d8512 wifi: cfg80211: handle UHB AP and STA power type
UHB AP send supported power type(LPI, SP, VLP)
in beacon and probe response IE and STA should
connect to these AP only if their regulatory support
the AP power type.

Beacon/Probe response are reported to userspace
with reason "STA regulatory not supporting to connect to AP
based on transmitted power type" and it should
not connect to AP.

Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231220133549.cbfbef9170a9.I432f78438de18aa9f5c9006be12e41dc34cc47c5@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-21 20:35:14 +01:00
Andrei Otcheretianski 9be61558de wifi: cfg80211: Schedule regulatory check on BSS STA channel change
Due to different relaxation policies it may be needed to re-check
channels after a BSS station interface is disconnected or performed a
channel switch.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231220133549.1f2f8475bcf1.I1879d259d8d756159c8060f61f4bce172e6d323e@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-21 20:35:14 +01:00
Andrei Otcheretianski 41a313d875 wifi: cfg80211: reg: Support P2P operation on DFS channels
FCC-594280 D01 Section B.3 allows peer-to-peer and ad hoc devices to
operate on DFS channels while they operate under the control of a
concurrent DFS master. For example, it is possible to have a P2P GO on a
DFS channel as long as BSS connection is active on the same channel.
Allow such operation by adding additional regulatory flags to indicate
DFS concurrent channels and capable devices. Add the required
relaxations in DFS regulatory checks.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231220133549.bdfb8a9c7c54.I973563562969a27fea8ec5685b96a3a47afe142f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-21 20:35:14 +01:00
Jonathan Corbet 756df98534 wifi: mac80211: address some kerneldoc warnings
include/net/mac80111.h contains a number of either excess or incorrect
kerneldoc entries for structure members, leading to these warnings:

  ./include/net/mac80211.h:491: warning: Excess struct member 'rssi' description in 'ieee80211_event'
  ./include/net/mac80211.h:491: warning: Excess struct member 'mlme' description in 'ieee80211_event'
  ./include/net/mac80211.h:491: warning: Excess struct member 'ba' description in 'ieee80211_event'
  ./include/net/mac80211.h:777: warning: Excess struct member 'ack_enabled' description in 'ieee80211_bss_conf'
  ./include/net/mac80211.h:1222: warning: Excess struct member 'ampdu_ack_len' description in 'ieee80211_tx_info'
  ./include/net/mac80211.h:1222: warning: Excess struct member 'ampdu_len' description in 'ieee80211_tx_info'
  ./include/net/mac80211.h:1222: warning: Excess struct member 'ack_signal' description in 'ieee80211_tx_info'
  ./include/net/mac80211.h:2920: warning: Excess struct member 'radiotap_he' description in 'ieee80211_hw'

Fix or remove the entries as needed.  This change removes 208 warnings from
a "make htmldocs" build.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://msgid.link/87zfy4bhxo.fsf@meer.lwn.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-21 20:35:14 +01:00
Jonathan Corbet 3361597890 wifi: cfg80211: address several kerneldoc warnings
include/net/cfg80211.h includes a number of kerneldoc entries for struct
members that do not exist, leading to these warnings:

  ./include/net/cfg80211.h:3192: warning: Excess struct member 'band_pref' description in 'cfg80211_bss_selection'
  ./include/net/cfg80211.h:3192: warning: Excess struct member 'adjust' description in 'cfg80211_bss_selection'
  ./include/net/cfg80211.h:6181: warning: Excess struct member 'bssid' description in 'wireless_dev'
  ./include/net/cfg80211.h:6181: warning: Excess struct member 'beacon_interval' description in 'wireless_dev'
  ./include/net/cfg80211.h:7299: warning: Excess struct member 'bss' description in 'cfg80211_rx_assoc_resp_data'

Remove and/or repair each entry to address the warnings and ensure a proper
docs build for the affected structures.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://msgid.link/87plz1g2sc.fsf@meer.lwn.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-21 20:35:14 +01:00
Paolo Abeni 74769d810e bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZYQVjgAKCRDbK58LschI
 g/NfAP9xMBCASd22+KPu44FtPPO5DKcdG7hATXZMpb/cygF8GQEAojcZ4jztx42S
 F1+4RPEoxrn31oVYdtEGFY9q85ruzgA=
 =2XhN
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-12-21

Hi David, hi Jakub, hi Paolo, hi Eric,

The following pull-request contains BPF updates for your *net* tree.

We've added 3 non-merge commits during the last 5 day(s) which contain
a total of 4 files changed, 45 insertions(+).

The main changes are:

1) Fix a syzkaller splat which triggered an oob issue in bpf_link_show_fdinfo(),
   from Jiri Olsa.

2) Fix another syzkaller-found issue which triggered a NULL pointer dereference
   in BPF sockmap for unconnected unix sockets, from John Fastabend.

bpf-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add missing BPF_LINK_TYPE invocations
  bpf: sockmap, test for unconnected af_unix sock
  bpf: syzkaller found null ptr deref in unix_bpf proto add
====================

Link: https://lore.kernel.org/r/20231221104844.1374-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21 12:27:29 +01:00
David Ahern dade3f6a1e net/ipv6: Revert remove expired routes with a separated list of routes
This reverts commit 3dec89b14d.

The commit has some race conditions given how expires is managed on a
fib6_info in relation to gc start, adding the entry to the gc list and
setting the timer value leading to UAF. Revert the commit and try again
in a later release.

Fixes: 3dec89b14d ("net/ipv6: Remove expired routes with a separated list of routes")
Cc: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231219030243.25687-1-dsahern@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21 09:01:30 +01:00
Victor Nogueira 4cf24dc893 net: sched: Add initial TC error skb drop reasons
Continue expanding Daniel's patch by adding new skb drop reasons that
are idiosyncratic to TC.

More specifically:

- SKB_DROP_REASON_TC_COOKIE_ERROR: An error occurred whilst
  processing a tc ext cookie.

- SKB_DROP_REASON_TC_CHAIN_NOTFOUND: tc chain lookup failed.

- SKB_DROP_REASON_TC_RECLASSIFY_LOOP: tc exceeded max reclassify loop
  iterations

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20 11:50:13 +00:00
Victor Nogueira b6a3c6066a net: sched: Make tc-related drop reason more flexible for remaining qdiscs
Incrementing on Daniel's patch[1], make tc-related drop reason more
flexible for remaining qdiscs - that is, all qdiscs aside from clsact.
In essence, the drop reason will be set by cls_api and act_api in case
any error occurred in the data path. With that, we can give the user more
detailed information so that they can distinguish between a policy drop
or an error drop.

[1] https://lore.kernel.org/all/20231009092655.22025-1-daniel@iogearbox.net

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20 11:50:13 +00:00
Victor Nogueira fb2780721c net: sched: Move drop_reason to struct tc_skb_cb
Move drop_reason from struct tcf_result to skb cb - more specifically to
struct tc_skb_cb. With that, we'll be able to also set the drop reason for
the remaining qdiscs (aside from clsact) that do not have access to
tcf_result when time comes to set the skb drop reason.

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20 11:50:13 +00:00
Pablo Neira Ayuso 0ae8e4cca7 netfilter: nf_tables: set transport offset from mac header for netdev/egress
Before this patch, transport offset (pkt->thoff) provides an offset
relative to the network header. This is fine for the inet families
because skb->data points to the network header in such case. However,
from netdev/egress, skb->data points to the mac header (if available),
thus, pkt->thoff is missing the mac header length.

Add skb_network_offset() to the transport offset (pkt->thoff) for
netdev, so transport header mangling works as expected. Adjust payload
fast eval function to use skb->data now that pkt->thoff provides an
absolute offset. This explains why users report that matching on
egress/netdev works but payload mangling does not.

This patch implicitly fixes payload mangling for IPv4 packets in
netdev/egress given skb_store_bits() requires an offset from skb->data
to reach the transport header.

I suspect that nft_exthdr and the trace infra were also broken from
netdev/egress because they also take skb->data as start, and pkt->thoff
was not correct.

Note that IPv6 is fine because ipv6_find_hdr() already provides a
transport offset starting from skb->data, which includes
skb_network_offset().

The bridge family also uses nft_set_pktinfo_ipv4_validate(), but there
skb_network_offset() is zero, so the update in this patch does not alter
the existing behaviour.

Fixes: 42df6e1d22 ("netfilter: Introduce egress hook")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-20 10:43:21 +01:00
Long Li 2c20e20b22 RDMA/mana_ib: query device capabilities
With RDMA device registered, use it to query on hardware capabilities and
cache this information for future query requests to the driver.

Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1702692255-23640-3-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-12-20 10:25:40 +02:00