Commit graph

74478 commits

Author SHA1 Message Date
Jiri Pirko a9fd44b15f devlink: push dpipe related code into separate file
Cut out another chunk from leftover.c and put dpipe related code
into a separate file.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-6-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28 08:02:22 -07:00
Jiri Pirko 2475ed158c devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
Since both dpipe and resource code is using this helper, in preparation
for code split to separate files, move
devlink_dpipe_send_and_alloc_skb() helper into netlink.c. Rename it on
the way.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-5-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28 08:02:22 -07:00
Jiri Pirko 2b4d8bb088 devlink: push shared buffer related code into separate file
Cut out another chunk from leftover.c and put sb related code
into a separate file.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-4-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28 08:02:22 -07:00
Jiri Pirko eec1e5ea1d devlink: push port related code into separate file
Cut out another chunk from leftover.c and put port related code
into a separate file.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-3-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28 08:02:22 -07:00
Jiri Pirko 56e6531283 devlink: push object register/unregister notifications into separate helpers
In preparations of leftover.c split to individual files, avoid need to
have object structures exposed in devl_internal.h and allow to have them
maintained in object files.

The register/unregister notifications need to know the structures
to iterate lists. To avoid the need, introduce per-object
register/unregister notification helpers and use them.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-2-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28 08:02:22 -07:00
Eric Dumazet 8be6f88b9d inet: fix IP_TRANSPARENT error handling
My recent patch forgot to change error handling for IP_TRANSPARENT
socket option.

WARNING: bad unlock balance detected!
6.5.0-rc7-syzkaller-01717-g59da9885767a #0 Not tainted
-------------------------------------
syz-executor151/5028 is trying to release lock (sk_lock-AF_INET) at:
[<ffffffff88213983>] sockopt_release_sock+0x53/0x70 net/core/sock.c:1073
but there are no more locks to release!

other info that might help us debug this:
1 lock held by syz-executor151/5028:

stack backtrace:
CPU: 0 PID: 5028 Comm: syz-executor151 Not tainted 6.5.0-rc7-syzkaller-01717-g59da9885767a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
__lock_release kernel/locking/lockdep.c:5438 [inline]
lock_release+0x4b5/0x680 kernel/locking/lockdep.c:5781
sock_release_ownership include/net/sock.h:1824 [inline]
release_sock+0x175/0x1b0 net/core/sock.c:3527
sockopt_release_sock+0x53/0x70 net/core/sock.c:1073
do_ip_setsockopt+0x12c1/0x3640 net/ipv4/ip_sockglue.c:1364
ip_setsockopt+0x59/0xe0 net/ipv4/ip_sockglue.c:1419
raw_setsockopt+0x218/0x290 net/ipv4/raw.c:833
__sys_setsockopt+0x2cd/0x5b0 net/socket.c:2305
__do_sys_setsockopt net/socket.c:2316 [inline]
__se_sys_setsockopt net/socket.c:2313 [inline]

Fixes: 4bd0623f04 ("inet: move inet->transparent to inet->inet_flags")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-28 10:27:03 +01:00
Daniel Borkmann 3a1e2f4398 net: Make consumed action consistent in sch_handle_egress
While looking at TC_ACT_* handling, the TC_ACT_CONSUMED is only handled in
sch_handle_ingress but not sch_handle_egress. This was added via cd11b16407
("net/tc: introduce TC_ACT_REINSERT.") and e5cf1baf92 ("act_mirred: use
TC_ACT_REINSERT when possible") and later got renamed into TC_ACT_CONSUMED
via 720f22fed8 ("net: sched: refactor reinsert action").

The initial work was targeted for ovs back then and only needed on ingress,
and the mirred action module also restricts it to only that. However, given
it's an API contract it would still make sense to make this consistent to
sch_handle_ingress and handle it on egress side in the same way, that is,
setting return code to "success" and returning NULL back to the caller as
otherwise an action module sitting on egress returning TC_ACT_CONSUMED could
lead to an UAF when untreated.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-28 10:18:03 +01:00
Daniel Borkmann 28d18b673f net: Fix skb consume leak in sch_handle_egress
Fix a memory leak for the tc egress path with TC_ACT_{STOLEN,QUEUED,TRAP}:

  [...]
  unreferenced object 0xffff88818bcb4f00 (size 232):
  comm "softirq", pid 0, jiffies 4299085078 (age 134.028s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 80 70 61 81 88 ff ff 00 41 31 14 81 88 ff ff  ..pa.....A1.....
  backtrace:
    [<ffffffff9991b938>] kmem_cache_alloc_node+0x268/0x400
    [<ffffffff9b3d9231>] __alloc_skb+0x211/0x2c0
    [<ffffffff9b3f0c7e>] alloc_skb_with_frags+0xbe/0x6b0
    [<ffffffff9b3bf9a9>] sock_alloc_send_pskb+0x6a9/0x870
    [<ffffffff9b6b3f00>] __ip_append_data+0x14d0/0x3bf0
    [<ffffffff9b6ba24e>] ip_append_data+0xee/0x190
    [<ffffffff9b7e1496>] icmp_push_reply+0xa6/0x470
    [<ffffffff9b7e4030>] icmp_reply+0x900/0xa00
    [<ffffffff9b7e42e3>] icmp_echo.part.0+0x1a3/0x230
    [<ffffffff9b7e444d>] icmp_echo+0xcd/0x190
    [<ffffffff9b7e9566>] icmp_rcv+0x806/0xe10
    [<ffffffff9b699bd1>] ip_protocol_deliver_rcu+0x351/0x3d0
    [<ffffffff9b699f14>] ip_local_deliver_finish+0x2b4/0x450
    [<ffffffff9b69a234>] ip_local_deliver+0x174/0x1f0
    [<ffffffff9b69a4b2>] ip_sublist_rcv_finish+0x1f2/0x420
    [<ffffffff9b69ab56>] ip_sublist_rcv+0x466/0x920
  [...]

I was able to reproduce this via:

  ip link add dev dummy0 type dummy
  ip link set dev dummy0 up
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 egress protocol ip prio 1 u32 match ip protocol 1 0xff action mirred egress redirect dev dummy0
  ping 1.1.1.1
  <stolen>

After the fix, there are no kmemleak reports with the reproducer. This is
in line with what is also done on the ingress side, and from debugging the
skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible
that these are two different skbs with both skb_unref(skb) as true. The two
seen skbs are due to mirred doing a skb_clone() internally as use_reinsert
is false in tcf_mirred_act() for egress. This was initially reported by Gal.

Fixes: e420bed025 ("bpf: Add fd-based tcx multi-prog infra with link support")
Reported-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/bdfc2640-8f65-5b56-4472-db8e2b161aab@nvidia.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-28 10:18:03 +01:00
Jann Horn 977ad86c2a dccp: Fix out of bounds access in DCCP error handler
There was a previous attempt to fix an out-of-bounds access in the DCCP
error handlers, but that fix assumed that the error handlers only want
to access the first 8 bytes of the DCCP header. Actually, they also look
at the DCCP sequence number, which is stored beyond 8 bytes, so an
explicit pskb_may_pull() is required.

Fixes: 6706a97fec ("dccp: fix out of bound access in dccp_v4_err()")
Fixes: 1aa9d1a0e7 ("ipv6: dccp: fix out of bound access in dccp_v6_err()")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-28 10:15:56 +01:00
Kuniyuki Iwashima c2f8fd7949 netrom: Deny concurrent connect().
syzkaller reported null-ptr-deref [0] related to AF_NETROM.
This is another self-accept issue from the strace log. [1]

syz-executor creates an AF_NETROM socket and calls connect(), which
is blocked at that time.  Then, sk->sk_state is TCP_SYN_SENT and
sock->state is SS_CONNECTING.

  [pid  5059] socket(AF_NETROM, SOCK_SEQPACKET, 0) = 4
  [pid  5059] connect(4, {sa_family=AF_NETROM, sa_data="..." <unfinished ...>

Another thread calls connect() concurrently, which finally fails
with -EINVAL.  However, the problem here is the socket state is
reset even while the first connect() is blocked.

  [pid  5060] connect(4, NULL, 0 <unfinished ...>
  [pid  5060] <... connect resumed>)      = -1 EINVAL (Invalid argument)

As sk->state is TCP_CLOSE and sock->state is SS_UNCONNECTED, the
following listen() succeeds.  Then, the first connect() looks up
itself as a listener and puts skb into the queue with skb->sk itself.
As a result, the next accept() gets another FD of itself as 3, and
the first connect() finishes.

  [pid  5060] listen(4, 0 <unfinished ...>
  [pid  5060] <... listen resumed>)       = 0
  [pid  5060] accept(4, NULL, NULL <unfinished ...>
  [pid  5060] <... accept resumed>)       = 3
  [pid  5059] <... connect resumed>)      = 0

Then, accept4() is called but blocked, which causes the general protection
fault later.

  [pid  5059] accept4(4, NULL, 0x20000400, SOCK_NONBLOCK <unfinished ...>

After that, another self-accept occurs by accept() and writev().

  [pid  5060] accept(4, NULL, NULL <unfinished ...>
  [pid  5061] writev(3, [{iov_base=...}] <unfinished ...>
  [pid  5061] <... writev resumed>)       = 99
  [pid  5060] <... accept resumed>)       = 6

Finally, the leader thread close()s all FDs.  Since the three FDs
reference the same socket, nr_release() does the cleanup for it
three times, and the remaining accept4() causes the following fault.

  [pid  5058] close(3)                    = 0
  [pid  5058] close(4)                    = 0
  [pid  5058] close(5)                    = -1 EBADF (Bad file descriptor)
  [pid  5058] close(6)                    = 0
  [pid  5058] <... exit_group resumed>)   = ?
  [   83.456055][ T5059] general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN

To avoid the issue, we need to return an error for connect() if
another connect() is in progress, as done in __inet_stream_connect().

[0]:
general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
CPU: 0 PID: 5059 Comm: syz-executor.0 Not tainted 6.5.0-rc5-syzkaller-00194-gace0ab3a4b54 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
RIP: 0010:__lock_acquire+0x109/0x5de0 kernel/locking/lockdep.c:5012
Code: 45 85 c9 0f 84 cc 0e 00 00 44 8b 05 11 6e 23 0b 45 85 c0 0f 84 be 0d 00 00 48 ba 00 00 00 00 00 fc ff df 4c 89 d1 48 c1 e9 03 <80> 3c 11 00 0f 85 e8 40 00 00 49 81 3a a0 69 48 90 0f 84 96 0d 00
RSP: 0018:ffffc90003d6f9e0 EFLAGS: 00010006
RAX: ffff8880244c8000 RBX: 1ffff920007adf6c RCX: 0000000000000003
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000018
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000018 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f51d519a6c0(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f51d5158d58 CR3: 000000002943f000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 lock_acquire kernel/locking/lockdep.c:5761 [inline]
 lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x3a/0x50 kernel/locking/spinlock.c:162
 prepare_to_wait+0x47/0x380 kernel/sched/wait.c:269
 nr_accept+0x20d/0x650 net/netrom/af_netrom.c:798
 do_accept+0x3a6/0x570 net/socket.c:1872
 __sys_accept4_file net/socket.c:1913 [inline]
 __sys_accept4+0x99/0x120 net/socket.c:1943
 __do_sys_accept4 net/socket.c:1954 [inline]
 __se_sys_accept4 net/socket.c:1951 [inline]
 __x64_sys_accept4+0x96/0x100 net/socket.c:1951
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f51d447cae9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f51d519a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000120
RAX: ffffffffffffffda RBX: 00007f51d459bf80 RCX: 00007f51d447cae9
RDX: 0000000020000400 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00007f51d44c847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000800 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f51d459bf80 R15: 00007ffc25c34e48
 </TASK>

Link: https://syzkaller.appspot.com/text?tag=CrashLog&x=152cdb63a80000 [1]
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot+666c97e4686410e79649@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=666c97e4686410e79649
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-28 06:58:46 +01:00
Sabrina Dubroca f3e444e31f tls: get cipher_name from cipher_desc in tls_set_sw_offload
tls_cipher_desc also contains the algorithm name needed by
crypto_alloc_aead, use it.

Finally, use get_cipher_desc to check if the cipher_type coming from
userspace is valid, and remove the cipher_type switch.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/53d021d80138aa125a9cef4468aa5ce531975a7b.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:42 -07:00
Sabrina Dubroca 48dfad27fd tls: use tls_cipher_desc to access per-cipher crypto_info in tls_set_sw_offload
The crypto_info_* helpers allow us to fetch pointers into the
per-cipher crypto_info's data.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/c23af110caf0af6b68de2f86c58064913e2e902a.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:42 -07:00
Sabrina Dubroca d9a6ca1a97 tls: use tls_cipher_desc to get per-cipher sizes in tls_set_sw_offload
We can get rid of some local variables, but we have to keep nonce_size
because tls1.3 uses nonce_size = 0 for all ciphers.

We can also drop the runtime sanity checks on iv/rec_seq/tag size,
since we have compile time checks on those values.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/deed9c4430a62c31751a72b8c03ad66ffe710717.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:42 -07:00
Sabrina Dubroca 077e05d135 tls: use tls_cipher_desc to simplify do_tls_getsockopt_conf
Every cipher uses the same code to update its crypto_info struct based
on the values contained in the cctx, with only the struct type and
size/offset changing. We can get those  from tls_cipher_desc, and use
a single pair of memcpy and final copy_to_user.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/c21a904b91e972bdbbf9d1c6d2731ccfa1eedf72.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:42 -07:00
Sabrina Dubroca 5f309ade49 tls: get crypto_info size from tls_cipher_desc in do_tls_setsockopt_conf
We can simplify do_tls_setsockopt_conf using tls_cipher_desc. Also use
get_cipher_desc's result to check if the cipher_type coming from
userspace is valid.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/e97658eb4c6a5832f8ba20a06c4f36a77763c59e.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:42 -07:00
Sabrina Dubroca e907277aeb tls: expand use of tls_cipher_desc in tls_sw_fallback_init
tls_sw_fallback_init already gets the key and tag size from
tls_cipher_desc. We can now also check that the cipher type is valid,
and stop hard-coding the algorithm name passed to crypto_alloc_aead.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/c8c94b8fcafbfb558e09589c1f1ad48dbdf92f76.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:42 -07:00
Sabrina Dubroca d2322cf5ed tls: allocate the fallback aead after checking that the cipher is valid
No need to allocate the aead if we're going to fail afterwards.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/335e32511ed55a0b30f3f81a78fa8f323b3bdf8f.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:42 -07:00
Sabrina Dubroca 3524dd4d5f tls: expand use of tls_cipher_desc in tls_set_device_offload
tls_set_device_offload is already getting iv and rec_seq sizes from
tls_cipher_desc. We can now also check if the cipher_type coming from
userspace is valid and can be offloaded.

We can also remove the runtime check on rec_seq, since we validate it
at compile time.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/8ab71b8eca856c7aaf981a45fe91ac649eb0e2e9.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:41 -07:00
Sabrina Dubroca 0d98cc0202 tls: validate cipher descriptions at compile time
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/b38fb8cf60e099e82ae9979c3c9c92421042417c.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:41 -07:00
Sabrina Dubroca 176a3f50bc tls: extend tls_cipher_desc to fully describe the ciphers
- add nonce, usually equal to iv_size but not for chacha
 - add offsets into the crypto_info for each field
 - add algorithm name
 - add offloadable flag

Also add helpers to access each field of a crypto_info struct
described by a tls_cipher_desc.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/39d5f476d63c171097764e8d38f6f158b7c109ae.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:41 -07:00
Sabrina Dubroca 8db44ab26b tls: rename tls_cipher_size_desc to tls_cipher_desc
We're going to add other fields to it to fully describe a cipher, so
the "_size" name won't match the contents.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/76ca6c7686bd6d1534dfa188fb0f1f6fabebc791.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:41 -07:00
Sabrina Dubroca 037303d676 tls: reduce size of tls_cipher_size_desc
tls_cipher_size_desc indexes ciphers by their type, but we're not
using indices 0..50 of the array. Each struct tls_cipher_size_desc is
20B, so that's a lot of unused memory. We can reindex the array
starting at the lowest used cipher_type.

Introduce the get_cipher_size_desc helper to find the right item and
avoid out-of-bounds accesses, and make tls_cipher_size_desc's size
explicit so that gcc reminds us to update TLS_CIPHER_MIN/MAX when we
add a new cipher.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/5e054e370e240247a5d37881a1cd93a67c15f4ca.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:41 -07:00
Sabrina Dubroca 200e231651 tls: add TLS_CIPHER_ARIA_GCM_* to tls_cipher_size_desc
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/b2e0fb79e6d0a4478be9bf33781dc9c9281c9d56.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:41 -07:00
Sabrina Dubroca fd0fc6fdd8 tls: move tls_cipher_size_desc to net/tls/tls.h
It's only used in net/tls/*, no need to bloat include/net/tls.h.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/dd9fad80415e5b3575b41f56b331871038362eab.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:41 -07:00
Dima Chumak 390a24cbc3 devlink: Expose port function commands to control IPsec packet offloads
Expose port function commands to enable / disable IPsec packet offloads,
this is used to control the port IPsec capabilities.

When IPsec packet is disabled for a function of the port (default),
function cannot offload IPsec packet operations (encapsulation and XFRM
policy offload). When enabled, IPsec packet operations can be offloaded
by the function of the port, which includes crypto operation
(Encrypt/Decrypt), IPsec encapsulation and XFRM state and policy
offload.

Example of a PCI VF port which supports IPsec packet offloads:

$ devlink port show pci/0000:06:00.0/1
    pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
        function:
        hw_addr 00:00:00:00:00:00 roce enable ipsec_packet disable

$ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable

$ devlink port show pci/0000:06:00.0/1
    pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
        function:
        hw_addr 00:00:00:00:00:00 roce enable ipsec_packet enable

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:08:45 -07:00
Dima Chumak 62b6442c58 devlink: Expose port function commands to control IPsec crypto offloads
Expose port function commands to enable / disable IPsec crypto offloads,
this is used to control the port IPsec capabilities.

When IPsec crypto is disabled for a function of the port (default),
function cannot offload any IPsec crypto operations (Encrypt/Decrypt and
XFRM state offloading). When enabled, IPsec crypto operations can be
offloaded by the function of the port.

Example of a PCI VF port which supports IPsec crypto offloads:

$ devlink port show pci/0000:06:00.0/1
    pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
        function:
        hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable

$ devlink port function set pci/0000:06:00.0/1 ipsec_crypto enable

$ devlink port show pci/0000:06:00.0/1
    pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
        function:
        hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto enable

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:08:44 -07:00
Budimir Markovic b3d26c5702 net/sched: sch_hfsc: Ensure inner classes have fsc curve
HFSC assumes that inner classes have an fsc curve, but it is currently
possible for classes without an fsc curve to become parents. This leads
to bugs including a use-after-free.

Don't allow non-root classes without HFSC_FSC to become parents.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Budimir Markovic <markovicbudimir@gmail.com>
Signed-off-by: Budimir Markovic <markovicbudimir@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20230824084905.422-1-markovicbudimir@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-25 18:57:54 -07:00
Jakub Kicinski bebfbf07c7 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZOjkTAAKCRDbK58LschI
 gx32AP9gaaHFBtOYBfoenKTJfMgv1WhtQHIBas+WN9ItmBx9MAEA4gm/VyQ6oD7O
 EBjJKJQ2CZ/QKw7cNacXw+l5jF7/+Q0=
 =8P7g
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-08-25

We've added 87 non-merge commits during the last 8 day(s) which contain
a total of 104 files changed, 3719 insertions(+), 4212 deletions(-).

The main changes are:

1) Add multi uprobe BPF links for attaching multiple uprobes
   and usdt probes, which is significantly faster and saves extra fds,
   from Jiri Olsa.

2) Add support BPF cpu v4 instructions for arm64 JIT compiler,
   from Xu Kuohai.

3) Add support BPF cpu v4 instructions for riscv64 JIT compiler,
   from Pu Lehui.

4) Fix LWT BPF xmit hooks wrt their return values where propagating
   the result from skb_do_redirect() would trigger a use-after-free,
   from Yan Zhai.

5) Fix a BPF verifier issue related to bpf_kptr_xchg() with local kptr
   where the map's value kptr type and locally allocated obj type
   mismatch, from Yonghong Song.

6) Fix BPF verifier's check_func_arg_reg_off() function wrt graph
   root/node which bypassed reg->off == 0 enforcement,
   from Kumar Kartikeya Dwivedi.

7) Lift BPF verifier restriction in networking BPF programs to treat
   comparison of packet pointers not as a pointer leak,
   from Yafang Shao.

8) Remove unmaintained XDP BPF samples as they are maintained
   in xdp-tools repository out of tree, from Toke Høiland-Jørgensen.

9) Batch of fixes for the tracing programs from BPF samples in order
   to make them more libbpf-aware, from Daniel T. Lee.

10) Fix a libbpf signedness determination bug in the CO-RE relocation
    handling logic, from Andrii Nakryiko.

11) Extend libbpf to support CO-RE kfunc relocations. Also follow-up
    fixes for bpf_refcount shared ownership implementation,
    both from Dave Marchevsky.

12) Add a new bpf_object__unpin() API function to libbpf,
    from Daniel Xu.

13) Fix a memory leak in libbpf to also free btf_vmlinux
    when the bpf_object gets closed, from Hao Luo.

14) Small error output improvements to test_bpf module, from Helge Deller.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (87 commits)
  selftests/bpf: Add tests for rbtree API interaction in sleepable progs
  bpf: Allow bpf_spin_{lock,unlock} in sleepable progs
  bpf: Consider non-owning refs to refcounted nodes RCU protected
  bpf: Reenable bpf_refcount_acquire
  bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  bpf: Consider non-owning refs trusted
  bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire
  selftests/bpf: Enable cpu v4 tests for RV64
  riscv, bpf: Support unconditional bswap insn
  riscv, bpf: Support signed div/mod insns
  riscv, bpf: Support 32-bit offset jmp insn
  riscv, bpf: Support sign-extension mov insns
  riscv, bpf: Support sign-extension load insns
  riscv, bpf: Fix missing exception handling and redundant zext for LDX_B/H/W
  samples/bpf: Add note to README about the XDP utilities moved to xdp-tools
  samples/bpf: Cleanup .gitignore
  samples/bpf: Remove the xdp_sample_pkts utility
  samples/bpf: Remove the xdp1 and xdp2 utilities
  samples/bpf: Remove the xdp_rxq_info utility
  samples/bpf: Remove the xdp_redirect* utilities
  ...
====================

Link: https://lore.kernel.org/r/20230825194319.12727-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-25 18:40:15 -07:00
Jakub Kicinski 1fa6ffad12 wireless-next patches for v6.6
The second pull request for v6.6, this time with both stack and driver
 changes. Unusually we have only one major new feature but lots of
 small cleanup all over, I guess this is due to people have been on
 vacation the last month.
 
 Major changes:
 
 rtw89
 
 * Introduce Time Averaged SAR (TAS) support
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmToqosRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZv9XQf9HDq9smbuWLvwzNjbbS31hHFLmnfhN8Zp
 +Zzn47gpMCle9ahGLQyw8lcfNPWCMyqOu4sGQ6hyyuH+YXoxZryuq9QDwWo9L/b1
 5Cpm4IaBYBMm0ZoOkWw2lQSzGyNrXgvCEKRVC+pYQMvr5V2aEWxT/kT4guiou9D5
 OXPRFN2iqZP0Q3TKcfKWRnWn3S0Ok3kZCFuXcWkL0sgwjqP/wbAPO1XNI1IImKNM
 xUd0zT4vK/layYq7i20y8blglI5kcp/aKCFEwYpQC2WPeZ3Wtl1G9PQ8eze5Gc2Q
 NTw3xfr6tENIcAmYoLdBdKbUq6e6pwLwXlojlZ2beR6s7LHM30AinQ==
 =2Hja
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2023-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.6

The second pull request for v6.6, this time with both stack and driver
changes. Unusually we have only one major new feature but lots of
small cleanup all over, I guess this is due to people have been on
vacation the last month.

Major changes:

rtw89
 - Introduce Time Averaged SAR (TAS) support

* tag 'wireless-next-2023-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (114 commits)
  wifi: rtlwifi: rtl8723: Remove unused function rtl8723_cmd_send_packet()
  wifi: rtw88: usb: kill and free rx urbs on probe failure
  wifi: rtw89: Fix clang -Wimplicit-fallthrough in rtw89_query_sar()
  wifi: rtw89: phy: modify register setting of ENV_MNTR, PHYSTS and DIG
  wifi: rtw89: phy: add phy_gen_def::cr_base to support WiFi 7 chips
  wifi: rtw89: mac: define register address of rx_filter to generalize code
  wifi: rtw89: mac: define internal memory address for WiFi 7 chip
  wifi: rtw89: mac: generalize code to indirectly access WiFi internal memory
  wifi: rtw89: mac: add mac_gen_def::band1_offset to map MAC band1 register address
  wifi: wlcore: sdio: Use module_sdio_driver macro to simplify the code
  wifi: rtw89: initialize multi-channel handling
  wifi: rtw89: provide functions to configure NoA for beacon update
  wifi: rtw89: call rtw89_chan_get() by vif chanctx if aware of vif
  wifi: rtw89: sar: let caller decide the center frequency to query
  wifi: rtw89: refine rtw89_correct_cck_chan() by rtw89_hw_to_nl80211_band()
  wifi: rtw89: add function prototype for coex request duration
  Fix nomenclature for USB and PCI wireless devices
  wifi: ath: Use is_multicast_ether_addr() to check multicast Ether address
  wifi: ath12k: Remove unused declarations
  wifi: ath12k: add check max message length while scanning with extraie
  ...
====================

Link: https://lore.kernel.org/r/20230825132230.A0833C433C8@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-25 18:35:09 -07:00
Matthew Wilcox (Oracle) f9bff0e318 minmax: add in_range() macro
Patch series "New page table range API", v6.

This patchset changes the API used by the MM to set up page table entries.
The four APIs are:

    set_ptes(mm, addr, ptep, pte, nr)
    update_mmu_cache_range(vma, addr, ptep, nr)
    flush_dcache_folio(folio) 
    flush_icache_pages(vma, page, nr)

flush_dcache_folio() isn't technically new, but no architecture
implemented it, so I've done that for them.  The old APIs remain around
but are mostly implemented by calling the new interfaces.

The new APIs are based around setting up N page table entries at once. 
The N entries belong to the same PMD, the same folio and the same VMA, so
ptep++ is a legitimate operation, and locking is taken care of for you. 
Some architectures can do a better job of it than just a loop, but I have
hesitated to make too deep a change to architectures I don't understand
well.

One thing I have changed in every architecture is that PG_arch_1 is now a
per-folio bit instead of a per-page bit when used for dcache clean/dirty
tracking.  This was something that would have to happen eventually, and it
makes sense to do it now rather than iterate over every page involved in a
cache flush and figure out if it needs to happen.

The point of all this is better performance, and Fengwei Yin has measured
improvement on x86.  I suspect you'll see improvement on your architecture
too.  Try the new will-it-scale test mentioned here:
https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/
You'll need to run it on an XFS filesystem and have
CONFIG_TRANSPARENT_HUGEPAGE set.

This patchset is the basis for much of the anonymous large folio work
being done by Ryan, so it's received quite a lot of testing over the last
few months.


This patch (of 38):

Determine if a value lies within a range more efficiently (subtraction +
comparison vs two comparisons and an AND).  It also has useful (under some
circumstances) behaviour if the range exceeds the maximum value of the
type.  Convert all the conflicting definitions of in_range() within the
kernel; some can use the generic definition while others need their own
definition.

Link: https://lkml.kernel.org/r/20230802151406.3735276-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20230802151406.3735276-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:18 -07:00
Luiz Augusto von Dentz 253f3399f4 Bluetooth: HCI: Introduce HCI_QUIRK_BROKEN_LE_CODED
This introduces HCI_QUIRK_BROKEN_LE_CODED which is used to indicate
that LE Coded PHY shall not be used, it is then set for some Intel
models that claim to support it but when used causes many problems.

Cc: stable@vger.kernel.org # 6.4.y+
Link: https://github.com/bluez/bluez/issues/577
Link: https://github.com/bluez/bluez/issues/582
Link: https://lore.kernel.org/linux-bluetooth/CABBYNZKco-v7wkjHHexxQbgwwSz-S=GZ=dZKbRE1qxT1h4fFbQ@mail.gmail.com/T/#
Fixes: 288c90224e ("Bluetooth: Enable all supported LE PHY by default")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24 12:23:46 -07:00
Claudia Draghicescu 9c0826310b Bluetooth: ISO: Add support for periodic adv reports processing
In the case of a Periodic Synchronized Receiver,
the PA report received from a Broadcaster contains the BASE,
which has information about codec and other parameters of a BIG.
This isnformation is stored and the application can retrieve it
using getsockopt(BT_ISO_BASE).

Signed-off-by: Claudia Draghicescu <claudia.rosu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24 12:22:56 -07:00
Pauli Virtanen 3344d31833 Bluetooth: hci_conn: fail SCO/ISO via hci_conn_failed if ACL gone early
Not calling hci_(dis)connect_cfm before deleting conn referred to by a
socket generally results to use-after-free.

When cleaning up SCO connections when the parent ACL is deleted too
early, use hci_conn_failed to do the connection cleanup properly.

We also need to clean up ISO connections in a similar situation when
connecting has started but LE Create CIS is not yet sent, so do it too
here.

Fixes: ca1fd42e7d ("Bluetooth: Fix potential double free caused by hci_conn_unlink")
Reported-by: syzbot+cf54c1da6574b6c1b049@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-bluetooth/00000000000013b93805fbbadc50@google.com/
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24 12:22:35 -07:00
Luiz Augusto von Dentz db08722fc7 Bluetooth: hci_core: Fix missing instances using HCI_MAX_AD_LENGTH
There a few instances still using HCI_MAX_AD_LENGTH instead of using
max_adv_len which takes care of detecting what is the actual maximum
length depending on if the controller supports EA or not.

Fixes: 112b5090c2 ("Bluetooth: MGMT: Fix always using HCI_MAX_AD_LENGTH")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24 12:22:05 -07:00
Iulia Tanasescu fbdc4bc472 Bluetooth: ISO: Use defer setup to separate PA sync and BIG sync
This commit implements defer setup support for the Broadcast Sink
scenario: By setting defer setup on a broadcast socket before calling
listen, the user is able to trigger the PA sync and BIG sync procedures
separately.

This is useful if the user first wants to synchronize to the periodic
advertising transmitted by a Broadcast Source, and trigger the BIG sync
procedure later on.

If defer setup is set, once a PA sync established event arrives, a new
hcon is created and notified to the ISO layer. A child socket associated
with the PA sync connection will be added to the accept queue of the
listening socket.

Once the accept call returns the fd for the PA sync child socket, the
user should call read on that fd. This will trigger the BIG create sync
procedure, and the PA sync socket will become a listening socket itself.

When the BIG sync established event is notified to the ISO layer, the
bis connections will be added to the accept queue of the PA sync parent.
The user should call accept on the PA sync socket to get the final bis
connections.

Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24 12:21:35 -07:00
Luiz Augusto von Dentz 3a15324fd4 Bluetooth: hci_conn: Fix sending BT_HCI_CMD_LE_CREATE_CONN_CANCEL
This fixes sending BT_HCI_CMD_LE_CREATE_CONN_CANCEL when
hci_le_create_conn_sync has not been called because HCI_CONN_SCANNING
has been clear too early before its cmd_sync callback has been run.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24 12:20:16 -07:00
Luiz Augusto von Dentz 94d9ba9f98 Bluetooth: hci_sync: Fix UAF in hci_disconnect_all_sync
Use-after-free can occur in hci_disconnect_all_sync if a connection is
deleted by concurrent processing of a controller event.

To prevent this the code now tries to iterate over the list backwards
to ensure the links are cleanup before its parents, also it no longer
relies on a cursor, instead it always uses the last element since
hci_abort_conn_sync is guaranteed to call hci_conn_del.

UAF crash log:
==================================================================
BUG: KASAN: slab-use-after-free in hci_set_powered_sync
(net/bluetooth/hci_sync.c:5424) [bluetooth]
Read of size 8 at addr ffff888009d9c000 by task kworker/u9:0/124

CPU: 0 PID: 124 Comm: kworker/u9:0 Tainted: G        W
6.5.0-rc1+ #10
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.2-1.fc38 04/01/2014
Workqueue: hci0 hci_cmd_sync_work [bluetooth]
Call Trace:
 <TASK>
 dump_stack_lvl+0x5b/0x90
 print_report+0xcf/0x670
 ? __virt_addr_valid+0xdd/0x160
 ? hci_set_powered_sync+0x2c9/0x4a0 [bluetooth]
 kasan_report+0xa6/0xe0
 ? hci_set_powered_sync+0x2c9/0x4a0 [bluetooth]
 ? __pfx_set_powered_sync+0x10/0x10 [bluetooth]
 hci_set_powered_sync+0x2c9/0x4a0 [bluetooth]
 ? __pfx_hci_set_powered_sync+0x10/0x10 [bluetooth]
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_set_powered_sync+0x10/0x10 [bluetooth]
 hci_cmd_sync_work+0x137/0x220 [bluetooth]
 process_one_work+0x526/0x9d0
 ? __pfx_process_one_work+0x10/0x10
 ? __pfx_do_raw_spin_lock+0x10/0x10
 ? mark_held_locks+0x1a/0x90
 worker_thread+0x92/0x630
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x196/0x1e0
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>

Allocated by task 1782:
 kasan_save_stack+0x33/0x60
 kasan_set_track+0x25/0x30
 __kasan_kmalloc+0x8f/0xa0
 hci_conn_add+0xa5/0xa80 [bluetooth]
 hci_bind_cis+0x881/0x9b0 [bluetooth]
 iso_connect_cis+0x121/0x520 [bluetooth]
 iso_sock_connect+0x3f6/0x790 [bluetooth]
 __sys_connect+0x109/0x130
 __x64_sys_connect+0x40/0x50
 do_syscall_64+0x60/0x90
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Freed by task 695:
 kasan_save_stack+0x33/0x60
 kasan_set_track+0x25/0x30
 kasan_save_free_info+0x2b/0x50
 __kasan_slab_free+0x10a/0x180
 __kmem_cache_free+0x14d/0x2e0
 device_release+0x5d/0xf0
 kobject_put+0xdf/0x270
 hci_disconn_complete_evt+0x274/0x3a0 [bluetooth]
 hci_event_packet+0x579/0x7e0 [bluetooth]
 hci_rx_work+0x287/0xaa0 [bluetooth]
 process_one_work+0x526/0x9d0
 worker_thread+0x92/0x630
 kthread+0x196/0x1e0
 ret_from_fork+0x2c/0x50
==================================================================

Fixes: 182ee45da0 ("Bluetooth: hci_sync: Rework hci_suspend_notifier")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24 12:19:55 -07:00
Thomas Weißschuh 5d21d0a65b net: generalize calculation of skb extensions length
Remove the necessity to modify skb_ext_total_length() when new extension
types are added.
Also reduces the line count a bit.

With optimizations enabled the function is folded down to the same
constant value as before during compilation.
This has been validated on x86 with GCC 6.5.0 and 13.2.1.
Also a similar construct has been validated on godbolt.org with GCC 5.1.
In any case the compiler has to be able to evaluate the construct at
compile-time for the BUILD_BUG_ON() in skb_extensions_init().

Even if not evaluated at compile-time this function would only ever
be executed once at run-time, so the overhead would be very minuscule.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230823-skb_ext-simplify-v2-1-66e26cd66860@weissschuh.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24 11:24:30 -07:00
Jakub Kicinski 57ce6427e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

include/net/inet_sock.h
  f866fbc842 ("ipv4: fix data-races around inet->inet_id")
  c274af2242 ("inet: introduce inet->inet_flags")
https://lore.kernel.org/all/679ddff6-db6e-4ff6-b177-574e90d0103d@tessares.net/

Adjacent changes:

drivers/net/bonding/bond_alb.c
  e74216b8de ("bonding: fix macvlan over alb bond support")
  f11e5bd159 ("bonding: support balance-alb with openvswitch")

drivers/net/ethernet/broadcom/bgmac.c
  d6499f0b7c ("net: bgmac: Return PTR_ERR() for fixed_phy_register()")
  23a14488ea ("net: bgmac: Fix return value check for fixed_phy_register()")

drivers/net/ethernet/broadcom/genet/bcmmii.c
  32bbe64a13 ("net: bcmgenet: Fix return value check for fixed_phy_register()")
  acf50d1adb ("net: bcmgenet: Return PTR_ERR() for fixed_phy_register()")

net/sctp/socket.c
  f866fbc842 ("ipv4: fix data-races around inet->inet_id")
  b09bde5c35 ("inet: move inet->mc_loop to inet->inet_frags")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24 10:51:39 -07:00
Trond Myklebust cd18f24085 SUNRPC: Don't override connect timeouts in rpc_clnt_add_xprt()
If the caller specifies the connect timeouts in the arguments to
rpc_clnt_add_xprt(), then we shouldn't override them.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-08-24 13:24:15 -04:00
Trond Myklebust d2ee413884 SUNRPC: Allow specification of TCP client connect timeout at setup
When we create a TCP transport, the connect timeout parameters are
currently fixed to be 90s. This is problematic in the pNFS flexfiles
case, where we may have multiple mirrors, and we would like to fail over
quickly to the next mirror if a data server is down.

This patch adds the ability to specify the connection parameters at RPC
client creation time.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-08-24 13:24:15 -04:00
Trond Myklebust 3e6ff89d2e SUNRPC: Refactor and simplify connect timeout
Instead of requiring the requests to redrive the connection several
times, just let the TCP connect code manage it now that we've adjusted
the TCP_SYNCNT value.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-08-24 13:24:15 -04:00
Trond Myklebust 3a107f0740 SUNRPC: Set the TCP_SYNCNT to match the socket timeout
Set the TCP SYN count so that we abort the connection attempt at around
the expected timeout value.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-08-24 13:24:15 -04:00
Linus Torvalds b5cc3833f1 Networking fixes for 6.5-rc8, including fixes from wifi, can
and netfilter
 
 Fixes to fixes:
 
   - nf_tables:
     - GC transaction race with abort path
     - defer gc run if previous batch is still pending
 
 Previous releases - regressions:
 
   - ipv4: fix data-races around inet->inet_id
 
   - phy: fix deadlocking in phy_error() invocation
 
   - mdio: fix C45 read/write protocol
 
   - ipvlan: fix a reference count leak warning in ipvlan_ns_exit()
 
   - ice: fix NULL pointer deref during VF reset
 
   - i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
 
   - tg3: use slab_build_skb() when needed
 
   - mtk_eth_soc: fix NULL pointer on hw reset
 
 Previous releases - always broken:
 
   - core: validate veth and vxcan peer ifindexes
 
   - sched: fix a qdisc modification with ambiguous command request
 
   - devlink: add missing unregister linecard notification
 
   - wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
 
   - batman:
     - do not get eth header before batadv_check_management_packet
     - fix batadv_v_ogm_aggr_send memory leak
 
   - bonding: fix macvlan over alb bond support
 
   - mlxsw: set time stamp fields also when its type is MIRROR_UTC
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmTnJIQSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkt7kP/jy6HOMwSOMFbtxQD2m89EImr6ZlLUPg
 H09seQzC5nwRbgZrdzukmM27HDKEkYe1sPyxhpS8E4iAslFaefEvnWqOY0oiQSpH
 OuF4mP/cS9QKb62NwKVrau3SCARS9arLmOF0mcJNdDOWwucE+SoFaebxSMitAU/w
 k8hHVsLwc5dwZAYznOl2/qsmPBnIUsxfymNJE/RuFqj1nHccGybh9mJKpAxc0knj
 QEjqno//PgAXPV/X3mH/wG0fcsXs0OlAnBS9yA95GNzuR2yWrh7bD/et99En/elS
 8paUio+O3P6Y6WaewgDYFm44pf/x+hFb18Irtab82BkdRw+lgFyF23g8IH7ToJAE
 mEaxwdS7AQ4XEunNyJsjwiffWUG1nFaoIhaGb0Lo1qmgLHDo+rrNhkrBWvZxSf0Q
 8QlMnCXopJ1c5Qltz5QNVaWPErpCcanxV3cpNlG+lTpfamWBrUpuv/EhHCUF/fr3
 hlgJEm+WoFTvexO+QC3CyJDz2JYLLMaaYaoUZ1aJS2dtTTc3tfUjEL8VcopfXI87
 2FXJ3qEtCkvfdtfFjhofw97qHDvGrTXa9r2JSh1Pp8v15pKdM2P/lMYxd4B0cSEw
 9udW/3bWkvHZayzBWvqDEiz3UTID1+uX0/qpBWY40QzTdIXo6sBrCCk93tjJUdcA
 kXjw9HkSqW6H
 =WKil
 -----END PGP SIGNATURE-----

Merge tag 'net-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from wifi, can and netfilter.

  Fixes to fixes:

   - nf_tables:
       - GC transaction race with abort path
       - defer gc run if previous batch is still pending

  Previous releases - regressions:

   - ipv4: fix data-races around inet->inet_id

   - phy: fix deadlocking in phy_error() invocation

   - mdio: fix C45 read/write protocol

   - ipvlan: fix a reference count leak warning in ipvlan_ns_exit()

   - ice: fix NULL pointer deref during VF reset

   - i40e: fix potential NULL pointer dereferencing of pf->vf in
     i40e_sync_vsi_filters()

   - tg3: use slab_build_skb() when needed

   - mtk_eth_soc: fix NULL pointer on hw reset

  Previous releases - always broken:

   - core: validate veth and vxcan peer ifindexes

   - sched: fix a qdisc modification with ambiguous command request

   - devlink: add missing unregister linecard notification

   - wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning

   - batman:
      - do not get eth header before batadv_check_management_packet
      - fix batadv_v_ogm_aggr_send memory leak

   - bonding: fix macvlan over alb bond support

   - mlxsw: set time stamp fields also when its type is MIRROR_UTC"

* tag 'net-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
  selftests: bonding: add macvlan over bond testing
  selftest: bond: add new topo bond_topo_2d1c.sh
  bonding: fix macvlan over alb bond support
  rtnetlink: Reject negative ifindexes in RTM_NEWLINK
  netfilter: nf_tables: defer gc run if previous batch is still pending
  netfilter: nf_tables: fix out of memory error handling
  netfilter: nf_tables: use correct lock to protect gc_list
  netfilter: nf_tables: GC transaction race with abort path
  netfilter: nf_tables: flush pending destroy work before netlink notifier
  netfilter: nf_tables: validate all pending tables
  ibmveth: Use dcbf rather than dcbfl
  i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
  net/sched: fix a qdisc modification with ambiguous command request
  igc: Fix the typo in the PTM Control macro
  batman-adv: Hold rtnl lock during MTU update via netlink
  igb: Avoid starting unnecessary workqueues
  can: raw: add missing refcount for memory leak fix
  can: isotp: fix support for transmission of SF without flow control
  bnx2x: new flag for track HW resource allocation
  sfc: allocate a big enough SKB for loopback selftest packet
  ...
2023-08-24 08:23:13 -07:00
Herbert Xu e6a28d6303 libceph: do not include crypto/algapi.h
The header file crypto/algapi.h is for internal use only.  Use the
header file crypto/utils.h instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-08-24 11:24:37 +02:00
Jeff Layton 4e8c4c2355 libceph: allow ceph_osdc_new_request to accept a multi-op read
Currently we have some special-casing for multi-op writes, but in the
case of a read, we can't really handle it. All of the current multi-op
callers call it with CEPH_OSD_FLAG_WRITE set.

Have ceph_osdc_new_request check for CEPH_OSD_FLAG_READ and if it's set,
allocate multiple reply ops instead of multiple request ops. If neither
flag is set, return -EINVAL.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-08-24 11:24:35 +02:00
Jeff Layton 69dd3b3930 libceph: add CEPH_OSD_OP_ASSERT_VER support
...and record the user_version in the reply in a new field in
ceph_osd_request, so we can populate the assert_ver appropriately.
Shuffle the fields a bit too so that the new field fits in an
existing hole on x86_64.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-08-24 11:24:35 +02:00
Paolo Abeni 8938fc0c7e netfilter pull request 2023-08-23
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmTmI1cNHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2AKBEEACACRkBNJ38IZoNhRdDWWVpoGiBL08BBZ/9Fdhh
 Cc/iZ0d/XWcAS8qmPlABk82rwZ7EwW0l+9VGai4easY37S6SC0qLKZQYScZj5Fpl
 hUMRiEn/Hd1fYjgGPCPG7dCFHYmh0JzXDFDDrBE9eRJmo7JdU/M9amLxYa2q1La7
 vvC6f9MO7+zUeCl5KLOpCBl3/kLDadHSA0FBaPIWP3K+Pd1wR2QJpNoy8U7XzZJP
 0+oS6kqqaOhAKImCzct2de1xfY4djnMzYYxAqxAUdd60/2dLiT+NJK03LA+FMKFX
 7bZY/CnoqWZzXbWcMAC/fg7nbj7zSS1HIgOft3zbj1sGZrhZmINC3hTjiIeSwyZV
 /n0fbV3IQaGCWx3dAGUQpuuCk3FwpIsw4NyRM8v43mnbFeaon/dBtMycXsWP+xiH
 VMc0j+BJl5zWNynZVTF1PYuNwkX9uubhDVrgtkqZZD+9RzE8i6DiRf7deOBLsI3N
 XlJpuc34hgGKe3s+Wn1FOY7jMO4FG6OEjB67t0tpjgAxg4mnuxGncXPV+dbTDq9k
 fgwntbo5RAL9R4itb2Qfy0cg4NiFF1Nqjyzxo+bBMMByst1hlsrAX/V7LInKF9Hi
 VI4X8YRdV2b8cQVFpqBigJS/k7wRUH7pdgd7YA6QSDVrBSp5mLf49+L7gaGOTJ6i
 hag4pg==
 =EVaB
 -----END PGP SIGNATURE-----

Merge tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter updates for net

This PR contains nf_tables updates for your *net* tree.

First patch fixes table validation, I broke this in 6.4 when tracking
validation state per table, reported by Pablo, fixup from myself.

Second patch makes sure objects waiting for memory release have been
released, this was broken in 6.1, patch from Pablo Neira Ayuso.

Patch three is a fix-for-fix from previous PR: In case a transaction
gets aborted, gc sequence counter needs to be incremented so pending
gc requests are invalidated, from Pablo.

Same for patch 4: gc list needs to use gc list lock, not destroy lock,
also from Pablo.

Patch 5 fixes a UaF in a set backend, but this should only occur when
failslab is enabled for GFP_KERNEL allocations, broken since feature
was added in 5.6, from myself.

Patch 6 fixes a double-free bug that was also added via previous PR:
We must not schedule gc work if the previous batch is still queued.

netfilter pull request 2023-08-23

* tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: defer gc run if previous batch is still pending
  netfilter: nf_tables: fix out of memory error handling
  netfilter: nf_tables: use correct lock to protect gc_list
  netfilter: nf_tables: GC transaction race with abort path
  netfilter: nf_tables: flush pending destroy work before netlink notifier
  netfilter: nf_tables: validate all pending tables
====================

Link: https://lore.kernel.org/r/20230823152711.15279-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-24 10:33:22 +02:00
Ido Schimmel 30188bd783 rtnetlink: Reject negative ifindexes in RTM_NEWLINK
Negative ifindexes are illegal, but the kernel does not validate the
ifindex in the ancillary header of RTM_NEWLINK messages, resulting in
the kernel generating a warning [1] when such an ifindex is specified.

Fix by rejecting negative ifindexes.

[1]
WARNING: CPU: 0 PID: 5031 at net/core/dev.c:9593 dev_index_reserve+0x1a2/0x1c0 net/core/dev.c:9593
[...]
Call Trace:
 <TASK>
 register_netdevice+0x69a/0x1490 net/core/dev.c:10081
 br_dev_newlink+0x27/0x110 net/bridge/br_netlink.c:1552
 rtnl_newlink_create net/core/rtnetlink.c:3471 [inline]
 __rtnl_newlink+0x115e/0x18c0 net/core/rtnetlink.c:3688
 rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3701
 rtnetlink_rcv_msg+0x439/0xd30 net/core/rtnetlink.c:6427
 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2545
 netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline]
 netlink_unicast+0x536/0x810 net/netlink/af_netlink.c:1368
 netlink_sendmsg+0x93c/0xe40 net/netlink/af_netlink.c:1910
 sock_sendmsg_nosec net/socket.c:728 [inline]
 sock_sendmsg+0xd9/0x180 net/socket.c:751
 ____sys_sendmsg+0x6ac/0x940 net/socket.c:2538
 ___sys_sendmsg+0x135/0x1d0 net/socket.c:2592
 __sys_sendmsg+0x117/0x1e0 net/socket.c:2621
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: 38f7b870d4 ("[RTNETLINK]: Link creation API")
Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230823064348.2252280-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-24 09:45:52 +02:00
Herbert Xu 8da1985ff7 wifi: mac80211: Do not include crypto/algapi.h
The header file crypto/algapi.h is for internal use only.  Use the
header file crypto/utils.h instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/E1qYlA0-006vFr-Ts@formenos.hmeau.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-24 08:42:36 +02:00
Yue Haibing f9597ba887 xprtrdma: Remove unused function declaration rpcrdma_bc_post_recv()
rpcrdma_bc_post_recv() is never implemented since introduction in
commit f531a5dbc4 ("xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers").

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-08-23 15:58:47 -04:00
Anna Schumaker 61182c796d SUNRPC: kmap() the xdr pages during decode
If the pages are in HIGHMEM then we need to make sure they're mapped
before trying to read data off of them, otherwise we could end up with a
NULL pointer dereference.

The downside to this is that we need an extra cleanup step at the end of
decode to kunmap() the last page. I introduced an xdr_finish_decode()
function to do this. Right now this function only calls the
unmap_current_page() function, but other generic cleanup steps could be
added in the future if we come across anything else.

Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-08-23 15:58:47 -04:00
Florian Westphal 8e51830e29 netfilter: nf_tables: defer gc run if previous batch is still pending
Don't queue more gc work, else we may queue the same elements multiple
times.

If an element is flagged as dead, this can mean that either the previous
gc request was invalidated/discarded by a transaction or that the previous
request is still pending in the system work queue.

The latter will happen if the gc interval is set to a very low value,
e.g. 1ms, and system work queue is backlogged.

The sets refcount is 1 if no previous gc requeusts are queued, so add
a helper for this and skip gc run if old requests are pending.

Add a helper for this and skip the gc run in this case.

Fixes: f6c383b8c3 ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-23 16:12:59 +02:00
Florian Westphal 5e1be4cdc9 netfilter: nf_tables: fix out of memory error handling
Several instances of pipapo_resize() don't propagate allocation failures,
this causes a crash when fault injection is enabled for gfp_kernel slabs.

Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-23 16:12:10 +02:00
Pablo Neira Ayuso 8357bc946a netfilter: nf_tables: use correct lock to protect gc_list
Use nf_tables_gc_list_lock spinlock, not nf_tables_destroy_list_lock to
protect the gc list.

Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-23 16:10:01 +02:00
Pablo Neira Ayuso 720344340f netfilter: nf_tables: GC transaction race with abort path
Abort path is missing a synchronization point with GC transactions. Add
GC sequence number hence any GC transaction losing race will be
discarded.

Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-23 16:10:01 +02:00
Pablo Neira Ayuso 2c9f029328 netfilter: nf_tables: flush pending destroy work before netlink notifier
Destroy work waits for the RCU grace period then it releases the objects
with no mutex held. All releases objects follow this path for
transactions, therefore, order is guaranteed and references to top-level
objects in the hierarchy remain valid.

However, netlink notifier might interfer with pending destroy work.
rcu_barrier() is not correct because objects are not release via RCU
callback. Flush destroy work before releasing objects from netlink
notifier path.

Fixes: d4bc8271db ("netfilter: nf_tables: netlink notifier might race to release objects")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-23 16:10:01 +02:00
Florian Westphal 4b80ced971 netfilter: nf_tables: validate all pending tables
We have to validate all tables in the transaction that are in
VALIDATE_DO state, the blamed commit below did not move the break
statement to its right location so we only validate one table.

Moreover, we can't init table->validate to _SKIP when a table object
is allocated.

If we do, then if a transcaction creates a new table and then
fails the transaction, nfnetlink will loop and nft will hang until
user cancels the command.

Add back the pernet state as a place to stash the last state encountered.
This is either _DO (we hit an error during commit validation) or _SKIP
(transaction passed all checks).

Fixes: 00c320f9b7 ("netfilter: nf_tables: make validation state per table")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-23 16:10:01 +02:00
Jamal Hadi Salim da71714e35 net/sched: fix a qdisc modification with ambiguous command request
When replacing an existing root qdisc, with one that is of the same kind, the
request boils down to essentially a parameterization change  i.e not one that
requires allocation and grafting of a new qdisc. syzbot was able to create a
scenario which resulted in a taprio qdisc replacing an existing taprio qdisc
with a combination of NLM_F_CREATE, NLM_F_REPLACE and NLM_F_EXCL leading to
create and graft scenario.
The fix ensures that only when the qdisc kinds are different that we should
allow a create and graft, otherwise it goes into the "change" codepath.

While at it, fix the code and comments to improve readability.

While syzbot was able to create the issue, it did not zone on the root cause.
Analysis from Vladimir Oltean <vladimir.oltean@nxp.com> helped narrow it down.

v1->V2 changes:
- remove "inline" function definition (Vladmir)
- remove extrenous braces in branches (Vladmir)
- change inline function names (Pedro)
- Run tdc tests (Victor)
v2->v3 changes:
- dont break else/if (Simon)

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot+a3618a167af2021433cd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/20230816225759.g25x76kmgzya2gei@skbuf/T/
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 09:44:48 +01:00
Jordan Rife 0bdf399342 net: Avoid address overwrite in kernel_connect
BPF programs that run on connect can rewrite the connect address. For
the connect system call this isn't a problem, because a copy of the address
is made when it is moved into kernel space. However, kernel_connect
simply passes through the address it is given, so the caller may observe
its address value unexpectedly change.

A practical example where this is problematic is where NFS is combined
with a system such as Cilium which implements BPF-based load balancing.
A common pattern in software-defined storage systems is to have an NFS
mount that connects to a persistent virtual IP which in turn maps to an
ephemeral server IP. This is usually done to achieve high availability:
if your server goes down you can quickly spin up a replacement and remap
the virtual IP to that endpoint. With BPF-based load balancing, mounts
will forget the virtual IP address when the address rewrite occurs
because a pointer to the only copy of that address is passed down the
stack. Server failover then breaks, because clients have forgotten the
virtual IP address. Reconnects fail and mounts remain broken. This patch
was tested by setting up a scenario like this and ensuring that NFS
reconnects worked after applying the patch.

Signed-off-by: Jordan Rife <jrife@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 09:42:05 +01:00
Jakub Kicinski 6176b8c4a1 nf-next pull request 2023-08-22
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmTku5sNHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2ANYQEAC+Ub5YNzQ7tcABozPWRpno4i3rHxBHCdp1faDu
 9ISdxwq62k4ynhrGb4UyVrw8PJDxlFKOtnmx1GnB7/FTwFpbIfqc4D/I0sO6RIn8
 z4G7ph9afen1Qme9Y03/5XE/C+HYDBy8bK4efHUUWyiAQJcTQnrdwC6EokXHnsRK
 zARvfyTD91IsIFZLkArqVe2VvvThhJL44Xci+vfPkTXQHI30nnYRGFn/gWnEbU2j
 jTi4rHm58oAJbYuEt2YCn6O9TwtcnbvxT0VcIb7viiWeJ+dHhGhsx89Sy1Qd37ko
 m3qZ7ZxR2+oEHWWpgnXrI6kMrN3ZH5DjR/pMlFnoiwnfgjfsnludMwzneRDszi9Q
 97/e5EP7WqSr0VRAge7HmgCDapbFSdIRLa4ZpCyX7CdIY1nIHajk7PJNnjq+xJ2X
 YHyjDY14HHi436nMTwKXzPECiqVgaOpqx9PgIlGmssTzfOYGO8+Q/bGy2cuLOz65
 a++iHM9hcAQV6VJfOB45CtFQmIKC4rWf1eC7Ba/oFDRvbfiaLZ1t5vawoBugyDeY
 5RbGWJobjlo3V0BnzFS56wBYNgdOqO7pfXzrvzpKZJyLdRFMIdwrioCCeerbNF+M
 vEh7RuEiKW6ydN7jlBD9ZxxgeGxwPYML8H8Ru8BW4NcjOzxrzq0MkRz03+HYBqhj
 akfj8g==
 =OSUE
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-23-08-22' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter updates for net-next

First patch resolves a fortify warning by wrapping the to-be-copied
members via struct_group.

Second patch replaces array[0] with array[] in ebtables uapi.
Both changes from GONG Ruiqi.

The largest chunk is replacement of strncpy with strscpy_pad()
in netfilter, from Justin Stitt.

Last patch, from myself, aborts ruleset validation if a fatal
signal is pending, this speeds up process exit.

* tag 'nf-next-23-08-22' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: allow loop termination for pending fatal signal
  netfilter: xtables: refactor deprecated strncpy
  netfilter: x_tables: refactor deprecated strncpy
  netfilter: nft_meta: refactor deprecated strncpy
  netfilter: nft_osf: refactor deprecated strncpy
  netfilter: nf_tables: refactor deprecated strncpy
  netfilter: nf_tables: refactor deprecated strncpy
  netfilter: ipset: refactor deprecated strncpy
  netfilter: ebtables: replace zero-length array members
  netfilter: ebtables: fix fortify warnings in size_entry_mwt()
====================

Link: https://lore.kernel.org/r/20230822154336.12888-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 18:47:35 -07:00
Geliang Tang ed1ad86b85 mptcp: register default scheduler
This patch defines the default packet scheduler mptcp_sched_default.
Register it in mptcp_sched_init(), which is invoked in mptcp_proto_init().
Skip deleting this default scheduler in mptcp_unregister_scheduler().

Set msk->sched to the default scheduler when the input parameter of
mptcp_init_sched() is NULL.

Invoke mptcp_sched_default_get_subflow in get_send() and get_retrans()
if the defaut scheduler is set or msk->sched is NULL.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-10-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:31:19 -07:00
Geliang Tang ee2708aeda mptcp: use get_retrans wrapper
This patch adds the multiple subflows support for __mptcp_retrans(). Use
get_retrans() wrapper instead of mptcp_subflow_get_retrans() in it.

Check the subflow scheduled flags to test which subflow or subflows are
picked by the scheduler, use them to send data.

Move msk_owned_by_me() and fallback checks into get_retrans() wrapper
from mptcp_subflow_get_retrans().

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-9-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:31:19 -07:00
Geliang Tang 0fa1b3783a mptcp: use get_send wrapper
This patch adds the multiple subflows support for __mptcp_push_pending
and __mptcp_subflow_push_pending. Use get_send() wrapper instead of
mptcp_subflow_get_send() in them.

Check the subflow scheduled flags to test which subflow or subflows are
picked by the scheduler, use them to send data.

Move msk_owned_by_me() and fallback checks into get_send() wrapper from
mptcp_subflow_get_send().

This commit allows the scheduler to set the subflow->scheduled bit in
multiple subflows, but it does not allow for sending redundant data.
Multiple scheduled subflows will send sequential data on each subflow.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-8-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:31:19 -07:00
Geliang Tang 07336a87fe mptcp: add scheduler wrappers
This patch defines two packet scheduler wrappers mptcp_sched_get_send()
and mptcp_sched_get_retrans(), invoke get_subflow() of msk->sched in
them.

Set data->reinject to true in mptcp_sched_get_retrans(), set it false in
mptcp_sched_get_send().

If msk->sched is NULL, use default functions mptcp_subflow_get_send()
and mptcp_subflow_get_retrans() to send data.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-7-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:31:19 -07:00
Geliang Tang fce68b0308 mptcp: add scheduled in mptcp_subflow_context
This patch adds a new member scheduled in struct mptcp_subflow_context,
which will be set in the MPTCP scheduler context when the scheduler
picks this subflow to send data.

Add a new helper mptcp_subflow_set_scheduled() to set this flag using
WRITE_ONCE().

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-6-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:31:19 -07:00
Geliang Tang 1730b2b2c5 mptcp: add sched in mptcp_sock
This patch adds a new struct member sched in struct mptcp_sock.
And two helpers mptcp_init_sched() and mptcp_release_sched() to
init and release it.

Init it with the sysctl scheduler in mptcp_init_sock(), copy the
scheduler from the parent in mptcp_sk_clone(), and release it in
__mptcp_destroy_sock().

Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-5-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:31:19 -07:00
Geliang Tang e3b2870b6d mptcp: add a new sysctl scheduler
This patch adds a new sysctl, named scheduler, to support for selection
of different schedulers. Export mptcp_get_scheduler helper to get this
sysctl.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-4-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:31:18 -07:00
Geliang Tang 740ebe35bd mptcp: add struct mptcp_sched_ops
This patch defines struct mptcp_sched_ops, which has three struct members,
name, owner and list, and four function pointers: init(), release() and
get_subflow().

The scheduler function get_subflow() have a struct mptcp_sched_data
parameter, which contains a reinject flag for retrans or not, a subflows
number and a mptcp_subflow_context array.

Add the scheduler registering, unregistering and finding functions to add,
delete and find a packet scheduler on the global list mptcp_sched_list.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-3-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:31:18 -07:00
Geliang Tang ebc1e08f01 mptcp: drop last_snd and MPTCP_RESET_SCHEDULER
Since the burst check conditions have moved out of the function
mptcp_subflow_get_send(), it makes all msk->last_snd useless.
This patch drops them as well as the macro MPTCP_RESET_SCHEDULER.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-2-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:31:18 -07:00
Geliang Tang c5b4297dee mptcp: refactor push_pending logic
To support redundant package schedulers more easily, this patch refactors
__mptcp_push_pending() logic from:

For each dfrag:
	While sends succeed:
		Call the scheduler (selects subflow and msk->snd_burst)
		Update subflow locks (push/release/acquire as needed)
		Send the dfrag data with mptcp_sendmsg_frag()
		Update already_sent, snd_nxt, snd_burst
	Update msk->first_pending
Push/release on final subflow

->

While first_pending isn't empty:
	Call the scheduler (selects subflow and msk->snd_burst)
	Update subflow locks (push/release/acquire as needed)
	For each pending dfrag:
		While sends succeed:
			Send the dfrag data with mptcp_sendmsg_frag()
			Update already_sent, snd_nxt, snd_burst
		Update msk->first_pending
		Break if required by msk->snd_burst / etc
	Push/release on final subflow

Refactors __mptcp_subflow_push_pending logic from:

For each dfrag:
	While sends succeed:
		Call the scheduler (selects subflow and msk->snd_burst)
		Send the dfrag data with mptcp_subflow_delegate(), break
		Send the dfrag data with mptcp_sendmsg_frag()
		Update dfrag->already_sent, msk->snd_nxt, msk->snd_burst
	Update msk->first_pending

->

While first_pending isn't empty:
	Call the scheduler (selects subflow and msk->snd_burst)
	Send the dfrag data with mptcp_subflow_delegate(), break
	Send the dfrag data with mptcp_sendmsg_frag()
	For each pending dfrag:
		While sends succeed:
			Send the dfrag data with mptcp_sendmsg_frag()
			Update already_sent, snd_nxt, snd_burst
		Update msk->first_pending
		Break if required by msk->snd_burst / etc

Move the duplicate code from __mptcp_push_pending() and
__mptcp_subflow_push_pending() into a new helper function, named
__subflow_push_pending(). Simplify __mptcp_push_pending() and
__mptcp_subflow_push_pending() by invoking this helper.

Also move the burst check conditions out of the function
mptcp_subflow_get_send(), check them in __subflow_push_pending() in
the inner "for each pending dfrag" loop.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-1-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:31:18 -07:00
Sven Eckelmann 987aae75fc batman-adv: Hold rtnl lock during MTU update via netlink
The automatic recalculation of the maximum allowed MTU is usually triggered
by code sections which are already rtnl lock protected by callers outside
of batman-adv. But when the fragmentation setting is changed via
batman-adv's own batadv genl family, then the rtnl lock is not yet taken.

But dev_set_mtu requires that the caller holds the rtnl lock because it
uses netdevice notifiers. And this code will then fail the check for this
lock:

  RTNL: assertion failed at net/core/dev.c (1953)

Cc: stable@vger.kernel.org
Reported-by: syzbot+f8812454d9b3ac00d282@syzkaller.appspotmail.com
Fixes: c6a953cce8 ("batman-adv: Trigger events for auto adjusted MTU")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821-batadv-missing-mtu-rtnl-lock-v1-1-1c5a7bfe861e@narfation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:25:10 -07:00
Oliver Hartkopp c275a176e4 can: raw: add missing refcount for memory leak fix
Commit ee8b94c851 ("can: raw: fix receiver memory leak") introduced
a new reference to the CAN netdevice that has assigned CAN filters.
But this new ro->dev reference did not maintain its own refcount which
lead to another KASAN use-after-free splat found by Eric Dumazet.

This patch ensures a proper refcount for the CAN nedevice.

Fixes: ee8b94c851 ("can: raw: fix receiver memory leak")
Reported-by: Eric Dumazet <edumazet@google.com>
Cc: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20230821144547.6658-3-socketcan@hartkopp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:18:50 -07:00
Oliver Hartkopp 0bfe711592 can: isotp: fix support for transmission of SF without flow control
The original implementation had a very simple handling for single frame
transmissions as it just sent the single frame without a timeout handling.

With the new echo frame handling the echo frame was also introduced for
single frames but the former exception ('simple without timers') has been
maintained by accident. This leads to a 1 second timeout when closing the
socket and to an -ECOMM error when CAN_ISOTP_WAIT_TX_DONE is selected.

As the echo handling is always active (also for single frames) remove the
wrong extra condition for single frames.

Fixes: 9f39d36530 ("can: isotp: add support for transmission without flow control")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20230821144547.6658-2-socketcan@hartkopp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:18:50 -07:00
Yue Haibing 9265f78b69 wifi: nl80211: Remove unused declaration nl80211_pmsr_dump_results()
nl80211_pmsr_dump_results() is never implemented since it was added in
commit 9bb7e0f24e ("cfg80211: add peer measurement with FTM initiator
API").

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230729121651.36836-1-yuehaibing@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-22 21:40:40 +02:00
Yue Haibing a3d9c4f7c4 wifi: mac80211: mesh: Remove unused function declaration mesh_ids_set_default()
Commit ccf80ddfe4 ("mac80211: mesh function and data structures
definitions") introducted this but never implemented it.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230731140712.1204-1-yuehaibing@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-22 21:40:40 +02:00
Yue Haibing f14cef0045 wifi: mac80211: Remove unused function declarations
Commit 685429623f ("mac80211: Fix circular locking dependency in
ARP filter handling") left the ieee80211_set_arp_filter() declaration
unused.
And commit 164eb02d07 ("mac80211: add radar detection command/event")
introducted ieee80211_dfs_cac_timer() declaration but never implemented
it.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230801134337.24452-1-yuehaibing@huawei.com
[reword commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-22 21:40:40 +02:00
Lin Ma 218d690c49 wifi: nl80211/cfg80211: add forgotten nla_policy for BSS color attribute
The previous commit dd3e4fc75b ("nl80211/cfg80211: add BSS color to
NDP ranging parameters") adds a parameter for NDP ranging by introducing
a new attribute type named NL80211_PMSR_FTM_REQ_ATTR_BSS_COLOR.

However, the author forgot to also describe the nla_policy at
nl80211_pmsr_ftm_req_attr_policy (net/wireless/nl80211.c). Just
complement it to avoid malformed attribute that causes out-of-attribute
access.

Fixes: dd3e4fc75b ("nl80211/cfg80211: add BSS color to NDP ranging parameters")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809033151.768910-1-linma@zju.edu.cn
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-22 21:40:40 +02:00
Johannes Berg 927521170c wifi: mac80211: fix puncturing bitmap handling in CSA
Code inspection reveals that we switch the puncturing bitmap
before the real channel switch, since that happens only in
the second round of the worker after the channel context is
switched by ieee80211_link_use_reserved_context().

Fixes: 2cc25e4b2a ("wifi: mac80211: configure puncturing bitmap")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-22 21:40:39 +02:00
Johannes Berg 67dfa589aa wifi: mac80211: check for station first in client probe
When probing a client, first check if we have it, and then
check for the channel context, otherwise you can trigger
the warning there easily by probing when the AP isn't even
started yet. Since a client existing means the AP is also
operating, we can then keep the warning.

Also simplify the moved code a bit.

Reported-by: syzbot+999fac712d84878a7379@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-22 21:40:39 +02:00
Johannes Berg abc76cf552 wifi: cfg80211: ocb: don't leave if not joined
If there's no OCB state, don't ask the driver/mac80211 to
leave, since that's just confusing. Since set/clear the
chandef state, that's a simple check.

Reported-by: syzbot+09d1cd2f71e6dd3bfd2c@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-22 21:40:39 +02:00
Johannes Berg 5d4e04bf3a wifi: cfg80211: reject auth/assoc to AP with our address
If the AP uses our own address as its MLD address or BSSID, then
clearly something's wrong. Reject such connections so we don't
try and fail later.

Reported-by: syzbot+2676771ed06a6df166ad@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-22 21:40:10 +02:00
Jakub Kicinski 1a8660546b Two fixes:
* reorder buffer filter checks can cause bad shift/UBSAN
    warning with newer HW, avoid the check (mac80211)
  * add Kconfig dependency for iwlwifi for PTP clock usage
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmTkrKMACgkQ10qiO8sP
 aAD6/A//bR8/98qCxJqeZurTw83JwWDo7J5Twthk4ik4cpi35s+7bqVYc34a1vIT
 poIiIxffZRUSsHOoMXpTP6xd7gLP9Hz6Ba8Jd7X9NG+/lfdpJlWGLDFri3JpNREi
 Jjqq8XOzB7+c+TwrK85j7nxY1JXLqLzXOxetAtuIZEqVrmC+T6+nnAw1ITLfCVL/
 fVaAFx3mDxTLi/qbAYnYPBKY5Kq1KH0F3j389vfIVPzTXBmLcYppZ17Lz38io0pZ
 MC0NO3W0U4MFX3i5P2Q/yjlUsuwJJROqJmnAOcy9+McQU9nFHrIAV3Na9G1mZbyS
 xmdcbjokhcE95ht8JI9yHESAbtACaSM4jq4W03zlgvvj3TrVt+1bDjjXLexqb6kc
 YinRoVc3Dn+Nvw8DrDa/1PMuO1YAlVg3ZYXRyjlL4dcLbz0SOiEHFyTIjIKflZYZ
 TbstNDbygIxBBtmVx/aJBZZoFo3G6e5FGQrVZ0uvDPqVaaFvs3ESv+ooXek9gl06
 OlzngnrTJO52Ky4quCTtmR16+J3GeAUjZNUsQKqNHu28zzBsF+CP8j+OCUTPQTe0
 YZuqBqnSDPUcllBxXEDp2pPm102q0DvHeKXg7cugdY2zyzvjgrxBSBwh5xwUA4sT
 RMI55Gok6gcdzQ1GHHRzLK6UkKzgmAbTjMypJkVV6Qh7dG6RNJo=
 =edA4
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Two fixes:
 - reorder buffer filter checks can cause bad shift/UBSAN
   warning with newer HW, avoid the check (mac80211)
 - add Kconfig dependency for iwlwifi for PTP clock usage

* tag 'wireless-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
  wifi: iwlwifi: mvm: add dependency for PTP clock
====================

Link: https://lore.kernel.org/r/20230822124206.43926-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 11:04:01 -07:00
Linus Torvalds 53663f4103 NFS client fixes for Linux 6.5
Highlights include:
 
 Stable fixes
  - NFS: Fix a use after free in nfs_direct_join_group()
 
 Bugfixes
  - NFS: Fix a sysfs server name memory leak
  - NFS: Fix a lock recovery hang in NFSv4.0
  - NFS: Fix page free in the error path for nfs42_proc_getxattr
  - NFS: Fix page free in the error path for __nfs4_get_acl_uncached
  - SUNRPC/rdma: Fix receive buffer dma-mapping after a server disconnect
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmTjgEEACgkQZwvnipYK
 APItFA//WzGcKbujlMXpiRdvUg6k6CfG/ikBRB1UwQEyZjK/tVZ96qt6UuHGNMbz
 b8GaGls7NRYJKezAcMSW9QMMPYVyG0PLwxOW6BPwsZS61Zn6HMeM1YRboaZEid7f
 JrUNhbUXHl6bVWrBNEtcr3IN/5ERU4sGCAa4A3uWdNxGyffD/avrK06/bfmE/SJi
 +7LVPp0M9rM5X5Z1c407TbWfg+L81Q9t0tTz7II3Ba9i2BzQ0uhQhyVUQAGF767u
 Vua4XWTRoqG1es+tA4iuwZ3KtaqXoaMRDWPLGTkmBrY+pAo+u4IPzY5LCwfUu6kI
 vttkZU5b0b05+UomJ1d+Muzr8uEjRmBhIHZsP6lgVVmuNzqkDb0gCGkfix87J+RO
 0QmDZ9D0ftJxsb8fSdp8iy8NqmqJ6X4FhsylRtANEuCrf8+zrkUlBJi47CCwpYDD
 8gq6SoTfA8MmiSgzrBuYkJe2HSx7c2csDl3xp5KrJX2IHODjbzlHC05fNadTWc6W
 0jQvq1cJ2xBYDNSxkG0Trsd3lTTao3rZC4M7imVVjTTOHS8X1LNCLkbZ7LVnA8rn
 0F+lp/h1qs/daXSp0aMG5wyvZNkx5rsJ23o+InNCjiCh3cDvoi9mg6DN5bQK8Foy
 Iqd2MTgxrMaF/FUbdGLdnFX4GQkgFPng8TpdX8sqqm1JHUprpqg=
 =nd41
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fixes from Trond Myklebust:

 - fix a use after free in nfs_direct_join_group() (Cc: stable)

 - fix sysfs server name memory leak

 - fix lock recovery hang in NFSv4.0

 - fix page free in the error path for nfs42_proc_getxattr() and
   __nfs4_get_acl_uncached()

 - SUNRPC/rdma: fix receive buffer dma-mapping after a server disconnect

* tag 'nfs-for-6.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  xprtrdma: Remap Receive buffers after a reconnect
  NFSv4: fix out path in __nfs4_get_acl_uncached
  NFSv4.2: fix error handling in nfs42_proc_getxattr
  NFS: Fix sysfs server name memory leak
  NFS: Fix a use after free in nfs_direct_join_group()
  NFSv4: Fix dropped lock for racing OPEN and delegation return
2023-08-22 10:50:17 -07:00
Florian Westphal 169384fbe8 netfilter: nf_tables: allow loop termination for pending fatal signal
abort early so task can exit faster if a fatal signal is pending,
no need to continue validation in that case.

Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22 15:14:32 +02:00
Justin Stitt aa222dd190 netfilter: xtables: refactor deprecated strncpy
Prefer `strscpy_pad` as it's a more robust interface whilst maintaing
zero-padding behavior.

There may have existed a bug here due to both `tbl->repl.name` and
`info->name` having a size of 32 as defined below:
|  #define XT_TABLE_MAXNAMELEN 32

This may lead to buffer overreads in some situations -- `strscpy` solves
this by guaranteeing NUL-termination of the dest buffer.

Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22 15:13:21 +02:00
Justin Stitt 06f7d3c3f8 netfilter: x_tables: refactor deprecated strncpy
Prefer `strscpy_pad` to `strncpy`.

Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22 15:13:21 +02:00
Justin Stitt ad156c23d6 netfilter: nft_meta: refactor deprecated strncpy
Prefer `strscpy_pad` to `strncpy`.

Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22 15:13:21 +02:00
Justin Stitt 6d87a4eae8 netfilter: nft_osf: refactor deprecated strncpy
Use `strscpy_pad` over `strncpy` for NUL-terminated strings.

We can also drop the + 1 from `NFT_OSF_MAXGENRELEN + 1` since `strscpy`
will guarantee NUL-termination.

Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22 15:13:21 +02:00
Justin Stitt 7457af8bf9 netfilter: nf_tables: refactor deprecated strncpy
Prefer `strscpy_pad` over `strncpy`.

Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22 15:13:21 +02:00
Justin Stitt 6cdd75a4a6 netfilter: nf_tables: refactor deprecated strncpy
Prefer `strscpy_pad` over `strncpy`.

Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22 15:13:21 +02:00
Justin Stitt e53314034b netfilter: ipset: refactor deprecated strncpy
Use `strscpy_pad` instead of `strncpy`.

Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22 15:13:20 +02:00
GONG, Ruiqi a7ed3465da netfilter: ebtables: fix fortify warnings in size_entry_mwt()
When compiling with gcc 13 and CONFIG_FORTIFY_SOURCE=y, the following
warning appears:

In function ‘fortify_memcpy_chk’,
    inlined from ‘size_entry_mwt’ at net/bridge/netfilter/ebtables.c:2118:2:
./include/linux/fortify-string.h:592:25: error: call to ‘__read_overflow2_field’
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Werror=attribute-warning]
  592 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The compiler is complaining:

memcpy(&offsets[1], &entry->watchers_offset,
                       sizeof(offsets) - sizeof(offsets[0]));

where memcpy reads beyong &entry->watchers_offset to copy
{watchers,target,next}_offset altogether into offsets[]. Silence the
warning by wrapping these three up via struct_group().

Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22 15:13:20 +02:00
Johannes Berg 19e4a47ee7 wifi: mac80211: check S1G action frame size
Before checking the action code, check that it even
exists in the frame.

Reported-by: syzbot+be9c824e6f269d608288@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-22 14:01:18 +02:00
Zhengchao Shao 43c2817225 net: remove unnecessary input parameter 'how' in ifdown function
When the ifdown function in the dst_ops structure is referenced, the input
parameter 'how' is always true. In the current implementation of the
ifdown interface, ip6_dst_ifdown does not use the input parameter 'how',
xfrm6_dst_ifdown and xfrm4_dst_ifdown functions use the input parameter
'unregister'. But false judgment on 'unregister' in xfrm6_dst_ifdown and
xfrm4_dst_ifdown is false, so remove the input parameter 'how' in ifdown
function.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821084104.3812233-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-22 13:19:02 +02:00
Kees Cook 7b6d708703 wifi: cfg80211: Annotate struct cfg80211_rnr_elems with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct cfg80211_rnr_elems.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230817211531.4193219-5-keescook@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-22 13:18:16 +02:00
Kees Cook 342bc7c9e8 wifi: cfg80211: Annotate struct cfg80211_pmsr_request with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct cfg80211_pmsr_request.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230817211531.4193219-4-keescook@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-22 13:18:16 +02:00
Kees Cook c14679d700 wifi: cfg80211: Annotate struct cfg80211_mbssid_elems with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct cfg80211_mbssid_elems.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230817211531.4193219-3-keescook@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-22 13:18:16 +02:00
Kees Cook d4d3aaf25a wifi: cfg80211: Annotate struct cfg80211_cqm_config with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct cfg80211_cqm_config.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230817211531.4193219-2-keescook@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-22 13:18:16 +02:00
Kees Cook 93ca82447c wifi: cfg80211: Annotate struct cfg80211_acl_data with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct cfg80211_acl_data.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230817211531.4193219-1-keescook@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-22 13:18:15 +02:00
Jeff Layton dee0c5f834 libceph: add new iov_iter-based ceph_msg_data_type and ceph_osd_data_type
Add an iov_iter to the unions in ceph_msg_data and ceph_msg_data_cursor.
Instead of requiring a list of pages or bvecs, we can just use an
iov_iter directly, and avoid extra allocations.

We assume that the pages represented by the iter are pinned such that
they shouldn't incur page faults, which is the case for the iov_iters
created by netfs.

While working on this, Al Viro informed me that he was going to change
iov_iter_get_pages to auto-advance the iterator as that pattern is more
or less required for ITER_PIPE anyway. We emulate that here for now by
advancing in the _next op and tracking that amount in the "lastlen"
field.

In the event that _next is called twice without an intervening
_advance, we revert the iov_iter by the remaining lastlen before
calling iov_iter_get_pages.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-08-22 09:01:48 +02:00
Jeff Layton f628d79997 libceph: add sparse read support to OSD client
Have get_reply check for the presence of sparse read ops in the
request and set the sparse_read boolean in the msg. That will queue the
messenger layer to use the sparse read codepath instead of the normal
data receive.

Add a new sparse_read operation for the OSD client, driven by its own
state machine. The messenger will repeatedly call the sparse_read
operation, and it will pass back the necessary info to set up to read
the next extent of data, while zero-filling the sparse regions.

The state machine will stop at the end of the last extent, and will
attach the extent map buffer to the ceph_osd_req_op so that the caller
can use it.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-08-22 09:01:47 +02:00
Jeff Layton d396f89db3 libceph: add sparse read support to msgr1
Add 2 new fields to ceph_connection_v1_info to track the necessary info
in sparse reads. Skip initializing the cursor for a sparse read.

Break out read_partial_message_section into a wrapper around a new
read_partial_message_chunk function that doesn't zero out the crc first.

Add new helper functions to drive receiving into the destinations
provided by the sparse_read state machine.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-08-22 09:01:47 +02:00
Jeff Layton f36217e35c libceph: support sparse reads on msgr2 secure codepath
Add a new init_sgs_pages helper that populates the scatterlist from
an arbitrary point in an array of pages.

Change setup_message_sgs to take an optional pointer to an array of
pages. If that's set, then the scatterlist will be set using that
array instead of the cursor.

When given a sparse read on a secure connection, decrypt the data
in-place rather than into the final destination, by passing it the
in_enc_pages array.

After decrypting, run the sparse_read state machine in a loop, copying
data from the decrypted pages until it's complete.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-08-22 09:01:47 +02:00
Jeff Layton ec3bc567ea libceph: new sparse_read op, support sparse reads on msgr2 crc codepath
Add support for a new sparse_read ceph_connection operation. The idea is
that the client driver can define this operation use it to do special
handling for incoming reads.

The alloc_msg routine will look at the request and determine whether the
reply is expected to be sparse. If it is, then we'll dispatch to a
different set of state machine states that will repeatedly call the
driver's sparse_read op to get length and placement info for reading the
extent map, and the extents themselves.

This necessitates adding some new field to some other structs:

- The msg gets a new bool to track whether it's a sparse_read request.

- A new field is added to the cursor to track the amount remaining in the
current extent. This is used to cap the read from the socket into the
msg_data

- Handing a revoke with all of this is particularly difficult, so I've
added a new data_len_remain field to the v2 connection info, and then
use that to skip that much on a revoke. We may want to expand the use of
that to the normal read path as well, just for consistency's sake.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-08-22 09:01:47 +02:00
Jeff Layton a679e50f72 libceph: define struct ceph_sparse_extent and add some helpers
When the OSD sends back a sparse read reply, it contains an array of
these structures. Define the structure and add a couple of helpers for
dealing with them.

Also add a place in struct ceph_osd_req_op to store the extent buffer,
and code to free it if it's populated when the req is torn down.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-08-22 09:01:47 +02:00
Jeff Layton 08b8a0440e libceph: add spinlock around osd->o_requests
In a later patch, we're going to need to search for a request in
the rbtree, but taking the o_mutex is inconvenient as we already
hold the con mutex at the point where we need it.

Add a new spinlock that we take when inserting and erasing entries from
the o_requests tree. Search of the rbtree can be done with either the
mutex or the spinlock, but insertion and removal requires both.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-08-22 09:01:47 +02:00
Ping-Ke Shih b98c16107c wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
The commit 06470f7468 ("mac80211: add API to allow filtering frames in BA sessions")
added reorder_buf_filtered to mark frames filtered by firmware, and it
can only work correctly if hw.max_rx_aggregation_subframes <= 64 since
it stores the bitmap in a u64 variable.

However, new HE or EHT devices can support BlockAck number up to 256 or
1024, and then using a higher subframe index leads UBSAN warning:

 UBSAN: shift-out-of-bounds in net/mac80211/rx.c:1129:39
 shift exponent 215 is too large for 64-bit type 'long long unsigned int'
 Call Trace:
  <IRQ>
  dump_stack_lvl+0x48/0x70
  dump_stack+0x10/0x20
  __ubsan_handle_shift_out_of_bounds+0x1ac/0x360
  ieee80211_release_reorder_frame.constprop.0.cold+0x64/0x69 [mac80211]
  ieee80211_sta_reorder_release+0x9c/0x400 [mac80211]
  ieee80211_prepare_and_rx_handle+0x1234/0x1420 [mac80211]
  ieee80211_rx_list+0xaef/0xf60 [mac80211]
  ieee80211_rx_napi+0x53/0xd0 [mac80211]

Since only old hardware that supports <=64 BlockAck uses
ieee80211_mark_rx_ba_filtered_frames(), limit the use as it is, so add a
WARN_ONCE() and comment to note to avoid using this function if hardware
capability is not suitable.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/20230818014004.16177-1-pkshih@realtek.com
[edit commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-21 09:49:54 +02:00
Eric Dumazet bc1fb82ae1 net: annotate data-races around sk->sk_lingertime
sk_getsockopt() runs locklessly. This means sk->sk_lingertime
can be read while other threads are changing its value.

Other reads also happen without socket lock being held,
and must be annotated.

Remove preprocessor logic using BITS_PER_LONG, compilers
are smart enough to figure this by themselves.

v2: fixed a clang W=1 (-Wtautological-constant-out-of-range-compare) warning
    (Jakub)

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-21 07:41:57 +01:00
Hangbin Liu b4672c7337 IPv4: add extack info for IPv4 address add/delete
Add extack info for IPv4 address add/delete, which would be useful for
users to understand the problem without having to read kernel code.

No extack message for the ifa_local checking in __inet_insert_ifa() as
it has been checked in find_matching_ifa().

Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-21 07:35:59 +01:00
Hangbin Liu b358f57f7d ipv6: do not match device when remove source route
After deleting an IPv6 address on an interface and cleaning up the
related preferred source entries, it is important to ensure that all
routes associated with the deleted address are properly cleared. The
current implementation of rt6_remove_prefsrc() only checks the preferred
source addresses bound to the current device. However, there may be
routes that are bound to other devices but still utilize the same
preferred source address.

To address this issue, it is necessary to also delete entries that are
bound to other interfaces but share the same source address with the
current device. Failure to delete these entries would leave routes that
are bound to the deleted address unclear. Here is an example reproducer
(I have omitted unrelated routes):

+ ip link add dummy1 type dummy
+ ip link add dummy2 type dummy
+ ip link set dummy1 up
+ ip link set dummy2 up
+ ip addr add 1:2:3:4::5/64 dev dummy1
+ ip route add 7:7:7:0::1 dev dummy1 src 1:2:3:4::5
+ ip route add 7:7:7:0::2 dev dummy2 src 1:2:3:4::5
+ ip -6 route show
1:2:3:4::/64 dev dummy1 proto kernel metric 256 pref medium
7:7:7::1 dev dummy1 src 1:2:3:4::5 metric 1024 pref medium
7:7:7::2 dev dummy2 src 1:2:3:4::5 metric 1024 pref medium
+ ip addr del 1:2:3:4::5/64 dev dummy1
+ ip -6 route show
7:7:7::1 dev dummy1 metric 1024 pref medium
7:7:7::2 dev dummy2 src 1:2:3:4::5 metric 1024 pref medium

As Ido reminds, in IPv6, the preferred source address is looked up in
the same VRF as the first nexthop device, which is different with IPv4.
So, while removing the device checking, we also need to add an
ipv6_chk_addr() check to make sure the address does not exist on the other
devices of the rt nexthop device's VRF.

After fix:
+ ip addr del 1:2:3:4::5/64 dev dummy1
+ ip -6 route show
7:7:7::1 dev dummy1 metric 1024 pref medium
7:7:7::2 dev dummy2 metric 1024 pref medium

Reported-by: Thomas Haller <thaller@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2170513
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:27:21 +01:00
Patrick Rohr 5cb249686e net: release reference to inet6_dev pointer
addrconf_prefix_rcv returned early without releasing the inet6_dev
pointer when the PIO lifetime is less than accept_ra_min_lft.

Fixes: 5027d54a9c ("net: change accept_ra_min_rtr_lft to affect all RA lifetimes")
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:23:57 +01:00
Eric Dumazet 0f158b32a9 net: selectively purge error queue in IP_RECVERR / IPV6_RECVERR
Setting IP_RECVERR and IPV6_RECVERR options to zero currently
purges the socket error queue, which was probably not expected
for zerocopy and tx_timestamp users.

I discovered this issue while preparing commit 6b5f43ea08
("inet: move inet->recverr to inet->inet_flags"), I presume this
change does not need to be backported to stable kernels.

Add skb_errqueue_purge() helper to purge error messages only.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:17:47 +01:00
Greg Kroah-Hartman 642073c306 Merge commit b320441c04 ("Merge tag 'tty-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty") into tty-next
We need the serial-core fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 14:29:37 +02:00
Eric Dumazet f866fbc842 ipv4: fix data-races around inet->inet_id
UDP sendmsg() is lockless, so ip_select_ident_segs()
can very well be run from multiple cpus [1]

Convert inet->inet_id to an atomic_t, but implement
a dedicated path for TCP, avoiding cost of a locked
instruction (atomic_add_return())

Note that this patch will cause a trivial merge conflict
because we added inet->flags in net-next tree.

v2: added missing change in
drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
(David Ahern)

[1]

BUG: KCSAN: data-race in __ip_make_skb / __ip_make_skb

read-write to 0xffff888145af952a of 2 bytes by task 7803 on cpu 1:
ip_select_ident_segs include/net/ip.h:542 [inline]
ip_select_ident include/net/ip.h:556 [inline]
__ip_make_skb+0x844/0xc70 net/ipv4/ip_output.c:1446
ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmmsg+0x269/0x500 net/socket.c:2634
__do_sys_sendmmsg net/socket.c:2663 [inline]
__se_sys_sendmmsg net/socket.c:2660 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffff888145af952a of 2 bytes by task 7804 on cpu 0:
ip_select_ident_segs include/net/ip.h:541 [inline]
ip_select_ident include/net/ip.h:556 [inline]
__ip_make_skb+0x817/0xc70 net/ipv4/ip_output.c:1446
ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmmsg+0x269/0x500 net/socket.c:2634
__do_sys_sendmmsg net/socket.c:2663 [inline]
__se_sys_sendmmsg net/socket.c:2660 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x184d -> 0x184e

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7804 Comm: syz-executor.1 Not tainted 6.5.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
==================================================================

Fixes: 23f57406b8 ("ipv4: avoid using shared IP generator for connected sockets")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 11:40:49 +01:00
Jakub Kicinski f534f6581e net: validate veth and vxcan peer ifindexes
veth and vxcan need to make sure the ifindexes of the peer
are not negative, core does not validate this.

Using iproute2 with user-space-level checking removed:

Before:

  # ./ip link add index 10 type veth peer index -1
  # ip link show
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
  2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:74:b2:03 brd ff:ff:ff:ff:ff:ff
  10: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 8a:90:ff:57:6d:5d brd ff:ff:ff:ff:ff:ff
  -1: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:ed:18:e6:fa:7f brd ff:ff:ff:ff:ff:ff

Now:

  $ ./ip link add index 10 type veth peer index -1
  Error: ifindex can't be negative.

This problem surfaced in net-next because an explicit WARN()
was added, the root cause is older.

Fixes: e6f8f1a739 ("veth: Allow to create peer link with given ifindex")
Fixes: a8f820a380 ("can: add Virtual CAN Tunnel driver (vxcan)")
Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 11:40:03 +01:00
Eric Dumazet 4025d3e73a net: add skb_queue_purge_reason and __skb_queue_purge_reason
skb_queue_purge() and __skb_queue_purge() become wrappers
around the new generic functions.

New SKB_DROP_REASON_QUEUE_PURGE drop reason is added,
but users can start adding more specific reasons.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 15:30:15 +01:00
Chuck Lever 895cedc179 xprtrdma: Remap Receive buffers after a reconnect
On server-initiated disconnect, rpcrdma_xprt_disconnect() was DMA-
unmapping the Receive buffers, but rpcrdma_post_recvs() neglected
to remap them after a new connection had been established. The
result was immediate failure of the new connection with the Receives
flushing with LOCAL_PROT_ERR.

Fixes: 671c450b6f ("xprtrdma: Fix oops in Receive handler after device removal")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-08-19 10:26:29 -04:00
Guangguan Wang bbed596c74 net/smc: Extend SMCR v2 linkgroup netlink attribute
Add SMC_NLA_LGR_R_V2_MAX_CONNS and SMC_NLA_LGR_R_V2_MAX_LINKS
to SMCR v2 linkgroup netlink attribute SMC_NLA_LGR_R_V2 for
linkgroup's detail info showing.

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 12:46:53 +01:00
Guangguan Wang 69b888e3bb net/smc: support max links per lgr negotiation in clc handshake
Support max links per lgr negotiation in clc handshake for SMCR v2.1,
which is one of smc v2.1 features. Server makes decision for the final
value of max links based on the client preferred max links and
self-preferred max links. Here use the minimum value of the client
preferred max links and server preferred max links.

Client                                       Server
     Proposal(max links(client preferred))
     -------------------------------------->

     Accept(max links(accepted value))
accepted value=min(client preferred, server preferred)
     <-------------------------------------

      Confirm(max links(accepted value))
     ------------------------------------->

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 12:46:53 +01:00
Guangguan Wang 7f0620b994 net/smc: support max connections per lgr negotiation
Support max connections per lgr negotiation for SMCR v2.1,
which is one of smc v2.1 features. Server makes decision for
the final value of max conns based on the client preferred
max conns and self-preferred max conns. Here use the minimum
value of client preferred max conns and server preferred max
conns.

Client                                     Server
     Proposal(max conns(client preferred))
     ------------------------------------>

     Accept(max conns(accepted value))
accepted value=min(client preferred, server preferred)
     <-----------------------------------

     Confirm(max conns(accepted value))
     ----------------------------------->

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 12:46:52 +01:00
Guangguan Wang 6ac1e6563f net/smc: support smc v2.x features validate
Support SMC v2.x features validate for SMC v2.1. This is the frame
code for SMC v2.x features validate, and will take effects only when
the negotiated release version is v2.1 or later.

For Server, v2.x features' validation should be done in smc_clc_srv_
v2x_features_validate when receiving v2.1 or later CLC Proposal Message,
such as max conns, max links negotiation, the decision of the final
value of max conns and max links should be made in this function.
And final check for server when receiving v2.1 or later CLC Confirm
Message should be done in smc_clc_v2x_features_confirm_check.

For client, v2.x features' validation should be done in smc_clc_clnt_
v2x_features_validate when receiving v2.1 or later CLC Accept Message,
for example, the decision to accpt the accepted value or to decline
should be made in this function.

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 12:46:52 +01:00
Guangguan Wang 7290178a82 net/smc: add vendor unique experimental options area in clc handshake
Add vendor unique experimental options area in clc handshake. In clc
accept and confirm msg, vendor unique experimental options use the
16-Bytes reserved field, which defined in struct smc_clc_fce_gid_ext
in previous version. Because of the struct smc_clc_first_contact_ext
is widely used and limit the scope of modification, this patch moves
the 16-Bytes reserved field out of struct smc_clc_fce_gid_ext, and
followed with the struct smc_clc_first_contact_ext in a new struct
names struct smc_clc_first_contact_ext_v2x.

For SMC-R first connection, in previous version, the struct smc_clc_
first_contact_ext and the 16-Bytes reserved field has already been
included in clc accept and confirm msg. Thus, this patch use struct
smc_clc_first_contact_ext_v2x instead of the struct smc_clc_first_
contact_ext and the 16-Bytes reserved field in SMC-R clc accept and
confirm msg is compatible with previous version.

For SMC-D first connection, in previous version, only the struct smc_
clc_first_contact_ext is included in clc accept and confirm msg, and
the 16-Bytes reserved field is not included. Thus, when the negotiated
smc release version is the version before v2.1, we still use struct
smc_clc_first_contact_ext for compatible consideration. If the negotiated
smc release version is v2.1 or later, use struct smc_clc_first_contact_
ext_v2x instead.

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 12:46:52 +01:00
Guangguan Wang 1e700948c9 net/smc: support smc release version negotiation in clc handshake
Support smc release version negotiation in clc handshake based on
SMC v2, where no negotiation process for different releases, but
for different versions. The latest smc release version was updated
to v2.1. And currently there are two release versions of SMCv2, v2.0
and v2.1. In the release version negotiation, client sends the preferred
release version by CLC Proposal Message, server makes decision for which
release version to use based on the client preferred release version and
self-supported release version (here choose the minimum release version
of the client preferred and server latest supported), then the decision
returns to client by CLC Accept Message. Client confirms the decision by
CLC Confirm Message.

Client                                    Server
      Proposal(preferred release version)
     ------------------------------------>

      Accept(accpeted release version)
 min(client preferred, server latest supported)
     <------------------------------------

      Confirm(accpeted release version)
     ------------------------------------>

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 12:46:52 +01:00
Eric Dumazet cba3f17869 dccp: annotate data-races in dccp_poll()
We changed tcp_poll() over time, bug never updated dccp.

Note that we also could remove dccp instead of maintaining it.

Fixes: 7c657876b6 ("[DCCP]: Initial implementation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230818015820.2701595-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:30:24 -07:00
Eric Dumazet 726e9e8b94 tcp: refine skb->ooo_okay setting
Enabling BIG TCP on a low end platform apparently increased
chances of getting flows locked on one busy TX queue.

A similar problem was handled in commit 9b462d02d6
("tcp: TCP Small Queues and strange attractors"),
but the strategy worked for either bulk flows,
or 'large enough' RPC. BIG TCP changed how large
RPC needed to be to enable the work around:
If RPC fits in a single skb, TSQ never triggers.

Root cause for the problem is a busy TX queue,
with delayed TX completions.

This patch changes how we set skb->ooo_okay to detect
the case TX completion was not done, but incoming ACK
already was processed and emptied rtx queue.

Update the comment to explain the tricky details.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230817182353.2523746-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:29:36 -07:00
Eric Dumazet 76f33296d2 sock: annotate data-races around prot->memory_pressure
*prot->memory_pressure is read/writen locklessly, we need
to add proper annotations.

A recent commit added a new race, it is time to audit all accesses.

Fixes: 2d0c88e84e ("sock: Fix misuse of sk_under_memory_pressure()")
Fixes: 4d93df0abd ("[SCTP]: Rewrite of sctp buffer management code")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Abel Wu <wuyun.abel@bytedance.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Link: https://lore.kernel.org/r/20230818015132.2699348-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:23:30 -07:00
Jiri Pirko 2ebbc9752d devlink: add missing unregister linecard notification
Cited fixes commit introduced linecard notifications for register,
however it didn't add them for unregister. Fix that by adding them.

Fixes: c246f9b5fd ("devlink: add support to create line card and expose to user")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230817125240.2144794-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 18:29:12 -07:00
Jakub Kicinski 04f28408a5 This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
 
  - Remove unused declarations, by Yue Haibing
 
  - Clean up MTU handling, by Sven Eckelmann (2 patches)
 
  - Clean up/remove (obsolete) functions, by Sven Eckelmann (3 patches)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmTc+z8WHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoeEDD/44gdQmazggfYU2KXn42Pq8CLWR
 Nsr4ljmTXiCVh1a/Gw3pRvT7vjH12Jv+nlzoEXzv+iAHKnchVvq1zAguIMSQJymQ
 6MbZNHYMckwb6zBXxYGC3qaDI3Zf67ECMEmQkX6ZvwYbXYUiAR+/P7UPlzc145Ur
 jWS0wgm9kn8631ZQoWsETIapaN8X9N9Iqt1kDLm29bMlQoEmE6XdpGgpv4JWFpwp
 8lDqwcI+v4dt/QsNd07xka0+3lpnQB6bYeyYDjWY1ae04OME5B81PqbbehGNYigp
 VP2yepBtfWmWM2nCLqM9tK98ALw+vxtFN9K0M/9p7jd9TyVEWephD5SBrA9aJiKL
 ruOIu1YNe1nQ1CFIcc28MiK/1o8ljDUVBr8Mt4OhZ+Pn9fhOlWLy1ZqahiFmley/
 DcmwZfreyyNZ1BIrVSsLHGgvJ+X3j4llNFdtk7oB8LplBL3wMVpvrLSwDH34aZbS
 dNuWanCEjoqpBEnMLD3J6BgFhvfBjuANXDmpWR6fvXz4WNFCyirvInldG1aJFHKL
 HyUPG6ANlT7q6A+vgWHDVsEcxIuiCZlBuNlj/Z2Sg6DAq4WTPcdg+/BYnPusDe4t
 M0tAHnkn/dCOCkA0Iq4vPZREgj/xvip+yTfBp5R8hMNibL5TOGoYOLVk/q0X5Dah
 ON1TAvgsFbymf129Jg==
 =f1Uk
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-pullrequest-20230816' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - Remove unused declarations, by Yue Haibing

 - Clean up MTU handling, by Sven Eckelmann (2 patches)

 - Clean up/remove (obsolete) functions, by Sven Eckelmann (3 patches)

* tag 'batadv-next-pullrequest-20230816' of git://git.open-mesh.org/linux-merge:
  batman-adv: Drop per algo GW section class code
  batman-adv: Keep batadv_netlink_notify_* static
  batman-adv: Drop unused function batadv_gw_bandwidth_set
  batman-adv: Check hardif MTU against runtime MTU
  batman-adv: Avoid magic value for minimum MTU
  batman-adv: Remove unused declarations
  batman-adv: Start new development cycle
====================

Link: https://lore.kernel.org/r/20230816164000.190884-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 15:19:18 -07:00
Jakub Kicinski 7793a88e88 Here are some batman-adv bugfixes:
- Fix issues with adjusted MTUs (2 patches), by Sven Eckelmann
 
  - Fix header access for memory reallocation case, by Remi Pommarel
 
  - Fix two memory leaks (2 patches), by Remi Pommarel
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmTc+eYWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeobgFD/9oBvQKj9rObbnHIgxe3ZZ4x4po
 FOln4eLv83YZwVP54BC0X8COymO+gd3tBbGg9U1s9kpn+hIOXi7zI8xnmS/jrKGB
 t8tDQ/1S9laCfanfDoHDdQ96ifJfQR6Mp7ZH1e64L22Ag5hKjVoGeQp2Mf5X2S+S
 7ZFdhofr/ZNi0Tz4Y+Jw9bh3W5TMnwSFfexSIfUJmh+06RGRRspOx2WbbgArMs12
 hP4UST0cIfIr0CinBQz+LiyT90GgC6r+xjkQrP3LgzYegC7eBW+bQxLgCtnk+Hic
 +t+aS3SnityZzFyaNJrULX7/u8WJumW4udu0jDl9raAWIJBUV5pNr7sNagQ45mvE
 NZ4/VnWGg6MnjdPC6CIuU6AuCLZYn1NiE6mp1vuFxMpqmiJUhMjwjTp8DaLpQZCV
 vDYca/bBuDMbTIl5LxQ965svbNVDiAS6gNHbrVs2k3bq3Ji7QS1M7MVR3npehGT0
 xInqQNO7QJ0c+/PaFMTZwi0LKk8qwvHLggsZyKXJ6i6YBH7YG1LTbEtqLT7rH3nv
 sHfTyPzw7b0oFh7/rSMzvf1P9yiMG1ZCY622uhU1M+CxA2Axr1Lcq1yc3umDP4Ds
 BM3JKilHsj/mM7g/HobBs7eg+BL9/KjNKFbj12Bc0xV7IXC4SNTlKSJUN0ZlGSsH
 RP7dNXOzS0LUErpARw==
 =EsR1
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-pullrequest-20230816' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here are some batman-adv bugfixes:

 - Fix issues with adjusted MTUs (2 patches), by Sven Eckelmann

 - Fix header access for memory reallocation case, by Remi Pommarel

 - Fix two memory leaks (2 patches), by Remi Pommarel

* tag 'batadv-net-pullrequest-20230816' of git://git.open-mesh.org/linux-merge:
  batman-adv: Fix batadv_v_ogm_aggr_send memory leak
  batman-adv: Fix TT global entry leak when client roamed back
  batman-adv: Do not get eth header before batadv_check_management_packet
  batman-adv: Don't increase MTU when set by user
  batman-adv: Trigger events for auto adjusted MTU
====================

Link: https://lore.kernel.org/r/20230816163318.189996-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 15:16:05 -07:00
Jesper Dangaard Brouer 0a0643164d net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache
Since v6.5-rc1 MM-tree is merged and contains a new flag SLAB_NO_MERGE
in commit d0bf7d5759 ("mm/slab: introduce kmem_cache flag SLAB_NO_MERGE")
now is the time to use this flag for networking as proposed
earlier see link.

The SKB (sk_buff) kmem_cache slab is critical for network performance.
Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
performance by amortising the alloc/free cost.

For the bulk API to perform efficiently the slub fragmentation need to
be low. Especially for the SLUB allocator, the efficiency of bulk free
API depend on objects belonging to the same slab (page).

When running different network performance microbenchmarks, I started
to notice that performance was reduced (slightly) when machines had
longer uptimes. I believe the cause was 'skbuff_head_cache' got
aliased/merged into the general slub for 256 bytes sized objects (with
my kernel config, without CONFIG_HARDENED_USERCOPY).

For SKB kmem_cache network stack have other various reasons for
not merging, but it varies depending on kernel config (e.g.
CONFIG_HARDENED_USERCOPY). We want to explicitly set SLAB_NO_MERGE
for this kmem_cache to get most out of kmem_cache_{alloc,free}_bulk APIs.

When CONFIG_SLUB_TINY is configured the bulk APIs are essentially
disabled. Thus, for this case drop the SLAB_NO_MERGE flag.

Link: https://lore.kernel.org/all/167396280045.539803.7540459812377220500.stgit@firesoul/
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/169211265663.1491038.8580163757548985946.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 14:38:58 -07:00
Jakub Kicinski 7ff57803d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/sfc/tc.c
  fa165e1949 ("sfc: don't unregister flow_indr if it was never registered")
  3bf969e88a ("sfc: add MAE table machinery for conntrack table")
https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 12:44:56 -07:00
Matthew Wilcox (Oracle) 350f6bbca1 mm: allow per-VMA locks on file-backed VMAs
Remove the TCP layering violation by allowing per-VMA locks on all VMAs. 
The fault path will immediately fail in handle_mm_fault().  There may be a
small performance reduction from this patch as a little unnecessary work
will be done on each page fault.  See later patches for the improvement.

Link: https://lkml.kernel.org/r/20230724185410.1124082-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:51 -07:00
Yan Zhai a171fbec88 lwt: Check LWTUNNEL_XMIT_CONTINUE strictly
LWTUNNEL_XMIT_CONTINUE is implicitly assumed in ip(6)_finish_output2,
such that any positive return value from a xmit hook could cause
unexpected continue behavior, despite that related skb may have been
freed. This could be error-prone for future xmit hook ops. One of the
possible errors is to return statuses of dst_output directly.

To make the code safer, redefine LWTUNNEL_XMIT_CONTINUE value to
distinguish from dst_output statuses and check the continue
condition explicitly.

Fixes: 3a0af8fd61 ("bpf: BPF for lightweight tunnel infrastructure")
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/96b939b85eda00e8df4f7c080f770970a4c5f698.1692326837.git.yan@cloudflare.com
2023-08-18 16:05:26 +02:00
Yan Zhai 29b22badb7 lwt: Fix return values of BPF xmit ops
BPF encap ops can return different types of positive values, such like
NET_RX_DROP, NET_XMIT_CN, NETDEV_TX_BUSY, and so on, from function
skb_do_redirect and bpf_lwt_xmit_reroute. At the xmit hook, such return
values would be treated implicitly as LWTUNNEL_XMIT_CONTINUE in
ip(6)_finish_output2. When this happens, skbs that have been freed would
continue to the neighbor subsystem, causing use-after-free bug and
kernel crashes.

To fix the incorrect behavior, skb_do_redirect return values can be
simply discarded, the same as tc-egress behavior. On the other hand,
bpf_lwt_xmit_reroute returns useful errors to local senders, e.g. PMTU
information. Thus convert its return values to avoid the conflict with
LWTUNNEL_XMIT_CONTINUE.

Fixes: 3a0af8fd61 ("bpf: BPF for lightweight tunnel infrastructure")
Reported-by: Jordan Griege <jgriege@cloudflare.com>
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/0d2b878186cfe215fec6b45769c1cd0591d3628d.1692326837.git.yan@cloudflare.com
2023-08-18 16:05:26 +02:00
Linus Torvalds 0e8860d212 Including fixes from ipsec and netfilter.
No known outstanding regressions.
 
 Fixes to fixes:
 
  - virtio-net: set queues after driver_ok, avoid a potential race
    added by recent fix
 
  - Revert "vlan: Fix VLAN 0 memory leak", it may lead to a warning
    when VLAN 0 is registered explicitly
 
  - nf_tables:
    - fix false-positive lockdep splat in recent fixes
    - don't fail inserts if duplicate has expired (fix test failures)
    - fix races between garbage collection and netns dismantle
 
 Current release - new code bugs:
 
  - mlx5: Fix mlx5_cmd_update_root_ft() error flow
 
 Previous releases - regressions:
 
  - phy: fix IRQ-based wake-on-lan over hibernate / power off
 
 Previous releases - always broken:
 
  - sock: fix misuse of sk_under_memory_pressure() preventing system
    from exiting global TCP memory pressure if a single cgroup is under
    pressure
 
  - fix the RTO timer retransmitting skb every 1ms if linear option
    is enabled
 
  - af_key: fix sadb_x_filter validation, amment netlink policy
 
  - ipsec: fix slab-use-after-free in decode_session6()
 
  - macb: in ZynqMP resume always configure PS GTR for non-wakeup source
 
 Misc:
 
  - netfilter: set default timeout to 3 secs for sctp shutdown send and
    recv state (from 300ms), align with protocol timers
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmTemA4ACgkQMUZtbf5S
 IrtCThAAj+t35QM5BgGZowmrx9U4yF+kacDkdPztxlT8a/b+famrTtnZJ8USW+PF
 VCk3Eu8JXheuyAOMArHyM84/crS6wim6mzGcXaucusA3981PFzoqdgCLLf9emAJ2
 j9vzKrnHBtdd5fj8Exwq70KN4CzXyrzRgqwr2EXBK9lH59HjX0+J7o+trbDxNmFK
 RZJE2oDCqf939iRGG3PhJryKYBmrQaMtdonNpSU5PiiRT0HnVYcEtdWcOXK7d53D
 onpoaPdawcsqsns5c5Qj01E1OdyM8X54BEGkl/S4FmSw5jF9Bp6btmTcxYYtdb7E
 M3CeYROZ0Kt8KcKKje/o1AzdGqWq8Hnxfwy+2WulZAHMucshg0JPm6Ev74WRondw
 NGYriKJSdORSO8idK9K/i7pnjZXYr9gU50lpPUFU+QzSdd+zv+U11arjAodwI9Wi
 pW+dFi3UR7J01LidaxclvHmWnZ7d5sSzE2khpqb0xd0+PagRGesl8qnKyoDJNS1P
 IHsOrRh9aXLzEZjud/rVG+sUobQvc1oiHW+hvbJ04GLKoli9U5poGT2fcaa4O67M
 T7JcN5oGDF+PIHJKgTEN7pfX2epY33gmofKUhbt/OPOqnvZOVbTu7/ojjuJZ8Lc5
 SF8AvTe+lECcX8Htjq30PoVfai+FT6AhnZzK0H9K4HMfUB9O32Q=
 =Ze13
 -----END PGP SIGNATURE-----

Merge tag 'net-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from ipsec and netfilter.

  No known outstanding regressions.

  Fixes to fixes:

   - virtio-net: set queues after driver_ok, avoid a potential race
     added by recent fix

   - Revert "vlan: Fix VLAN 0 memory leak", it may lead to a warning
     when VLAN 0 is registered explicitly

   - nf_tables:
      - fix false-positive lockdep splat in recent fixes
      - don't fail inserts if duplicate has expired (fix test failures)
      - fix races between garbage collection and netns dismantle

  Current release - new code bugs:

   - mlx5: Fix mlx5_cmd_update_root_ft() error flow

  Previous releases - regressions:

   - phy: fix IRQ-based wake-on-lan over hibernate / power off

  Previous releases - always broken:

   - sock: fix misuse of sk_under_memory_pressure() preventing system
     from exiting global TCP memory pressure if a single cgroup is under
     pressure

   - fix the RTO timer retransmitting skb every 1ms if linear option is
     enabled

   - af_key: fix sadb_x_filter validation, amment netlink policy

   - ipsec: fix slab-use-after-free in decode_session6()

   - macb: in ZynqMP resume always configure PS GTR for non-wakeup
     source

  Misc:

   - netfilter: set default timeout to 3 secs for sctp shutdown send and
     recv state (from 300ms), align with protocol timers"

* tag 'net-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
  ice: Block switchdev mode when ADQ is active and vice versa
  qede: fix firmware halt over suspend and resume
  net: do not allow gso_size to be set to GSO_BY_FRAGS
  sock: Fix misuse of sk_under_memory_pressure()
  sfc: don't fail probe if MAE/TC setup fails
  sfc: don't unregister flow_indr if it was never registered
  net: dsa: mv88e6xxx: Wait for EEPROM done before HW reset
  net/mlx5: Fix mlx5_cmd_update_root_ft() error flow
  net/mlx5e: XDP, Fix fifo overrun on XDP_REDIRECT
  i40e: fix misleading debug logs
  iavf: fix FDIR rule fields masks validation
  ipv6: fix indentation of a config attribute
  mailmap: add entries for Simon Horman
  broadcom: b44: Use b44_writephy() return value
  net: openvswitch: reject negative ifindex
  team: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
  net: phy: broadcom: stub c45 read/write for 54810
  netfilter: nft_dynset: disallow object maps
  netfilter: nf_tables: GC transaction race with netns dismantle
  netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
  ...
2023-08-18 06:52:23 +02:00
François Michel 3cad70bc74 netem: use seeded PRNG for correlated loss events
Use prandom_u32_state() instead of get_random_u32() to generate
the correlated loss events of netem.

Signed-off-by: François Michel <francois.michel@uclouvain.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20230815092348.1449179-4-francois.michel@uclouvain.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17 19:15:06 -07:00
François Michel 9c87b2aecc netem: use a seeded PRNG for generating random losses
Use prandom_u32_state() instead of get_random_u32() to generate
the random loss events of netem. The state of the prng is part
of the prng attribute of struct netem_sched_data.

Signed-off-by: François Michel <francois.michel@uclouvain.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20230815092348.1449179-3-francois.michel@uclouvain.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17 19:15:05 -07:00
François Michel 4072d97ddc netem: add prng attribute to netem_sched_data
Add prng attribute to struct netem_sched_data and
allows setting the seed of the PRNG through netlink
using the new TCA_NETEM_PRNG_SEED attribute.
The PRNG attribute is not actually used yet.

Signed-off-by: François Michel <francois.michel@uclouvain.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20230815092348.1449179-2-francois.michel@uclouvain.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17 19:15:05 -07:00
Abel Wu 2d0c88e84e sock: Fix misuse of sk_under_memory_pressure()
The status of global socket memory pressure is updated when:

  a) __sk_mem_raise_allocated():

	enter: sk_memory_allocated(sk) >  sysctl_mem[1]
	leave: sk_memory_allocated(sk) <= sysctl_mem[0]

  b) __sk_mem_reduce_allocated():

	leave: sk_under_memory_pressure(sk) &&
		sk_memory_allocated(sk) < sysctl_mem[0]

So the conditions of leaving global pressure are inconstant, which
may lead to the situation that one pressured net-memcg prevents the
global pressure from being cleared when there is indeed no global
pressure, thus the global constrains are still in effect unexpectedly
on the other sockets.

This patch fixes this by ignoring the net-memcg's pressure when
deciding whether should leave global memory pressure.

Fixes: e1aab161e0 ("socket: initial cgroup code.")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Link: https://lore.kernel.org/r/20230816091226.1542-1-wuyun.abel@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17 11:34:36 -07:00
Jakub Kicinski f54a2a132a bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZN0eNgAKCRDbK58LschI
 gwhhAQCwbrEgA3LslDlk22eqyfRH04D+9d7Kc3ISQssyjlr9swD+NfwfDvYqopwj
 Dp67QkHdluixf2/NMPTEvg/CA4mlmww=
 =4BwF
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-08-16

We've added 17 non-merge commits during the last 6 day(s) which contain
a total of 20 files changed, 1179 insertions(+), 37 deletions(-).

The main changes are:

1) Add a BPF hook in sys_socket() to change the protocol ID
   from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy
   applications, from Geliang Tang.

2) Follow-up/fallout fix from the SO_REUSEPORT + bpf_sk_assign work
   to fix a splat on non-fullsock sks in inet[6]_steal_sock,
   from Lorenz Bauer.

3) Improvements to struct_ops links to avoid forcing presence of
   update/validate callbacks. Also add bpf_struct_ops fields documentation,
   from David Vernet.

4) Ensure libbpf sets close-on-exec flag on gzopen, from Marco Vedovati.

5) Several new tcx selftest additions and bpftool link show support for
   tcx and xdp links, from Daniel Borkmann.

6) Fix a smatch warning on uninitialized symbol in
   bpf_perf_link_fill_kprobe, from Yafang Shao.

7) BPF selftest fixes e.g. misplaced break in kfunc_call test,
   from Yipeng Zou.

8) Small cleanup to remove unused declaration bpf_link_new_file,
   from Yue Haibing.

9) Small typo fix to bpftool's perf help message, from Daniel T. Lee.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Add mptcpify test
  selftests/bpf: Fix error checks of mptcp open_and_load
  selftests/bpf: Add two mptcp netns helpers
  bpf: Add update_socket_protocol hook
  bpftool: Implement link show support for xdp
  bpftool: Implement link show support for tcx
  selftests/bpf: Add selftest for fill_link_info
  bpf: Fix uninitialized symbol in bpf_perf_link_fill_kprobe()
  net: Fix slab-out-of-bounds in inet[6]_steal_sock
  bpf: Document struct bpf_struct_ops fields
  bpf: Support default .validate() and .update() behavior for struct_ops links
  selftests/bpf: Add various more tcx test cases
  selftests/bpf: Clean up fmod_ret in bench_rename test script
  selftests/bpf: Fix repeat option when kfunc_call verification fails
  libbpf: Set close-on-exec flag on gzopen
  bpftool: fix perf help message
  bpf: Remove unused declaration bpf_link_new_file()
====================

Link: https://lore.kernel.org/r/20230816212840.1539-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16 20:09:43 -07:00
Geliang Tang 0dd061a6a1 bpf: Add update_socket_protocol hook
Add a hook named update_socket_protocol in __sys_socket(), for bpf
progs to attach to and update socket protocol. One user case is to
force legacy TCP apps to create and use MPTCP sockets instead of
TCP ones.

Define a fmod_ret set named bpf_mptcp_fmodret_ids, add the hook
update_socket_protocol into this set, and register it in
bpf_mptcp_kfunc_init().

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Link: https://lore.kernel.org/r/ac84be00f97072a46f8a72b4e2be46cbb7fa5053.1692147782.git.geliang.tang@suse.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-16 10:22:16 -07:00
Kui-Feng Lee 3dec89b14d net/ipv6: Remove expired routes with a separated list of routes.
FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
can be expensive if the number of routes in a table is big, even if most of
them are permanent. Checking routes in a separated list of routes having
expiration will avoid this potential issue.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16 12:26:43 +01:00
David S. Miller de4c5efeec nf pull request 2023-08-16
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmTb9twNHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2ANr7D/wN/XTDG3JxLl4VUtmSOBrD6y6QwpqfUYBD11Ev
 eAXpP69wxh2J9gPqtVsPAJwbc0F2eca38ziyJ9+4hmNWNBc3Hh1oXj/9e0IqdPUP
 9AEHu73jLmeb6bN0RU8guxmipwZq/a4Q6y/OYPhf+c0uULYEWdH6AAHs3WGRCWHI
 a0gedUU3ChKDueObHWfaZSqGuMVKS+eCfT57oKc/l2J2b2064JAksKMDglsDmsA/
 VQw8Ko+l1PO7t8mOswPufGYyg5tKUXpq8AJ3Dlg2l2Qzws29FEIfzLLbu89GHlvP
 FYSjgOuVdwVra/Kt7jQzUxeGrBXZC8MuaGnEOi/tMXidw0uc+N1y2Bg2N6eFkmxW
 AN5e4p0S3ddfbGaEVoDx5aS6kKTDCQAvgeaM+KIExmbMJQ77FEqnq/WqyDd/eHJL
 5Su1nPoIGzswSzcYC6eh6AEnesx01OdKCZtMfF1LxutopMLItsBe3UxPRglJfvvX
 XxuosrZe1aIOgCSQGTeP6DqpTVEOsvNThhxapKt1AeW+YfnzGwZD0hU6vOq9ZSHS
 K+wRWfhipS5nt8zTv8SbM2DgOmD0pPcsiueAXNQUW7gUUXleLOvP8tden91M/37C
 eGMuZqdWRAjJtU0q4QvhiLSvlI7Awh2dsr0Vgh4k6b1oyuk3UrwKJeoZW07g9MR4
 VLRIUg==
 =5ePi
 -----END PGP SIGNATURE-----

Merge tag 'nf-23-08-16' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florisn Westphal says:

====================
These are netfilter fixes for the *net* tree.

First patch resolves a false-positive lockdep splat:
rcu_dereference is used outside of rcu read lock.  Let lockdep
validate that the transaction mutex is locked.

Second patch fixes a kdoc warning added in previous PR.

Third patch fixes a memory leak:
The catchall element isn't disabled correctly, this allows
userspace to deactivate the element again. This results in refcount
underflow which in turn prevents memory release. This was always
broken since the feature was added in 5.13.

Patch 4 fixes an incorrect change in the previous pull request:
Adding a duplicate key to a set should work if the duplicate key
has expired, restore this behaviour. All from myself.

Patch #5 resolves an old historic artifact in sctp conntrack:
a 300ms timeout for shutdown_ack. Increase this to 3s.  From Xin Long.

Patch #6 fixes a sysctl data race in ipvs, two threads can clobber the
sysctl value, from Sishuai Gong. This is a day-0 bug that predates git
history.

Patches 7, 8 and 9, from Pablo Neira Ayuso, are also followups
for the previous GC rework in nf_tables: The netlink notifier and the
netns exit path must both increment the gc worker seqcount, else worker
may encounter stale (free'd) pointers.
================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16 11:11:24 +01:00
Eric Dumazet 12af73269f inet: implement lockless IP_MINTTL
inet->min_ttl is already read with READ_ONCE().

Implementing IP_MINTTL socket option set/read
without holding the socket lock is easy.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16 11:09:18 +01:00
Eric Dumazet 10f42426e5 inet: implement lockless IP_TTL
ip_select_ttl() is racy, because it reads inet->uc_ttl
without proper locking.

Add READ_ONCE()/WRITE_ONCE() annotations while
allowing IP_TTL socket option to be set/read without
holding the socket lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16 11:09:18 +01:00
Eric Dumazet 08e39c0dfa inet: move inet->defer_connect to inet->inet_flags
Make room in struct inet_sock by removing this bit field,
using one available bit in inet_flags instead.

Also move local_port_range to fill the resulting hole,
saving 8 bytes on 64bit arches.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16 11:09:18 +01:00
Eric Dumazet ca571e2eb7 inet: move inet->bind_address_no_port to inet->inet_flags
IP_BIND_ADDRESS_NO_PORT socket option can now be set/read
without locking the socket.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16 11:09:17 +01:00
Eric Dumazet f04b8d3478 inet: move inet->nodefrag to inet->inet_flags
IP_NODEFRAG socket option can now be set/read
without locking the socket.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16 11:09:17 +01:00
Eric Dumazet b1c0356a58 inet: move inet->is_icsk to inet->inet_flags
We move single bit fields to inet->inet_flags to avoid races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16 11:09:17 +01:00