Commit graph

16549 commits

Author SHA1 Message Date
Eric Dumazet ef62c0ae6d net/ipv6: convert skip_notify_on_dev_down sysctl to u8
Save a bit a space, and could help future sysctls to
use the same pattern.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02 22:55:43 -07:00
Eric Dumazet edf2e1d201 net/ipv6: fix bool/int mismatch for skip_notify_on_dev_down
skip_notify_on_dev_down ctl table expects this field
to be an int (4 bytes), not a bool (1 byte).

Because proc_dou8vec_minmax() was added in 5.13,
this patch converts skip_notify_on_dev_down to an int.

Following patch then converts the field to u8 and use proc_dou8vec_minmax().

Fixes: 7c6bb7d2fa ("net/ipv6: Add knob to skip DELROUTE message on device down")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02 22:55:43 -07:00
Akihiro Suda e209fee411 net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294
With this commit, all the GIDs ("0 4294967294") can be written to the
"net.ipv4.ping_group_range" sysctl.

Note that 4294967295 (0xffffffff) is an invalid GID (see gid_valid() in
include/linux/uidgid.h), and an attempt to register this number will cause
-EINVAL.

Prior to this commit, only up to GID 2147483647 could be covered.
Documentation/networking/ip-sysctl.rst had "0 4294967295" as an example
value, but this example was wrong and causing -EINVAL.

Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Co-developed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-02 09:55:22 +01:00
Qingfang DENG ed779fe4c9 neighbour: fix unaligned access to pneigh_entry
After the blamed commit, the member key is longer 4-byte aligned. On
platforms that do not support unaligned access, e.g., MIPS32R2 with
unaligned_action set to 1, this will trigger a crash when accessing
an IPv6 pneigh_entry, as the key is cast to an in6_addr pointer.

Change the type of the key to u32 to make it aligned.

Fixes: 62dd93181a ("[IPV6] NDISC: Set per-entry is_router flag in Proxy NA.")
Signed-off-by: Qingfang DENG <qingfang.deng@siflower.com.cn>
Link: https://lore.kernel.org/r/20230601015432.159066-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-01 21:36:37 -07:00
fuyuanli 30c6f0bf95 tcp: fix mishandling when the sack compression is deferred.
In this patch, we mainly try to handle sending a compressed ack
correctly if it's deferred.

Here are more details in the old logic:
When sack compression is triggered in the tcp_compressed_ack_kick(),
if the sock is owned by user, it will set TCP_DELACK_TIMER_DEFERRED
and then defer to the release cb phrase. Later once user releases
the sock, tcp_delack_timer_handler() should send a ack as expected,
which, however, cannot happen due to lack of ICSK_ACK_TIMER flag.
Therefore, the receiver would not sent an ack until the sender's
retransmission timeout. It definitely increases unnecessary latency.

Fixes: 5d9f4262b7 ("tcp: add SACK compression")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: fuyuanli <fuyuanli@didiglobal.com>
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/netdev/20230529113804.GA20300@didi-ThinkCentre-M920t-N000/
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230531080150.GA20424@didi-ThinkCentre-M920t-N000
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-01 13:15:12 +02:00
Haiyang Zhang 1919b39fc6 net: mana: Fix perf regression: remove rx_cqes, tx_cqes counters
The apc->eth_stats.rx_cqes is one per NIC (vport), and it's on the
frequent and parallel code path of all queues. So, r/w into this
single shared variable by many threads on different CPUs creates a
lot caching and memory overhead, hence perf regression. And, it's
not accurate due to the high volume concurrent r/w.

For example, a workload is iperf with 128 threads, and with RPS
enabled. We saw perf regression of 25% with the previous patch
adding the counters. And this patch eliminates the regression.

Since the error path of mana_poll_rx_cq() already has warnings, so
keeping the counter and convert it to a per-queue variable is not
necessary. So, just remove this counter from this high frequency
code path.

Also, remove the tx_cqes counter for the same reason. We have
warnings & other counters for errors on that path, and don't need
to count every normal cqe processing.

Cc: stable@vger.kernel.org
Fixes: bd7fc6e195 ("net: mana: Add new MANA VF performance counters for easier troubleshooting")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/1685115537-31675-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30 12:05:22 +02:00
Eric Dumazet 4faeee0cf8 tcp: deny tcp_disconnect() when threads are waiting
Historically connect(AF_UNSPEC) has been abused by syzkaller
and other fuzzers to trigger various bugs.

A recent one triggers a divide-by-zero [1], and Paolo Abeni
was able to diagnose the issue.

tcp_recvmsg_locked() has tests about sk_state being not TCP_LISTEN
and TCP REPAIR mode being not used.

Then later if socket lock is released in sk_wait_data(),
another thread can call connect(AF_UNSPEC), then make this
socket a TCP listener.

When recvmsg() is resumed, it can eventually call tcp_cleanup_rbuf()
and attempt a divide by 0 in tcp_rcv_space_adjust() [1]

This patch adds a new socket field, counting number of threads
blocked in sk_wait_event() and inet_wait_for_connect().

If this counter is not zero, tcp_disconnect() returns an error.

This patch adds code in blocking socket system calls, thus should
not hurt performance of non blocking ones.

Note that we probably could revert commit 499350a5a6 ("tcp:
initialize rcv_mss to TCP_MIN_MSS instead of 0") to restore
original tcpi_rcv_mss meaning (was 0 if no payload was ever
received on a socket)

[1]
divide error: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 13832 Comm: syz-executor.5 Not tainted 6.3.0-rc4-syzkaller-00224-g00c7b5f4ddc5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
RIP: 0010:tcp_rcv_space_adjust+0x36e/0x9d0 net/ipv4/tcp_input.c:740
Code: 00 00 00 00 fc ff df 4c 89 64 24 48 8b 44 24 04 44 89 f9 41 81 c7 80 03 00 00 c1 e1 04 44 29 f0 48 63 c9 48 01 e9 48 0f af c1 <49> f7 f6 48 8d 04 41 48 89 44 24 40 48 8b 44 24 30 48 c1 e8 03 48
RSP: 0018:ffffc900033af660 EFLAGS: 00010206
RAX: 4a66b76cbade2c48 RBX: ffff888076640cc0 RCX: 00000000c334e4ac
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000001
RBP: 00000000c324e86c R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8880766417f8
R13: ffff888028fbb980 R14: 0000000000000000 R15: 0000000000010344
FS: 00007f5bffbfe700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32f25000 CR3: 000000007ced0000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tcp_recvmsg_locked+0x100e/0x22e0 net/ipv4/tcp.c:2616
tcp_recvmsg+0x117/0x620 net/ipv4/tcp.c:2681
inet6_recvmsg+0x114/0x640 net/ipv6/af_inet6.c:670
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg+0xe2/0x160 net/socket.c:1038
____sys_recvmsg+0x210/0x5a0 net/socket.c:2720
___sys_recvmsg+0xf2/0x180 net/socket.c:2762
do_recvmmsg+0x25e/0x6e0 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0x20f/0x260 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5c0108c0f9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f5bffbfe168 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 00007f5c011ac050 RCX: 00007f5c0108c0f9
RDX: 0000000000000001 RSI: 0000000020000bc0 RDI: 0000000000000003
RBP: 00007f5c010e7b39 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000122 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5c012cfb1f R14: 00007f5bffbfe300 R15: 0000000000022000
</TASK>

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Paolo Abeni <pabeni@redhat.com>
Diagnosed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20230526163458.2880232-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-29 22:03:48 -07:00
Chuck Lever 26fb5480a2 net/handshake: Enable the SNI extension to work properly
Enable the upper layer protocol to specify the SNI peername. This
avoids the need for tlshd to use a DNS lookup, which can return a
hostname that doesn't match the incoming certificate's SubjectName.

Fixes: 2fd5532044 ("net/handshake: Add a kernel API for requesting a TLSv1.3 handshake")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 22:05:24 -07:00
Jakub Kicinski 0c615f1cc3 bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZG4AiAAKCRDbK58LschI
 g+xlAQCmefGbDuwPckZLnomvt6gl4bkIjs7kc1ySbG9QBnaInwD/WyrJaQIPijuD
 qziHPAyx+MEgPseFU1b7Le35SZ66IwM=
 =s4R1
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-05-24

We've added 19 non-merge commits during the last 10 day(s) which contain
a total of 20 files changed, 738 insertions(+), 448 deletions(-).

The main changes are:

1) Batch of BPF sockmap fixes found when running against NGINX TCP tests,
   from John Fastabend.

2) Fix a memleak in the LRU{,_PERCPU} hash map when bucket locking fails,
   from Anton Protopopov.

3) Init the BPF offload table earlier than just late_initcall,
   from Jakub Kicinski.

4) Fix ctx access mask generation for 32-bit narrow loads of 64-bit fields,
   from Will Deacon.

5) Remove a now unsupported __fallthrough in BPF samples,
   from Andrii Nakryiko.

6) Fix a typo in pkg-config call for building sign-file,
   from Jeremy Sowden.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf, sockmap: Test progs verifier error with latest clang
  bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops
  bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer
  bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0
  bpf, sockmap: Build helper to create connected socket pair
  bpf, sockmap: Pull socket helpers out of listen test for general use
  bpf, sockmap: Incorrectly handling copied_seq
  bpf, sockmap: Wake up polling after data copy
  bpf, sockmap: TCP data stall on recv before accept
  bpf, sockmap: Handle fin correctly
  bpf, sockmap: Improved check for empty queue
  bpf, sockmap: Reschedule is now done through backlog
  bpf, sockmap: Convert schedule_work into delayed_work
  bpf, sockmap: Pass skb ownership through read_skb
  bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps
  bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields
  samples/bpf: Drop unnecessary fallthrough
  bpf: netdev: init the offload table earlier
  selftests/bpf: Fix pkg-config call building sign-file
====================

Link: https://lore.kernel.org/r/20230524170839.13905-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 21:57:57 -07:00
Yunsheng Lin 368d3cb406 page_pool: fix inconsistency for page_pool_ring_[un]lock()
page_pool_ring_[un]lock() use in_softirq() to decide which
spin lock variant to use, and when they are called in the
context with in_softirq() being false, spin_lock_bh() is
called in page_pool_ring_lock() while spin_unlock() is
called in page_pool_ring_unlock(), because spin_lock_bh()
has disabled the softirq in page_pool_ring_lock(), which
causes inconsistency for spin lock pair calling.

This patch fixes it by returning in_softirq state from
page_pool_producer_lock(), and use it to decide which
spin lock variant to use in page_pool_producer_unlock().

As pool->ring has both producer and consumer lock, so
rename it to page_pool_producer_[un]lock() to reflect
the actual usage. Also move them to page_pool.c as they
are only used there, and remove the 'inline' as the
compiler may have better idea to do inlining or not.

Fixes: 7886244736 ("net: page_pool: Add bulk support for ptr_ring")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/20230522031714.5089-1-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:25:13 -07:00
John Fastabend e5c6de5fa0 bpf, sockmap: Incorrectly handling copied_seq
The read_skb() logic is incrementing the tcp->copied_seq which is used for
among other things calculating how many outstanding bytes can be read by
the application. This results in application errors, if the application
does an ioctl(FIONREAD) we return zero because this is calculated from
the copied_seq value.

To fix this we move tcp->copied_seq accounting into the recv handler so
that we update these when the recvmsg() hook is called and data is in
fact copied into user buffers. This gives an accurate FIONREAD value
as expected and improves ACK handling. Before we were calling the
tcp_rcv_space_adjust() which would update 'number of bytes copied to
user in last RTT' which is wrong for programs returning SK_PASS. The
bytes are only copied to the user when recvmsg is handled.

Doing the fix for recvmsg is straightforward, but fixing redirect and
SK_DROP pkts is a bit tricker. Build a tcp_psock_eat() helper and then
call this from skmsg handlers. This fixes another issue where a broken
socket with a BPF program doing a resubmit could hang the receiver. This
happened because although read_skb() consumed the skb through sock_drop()
it did not update the copied_seq. Now if a single reccv socket is
redirecting to many sockets (for example for lb) the receiver sk will be
hung even though we might expect it to continue. The hang comes from
not updating the copied_seq numbers and memory pressure resulting from
that.

We have a slight layer problem of calling tcp_eat_skb even if its not
a TCP socket. To fix we could refactor and create per type receiver
handlers. I decided this is more work than we want in the fix and we
already have some small tweaks depending on caller that use the
helper skb_bpf_strparser(). So we extend that a bit and always set
the strparser bit when it is in use and then we can gate the
seq_copied updates on this.

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-9-john.fastabend@gmail.com
2023-05-23 16:10:42 +02:00
Nicolas Dichtel 3632679d9e ipv{4,6}/raw: fix output xfrm lookup wrt protocol
With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the
protocol field of the flow structure, build by raw_sendmsg() /
rawv6_sendmsg()),  is set to IPPROTO_RAW. This breaks the ipsec policy
lookup when some policies are defined with a protocol in the selector.

For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to
specify the protocol. Just accept all values for IPPROTO_RAW socket.

For ipv4, the sin_port field of 'struct sockaddr_in' could not be used
without breaking backward compatibility (the value of this field was never
checked). Let's add a new kind of control message, so that the userland
could specify which protocol is used.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
CC: stable@vger.kernel.org
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-23 15:38:59 +02:00
Jakub Kicinski 67caf26d76 bluetooth pull request for net:
- Fix compiler warnings on btnxpuart
  - Fix potential double free on hci_conn_unlink
  - Fix UAF on hci_conn_hash_flush
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmRoBgIZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKc9RD/47PfJj6p/dn6WTthegoZ7K
 BK0cC4j2H2LBbt4UzGA7BWdGxiJnguahsap09jq3t1hB/9ggXPMbA/4674A58+RM
 iYt6u5/uZCPmVyp5P6YVyFIHv4xM/ruQ3wr8/32AscW066Jp/M8TJ0qTOLQpc+qp
 jkg2xdKMNj1sYzA3reR/oWiWXQR2qJVlHLMq5namZr04B4mj6+2HyaS8+hjVIwku
 VW9g/8fi9FfikdyRfYPNzHuV9C97sP6zvPqKbrxlEk5f1TLfbY9lFqRUgvAL7bco
 fT3ddDqpw/3cMP7b8MKbDB0aS+rt4fT6g83Qtfyeps1cauCMoCeHFIqAfmGQtNTx
 qO7q/fG7sOSFlfyiwbwHTYxYV8z4ABavrRF4lbvIqt9kh7Xuow87rcgrTBmX7nXE
 TUMzeCv/SSQP8k1+O4XNwzUf8XiL6XsbMYmgGRLB8IO+S8QXEDLUelhcCrahq9Pd
 FQ3v4Op1vY6YiTP12iaxzYb82OqdU8Th6PpOGjdjPw/7UQiJaItaX8YKDfw8h/d0
 XDxHbXvWIORegvMHGuXvjUYDVTTTmAvvuPs5nde9rXgMxTmvzN9e7Srbpb6I0nv4
 R5nKiZjnugVaZ9lFimLlj8Bh9/KavpT/41MnF9Oe/XzCfcTVnxthWYJ6kN5Zscof
 o8bd8Q4c1F3x6rW7DFMLTQ==
 =v45m
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2023-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fix compiler warnings on btnxpuart
 - Fix potential double free on hci_conn_unlink
 - Fix UAF on hci_conn_hash_flush

* tag 'for-net-2023-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: btnxpuart: Fix compiler warnings
  Bluetooth: Unlink CISes when LE disconnects in hci_conn_del
  Bluetooth: Fix UAF in hci_conn_hash_flush again
  Bluetooth: Refcnt drop must be placed last in hci_conn_unlink
  Bluetooth: Fix potential double free caused by hci_conn_unlink
====================

Link: https://lore.kernel.org/r/20230519233056.2024340-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-19 22:48:26 -07:00
Taehee Yoo ae9b15fbe6 net: fix stack overflow when LRO is disabled for virtual interfaces
When the virtual interface's feature is updated, it synchronizes the
updated feature for its own lower interface.
This propagation logic should be worked as the iteration, not recursively.
But it works recursively due to the netdev notification unexpectedly.
This problem occurs when it disables LRO only for the team and bonding
interface type.

       team0
         |
  +------+------+-----+-----+
  |      |      |     |     |
team1  team2  team3  ...  team200

If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE
event to its own lower interfaces(team1 ~ team200).
It is worked by netdev_sync_lower_features().
So, the NETDEV_FEAT_CHANGE notification logic of each lower interface
work iteratively.
But generated NETDEV_FEAT_CHANGE event is also sent to the upper
interface too.
upper interface(team0) generates the NETDEV_FEAT_CHANGE event for its own
lower interfaces again.
lower and upper interfaces receive this event and generate this
event again and again.
So, the stack overflow occurs.

But it is not the infinite loop issue.
Because the netdev_sync_lower_features() updates features before
generating the NETDEV_FEAT_CHANGE event.
Already synchronized lower interfaces skip notification logic.
So, it is just the problem that iteration logic is changed to the
recursive unexpectedly due to the notification mechanism.

Reproducer:

ip link add team0 type team
ethtool -K team0 lro on
for i in {1..200}
do
        ip link add team$i master team0 type team
        ethtool -K team$i lro on
done

ethtool -K team0 lro off

In order to fix it, the notifier_ctx member of bonding/team is introduced.

Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com
Fixes: fd867d51f8 ("net/core: generic support for disabling netdev features down stack")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230517143010.3596250-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-19 22:46:37 -07:00
Ruihan Li a2ac591cb4 Bluetooth: Fix UAF in hci_conn_hash_flush again
Commit 06149746e7 ("Bluetooth: hci_conn: Add support for linking
multiple hcon") reintroduced a previously fixed bug [1] ("KASAN:
slab-use-after-free Read in hci_conn_hash_flush"). This bug was
originally fixed by commit 5dc7d23e16 ("Bluetooth: hci_conn: Fix
possible UAF").

The hci_conn_unlink function was added to avoid invalidating the link
traversal caused by successive hci_conn_del operations releasing extra
connections. However, currently hci_conn_unlink itself also releases
extra connections, resulted in the reintroduced bug.

This patch follows a more robust solution for cleaning up all
connections, by repeatedly removing the first connection until there are
none left. This approach does not rely on the inner workings of
hci_conn_del and ensures proper cleanup of all connections.

Meanwhile, we need to make sure that hci_conn_del never fails. Indeed it
doesn't, as it now always returns zero. To make this a bit clearer, this
patch also changes its return type to void.

Reported-by: syzbot+8bb72f86fc823817bc5d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-bluetooth/000000000000aa920505f60d25ad@google.com/
Fixes: 06149746e7 ("Bluetooth: hci_conn: Add support for linking multiple hcon")
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Co-developed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-05-19 15:37:45 -07:00
Jakub Kicinski eca9bfafee tls: rx: strp: preserve decryption status of skbs when needed
When receive buffer is small we try to copy out the data from
TCP into a skb maintained by TLS to prevent connection from
stalling. Unfortunately if a single record is made up of a mix
of decrypted and non-decrypted skbs combining them into a single
skb leads to loss of decryption status, resulting in decryption
errors or data corruption.

Similarly when trying to use TCP receive queue directly we need
to make sure that all the skbs within the record have the same
status. If we don't the mixed status will be detected correctly
but we'll CoW the anchor, again collapsing it into a single paged
skb without decrypted status preserved. So the "fixup" code will
not know which parts of skb to re-encrypt.

Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-19 08:37:37 +01:00
Eric Dumazet ef1148d448 ipv6: remove nexthop_fib6_nh_bh()
After blamed commit, nexthop_fib6_nh_bh() and nexthop_fib6_nh()
are the same.

Delete nexthop_fib6_nh_bh(), and convert /proc/net/ipv6_route
to standard rcu to avoid this splat:

[ 5723.180080] WARNING: suspicious RCU usage
[ 5723.180083] -----------------------------
[ 5723.180084] include/net/nexthop.h:516 suspicious rcu_dereference_check() usage!
[ 5723.180086]
other info that might help us debug this:

[ 5723.180087]
rcu_scheduler_active = 2, debug_locks = 1
[ 5723.180089] 2 locks held by cat/55856:
[ 5723.180091] #0: ffff9440a582afa8 (&p->lock){+.+.}-{3:3}, at: seq_read_iter (fs/seq_file.c:188)
[ 5723.180100] #1: ffffffffaac07040 (rcu_read_lock_bh){....}-{1:2}, at: rcu_lock_acquire (include/linux/rcupdate.h:326)
[ 5723.180109]
stack backtrace:
[ 5723.180111] CPU: 14 PID: 55856 Comm: cat Tainted: G S        I        6.3.0-dbx-DEV #528
[ 5723.180115] Call Trace:
[ 5723.180117]  <TASK>
[ 5723.180119] dump_stack_lvl (lib/dump_stack.c:107)
[ 5723.180124] dump_stack (lib/dump_stack.c:114)
[ 5723.180126] lockdep_rcu_suspicious (include/linux/context_tracking.h:122)
[ 5723.180132] ipv6_route_seq_show (include/net/nexthop.h:?)
[ 5723.180135] ? ipv6_route_seq_next (net/ipv6/ip6_fib.c:2605)
[ 5723.180140] seq_read_iter (fs/seq_file.c:272)
[ 5723.180145] seq_read (fs/seq_file.c:163)
[ 5723.180151] proc_reg_read (fs/proc/inode.c:316 fs/proc/inode.c:328)
[ 5723.180155] vfs_read (fs/read_write.c:468)
[ 5723.180160] ? up_read (kernel/locking/rwsem.c:1617)
[ 5723.180164] ksys_read (fs/read_write.c:613)
[ 5723.180168] __x64_sys_read (fs/read_write.c:621)
[ 5723.180170] do_syscall_64 (arch/x86/entry/common.c:?)
[ 5723.180174] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 5723.180177] RIP: 0033:0x7fa455677d2a

Fixes: 09eed1192c ("neighbour: switch to standard rcu, instead of rcu_bh")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230510154646.370659-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-11 18:07:05 -07:00
Hangbin Liu 9949e2efb5 bonding: fix send_peer_notif overflow
Bonding send_peer_notif was defined as u8. Since commit 07a4ddec3c
("bonding: add an option to specify a delay between peer notifications").
the bond->send_peer_notif will be num_peer_notif multiplied by
peer_notif_delay, which is u8 * u32. This would cause the send_peer_notif
overflow easily. e.g.

  ip link add bond0 type bond mode 1 miimon 100 num_grat_arp 30 peer_notify_delay 1000

To fix the overflow, let's set the send_peer_notif to u32 and limit
peer_notif_delay to 300s.

Reported-by: Liang Li <liali@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2090053
Fixes: 07a4ddec3c ("bonding: add an option to specify a delay between peer notifications")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-10 09:27:20 +01:00
Kuniyuki Iwashima dfd9248c07 net: Fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().
KCSAN found a data race in sock_recv_cmsgs() where the read access
to sk->sk_stamp needs READ_ONCE().

BUG: KCSAN: data-race in packet_recvmsg / packet_recvmsg

write (marked) to 0xffff88803c81f258 of 8 bytes by task 19171 on cpu 0:
 sock_write_timestamp include/net/sock.h:2670 [inline]
 sock_recv_cmsgs include/net/sock.h:2722 [inline]
 packet_recvmsg+0xb97/0xd00 net/packet/af_packet.c:3489
 sock_recvmsg_nosec net/socket.c:1019 [inline]
 sock_recvmsg+0x11a/0x130 net/socket.c:1040
 sock_read_iter+0x176/0x220 net/socket.c:1118
 call_read_iter include/linux/fs.h:1845 [inline]
 new_sync_read fs/read_write.c:389 [inline]
 vfs_read+0x5e0/0x630 fs/read_write.c:470
 ksys_read+0x163/0x1a0 fs/read_write.c:613
 __do_sys_read fs/read_write.c:623 [inline]
 __se_sys_read fs/read_write.c:621 [inline]
 __x64_sys_read+0x41/0x50 fs/read_write.c:621
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

read to 0xffff88803c81f258 of 8 bytes by task 19183 on cpu 1:
 sock_recv_cmsgs include/net/sock.h:2721 [inline]
 packet_recvmsg+0xb64/0xd00 net/packet/af_packet.c:3489
 sock_recvmsg_nosec net/socket.c:1019 [inline]
 sock_recvmsg+0x11a/0x130 net/socket.c:1040
 sock_read_iter+0x176/0x220 net/socket.c:1118
 call_read_iter include/linux/fs.h:1845 [inline]
 new_sync_read fs/read_write.c:389 [inline]
 vfs_read+0x5e0/0x630 fs/read_write.c:470
 ksys_read+0x163/0x1a0 fs/read_write.c:613
 __do_sys_read fs/read_write.c:623 [inline]
 __se_sys_read fs/read_write.c:621 [inline]
 __x64_sys_read+0x41/0x50 fs/read_write.c:621
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

value changed: 0xffffffffc4653600 -> 0x0000000000000000

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 19183 Comm: syz-executor.5 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014

Fixes: 6c7c98bad4 ("sock: avoid dirtying sk_stamp, if possible")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230508175543.55756-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-09 20:09:49 -07:00
Linus Torvalds ed23734c23 Including fixes from netfilter.
Current release - regressions:
 
  - sched: act_pedit: free pedit keys on bail from offset check
 
 Current release - new code bugs:
 
  - pds_core:
   - Kconfig fixes (DEBUGFS and AUXILIARY_BUS)
   - fix mutex double unlock in error path
 
 Previous releases - regressions:
 
  - sched: cls_api: remove block_cb from driver_list before freeing
 
  - nf_tables: fix ct untracked match breakage
 
  - eth: mtk_eth_soc: drop generic vlan rx offload
 
  - sched: flower: fix error handler on replace
 
 Previous releases - always broken:
 
  - tcp: fix skb_copy_ubufs() vs BIG TCP
 
  - ipv6: fix skb hash for some RST packets
 
  - af_packet: don't send zero-byte data in packet_sendmsg_spkt()
 
  - rxrpc: timeout handling fixes after moving client call connection
    to the I/O thread
 
  - ixgbe: fix panic during XDP_TX with > 64 CPUs
 
  - igc: RMW the SRRCTL register to prevent losing timestamp config
 
  - dsa: mt7530: fix corrupt frames using TRGMII on 40 MHz XTAL MT7621
 
  - r8152:
    - fix flow control issue of RTL8156A
    - fix the poor throughput for 2.5G devices
    - move setting r8153b_rx_agg_chg_indicate() to fix coalescing
    - enable autosuspend
 
  - ncsi: clear Tx enable mode when handling a Config required AEN
 
  - octeontx2-pf: macsec: fixes for CN10KB ASIC rev
 
 Misc:
 
  - 9p: remove INET dependency
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmRVeUIACgkQMUZtbf5S
 IrtTug/9Hhg/L0PTSwrfuGh4W1/cjheMWppNLkwyWQUiKG7FcZQ9vu9PxceE3VRu
 2fTqHyvgDMZ8jACovXObeda8z1+g3s/tIPaXELephBIjVlF/h3kG2OaIzlU4jDb4
 A4vklwf8eLbfyVBG22QgKl/I70zVMtnmnOo6c6CPuIOTcMPzslndFO9tB0nCg99F
 DCgCM1BBP1tz+OUch2rLnSzYcqkWqS49BhRk6dhYSliawUFU/5+1tDGDjwWolkfm
 0jqP9DjBOSpZKO8m7SpsUNz7NFRIfYErWZ+YebWbggNxj/6TRJTP83MM0tGoK1rE
 /mz2xpuOki59frlwVOAD6gb/qefjHUp21P4NA7bnhizxFlQL5MHpCeGQ9yLHBSmY
 9Q4ArJkM4jXQ0oDA2nII/pz+cDZGEWFGQ14WW3kYUb7WFmISH4I9OiA9i0TBW6OL
 r1Y/rqzkUvtKWzh9RpiAF9lsdHAm3SX9ES5RfMxzv0x886VOZR4jaMmokRDdPRzq
 0r2Oyj75b62+X0r44Fe22Pl/kPS/uh3642xo9h85aAv/EvhT9JNzMvomJm9d6tkb
 966I085AVbwxPAy+rl5SWyAq60EWDExNTjZvPv0mSMlmSsQ9iK5//xOF2Saw2zai
 /44zQ27tVGkCC44Ou5KmfJN3u4OrKkhcuyxtcDr9QeoOdKZRkMg=
 =9xND
 -----END PGP SIGNATURE-----

Merge tag 'net-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter.

  Current release - regressions:

   - sched: act_pedit: free pedit keys on bail from offset check

  Current release - new code bugs:

   - pds_core:
      - Kconfig fixes (DEBUGFS and AUXILIARY_BUS)
      - fix mutex double unlock in error path

  Previous releases - regressions:

   - sched: cls_api: remove block_cb from driver_list before freeing

   - nf_tables: fix ct untracked match breakage

   - eth: mtk_eth_soc: drop generic vlan rx offload

   - sched: flower: fix error handler on replace

  Previous releases - always broken:

   - tcp: fix skb_copy_ubufs() vs BIG TCP

   - ipv6: fix skb hash for some RST packets

   - af_packet: don't send zero-byte data in packet_sendmsg_spkt()

   - rxrpc: timeout handling fixes after moving client call connection
     to the I/O thread

   - ixgbe: fix panic during XDP_TX with > 64 CPUs

   - igc: RMW the SRRCTL register to prevent losing timestamp config

   - dsa: mt7530: fix corrupt frames using TRGMII on 40 MHz XTAL MT7621

   - r8152:
      - fix flow control issue of RTL8156A
      - fix the poor throughput for 2.5G devices
      - move setting r8153b_rx_agg_chg_indicate() to fix coalescing
      - enable autosuspend

   - ncsi: clear Tx enable mode when handling a Config required AEN

   - octeontx2-pf: macsec: fixes for CN10KB ASIC rev

  Misc:

   - 9p: remove INET dependency"

* tag 'net-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
  net: bcmgenet: Remove phy_stop() from bcmgenet_netif_stop()
  pds_core: fix mutex double unlock in error path
  net/sched: flower: fix error handler on replace
  Revert "net/sched: flower: Fix wrong handle assignment during filter change"
  net/sched: flower: fix filter idr initialization
  net: fec: correct the counting of XDP sent frames
  bonding: add xdp_features support
  net: enetc: check the index of the SFI rather than the handle
  sfc: Add back mailing list
  virtio_net: suppress cpu stall when free_unused_bufs
  ice: block LAN in case of VF to VF offload
  net: dsa: mt7530: fix network connectivity with multiple CPU ports
  net: dsa: mt7530: fix corrupt frames using trgmii on 40 MHz XTAL MT7621
  9p: Remove INET dependency
  netfilter: nf_tables: fix ct untracked match breakage
  af_packet: Don't send zero-byte data in packet_sendmsg_spkt().
  igc: read before write to SRRCTL register
  pds_core: add AUXILIARY_BUS and NET_DEVLINK to Kconfig
  pds_core: remove CONFIG_DEBUG_FS from makefile
  ionic: catch failure from devlink_alloc
  ...
2023-05-05 19:12:01 -07:00
Jakub Kicinski 644bca1d48 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
There's a fix which landed in net-next, pull it in along
with the couple of minor cleanups.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-05 13:35:45 -07:00
Lorenzo Bianconi cb9e6e584d bonding: add xdp_features support
Introduce xdp_features support for bonding driver according to the slave
devices attached to the master one. xdp_features is required whenever we
want to xdp_redirect traffic into a bond device and then into selected
slaves attached to it.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Fixes: 66c0e13ad2 ("drivers: net: turn on XDP features")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Jussi Maki <joamaki@gmail.com>
Tested-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-05 09:34:23 +01:00
Linus Torvalds 8e15605be8 9p patches for 6.4 merge window
This pull request includes a number of patches that didn't quite make
 the cut last merge window while we addressed some outstanding issues
 and review comments.  It includes some new caching modes for those that
 only want readahead caches and reworks how we do writeback caching so we
 are not keeping extra references around which both causes performance problems
 and uses lots of additional resources on the server.
 
 It also includes a new flag to force disabling of xattrs which can also
 cause major performance issues, particularly if the underlying filesystem
 on the server doesn't support them.
 
 Finally it adds a couple of additional mount options to better support
 directio and enabling caches when the server doesn't support qid.version.
 
 There was one late-breaking bug report that has also been included as its
 own patch where I forgot to propagate an embarassing bit-logic fix to the
 various variations of open.  Since that was only added to for-next a week
 ago, if you would like to not include it, I can include it in the first
 round of fixes for -rc2.
 
 Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElpbw0ZalkJikytFRiP/V+0pf/5gFAmRS/LEACgkQiP/V+0pf
 /5i84RAAqFtTYPYxg0VMLvjix0BCCE7CYR+vr7UsoGFeuVRyFsKh7G6fhmRXUhpG
 2kf+1CK+xzQd9PEKiQnVmGhib9SCdeWqb9EUpCgLZEmrSci0qUenzjf3Jg5Qgwhx
 u6xu9tipzHeHMFBD6n0f+j2fZEsDAv5IzgL14F9YfJQueVsL+HUOifLncruWnUEn
 rJAf7omhE1x+FfWsNaB22AksvYUXtQfoG3MgCPWql/XKlo/6xeW/8/eprutIO06M
 LUp3ie4RbSRZiE63SfPhxFMJCZ7g+R1JSKe1J8i9bbbwzYsVO8dovwdMCw0tP7ZQ
 QjVq+Ng4x0Rn5Bmekj18ua7zRsbl96DaoqcnYWKUOHkRVJleTG9t31MeZQaRC+gh
 F3321XkqDBHMXPVVLVQ1Gb1Pxt8dk9iFwmTMxktTrM4n5Zwv8ldMDKQgOhDIv9WA
 dDTjZ7mYpk2f9atxA0oys5oebQlT3D0CM/3p6n/PsXXpk30tat99kgcKlpN8SrL+
 Fs0UkXTQuu0Vin8zaMarQ2TW2UGQVM8qRD2gTooRnOItdqItx17czaE2ALOdIuVs
 LCbDxOXNPP/YbzakNUxh5ldI3Z3eahbilVfGa8ILfvprKnH7MaMmSo0rkP4XHj1h
 JDUiN7JOCOW4dvfaXa4knSIjs62Y9oTS0MWrO9Ajq8bg4ku6U7c=
 =qXIe
 -----END PGP SIGNATURE-----

Merge tag '9p-6.4-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull 9p updates from Eric Van Hensbergen:
 "This includes a number of patches that didn't quite make the cut last
  merge window while we addressed some outstanding issues and review
  comments. It includes some new caching modes for those that only want
  readahead caches and reworks how we do writeback caching so we are not
  keeping extra references around which both causes performance problems
  and uses lots of additional resources on the server.

  It also includes a new flag to force disabling of xattrs which can
  also cause major performance issues, particularly if the underlying
  filesystem on the server doesn't support them.

  Finally it adds a couple of additional mount options to better support
  directio and enabling caches when the server doesn't support
  qid.version.

  There was one late-breaking bug report that has also been included as
  its own patch where I forgot to propagate an embarassing bit-logic fix
  to the various variations of open"

* tag '9p-6.4-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: Fix bit operation logic error
  fs/9p: Rework cache modes and add new options to Documentation
  fs/9p: remove writeback fid and fix per-file modes
  fs/9p: Add new mount modes
  9p: Add additional debug flags and open modes
  fs/9p: allow disable of xattr support on mount
  fs/9p: Remove unnecessary superblock flags
  fs/9p: Consolidate file operations and add readahead and writeback
2023-05-04 14:37:53 -07:00
Pablo Neira Ayuso c1592a8994 netfilter: nf_tables: deactivate anonymous set from preparation phase
Toggle deleted anonymous sets as inactive in the next generation, so
users cannot perform any update on it. Clear the generation bitmask
in case the transaction is aborted.

The following KASAN splat shows a set element deletion for a bound
anonymous set that has been already removed in the same transaction.

[   64.921510] ==================================================================
[   64.923123] BUG: KASAN: wild-memory-access in nf_tables_commit+0xa24/0x1490 [nf_tables]
[   64.924745] Write of size 8 at addr dead000000000122 by task test/890
[   64.927903] CPU: 3 PID: 890 Comm: test Not tainted 6.3.0+ #253
[   64.931120] Call Trace:
[   64.932699]  <TASK>
[   64.934292]  dump_stack_lvl+0x33/0x50
[   64.935908]  ? nf_tables_commit+0xa24/0x1490 [nf_tables]
[   64.937551]  kasan_report+0xda/0x120
[   64.939186]  ? nf_tables_commit+0xa24/0x1490 [nf_tables]
[   64.940814]  nf_tables_commit+0xa24/0x1490 [nf_tables]
[   64.942452]  ? __kasan_slab_alloc+0x2d/0x60
[   64.944070]  ? nf_tables_setelem_notify+0x190/0x190 [nf_tables]
[   64.945710]  ? kasan_set_track+0x21/0x30
[   64.947323]  nfnetlink_rcv_batch+0x709/0xd90 [nfnetlink]
[   64.948898]  ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-05-03 08:24:32 +02:00
David Howells db099c625b rxrpc: Fix timeout of a call that hasn't yet been granted a channel
afs_make_call() calls rxrpc_kernel_begin_call() to begin a call (which may
get stalled in the background waiting for a connection to become
available); it then calls rxrpc_kernel_set_max_life() to set the timeouts -
but that starts the call timer so the call timer might then expire before
we get a connection assigned - leading to the following oops if the call
stalled:

	BUG: kernel NULL pointer dereference, address: 0000000000000000
	...
	CPU: 1 PID: 5111 Comm: krxrpcio/0 Not tainted 6.3.0-rc7-build3+ #701
	RIP: 0010:rxrpc_alloc_txbuf+0xc0/0x157
	...
	Call Trace:
	 <TASK>
	 rxrpc_send_ACK+0x50/0x13b
	 rxrpc_input_call_event+0x16a/0x67d
	 rxrpc_io_thread+0x1b6/0x45f
	 ? _raw_spin_unlock_irqrestore+0x1f/0x35
	 ? rxrpc_input_packet+0x519/0x519
	 kthread+0xe7/0xef
	 ? kthread_complete_and_exit+0x1b/0x1b
	 ret_from_fork+0x22/0x30

Fix this by noting the timeouts in struct rxrpc_call when the call is
created.  The timer will be started when the first packet is transmitted.

It shouldn't be possible to trigger this directly from userspace through
AF_RXRPC as sendmsg() will return EBUSY if the call is in the
waiting-for-conn state if it dropped out of the wait due to a signal.

Fixes: 9d35d880e0 ("rxrpc: Move client call connection to the I/O thread")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-01 07:43:19 +01:00
Chuck Lever 5e052dda12 SUNRPC: Recognize control messages in server-side TCP socket code
To support kTLS, the server-side TCP socket receive path needs to
watch for CMSGs.

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-27 18:49:24 -04:00
Paolo Abeni c248b27cfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-26 10:17:46 +02:00
Jakub Kicinski ffcddcaed6 netfilter pull request 23-04-22
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmRDIGwACgkQ1V2XiooU
 IOTR2xAAgV2g59s0pPO0VnAs9WTKiMgm+9JZz2uLwOT1/nvVzdz8P7wXTJK47m57
 qziX//P0S37D6e5UkdSHpGBSfeIAMwciDd5WVJgsZYhpZ/QMS3y4GEUXVybMhV+p
 oqX86Kc1u91+kLbgFF1CdbMDvw+oNW43QrIYR1Ok2MqlnZ3dlDajkw5KLYWFvb6t
 9BCmkqZ1xheuN1l8CPZ0La3a9fwqTuC0P7TgSlO1BZF5nhADajcGt8lF8Td98zEZ
 Mrb4VyOrhT6UZgGd+UW1ZlNyduO2Cb6Tj7jx9Es5gX8fDjmnZyAJf1qW9Iinoo46
 4Yr+61oSNNGdE63m1lLm2ZQQJGDlUsuGCjUMW4hyVe8YIrqQg+cOXaTkCa40Xvos
 qvObjrsnAd9xHtBchk4IaQ84J56ifalXULAJs8IZGM5K2XUNwnF7kIOaVtV8Rq+K
 BSSprrM9Z7Lz5W5ucRLlBmDYSDd+ESUnhgJgIuf1CZ04mRBupF4IkqCU5INmVhJR
 472cSc9DKPHmsnFFdszidxBwM6mg00qQ9M4Qvl+dQIOs0a28Pr8nHPOSStgMaecd
 NAPGGEMdRLNaH5KYN1VbseUbbFhXQpXcuNCF1q8fdzpQYyo/+I/lu618sFEhy08r
 KN6+JIDdrcAdMyYgL3rQy57u58xYlwU2NLIE0SLHLqI4grtTxnw=
 =T5eS
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-23-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

1) Reduce jumpstack footprint: Stash chain in last rule marker in blob for
   tracing. Remove last rule and chain from jumpstack. From Florian Westphal.

2) nf_tables validates all tables before committing the new rules.
   Unfortunately, this has two drawbacks:

   - Since addition of the transaction mutex pernet state gets written to
     outside of the locked section from the cleanup callback, this is
     wrong so do this cleanup directly after table has passed all checks.

   - Revalidate tables that saw no changes. This can be avoided by
     keeping the validation state per table, not per netns.

   From Florian Westphal.

3) Get rid of a few redundant pointers in the traceinfo structure.
   The three removed pointers are used in the expression evaluation loop,
   so gcc keeps them in registers. Passing them to the (inlined) helpers
   thus doesn't increase nft_do_chain text size, while stack is reduced
   by another 24 bytes on 64bit arches. From Florian Westphal.

4) IPVS cleanups in several ways without implementing any functional
   changes, aside from removing some debugging output:

   - Update width of source for ip_vs_sync_conn_options
     The operation is safe, use an annotation to describe it properly.

   - Consistently use array_size() in ip_vs_conn_init()
     It seems better to use helpers consistently.

   - Remove {Enter,Leave}Function. These seem to be well past their
     use-by date.

   - Correct spelling in comments.

   From Simon Horman.

5) Extended netlink error report for netdevice in flowtables and
   netdev/chains. Allow for incrementally add/delete devices to netdev
   basechain. Allow to create netdev chain without device.

* tag 'nf-next-23-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: allow to create netdev chain without device
  netfilter: nf_tables: support for deleting devices in an existing netdev chain
  netfilter: nf_tables: support for adding new devices to an existing netdev chain
  netfilter: nf_tables: rename function to destroy hook list
  netfilter: nf_tables: do not send complete notification of deletions
  netfilter: nf_tables: extended netlink error reporting for netdevice
  ipvs: Correct spelling in comments
  ipvs: Remove {Enter,Leave}Function
  ipvs: Consistently use array_size() in ip_vs_conn_init()
  ipvs: Update width of source for ip_vs_sync_conn_options
  netfilter: nf_tables: do not store rule in traceinfo structure
  netfilter: nf_tables: do not store verdict in traceinfo structure
  netfilter: nf_tables: do not store pktinfo in traceinfo structure
  netfilter: nf_tables: remove unneeded conditional
  netfilter: nf_tables: make validation state per table
  netfilter: nf_tables: don't write table validation state without mutex
  netfilter: nf_tables: don't store chain address on jump
  netfilter: nf_tables: don't store address of last rule on jump
  netfilter: nf_tables: merge nft_rules_old structure and end of ruleblob marker
====================

Link: https://lore.kernel.org/r/20230421235021.216950-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24 15:37:36 -07:00
Luiz Augusto von Dentz d883a4669a Bluetooth: hci_sync: Only allow hci_cmd_sync_queue if running
This makes sure hci_cmd_sync_queue only queue new work if HCI_RUNNING
has been set otherwise there is a risk of commands being sent while
turning off.

Because hci_cmd_sync_queue can no longer queue work while HCI_RUNNING is
not set it cannot be used to power on adapters so instead
hci_cmd_sync_submit is introduced which bypass the HCI_RUNNING check, so
it behaves like the old implementation.

Link: https://lore.kernel.org/all/CAB4PzUpDMvdc8j2MdeSAy1KkAE-D3woprCwAdYWeOc-3v3c9Sw@mail.gmail.com/
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23 22:07:43 -07:00
Raul Cheleguini 91b6d02ddc Bluetooth: Add new quirk for broken set random RPA timeout for ATS2851
The ATS2851 based controller advertises support for command "LE Set Random
Private Address Timeout" but does not actually implement it, impeding the
controller initialization.

Add the quirk HCI_QUIRK_BROKEN_SET_RPA_TIMEOUT to unblock the controller
initialization.

< HCI Command: LE Set Resolvable Private... (0x08|0x002e) plen 2
        Timeout: 900 seconds
> HCI Event: Command Status (0x0f) plen 4
      LE Set Resolvable Private Address Timeout (0x08|0x002e) ncmd 1
        Status: Unknown HCI Command (0x01)

Co-developed-by: imoc <wzj9912@gmail.com>
Signed-off-by: imoc <wzj9912@gmail.com>
Signed-off-by: Raul Cheleguini <raul.cheleguini@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23 22:04:26 -07:00
Luiz Augusto von Dentz c09b80be6f Bluetooth: hci_conn: Fix not waiting for HCI_EVT_LE_CIS_ESTABLISHED
When submitting HCI_OP_LE_CREATE_CIS the code shall wait for
HCI_EVT_LE_CIS_ESTABLISHED thus enforcing the serialization of
HCI_OP_LE_CREATE_CIS as the Core spec does not allow to send them in
parallel:

  BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 2566:

  If the Host issues this command before all the HCI_LE_CIS_Established
  events from the previous use of the command have been generated, the
  Controller shall return the error code Command Disallowed (0x0C).

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23 22:04:01 -07:00
Luiz Augusto von Dentz c14516faed Bluetooth: hci_conn: Fix not matching by CIS ID
This fixes only matching CIS by address which prevents creating new hcon
if upper layer is requesting a specific CIS ID.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23 22:03:37 -07:00
Luiz Augusto von Dentz 06149746e7 Bluetooth: hci_conn: Add support for linking multiple hcon
Since it is required for some configurations to have multiple CIS with
the same peer which is now covered by iso-tester in the following test
cases:

    ISO AC 6(i) - Success
    ISO AC 7(i) - Success
    ISO AC 8(i) - Success
    ISO AC 9(i) - Success
    ISO AC 11(i) - Success

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23 22:03:13 -07:00
Luiz Augusto von Dentz 288c90224e Bluetooth: Enable all supported LE PHY by default
This enables 2M and Coded PHY by default if they are marked as supported
in the LE features bits.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23 21:59:41 -07:00
Iulia Tanasescu 0fe8c8d071 Bluetooth: Split bt_iso_qos into dedicated structures
Split bt_iso_qos into dedicated unicast and broadcast
structures and add additional broadcast parameters.

Fixes: eca0ae4aea ("Bluetooth: Add initial implementation of BIS connections")
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23 21:59:17 -07:00
Abhishek Pandit-Subedi 9695ef876f Bluetooth: Add support for hci devcoredump
Add devcoredump APIs to hci core so that drivers only have to provide
the dump skbs instead of managing the synchronization and timeouts.

The devcoredump APIs should be used in the following manner:
 - hci_devcoredump_init is called to allocate the dump.
 - hci_devcoredump_append is called to append any skbs with dump data
   OR hci_devcoredump_append_pattern is called to insert a pattern.
 - hci_devcoredump_complete is called when all dump packets have been
   sent OR hci_devcoredump_abort is called to indicate an error and
   cancel an ongoing dump collection.

The high level APIs just prepare some skbs with the appropriate data and
queue it for the dump to process. Packets part of the crashdump can be
intercepted in the driver in interrupt context and forwarded directly to
the devcoredump APIs.

Internally, there are 5 states for the dump: idle, active, complete,
abort and timeout. A devcoredump will only be in active state after it
has been initialized. Once active, it accepts data to be appended,
patterns to be inserted (i.e. memset) and a completion event or an abort
event to generate a devcoredump. The timeout is initialized at the same
time the dump is initialized (defaulting to 10s) and will be cleared
either when the timeout occurs or the dump is complete or aborted.

Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Manish Mandlik <mmandlik@google.com>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23 21:57:59 -07:00
Vasily Khoruzhick 8194f1ef5a Bluetooth: Add new quirk for broken local ext features page 2
Some adapters (e.g. RTL8723CS) advertise that they have more than
2 pages for local ext features, but they don't support any features
declared in these pages. RTL8723CS reports max_page = 2 and declares
support for sync train and secure connection, but it responds with
either garbage or with error in status on corresponding commands.

Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Bastian Germann <bage@debian.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23 21:50:01 -07:00
Luiz Augusto von Dentz b8b23001b8 Bluetooth: L2CAP: Delay identity address updates
This delays the identity address updates to give time for userspace to
process the new address otherwise there is a risk that userspace
creates a duplicated device if the MGMT event is delayed for some
reason.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23 21:48:44 -07:00
Luiz Augusto von Dentz d4b20f0b84 Bluetooth: hci_core: Make hci_conn_hash_add append to the list
This makes hci_conn_hash_add append to the tail of the conn_hash so it
matches the order they are created, this is required if the controller
attempts to match the order of ACL with CIS which uses append logic
when programming the CIS ids on the CIG.

The result of this change affects Create CIS:

Before:

< HCI Command: LE Create Connected Isochronous Stream (0x08|0x0064) plen 9
        Number of CIS: 2
        CIS Handle: 2560
        ACL Handle: 3586
        CIS Handle: 2561
        ACL Handle: 3585

After:

< HCI Command: LE Create Connected Isochronous Stream (0x08|0x0064) plen 9
        Number of CIS: 2
        CIS Handle: 2560
        ACL Handle: 3585
        CIS Handle: 2561
        ACL Handle: 3586

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23 21:46:00 -07:00
Luiz Augusto von Dentz a80d2c545d Bluetooth: MGMT: Use BIT macro when defining bitfields
This makes use of BIT macro when defining bitfields which makes it
clearer what bit it is toggling.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23 21:41:22 -07:00
David Howells e0416e7d33 rxrpc: Fix potential race in error handling in afs_make_call()
If the rxrpc call set up by afs_make_call() receives an error whilst it is
transmitting the request, there's the possibility that it may get to the
point the rxrpc call is ended (after the error_kill_call label) just as the
call is queued for async processing.

This could manifest itself as call->rxcall being seen as NULL in
afs_deliver_to_call() when it tries to lock the call.

Fix this by splitting rxrpc_kernel_end_call() into a function to shut down
an rxrpc call and a function to release the caller's reference and calling
the latter only when we get to afs_put_call().

Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: kafs-testing+fedora36_64checkkafs-build-306@auristor.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-22 15:16:39 +01:00
Jakub Kicinski 9a82cdc28f bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZELn8wAKCRDbK58LschI
 g1khAQC1nmXPuKjM4EAfFK8Ysb3KoF8ADmpE97n+/HEDydCagwD/bX0+NABR75Nh
 ueGcoU1TcfcbshDzrH0s+C95owZDZw4=
 =BeZM
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-04-21

We've added 71 non-merge commits during the last 8 day(s) which contain
a total of 116 files changed, 13397 insertions(+), 8896 deletions(-).

The main changes are:

1) Add a new BPF netfilter program type and minimal support to hook
   BPF programs to netfilter hooks such as prerouting or forward,
   from Florian Westphal.

2) Fix race between btf_put and btf_idr walk which caused a deadlock,
   from Alexei Starovoitov.

3) Second big batch to migrate test_verifier unit tests into test_progs
   for ease of readability and debugging, from Eduard Zingerman.

4) Add support for refcounted local kptrs to the verifier for allowing
   shared ownership, useful for adding a node to both the BPF list and
   rbtree, from Dave Marchevsky.

5) Migrate bpf_for(), bpf_for_each() and bpf_repeat() macros from BPF
  selftests into libbpf-provided bpf_helpers.h header and improve
  kfunc handling, from Andrii Nakryiko.

6) Support 64-bit pointers to kfuncs needed for archs like s390x,
   from Ilya Leoshkevich.

7) Support BPF progs under getsockopt with a NULL optval,
   from Stanislav Fomichev.

8) Improve verifier u32 scalar equality checking in order to enable
   LLVM transformations which earlier had to be disabled specifically
   for BPF backend, from Yonghong Song.

9) Extend bpftool's struct_ops object loading to support links,
   from Kui-Feng Lee.

10) Add xsk selftest follow-up fixes for hugepage allocated umem,
    from Magnus Karlsson.

11) Support BPF redirects from tc BPF to ifb devices,
    from Daniel Borkmann.

12) Add BPF support for integer type when accessing variable length
    arrays, from Feng Zhou.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (71 commits)
  selftests/bpf: verifier/value_ptr_arith converted to inline assembly
  selftests/bpf: verifier/value_illegal_alu converted to inline assembly
  selftests/bpf: verifier/unpriv converted to inline assembly
  selftests/bpf: verifier/subreg converted to inline assembly
  selftests/bpf: verifier/spin_lock converted to inline assembly
  selftests/bpf: verifier/sock converted to inline assembly
  selftests/bpf: verifier/search_pruning converted to inline assembly
  selftests/bpf: verifier/runtime_jit converted to inline assembly
  selftests/bpf: verifier/regalloc converted to inline assembly
  selftests/bpf: verifier/ref_tracking converted to inline assembly
  selftests/bpf: verifier/map_ptr_mixing converted to inline assembly
  selftests/bpf: verifier/map_in_map converted to inline assembly
  selftests/bpf: verifier/lwt converted to inline assembly
  selftests/bpf: verifier/loops1 converted to inline assembly
  selftests/bpf: verifier/jeq_infer_not_null converted to inline assembly
  selftests/bpf: verifier/direct_packet_access converted to inline assembly
  selftests/bpf: verifier/d_path converted to inline assembly
  selftests/bpf: verifier/ctx converted to inline assembly
  selftests/bpf: verifier/btf_ctx_access converted to inline assembly
  selftests/bpf: verifier/bpf_get_stack converted to inline assembly
  ...
====================

Link: https://lore.kernel.org/r/20230421211035.9111-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-21 20:32:37 -07:00
Pablo Neira Ayuso b9703ed44f netfilter: nf_tables: support for adding new devices to an existing netdev chain
This patch allows users to add devices to an existing netdev chain.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22 01:39:42 +02:00
Simon Horman c7d15aaa10 ipvs: Correct spelling in comments
Correct some spelling errors flagged by codespell and found by inspection.

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22 01:39:41 +02:00
Simon Horman 210ffe4a74 ipvs: Remove {Enter,Leave}Function
Remove EnterFunction and LeaveFunction.

These debugging macros seem well past their use-by date.  And seem to
have little value these days. Removing them allows some trivial cleanup
of some exit paths for some functions. These are also included in this
patch. There is likely scope for further cleanup of both debugging and
unwind paths. But let's leave that for another day.

Only intended to change debug output, and only when CONFIG_IP_VS_DEBUG
is enabled. Compile tested only.

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22 01:39:41 +02:00
Simon Horman e3478c68f6 ipvs: Update width of source for ip_vs_sync_conn_options
In ip_vs_sync_conn_v0() copy is made to struct ip_vs_sync_conn_options.
That structure looks like this:

struct ip_vs_sync_conn_options {
        struct ip_vs_seq        in_seq;
        struct ip_vs_seq        out_seq;
};

The source of the copy is the in_seq field of struct ip_vs_conn.  Whose
type is struct ip_vs_seq. Thus we can see that the source - is not as
wide as the amount of data copied, which is the width of struct
ip_vs_sync_conn_option.

The copy is safe because the next field in is another struct ip_vs_seq.
Make use of struct_group() to annotate this.

Flagged by gcc-13 as:

 In file included from ./include/linux/string.h:254,
                  from ./include/linux/bitmap.h:11,
                  from ./include/linux/cpumask.h:12,
                  from ./arch/x86/include/asm/paravirt.h:17,
                  from ./arch/x86/include/asm/cpuid.h:62,
                  from ./arch/x86/include/asm/processor.h:19,
                  from ./arch/x86/include/asm/timex.h:5,
                  from ./include/linux/timex.h:67,
                  from ./include/linux/time32.h:13,
                  from ./include/linux/time.h:60,
                  from ./include/linux/stat.h:19,
                  from ./include/linux/module.h:13,
                  from net/netfilter/ipvs/ip_vs_sync.c:38:
 In function 'fortify_memcpy_chk',
     inlined from 'ip_vs_sync_conn_v0' at net/netfilter/ipvs/ip_vs_sync.c:606:3:
 ./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
   529 |                         __read_overflow2_field(q_size_field, size);
       |

Compile tested only.

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22 01:39:41 +02:00
Florian Westphal 46df417544 netfilter: nf_tables: do not store rule in traceinfo structure
pass it as argument instead.  This reduces size of traceinfo to
16 bytes.  Total stack usage:

 nf_tables_core.c:252 nft_do_chain    304     static

While its possible to also pass basechain as argument, doing so
increases nft_do_chaininfo function size.

Unlike pktinfo/verdict/rule the basechain info isn't used in
the expression evaluation path. gcc places it on the stack, which
results in extra push/pop when it gets passed to the trace helpers
as argument rather than as part of the traceinfo structure.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22 01:39:41 +02:00
Florian Westphal 0a202145d5 netfilter: nf_tables: do not store verdict in traceinfo structure
Just pass it as argument to nft_trace_notify. Stack is reduced by 8 bytes:

nf_tables_core.c:256 nft_do_chain    312     static

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22 01:39:41 +02:00
Florian Westphal 698bb828a6 netfilter: nf_tables: do not store pktinfo in traceinfo structure
pass it as argument.  No change in object size.

stack usage decreases by 8 byte:
 nf_tables_core.c:254  nft_do_chain       320     static

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22 01:39:41 +02:00
Florian Westphal 00c320f9b7 netfilter: nf_tables: make validation state per table
We only need to validate tables that saw changes in the current
transaction.

The existing code revalidates all tables, but this isn't needed as
cross-table jumps are not allowed (chains have table scope).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22 01:39:40 +02:00