Commit graph

77414 commits

Author SHA1 Message Date
Jianguo Wu a2225e0250 netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter core
Currently, the sysctl net.netfilter.nf_hooks_lwtunnel depends on the
nf_conntrack module, but the nf_conntrack module is not always loaded.
Therefore, accessing net.netfilter.nf_hooks_lwtunnel may have an error.

Move sysctl nf_hooks_lwtunnel into the netfilter core.

Fixes: 7a3f5b0de3 ("netfilter: add netfilter hooks to SRv6 data plane")
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-19 18:41:59 +02:00
Jianguo Wu 9a3bc8d16e seg6: fix parameter passing when calling NF_HOOK() in End.DX4 and End.DX6 behaviors
input_action_end_dx4() and input_action_end_dx6() are called NF_HOOK() for
PREROUTING hook, in PREROUTING hook, we should passing a valid indev,
and a NULL outdev to NF_HOOK(), otherwise may trigger a NULL pointer
dereference, as below:

    [74830.647293] BUG: kernel NULL pointer dereference, address: 0000000000000090
    [74830.655633] #PF: supervisor read access in kernel mode
    [74830.657888] #PF: error_code(0x0000) - not-present page
    [74830.659500] PGD 0 P4D 0
    [74830.660450] Oops: 0000 [#1] PREEMPT SMP PTI
    ...
    [74830.664953] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    [74830.666569] RIP: 0010:rpfilter_mt+0x44/0x15e [ipt_rpfilter]
    ...
    [74830.689725] Call Trace:
    [74830.690402]  <IRQ>
    [74830.690953]  ? show_trace_log_lvl+0x1c4/0x2df
    [74830.692020]  ? show_trace_log_lvl+0x1c4/0x2df
    [74830.693095]  ? ipt_do_table+0x286/0x710 [ip_tables]
    [74830.694275]  ? __die_body.cold+0x8/0xd
    [74830.695205]  ? page_fault_oops+0xac/0x140
    [74830.696244]  ? exc_page_fault+0x62/0x150
    [74830.697225]  ? asm_exc_page_fault+0x22/0x30
    [74830.698344]  ? rpfilter_mt+0x44/0x15e [ipt_rpfilter]
    [74830.699540]  ipt_do_table+0x286/0x710 [ip_tables]
    [74830.700758]  ? ip6_route_input+0x19d/0x240
    [74830.701752]  nf_hook_slow+0x3f/0xb0
    [74830.702678]  input_action_end_dx4+0x19b/0x1e0
    [74830.703735]  ? input_action_end_t+0xe0/0xe0
    [74830.704734]  seg6_local_input_core+0x2d/0x60
    [74830.705782]  lwtunnel_input+0x5b/0xb0
    [74830.706690]  __netif_receive_skb_one_core+0x63/0xa0
    [74830.707825]  process_backlog+0x99/0x140
    [74830.709538]  __napi_poll+0x2c/0x160
    [74830.710673]  net_rx_action+0x296/0x350
    [74830.711860]  __do_softirq+0xcb/0x2ac
    [74830.713049]  do_softirq+0x63/0x90

input_action_end_dx4() passing a NULL indev to NF_HOOK(), and finally
trigger a NULL dereference in rpfilter_mt()->rpfilter_is_loopback():

    static bool
    rpfilter_is_loopback(const struct sk_buff *skb,
          	       const struct net_device *in)
    {
            // in is NULL
            return skb->pkt_type == PACKET_LOOPBACK ||
          	 in->flags & IFF_LOOPBACK;
    }

Fixes: 7a3f5b0de3 ("netfilter: add netfilter hooks to SRv6 data plane")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-19 16:26:42 +02:00
Jozsef Kadlecsik 8ecd06277a netfilter: ipset: Fix suspicious rcu_dereference_protected()
When destroying all sets, we are either in pernet exit phase or
are executing a "destroy all sets command" from userspace. The latter
was taken into account in ip_set_dereference() (nfnetlink mutex is held),
but the former was not. The patch adds the required check to
rcu_dereference_protected() in ip_set_dereference().

Fixes: 4e7aaa6b82 ("netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type")
Reported-by: syzbot+b62c37cdd58103293a5a@syzkaller.appspotmail.com
Reported-by: syzbot+cfbe1da5fdfc39efc293@syzkaller.appspotmail.com
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202406141556.e0b6f17e-lkp@intel.com
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-19 15:12:56 +02:00
Xin Long 88c67aeb14 sched: act_ct: add netns into the key of tcf_ct_flow_table
zones_ht is a global hashtable for flow_table with zone as key. However,
it does not consider netns when getting a flow_table from zones_ht in
tcf_ct_init(), and it means an act_ct action in netns A may get a
flow_table that belongs to netns B if it has the same zone value.

In Shuang's test with the TOPO:

  tcf2_c <---> tcf2_sw1 <---> tcf2_sw2 <---> tcf2_s

tcf2_sw1 and tcf2_sw2 saw the same flow and used the same flow table,
which caused their ct entries entering unexpected states and the
TCP connection not able to end normally.

This patch fixes the issue simply by adding netns into the key of
tcf_ct_flow_table so that an act_ct action gets a flow_table that
belongs to its own netns in tcf_ct_init().

Note that for easy coding we don't use tcf_ct_flow_table.nf_ft.net,
as the ct_ft is initialized after inserting it to the hashtable in
tcf_ct_flow_table_get() and also it requires to implement several
functions in rhashtable_params including hashfn, obj_hashfn and
obj_cmpfn.

Fixes: 64ff70b80f ("net/sched: act_ct: Offload established connections to flow table")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/1db5b6cc6902c5fc6f8c6cbd85494a2008087be5.1718488050.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18 15:24:24 +02:00
Xin Long 2ebe8f840c tipc: force a dst refcount before doing decryption
As it says in commit 3bc07321cc ("xfrm: Force a dst refcount before
entering the xfrm type handlers"):

"Crypto requests might return asynchronous. In this case we leave the
 rcu protected region, so force a refcount on the skb's destination
 entry before we enter the xfrm type input/output handlers."

On TIPC decryption path it has the same problem, and skb_dst_force()
should be called before doing decryption to avoid a possible crash.

Shuang reported this issue when this warning is triggered:

  [] WARNING: include/net/dst.h:337 tipc_sk_rcv+0x1055/0x1ea0 [tipc]
  [] Kdump: loaded Tainted: G W --------- - - 4.18.0-496.el8.x86_64+debug
  [] Workqueue: crypto cryptd_queue_worker
  [] RIP: 0010:tipc_sk_rcv+0x1055/0x1ea0 [tipc]
  [] Call Trace:
  [] tipc_sk_mcast_rcv+0x548/0xea0 [tipc]
  [] tipc_rcv+0xcf5/0x1060 [tipc]
  [] tipc_aead_decrypt_done+0x215/0x2e0 [tipc]
  [] cryptd_aead_crypt+0xdb/0x190
  [] cryptd_queue_worker+0xed/0x190
  [] process_one_work+0x93d/0x17e0

Fixes: fc1b6d6de2 ("tipc: introduce TIPC encryption & authentication")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/fbe3195fad6997a4eec62d9bf076b2ad03ac336b.1718476040.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18 15:08:57 +02:00
David Ruth d864319871 net/sched: act_api: fix possible infinite loop in tcf_idr_check_alloc()
syzbot found hanging tasks waiting on rtnl_lock [1]

A reproducer is available in the syzbot bug.

When a request to add multiple actions with the same index is sent, the
second request will block forever on the first request. This holds
rtnl_lock, and causes tasks to hang.

Return -EAGAIN to prevent infinite looping, while keeping documented
behavior.

[1]

INFO: task kworker/1:0:5088 blocked for more than 143 seconds.
Not tainted 6.9.0-rc4-syzkaller-00173-g3cdb45594619 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/1:0 state:D stack:23744 pid:5088 tgid:5088 ppid:2 flags:0x00004000
Workqueue: events_power_efficient reg_check_chans_work
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5409 [inline]
__schedule+0xf15/0x5d00 kernel/sched/core.c:6746
__schedule_loop kernel/sched/core.c:6823 [inline]
schedule+0xe7/0x350 kernel/sched/core.c:6838
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6895
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x5b8/0x9c0 kernel/locking/mutex.c:752
wiphy_lock include/net/cfg80211.h:5953 [inline]
reg_leave_invalid_chans net/wireless/reg.c:2466 [inline]
reg_check_chans_work+0x10a/0x10e0 net/wireless/reg.c:2481

Fixes: 0190c1d452 ("net: sched: atomically check-allocate action")
Reported-by: syzbot+b87c222546179f4513a7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b87c222546179f4513a7
Signed-off-by: David Ruth <druth@chromium.org>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20240614190326.1349786-1-druth@chromium.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18 12:28:01 +02:00
Yue Haibing ff960f9d3e netns: Make get_net_ns() handle zero refcount net
Syzkaller hit a warning:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 3 PID: 7890 at lib/refcount.c:25 refcount_warn_saturate+0xdf/0x1d0
Modules linked in:
CPU: 3 PID: 7890 Comm: tun Not tainted 6.10.0-rc3-00100-gcaa4f9578aba-dirty #310
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:refcount_warn_saturate+0xdf/0x1d0
Code: 41 49 04 31 ff 89 de e8 9f 1e cd fe 84 db 75 9c e8 76 26 cd fe c6 05 b6 41 49 04 01 90 48 c7 c7 b8 8e 25 86 e8 d2 05 b5 fe 90 <0f> 0b 90 90 e9 79 ff ff ff e8 53 26 cd fe 0f b6 1
RSP: 0018:ffff8881067b7da0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff811c72ac
RDX: ffff8881026a2140 RSI: ffffffff811c72b5 RDI: 0000000000000001
RBP: ffff8881067b7db0 R08: 0000000000000000 R09: 205b5d3730353139
R10: 0000000000000000 R11: 205d303938375420 R12: ffff8881086500c4
R13: ffff8881086500c4 R14: ffff8881086500b0 R15: ffff888108650040
FS:  00007f5b2961a4c0(0000) GS:ffff88823bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055d7ed36fd18 CR3: 00000001482f6000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? show_regs+0xa3/0xc0
 ? __warn+0xa5/0x1c0
 ? refcount_warn_saturate+0xdf/0x1d0
 ? report_bug+0x1fc/0x2d0
 ? refcount_warn_saturate+0xdf/0x1d0
 ? handle_bug+0xa1/0x110
 ? exc_invalid_op+0x3c/0xb0
 ? asm_exc_invalid_op+0x1f/0x30
 ? __warn_printk+0xcc/0x140
 ? __warn_printk+0xd5/0x140
 ? refcount_warn_saturate+0xdf/0x1d0
 get_net_ns+0xa4/0xc0
 ? __pfx_get_net_ns+0x10/0x10
 open_related_ns+0x5a/0x130
 __tun_chr_ioctl+0x1616/0x2370
 ? __sanitizer_cov_trace_switch+0x58/0xa0
 ? __sanitizer_cov_trace_const_cmp2+0x1c/0x30
 ? __pfx_tun_chr_ioctl+0x10/0x10
 tun_chr_ioctl+0x2f/0x40
 __x64_sys_ioctl+0x11b/0x160
 x64_sys_call+0x1211/0x20d0
 do_syscall_64+0x9e/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5b28f165d7
Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 8
RSP: 002b:00007ffc2b59c5e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5b28f165d7
RDX: 0000000000000000 RSI: 00000000000054e3 RDI: 0000000000000003
RBP: 00007ffc2b59c650 R08: 00007f5b291ed8c0 R09: 00007f5b2961a4c0
R10: 0000000029690010 R11: 0000000000000246 R12: 0000000000400730
R13: 00007ffc2b59cf40 R14: 0000000000000000 R15: 0000000000000000
 </TASK>
Kernel panic - not syncing: kernel: panic_on_warn set ...

This is trigger as below:
          ns0                                    ns1
tun_set_iff() //dev is tun0
   tun->dev = dev
//ip link set tun0 netns ns1
                                       put_net() //ref is 0
__tun_chr_ioctl() //TUNGETDEVNETNS
   net = dev_net(tun->dev);
   open_related_ns(&net->ns, get_net_ns); //ns1
     get_net_ns()
        get_net() //addition on 0

Use maybe_get_net() in get_net_ns in case net's ref is zero to fix this

Fixes: 0c3e0e3bb6 ("tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device")
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20240614131302.2698509-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18 10:59:52 +02:00
Eric Dumazet d46401052c xfrm6: check ip6_dst_idev() return value in xfrm6_get_saddr()
ip6_dst_idev() can return NULL, xfrm6_get_saddr() must act accordingly.

syzbot reported:

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 PID: 12 Comm: kworker/u8:1 Not tainted 6.10.0-rc2-syzkaller-00383-gb8481381d4e2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker
 RIP: 0010:xfrm6_get_saddr+0x93/0x130 net/ipv6/xfrm6_policy.c:64
Code: df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 97 00 00 00 4c 8b ab d8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 86 00 00 00 4d 8b 6d 00 e8 ca 13 47 01 48 b8 00
RSP: 0018:ffffc90000117378 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffff88807b079dc0 RCX: ffffffff89a0d6d7
RDX: 0000000000000000 RSI: ffffffff89a0d6e9 RDI: ffff88807b079e98
RBP: ffff88807ad73248 R08: 0000000000000007 R09: fffffffffffff000
R10: ffff88807b079dc0 R11: 0000000000000007 R12: ffffc90000117480
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4586d00440 CR3: 0000000079042000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  xfrm_get_saddr net/xfrm/xfrm_policy.c:2452 [inline]
  xfrm_tmpl_resolve_one net/xfrm/xfrm_policy.c:2481 [inline]
  xfrm_tmpl_resolve+0xa26/0xf10 net/xfrm/xfrm_policy.c:2541
  xfrm_resolve_and_create_bundle+0x140/0x2570 net/xfrm/xfrm_policy.c:2835
  xfrm_bundle_lookup net/xfrm/xfrm_policy.c:3070 [inline]
  xfrm_lookup_with_ifid+0x4d1/0x1e60 net/xfrm/xfrm_policy.c:3201
  xfrm_lookup net/xfrm/xfrm_policy.c:3298 [inline]
  xfrm_lookup_route+0x3b/0x200 net/xfrm/xfrm_policy.c:3309
  ip6_dst_lookup_flow+0x15c/0x1d0 net/ipv6/ip6_output.c:1256
  send6+0x611/0xd20 drivers/net/wireguard/socket.c:139
  wg_socket_send_skb_to_peer+0xf9/0x220 drivers/net/wireguard/socket.c:178
  wg_socket_send_buffer_to_peer+0x12b/0x190 drivers/net/wireguard/socket.c:200
  wg_packet_send_handshake_initiation+0x227/0x360 drivers/net/wireguard/send.c:40
  wg_packet_handshake_send_worker+0x1c/0x30 drivers/net/wireguard/send.c:51
  process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
  process_scheduled_works kernel/workqueue.c:3312 [inline]
  worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393
  kthread+0x2c1/0x3a0 kernel/kthread.c:389
  ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240615154231.234442-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-17 18:04:46 -07:00
Eric Dumazet b86762dbe1 ipv6: prevent possible NULL dereference in rt6_probe()
syzbot caught a NULL dereference in rt6_probe() [1]

Bail out if  __in6_dev_get() returns NULL.

[1]
Oops: general protection fault, probably for non-canonical address 0xdffffc00000000cb: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000658-0x000000000000065f]
CPU: 1 PID: 22444 Comm: syz-executor.0 Not tainted 6.10.0-rc2-syzkaller-00383-gb8481381d4e2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
 RIP: 0010:rt6_probe net/ipv6/route.c:656 [inline]
 RIP: 0010:find_match+0x8c4/0xf50 net/ipv6/route.c:758
Code: 14 fd f7 48 8b 85 38 ff ff ff 48 c7 45 b0 00 00 00 00 48 8d b8 5c 06 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 19
RSP: 0018:ffffc900034af070 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90004521000
RDX: 00000000000000cb RSI: ffffffff8990d0cd RDI: 000000000000065c
RBP: ffffc900034af150 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000002 R12: 000000000000000a
R13: 1ffff92000695e18 R14: ffff8880244a1d20 R15: 0000000000000000
FS:  00007f4844a5a6c0(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b31b27000 CR3: 000000002d42c000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  rt6_nh_find_match+0xfa/0x1a0 net/ipv6/route.c:784
  nexthop_for_each_fib6_nh+0x26d/0x4a0 net/ipv4/nexthop.c:1496
  __find_rr_leaf+0x6e7/0xe00 net/ipv6/route.c:825
  find_rr_leaf net/ipv6/route.c:853 [inline]
  rt6_select net/ipv6/route.c:897 [inline]
  fib6_table_lookup+0x57e/0xa30 net/ipv6/route.c:2195
  ip6_pol_route+0x1cd/0x1150 net/ipv6/route.c:2231
  pol_lookup_func include/net/ip6_fib.h:616 [inline]
  fib6_rule_lookup+0x386/0x720 net/ipv6/fib6_rules.c:121
  ip6_route_output_flags_noref net/ipv6/route.c:2639 [inline]
  ip6_route_output_flags+0x1d0/0x640 net/ipv6/route.c:2651
  ip6_dst_lookup_tail.constprop.0+0x961/0x1760 net/ipv6/ip6_output.c:1147
  ip6_dst_lookup_flow+0x99/0x1d0 net/ipv6/ip6_output.c:1250
  rawv6_sendmsg+0xdab/0x4340 net/ipv6/raw.c:898
  inet_sendmsg+0x119/0x140 net/ipv4/af_inet.c:853
  sock_sendmsg_nosec net/socket.c:730 [inline]
  __sock_sendmsg net/socket.c:745 [inline]
  sock_write_iter+0x4b8/0x5c0 net/socket.c:1160
  new_sync_write fs/read_write.c:497 [inline]
  vfs_write+0x6b6/0x1140 fs/read_write.c:590
  ksys_write+0x1f8/0x260 fs/read_write.c:643
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 52e1635631 ("[IPV6]: ROUTE: Add router_probe_interval sysctl.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240615151454.166404-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-17 18:04:23 -07:00
Eric Dumazet 2eab4543a2 ipv6: prevent possible NULL deref in fib6_nh_init()
syzbot reminds us that in6_dev_get() can return NULL.

fib6_nh_init()
    ip6_validate_gw(  &idev  )
        ip6_route_check_nh(  idev  )
            *idev = in6_dev_get(dev); // can be NULL

Oops: general protection fault, probably for non-canonical address 0xdffffc00000000bc: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x00000000000005e0-0x00000000000005e7]
CPU: 0 PID: 11237 Comm: syz-executor.3 Not tainted 6.10.0-rc2-syzkaller-00249-gbe27b8965297 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
 RIP: 0010:fib6_nh_init+0x640/0x2160 net/ipv6/route.c:3606
Code: 00 00 fc ff df 4c 8b 64 24 58 48 8b 44 24 28 4c 8b 74 24 30 48 89 c1 48 89 44 24 28 48 8d 98 e0 05 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 38 84 c0 0f 85 b3 17 00 00 8b 1b 31 ff 89 de e8 b8 8b
RSP: 0018:ffffc900032775a0 EFLAGS: 00010202
RAX: 00000000000000bc RBX: 00000000000005e0 RCX: 0000000000000000
RDX: 0000000000000010 RSI: ffffc90003277a54 RDI: ffff88802b3a08d8
RBP: ffffc900032778b0 R08: 00000000000002fc R09: 0000000000000000
R10: 00000000000002fc R11: 0000000000000000 R12: ffff88802b3a08b8
R13: 1ffff9200064eec8 R14: ffffc90003277a00 R15: dffffc0000000000
FS:  00007f940feb06c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000245e8000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  ip6_route_info_create+0x99e/0x12b0 net/ipv6/route.c:3809
  ip6_route_add+0x28/0x160 net/ipv6/route.c:3853
  ipv6_route_ioctl+0x588/0x870 net/ipv6/route.c:4483
  inet6_ioctl+0x21a/0x280 net/ipv6/af_inet6.c:579
  sock_do_ioctl+0x158/0x460 net/socket.c:1222
  sock_ioctl+0x629/0x8e0 net/socket.c:1341
  vfs_ioctl fs/ioctl.c:51 [inline]
  __do_sys_ioctl fs/ioctl.c:907 [inline]
  __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f940f07cea9

Fixes: 428604fb11 ("ipv6: do not set routes if disable_ipv6 has been enabled")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240614082002.26407-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-17 18:01:56 -07:00
Eric Dumazet 9e046bb111 tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack()
Some applications were reporting ETIMEDOUT errors on apparently
good looking flows, according to packet dumps.

We were able to root cause the issue to an accidental setting
of tp->retrans_stamp in the following scenario:

- client sends TFO SYN with data.
- server has TFO disabled, ACKs only SYN but not payload.
- client receives SYNACK covering only SYN.
- tcp_ack() eats SYN and sets tp->retrans_stamp to 0.
- tcp_rcv_fastopen_synack() calls tcp_xmit_retransmit_queue()
  to retransmit TFO payload w/o SYN, sets tp->retrans_stamp to "now",
  but we are not in any loss recovery state.
- TFO payload is ACKed.
- we are not in any loss recovery state, and don't see any dupacks,
  so we don't get to any code path that clears tp->retrans_stamp.
- tp->retrans_stamp stays non-zero for the lifetime of the connection.
- after first RTO, tcp_clamp_rto_to_user_timeout() clamps second RTO
  to 1 jiffy due to bogus tp->retrans_stamp.
- on clamped RTO with non-zero icsk_retransmits, retransmits_timed_out()
  sets start_ts from tp->retrans_stamp from TFO payload retransmit
  hours/days ago, and computes bogus long elapsed time for loss recovery,
  and suffers ETIMEDOUT early.

Fixes: a7abf3cd76 ("tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()")
CC: stable@vger.kernel.org
Co-developed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Co-developed-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240614130615.396837-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-17 17:50:39 -07:00
Gavrilov Ilia 0b9130247f netrom: Fix a memory leak in nr_heartbeat_expiry()
syzbot reported a memory leak in nr_create() [0].

Commit 409db27e3a ("netrom: Fix use-after-free of a listening socket.")
added sock_hold() to the nr_heartbeat_expiry() function, where
a) a socket has a SOCK_DESTROY flag or
b) a listening socket has a SOCK_DEAD flag.

But in the case "a," when the SOCK_DESTROY flag is set, the file descriptor
has already been closed and the nr_release() function has been called.
So it makes no sense to hold the reference count because no one will
call another nr_destroy_socket() and put it as in the case "b."

nr_connect
  nr_establish_data_link
    nr_start_heartbeat

nr_release
  switch (nr->state)
  case NR_STATE_3
    nr->state = NR_STATE_2
    sock_set_flag(sk, SOCK_DESTROY);

                        nr_rx_frame
                          nr_process_rx_frame
                            switch (nr->state)
                            case NR_STATE_2
                              nr_state2_machine()
                                nr_disconnect()
                                  nr_sk(sk)->state = NR_STATE_0
                                  sock_set_flag(sk, SOCK_DEAD)

                        nr_heartbeat_expiry
                          switch (nr->state)
                          case NR_STATE_0
                            if (sock_flag(sk, SOCK_DESTROY) ||
                               (sk->sk_state == TCP_LISTEN
                                 && sock_flag(sk, SOCK_DEAD)))
                               sock_hold()  // ( !!! )
                               nr_destroy_socket()

To fix the memory leak, let's call sock_hold() only for a listening socket.

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with Syzkaller.

[0]: https://syzkaller.appspot.com/bug?extid=d327a1f3b12e1e206c16

Reported-by: syzbot+d327a1f3b12e1e206c16@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d327a1f3b12e1e206c16
Fixes: 409db27e3a ("netrom: Fix use-after-free of a listening socket.")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-17 13:06:23 +01:00
Jakub Kicinski 7ed352d34f netdev-genl: fix error codes when outputting XDP features
-EINVAL will interrupt the dump. The correct error to return
if we have more data to dump is -EMSGSIZE.

Discovered by doing:

  for i in `seq 80`; do ip link add type veth; done
  ./cli.py --dbg-small-recv 5300 --spec netdev.yaml --dump dev-get >> /dev/null
  [...]
     nl_len = 64 (48) nl_flags = 0x0 nl_type = 19
     nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
  	error: -22

Fixes: d3d854fd6a ("netdev-genl: create a simple family for netdev stuff")
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240613213044.3675745-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14 18:04:29 -07:00
Jakub Kicinski c64da10adb bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZmykPwAKCRDbK58LschI
 g7LOAQDVPkJ9k50/xrWIBtgvkGq1jCrMlpwEh49QYO0xoqh1IgEA+6Xje9jCIsdp
 AHz9WmZ6G0EpTuDgFq50K1NVZ7MgSQE=
 =zKfv
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-06-14

We've added 8 non-merge commits during the last 2 day(s) which contain
a total of 9 files changed, 92 insertions(+), 11 deletions(-).

The main changes are:

1) Silence a syzkaller splat under CONFIG_DEBUG_NET=y in pskb_pull_reason()
   triggered via __bpf_try_make_writable(), from Florian Westphal.

2) Fix removal of kfuncs during linking phase which then throws a kernel
   build warning via resolve_btfids about unresolved symbols,
   from Tony Ambardar.

3) Fix a UML x86_64 compilation failure from BPF as pcpu_hot symbol
   is not available on User Mode Linux, from Maciej Żenczykowski.

4) Fix a register corruption in reg_set_min_max triggering an invariant
   violation in BPF verifier, from Daniel Borkmann.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Harden __bpf_kfunc tag against linker kfunc removal
  compiler_types.h: Define __retain for __attribute__((__retain__))
  bpf: Avoid splat in pskb_pull_reason
  bpf: fix UML x86_64 compile failure
  selftests/bpf: Add test coverage for reg_set_min_max handling
  bpf: Reduce stack consumption in check_stack_write_fixed_off
  bpf: Fix reg_set_min_max corruption of fake_reg
  MAINTAINERS: mailmap: Update Stanislav's email address
====================

Link: https://lore.kernel.org/r/20240614203223.26500-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14 17:57:10 -07:00
Florian Westphal 2bbe3e5a2f bpf: Avoid splat in pskb_pull_reason
syzkaller builds (CONFIG_DEBUG_NET=y) frequently trigger a debug
hint in pskb_may_pull.

We'd like to retain this debug check because it might hint at integer
overflows and other issues (kernel code should pull headers, not huge
value).

In bpf case, this splat isn't interesting at all: such (nonsensical)
bpf programs are typically generated by a fuzzer anyway.

Do what Eric suggested and suppress such warning.

For CONFIG_DEBUG_NET=n we don't need the extra check because
pskb_may_pull will do the right thing: return an error without the
WARN() backtrace.

Fixes: 219eee9c0d ("net: skbuff: add overflow debug check to pull/push helpers")
Reported-by: syzbot+0c4150bff9fff3bf023c@syzkaller.appspotmail.com
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://syzkaller.appspot.com/bug?extid=0c4150bff9fff3bf023c
Link: https://lore.kernel.org/netdev/9f254c96-54f2-4457-b7ab-1d9f6187939c@gmail.com/
Link: https://lore.kernel.org/bpf/20240614101801.9496-1-fw@strlen.de
2024-06-14 17:20:21 +02:00
Jakub Kicinski 1afe4a6437 Various fixes:
* cfg80211: wext scan
  * mac80211: monitor regression, scan counted_by, offload
  * iwlwifi: locking, 6 GHz scan, remain-on-channel
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmZsBYsACgkQ10qiO8sP
 aABscRAAi2MEj3YsuP4BshjLSLESJB9GvgePfxPuu1sfpTaoUO3RNwX0eyy751tL
 nYFogzP//JKT4UWPYRFBHNDa9xj17BNamaV5vI27ee0M+dUNfVj1oKzY0V7V/d9T
 0s6DOBbUtrJ+mLou+EzwpAyBIdc0vSIkiExGKXCKbINfG7v2twA0IYgaDExIKXi/
 TsLYgPCLuKC3VQDdLp4GQqpA/BwudrjzyArqJm3Q9aGrYGihWBw1B5XAMjNSlYPE
 p00ZgboQrDAKv5eIW6Az9UI6CkhWjW+RrxqmHSxs4NJNC/sySpU28PliXFqAz7c2
 HTvkyZIBFt8//r9K9RCrhKjHq8w6JfVzSZ9z0Swh432ARpm3b2NfzSRgGaToJBWS
 udFieqgQY9r7BMWhQ73eFyvEn24SrcxwL0cHMrleeSibq8E7uoAYFZyUNMit5glt
 Lk5fJbRpbGD8lNyrgf6M++4IKaP71K07525ZuCe2NoBNsY6eHRf0Ba2ffZhF/YwB
 Mc6H89w8XtG/k4AxcGZM2Be0FLvbp0gNpIMpWfP8oPjGJAUM4VyTFwH/T/aR6GmB
 0BB5QQiIBSklaV7u4BROu/7Igjl3msBVh0r6u7b610T4g6JYiU4TG75z2tpD0c2L
 vXQQHuWjZQBWTEGu22022ryyGCIEzdMaonuhjCVOhg9azcb8EmY=
 =KA58
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2024-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Various fixes:
 * cfg80211: wext scan
 * mac80211: monitor regression, scan counted_by, offload
 * iwlwifi: locking, 6 GHz scan, remain-on-channel

* tag 'wireless-2024-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: mac80211: fix monitor channel with chanctx emulation
  wifi: mac80211: Avoid address calculations via out of bounds array indexing
  wifi: mac80211: Recalc offload when monitor stop
  wifi: iwlwifi: scan: correctly check if PSC listen period is needed
  wifi: iwlwifi: mvm: fix ROC version check
  wifi: iwlwifi: mvm: unlock mvm mutex
  wifi: cfg80211: wext: add extra SIOCSIWSCAN data check
  wifi: cfg80211: wext: set ssids=NULL for passive scans
====================

Link: https://lore.kernel.org/r/20240614085710.24103-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14 08:13:18 -07:00
Ondrej Mosnacek 89aa3619d1 cipso: make cipso_v4_skbuff_delattr() fully remove the CIPSO options
As the comment in this function says, the code currently just clears the
CIPSO part with IPOPT_NOP, rather than removing it completely and
trimming the packet. The other cipso_v4_*_delattr() functions, however,
do the proper removal and also calipso_skbuff_delattr() makes an effort
to remove the CALIPSO options instead of replacing them with padding.

Some routers treat IPv4 packets with anything (even NOPs) in the option
header as a special case and take them through a slower processing path.
Consequently, hardening guides such as STIG recommend to configure such
routers to drop packets with non-empty IP option headers [1][2]. Thus,
users might expect NetLabel to produce packets with minimal padding (or
at least with no padding when no actual options are present).

Implement the proper option removal to address this and to be closer to
what the peer functions do.

[1] https://www.stigviewer.com/stig/juniper_router_rtr/2019-09-27/finding/V-90937
[2] https://www.stigviewer.com/stig/cisco_ios_xe_router_rtr/2021-03-26/finding/V-217001

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-14 08:19:54 +01:00
Ondrej Mosnacek 9f36169912 cipso: fix total option length computation
As evident from the definition of ip_options_get(), the IP option
IPOPT_END is used to pad the IP option data array, not IPOPT_NOP. Yet
the loop that walks the IP options to determine the total IP options
length in cipso_v4_delopt() doesn't take IPOPT_END into account.

Fix it by recognizing the IPOPT_END value as the end of actual options.

Fixes: 014ab19a69 ("selinux: Set socket NetLabel based on connection endpoint")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-14 08:18:49 +01:00
Johannes Berg 0d9c2beed1 wifi: mac80211: fix monitor channel with chanctx emulation
After the channel context emulation, there were reports that
changing the monitor channel no longer works. This is because
those drivers don't have WANT_MONITOR_VIF, so the setting the
channel always exits out quickly.

Fix this by always allocating the virtual monitor sdata, and
simply not telling the driver about it unless it wanted to.
This way, we have an interface/sdata to bind the chanctx to,
and the emulation can work correctly.

Cc: stable@vger.kernel.org
Fixes: 0a44dfc070 ("wifi: mac80211: simplify non-chanctx drivers")
Reported-and-tested-by: Savyasaachi Vanga <savyasaachiv@gmail.com>
Closes: https://lore.kernel.org/r/chwoymvpzwtbmzryrlitpwmta5j6mtndocxsyqvdyikqu63lon@gfds653hkknl
Link: https://msgid.link/20240612122351.b12d4a109dde.I1831a44417faaab92bea1071209abbe4efbe3fba@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-14 09:14:08 +02:00
Kenton Groombridge 2663d0462e wifi: mac80211: Avoid address calculations via out of bounds array indexing
req->n_channels must be set before req->channels[] can be used.

This patch fixes one of the issues encountered in [1].

[   83.964255] UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:364:4
[   83.964258] index 0 is out of range for type 'struct ieee80211_channel *[]'
[...]
[   83.964264] Call Trace:
[   83.964267]  <TASK>
[   83.964269]  dump_stack_lvl+0x3f/0xc0
[   83.964274]  __ubsan_handle_out_of_bounds+0xec/0x110
[   83.964278]  ieee80211_prep_hw_scan+0x2db/0x4b0
[   83.964281]  __ieee80211_start_scan+0x601/0x990
[   83.964291]  nl80211_trigger_scan+0x874/0x980
[   83.964295]  genl_family_rcv_msg_doit+0xe8/0x160
[   83.964298]  genl_rcv_msg+0x240/0x270
[...]

[1] https://bugzilla.kernel.org/show_bug.cgi?id=218810

Co-authored-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Kenton Groombridge <concord@gentoo.org>
Link: https://msgid.link/20240605152218.236061-1-concord@gentoo.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-14 09:08:41 +02:00
Linus Torvalds d20f6b3d74 Including fixes from bluetooth and netfilter.
Current release - regressions:
 
  - Revert "igc: fix a log entry using uninitialized netdev",
    it traded lack of netdev name in a printk() for a crash
 
 Previous releases - regressions:
 
  - Bluetooth: L2CAP: fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
 
  - geneve: fix incorrectly setting lengths of inner headers in the skb,
    confusing the drivers and causing mangled packets
 
  - sched: initialize noop_qdisc owner to avoid false-positive recursion
    detection (recursing on CPU 0), which bubbles up to user space as
    a sendmsg() error, while noop_qdisc should silently drop
 
  - netdevsim: fix backwards compatibility in nsim_get_iflink()
 
 Previous releases - always broken:
 
  - netfilter: ipset: fix race between namespace cleanup and gc
    in the list:set type
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZrFjoACgkQMUZtbf5S
 Iru6Bw/+MfomIf6qvdCXKdka4eOeqZLg7gZU0UdC99VM1SH7QGazkAvj4ACbDMa7
 04mgNZKquV5Fx6AJQwjAodzHx2KUl5WA5cWzAuLyA78lJXoipI7W+KRtcBzGl0gs
 IQ+IQCofWjduLMc9y67TqTSnVhtDWaHWw6PwMW8Z4BotD9hXxoUeGXz373UA8xhW
 2Wz1HkQbDqIFqc0Sp1c0IfAQtnzzvg4yC+KCV+2nHB/d8CAlCUJ6deVWbCtF8d5O
 /ospqFykzkENbYh8ySMEs6bAH0mS2nMiLPRnoLW1b2vMQWgOwv8xYVaYHI5tP+7u
 NxMZd4JQntBLhe8jV3sc6ciPnlPSDu6rNDwWJcvK26EHPXYg/opsihH18nMu1esO
 fp//KvKz8BT4vrkAW+YpxaD86V1X0dKkPIr2qFQ3eMHF8A1p+lYcGiWd1BQNPj5A
 HHX1ERTVHxyl1nH2wy0FHhPXt1k5SzUT9AS0PyBou14stwN1O8VHHmGrTbu+CHe5
 /P1jJ9DNDGO6LdDr60W9r+ucyvGYGxoZe09NQOiBXYnJbb1Xq5Allh+d6O+oyT0y
 kM1jsPt2360nF2TZ8lMpn+R+OfTdOaQMw5nHXd+XFX0VktQ/231vW9L/dRfcOt6C
 ESuaDHz0Q1DE8PI/dfrxRQLDG7UckN27aTHdn+ZHkq4VjdUPUdk=
 =cyRR
 -----END PGP SIGNATURE-----

Merge tag 'net-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth and netfilter.

  Slim pickings this time, probably a combination of summer, DevConf.cz,
  and the end of first half of the year at corporations.

  Current release - regressions:

   - Revert "igc: fix a log entry using uninitialized netdev", it traded
     lack of netdev name in a printk() for a crash

  Previous releases - regressions:

   - Bluetooth: L2CAP: fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ

   - geneve: fix incorrectly setting lengths of inner headers in the
     skb, confusing the drivers and causing mangled packets

   - sched: initialize noop_qdisc owner to avoid false-positive
     recursion detection (recursing on CPU 0), which bubbles up to user
     space as a sendmsg() error, while noop_qdisc should silently drop

   - netdevsim: fix backwards compatibility in nsim_get_iflink()

  Previous releases - always broken:

   - netfilter: ipset: fix race between namespace cleanup and gc in the
     list:set type"

* tag 'net-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
  bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send()
  af_unix: Read with MSG_PEEK loops if the first unread byte is OOB
  bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response
  gve: Clear napi->skb before dev_kfree_skb_any()
  ionic: fix use after netif_napi_del()
  Revert "igc: fix a log entry using uninitialized netdev"
  net: bridge: mst: fix suspicious rcu usage in br_mst_set_state
  net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state
  net/ipv6: Fix the RT cache flush via sysctl using a previous delay
  net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters
  gve: ignore nonrelevant GSO type bits when processing TSO headers
  net: pse-pd: Use EOPNOTSUPP error code instead of ENOTSUPP
  netfilter: Use flowlabel flow key when re-routing mangled packets
  netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type
  netfilter: nft_inner: validate mandatory meta and payload
  tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
  mailmap: map Geliang's new email address
  mptcp: pm: update add_addr counters after connect
  mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID
  mptcp: ensure snd_una is properly initialized on connect
  ...
2024-06-13 11:11:53 -07:00
Linus Torvalds fd88e181d8 NFS client bugfixes for Linux 6.10
Bugfixes:
  - NFSv4.2: Fix a memory leak in nfs4_set_security_label
  - NFSv2/v3: abort nfs_atomic_open_v23 if the name is too long.
  - NFS: Add appropriate memory barriers to the sillyrename code
  - Propagate readlink errors in nfs_symlink_filler
  - NFS: don't invalidate dentries on transient errors
  - NFS: fix unnecessary synchronous writes in random write workloads
  - NFSv4.1: enforce rootpath check when deciding whether or not to trunk
 
 Other:
  - Change email address for Trond Myklebust due to email server concerns
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmZrD7kACgkQZwvnipYK
 APJo/RAAlg5uU8kTzXCbUbe9LImF5nh+9XFtg6nnQ+rxQqCU17noT0bazghvBcDP
 N6v4evWJVeZhnqspVZkdMQWeyNEqsew5uPRoC+gy0sh4RdwT+BHsMwLMWtNTzXoc
 GW7DOJ7LePzxmh0bksIFn6vmsuhxyI7hKkDx8XuG0YjmHQDcl2TeyHLfii7TFIMP
 hFEVw63ZFb+HKV0oInyP27iiM1HstvZ8MbxLcu1PoA3IaiNUYXUgBViWF2c5P6uY
 KV7KynUMgkWQc289aaR7QE5Yz2f4vsYF4oD72+Z3v65W+5HunYut/HUnUGgjHPGq
 dI5EwSgxW5YKoo/kIvto3yF+ppkppl2gUYFlN3+O/IEXwh+FTXBF2b/tUWFkKQPH
 7X3YoosWV/WN1eWqa55znF5IzrG5gdR5z6Et1elTmn4xG4hwoC5U5lOP34DohS4Q
 N/MMxzVcL348j2teN+dFNXM3WkEVaMJudavJ7A7KehZKSTZAuFNHDxsMjvyGBbGI
 Za1DanWSCWBD8Bawt9hB8z+k4eN7dGfeWUgMHa2zxOowfeq0MYTjAlr1A8SuXADv
 E+1tjL7CL7HeHReOg0cbP11BxLOlXYiObsbivFcbYGglbWwtPqR4q4+CIA6AuhCF
 LSpxF/6uKCXUHCuuGdAgZ5nNApHvC+HoUN0gCBcpIALcazLq0TQ=
 =DeAG
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fixes from Trond Myklebust:
 "Bugfixes:
   - NFSv4.2: Fix a memory leak in nfs4_set_security_label
   - NFSv2/v3: abort nfs_atomic_open_v23 if the name is too long.
   - NFS: Add appropriate memory barriers to the sillyrename code
   - Propagate readlink errors in nfs_symlink_filler
   - NFS: don't invalidate dentries on transient errors
   - NFS: fix unnecessary synchronous writes in random write workloads
   - NFSv4.1: enforce rootpath check when deciding whether or not to trunk

  Other:
   - Change email address for Trond Myklebust due to email server concerns"

* tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: add barriers when testing for NFS_FSDATA_BLOCKED
  SUNRPC: return proper error from gss_wrap_req_priv
  NFSv4.1 enforce rootpath check in fs_location query
  NFS: abort nfs_atomic_open_v23 if name is too long.
  nfs: don't invalidate dentries on transient errors
  nfs: Avoid flushing many pages with NFS_FILE_SYNC
  nfs: propagate readlink errors in nfs_symlink_filler
  MAINTAINERS: Change email address for Trond Myklebust
  NFSv4: Fix memory leak in nfs4_set_security_label
2024-06-13 11:07:32 -07:00
Rao Shoaib a6736a0add af_unix: Read with MSG_PEEK loops if the first unread byte is OOB
Read with MSG_PEEK flag loops if the first byte to read is an OOB byte.
commit 22dd70eb2c ("af_unix: Don't peek OOB data without MSG_OOB.")
addresses the loop issue but does not address the issue that no data
beyond OOB byte can be read.

>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
>>> c1.send(b'a', MSG_OOB)
1
>>> c1.send(b'b')
1
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'b'

>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
>>> c2.setsockopt(SOL_SOCKET, SO_OOBINLINE, 1)
>>> c1.send(b'a', MSG_OOB)
1
>>> c1.send(b'b')
1
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'b'
>>>

Fixes: 314001f0bf ("af_unix: Add OOB support")
Signed-off-by: Rao Shoaib <Rao.Shoaib@oracle.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240611084639.2248934-1-Rao.Shoaib@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13 08:03:55 -07:00
Nikolay Aleksandrov 546ceb1dfd net: bridge: mst: fix suspicious rcu usage in br_mst_set_state
I converted br_mst_set_state to RCU to avoid a vlan use-after-free
but forgot to change the vlan group dereference helper. Switch to vlan
group RCU deref helper to fix the suspicious rcu usage warning.

Fixes: 3a7c1661ae ("net: bridge: mst: fix vlan use-after-free")
Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240609103654.914987-3-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12 18:24:24 -07:00
Nikolay Aleksandrov 36c92936e8 net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state
Pass the already obtained vlan group pointer to br_mst_vlan_set_state()
instead of dereferencing it again. Each caller has already correctly
dereferenced it for their context. This change is required for the
following suspicious RCU dereference fix. No functional changes
intended.

Fixes: 3a7c1661ae ("net: bridge: mst: fix vlan use-after-free")
Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240609103654.914987-2-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12 18:24:24 -07:00
Petr Pavlu 14a20e5b4a net/ipv6: Fix the RT cache flush via sysctl using a previous delay
The net.ipv6.route.flush system parameter takes a value which specifies
a delay used during the flush operation for aging exception routes. The
written value is however not used in the currently requested flush and
instead utilized only in the next one.

A problem is that ipv6_sysctl_rtcache_flush() first reads the old value
of net->ipv6.sysctl.flush_delay into a local delay variable and then
calls proc_dointvec() which actually updates the sysctl based on the
provided input.

Fix the problem by switching the order of the two operations.

Fixes: 4990509f19 ("[NETNS][IPV6]: Make sysctls route per namespace.")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607112828.30285-1-petr.pavlu@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12 17:51:35 -07:00
Jakub Kicinski d92589f8fd netfilter pull request 24-06-11
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmZoxygACgkQ1V2XiooU
 IOSznA/+IofQMauyDxkynJAVVQSPH0Ee8kbeGj0jJWVTZVtLejZ/nLenGbsVGL16
 SPv6QbiHIRCCbJ/Py+LlkXJ8iXF0ka2bu8p5IYQ2ic949JSay32XAaK4EZdQTtTS
 3gx3cM2PKIYY4yi9K0qDpMsC7ZyKjAlVVjOrBPiP8viYtXcgoVbMq1/E8kAUf/G1
 YB+occVHk6NDY6MJE+ooYwhssv4qSaPNHNHmQDQFHhQ4cdVKzT93do55Hu4/K7DN
 6LlXAW+YpVeSmopwARrc+FmchwxyIoSQsh26Yn2Q/LiE9kQGeLQ34+nZjtmp/4aD
 MkThTSk+ImJO1kuLibC2m84bg94c0dfDk/p5Gbcr1l0DxqOseYOz1hfnNTvfcBE8
 4m30+jdtFabDy2pJKIEC730+UW5TrH+0f8HWazGOYJxvJOpip/ZiVjzcp57EybJS
 y3GxExVWGab3hl+w2wfFfM5uHWosfXfJzKjdKMJNswpcZ89QR4/GqAZAvPMh8ljW
 dxhwcRQ6IEy6B6yVnQ9dq9W9aGfsojIHqcK3HXdp3xvSnW2ZcRD89EL+nw1xfRzw
 gl/89/EnTjKywCAJ1XhKh/WwFT5r+b8RGqDTH4aeT4OBrR0v2/Z9RUmXW9wF+C0t
 nll1UlwdZqE0E0VvhhvwGGgF0Fr8nfLC1taIE0YW1/u9DxWleHs=
 =yxF0
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patch #1 fixes insufficient sanitization of netlink attributes for the
	 inner expression which can trigger nul-pointer dereference,
	 from Davide Ornaghi.

Patch #2 address a report that there is a race condition between
         namespace cleanup and the garbage collection of the list:set
         type. This patch resolves this issue with other minor issues
	 as well, from Jozsef Kadlecsik.

Patch #3 ip6_route_me_harder() ignores flowlabel/dsfield when ip dscp
	 has been mangled, this unbreaks ip6 dscp set $v,
	 from Florian Westphal.

All of these patches address issues that are present in several releases.

* tag 'nf-24-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: Use flowlabel flow key when re-routing mangled packets
  netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type
  netfilter: nft_inner: validate mandatory meta and payload
====================

Link: https://lore.kernel.org/r/20240611220323.413713-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12 16:29:00 -07:00
Remi Pommarel 7d09e17c04 wifi: mac80211: Recalc offload when monitor stop
When a monitor interface is started, ieee80211_recalc_offload() is
called and 802.11 encapsulation offloading support get disabled so
monitor interface could get native wifi frames directly. But when
this interface is stopped there is no need to keep the 802.11
encpasulation offloading off.

This call ieee80211_recalc_offload() when monitor interface is stopped
so 802.11 encapsulation offloading gets re-activated if possible.

Fixes: 6aea26ce5a ("mac80211: rework tx encapsulation offload API")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Link: https://msgid.link/840baab454f83718e6e16fd836ac597d924e85b9.1716048326.git.repk@triplefau.lt
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-12 12:29:25 +02:00
Dmitry Antipov 6ef09cdc5b wifi: cfg80211: wext: add extra SIOCSIWSCAN data check
In 'cfg80211_wext_siwscan()', add extra check whether number of
channels passed via 'ioctl(sock, SIOCSIWSCAN, ...)' doesn't exceed
IW_MAX_FREQUENCIES and reject invalid request with -EINVAL otherwise.

Reported-by: syzbot+253cd2d2491df77c93ac@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=253cd2d2491df77c93ac
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Link: https://msgid.link/20240531032010.451295-1-dmantipov@yandex.ru
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-12 10:07:56 +02:00
Johannes Berg 0941772342 wifi: cfg80211: wext: set ssids=NULL for passive scans
In nl80211, we always set the ssids of a scan request to
NULL when n_ssids==0 (passive scan). Drivers have relied
on this behaviour in the past, so we fixed it in 6 GHz
scan requests as well, and added a warning so we'd have
assurance the API would always be called that way.

syzbot found that wext doesn't ensure that, so we reach
the check and trigger the warning. Fix the wext code to
set the ssids pointer to NULL when there are none.

Reported-by: syzbot+cd6135193ba6bb9ad158@syzkaller.appspotmail.com
Fixes: f7a8b10bfd ("wifi: cfg80211: fix 6 GHz scan request building")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-12 10:06:58 +02:00
Jakub Kicinski f6b2f578df bluetooth pull request for net:
- hci_sync: Fix not using correct handle
  - L2CAP: Fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
  - L2CAP: fix connection setup in l2cap_connect
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmZnBfgZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKXIHD/45N5EpMuwOkHMeXDO9FnNa
 LKAD2t1weNDd8SOqu96jmyxxfLwkU2b/mrSkn5gM+y8g6NNrCH1B+Pp7yRzZE1C4
 nMoxTUM26YY4AoznmWb56JgpTfXMsjUMAan6hZUnRdSKtycUOArDJdFwjACRwIkp
 KtpaUkRPq70kAV/OOtaB5aCn5GuCDIg10vyFhTD8ieP7d1DcRKyw2CJcnu+9s9ao
 KkvTiEICPGzmHOmOw5QXIUr5MUuhHfGsbk3oEE7dAgNd0E6yOgPa6Cs7KfNz3x9i
 gyCjJm9vOIr6xozCwIpwUhlPVbvH1aKY0LECLacK7vUYFTrcE6IimGi6HZcSqfsP
 G/WHMMEMJkLG53FeHVSSX9zzWN8Rvw0hBorW4EZsQm0RSbzhxQInMda05nSTJCos
 UuNSQCTTynkt27q/UqJj/NGAwkCL5f86UnrDk1Jb9O6r4QKlcQTxS40FSqXpy5nY
 6scPyagLdIEWuWM9eUTqtHP0Ax8ROHfwGvGRONFvhTZmRAEcIU3VtwkXV2OGLVhy
 E+yZTsW3EUnKRQRL30/N2KdTIBgH6V6stn5Bf0mX5pFoykNxSf+scBtvEcRDyUKu
 3pc47Q/iypzfa/VFDuXAXhx+IgKp+7d/cPpH+6MTpSGtZbyhkeK0eX112kPijF15
 sRbbDuQYZg11aVPThMhC0A==
 =mmLZ
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2024-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - hci_sync: fix not using correct handle
 - L2CAP: fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
 - L2CAP: fix connection setup in l2cap_connect

* tag 'for-net-2024-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: fix connection setup in l2cap_connect
  Bluetooth: L2CAP: Fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
  Bluetooth: hci_sync: Fix not using correct handle
====================

Link: https://lore.kernel.org/r/20240610135803.920662-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-11 19:40:27 -07:00
Florian Westphal 6f8f132cc7 netfilter: Use flowlabel flow key when re-routing mangled packets
'ip6 dscp set $v' in an nftables outpute route chain has no effect.
While nftables does detect the dscp change and calls the reroute hook.
But ip6_route_me_harder never sets the dscp/flowlabel:
flowlabel/dsfield routing rules are ignored and no reroute takes place.

Thanks to Yi Chen for an excellent reproducer script that I used
to validate this change.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-11 18:46:04 +02:00
Jozsef Kadlecsik 4e7aaa6b82 netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type
Lion Ackermann reported that there is a race condition between namespace cleanup
in ipset and the garbage collection of the list:set type. The namespace
cleanup can destroy the list:set type of sets while the gc of the set type is
waiting to run in rcu cleanup. The latter uses data from the destroyed set which
thus leads use after free. The patch contains the following parts:

- When destroying all sets, first remove the garbage collectors, then wait
  if needed and then destroy the sets.
- Fix the badly ordered "wait then remove gc" for the destroy a single set
  case.
- Fix the missing rcu locking in the list:set type in the userspace test
  case.
- Use proper RCU list handlings in the list:set type.

The patch depends on c1193d9bbb (netfilter: ipset: Add list flush to cancel_gc).

Fixes: 97f7cf1cd8 (netfilter: ipset: fix performance regression in swap operation)
Reported-by: Lion Ackermann <nnamrec@gmail.com>
Tested-by: Lion Ackermann <nnamrec@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-11 18:46:04 +02:00
Davide Ornaghi c4ab9da85b netfilter: nft_inner: validate mandatory meta and payload
Check for mandatory netlink attributes in payload and meta expression
when used embedded from the inner expression, otherwise NULL pointer
dereference is possible from userspace.

Fixes: a150d122b6 ("netfilter: nft_meta: add inner match support")
Fixes: 3a07327d10 ("netfilter: nft_inner: support for inner tunnel header matching")
Signed-off-by: Davide Ornaghi <d.ornaghi97@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-11 18:46:04 +02:00
Eric Dumazet 36534d3c54 tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
Due to timer wheel implementation, a timer will usually fire
after its schedule.

For instance, for HZ=1000, a timeout between 512ms and 4s
has a granularity of 64ms.
For this range of values, the extra delay could be up to 63ms.

For TCP, this means that tp->rcv_tstamp may be after
inet_csk(sk)->icsk_timeout whenever the timer interrupt
finally triggers, if one packet came during the extra delay.

We need to make sure tcp_rtx_probe0_timed_out() handles this case.

Fixes: e89688e3e9 ("net: tcp: fix unexcepted socket die when snd_wnd is 0")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Menglong Dong <imagedong@tencent.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240607125652.1472540-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10 19:50:10 -07:00
YonglongLi 40eec1795c mptcp: pm: update add_addr counters after connect
The creation of new subflows can fail for different reasons. If no
subflow have been created using the received ADD_ADDR, the related
counters should not be updated, otherwise they will never be decremented
for events related to this ID later on.

For the moment, the number of accepted ADD_ADDR is only decremented upon
the reception of a related RM_ADDR, and only if the remote address ID is
currently being used by at least one subflow. In other words, if no
subflow can be created with the received address, the counter will not
be decremented. In this case, it is then important not to increment
pm.add_addr_accepted counter, and not to modify pm.accept_addr bit.

Note that this patch does not modify the behaviour in case of failures
later on, e.g. if the MP Join is dropped or rejected.

The "remove invalid addresses" MP Join subtest has been modified to
validate this case. The broadcast IP address is added before the "valid"
address that will be used to successfully create a subflow, and the
limit is decreased by one: without this patch, it was not possible to
create the last subflow, because:

- the broadcast address would have been accepted even if it was not
  usable: the creation of a subflow to this address results in an error,

- the limit of 2 accepted ADD_ADDR would have then been reached.

Fixes: 01cacb00b3 ("mptcp: add netlink-based PM")
Cc: stable@vger.kernel.org
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: YonglongLi <liyonglong@chinatelecom.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-3-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10 19:49:10 -07:00
YonglongLi 6a09788c1a mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID
The RmAddr MIB counter is supposed to be incremented once when a valid
RM_ADDR has been received. Before this patch, it could have been
incremented as many times as the number of subflows connected to the
linked address ID, so it could have been 0, 1 or more than 1.

The "RmSubflow" is incremented after a local operation. In this case,
it is normal to tied it with the number of subflows that have been
actually removed.

The "remove invalid addresses" MP Join subtest has been modified to
validate this case. A broadcast IP address is now used instead: the
client will not be able to create a subflow to this address. The
consequence is that when receiving the RM_ADDR with the ID attached to
this broadcast IP address, no subflow linked to this ID will be found.

Fixes: 7a7e52e38a ("mptcp: add RM_ADDR related mibs")
Cc: stable@vger.kernel.org
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: YonglongLi <liyonglong@chinatelecom.cn>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-2-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10 19:49:10 -07:00
Paolo Abeni 8031b58c3a mptcp: ensure snd_una is properly initialized on connect
This is strictly related to commit fb7a0d3348 ("mptcp: ensure snd_nxt
is properly initialized on connect"). It turns out that syzkaller can
trigger the retransmit after fallback and before processing any other
incoming packet - so that snd_una is still left uninitialized.

Address the issue explicitly initializing snd_una together with snd_nxt
and write_seq.

Suggested-by: Mat Martineau <martineau@kernel.org>
Fixes: 8fd738049a ("mptcp: fallback in case of simultaneous connect")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/485
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-1-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10 19:49:10 -07:00
Johannes Berg 44180feacc net/sched: initialize noop_qdisc owner
When the noop_qdisc owner isn't initialized, then it will be 0,
so packets will erroneously be regarded as having been subject
to recursion as long as only CPU 0 queues them. For non-SMP,
that's all packets, of course. This causes a change in what's
reported to userspace, normally noop_qdisc would drop packets
silently, but with this change the syscall returns -ENOBUFS if
RECVERR is also set on the socket.

Fix this by initializing the owner field to -1, just like it
would be for dynamically allocated qdiscs by qdisc_alloc().

Fixes: 0f022d32c3 ("net/sched: Fix mirred deadlock on device recursion")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240607175340.786bfb938803.I493bf8422e36be4454c08880a8d3703cea8e421a@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10 19:36:49 -07:00
Pauli Virtanen c695439d19 Bluetooth: fix connection setup in l2cap_connect
The amp_id argument of l2cap_connect() was removed in
commit 84a4bb6548 ("Bluetooth: HCI: Remove HCI_AMP support")

It was always called with amp_id == 0, i.e. AMP_ID_BREDR == 0x00 (ie.
non-AMP controller).  In the above commit, the code path for amp_id != 0
was preserved, although it should have used the amp_id == 0 one.

Restore the previous behavior of the non-AMP code path, to fix problems
with L2CAP connections.

Fixes: 84a4bb6548 ("Bluetooth: HCI: Remove HCI_AMP support")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-06-10 09:48:30 -04:00
Luiz Augusto von Dentz 806a5198c0 Bluetooth: L2CAP: Fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
This removes the bogus check for max > hcon->le_conn_max_interval since
the later is just the initial maximum conn interval not the maximum the
stack could support which is really 3200=4000ms.

In order to pass GAP/CONN/CPUP/BV-05-C one shall probably enter values
of the following fields in IXIT that would cause hci_check_conn_params
to fail:

TSPX_conn_update_int_min
TSPX_conn_update_int_max
TSPX_conn_update_peripheral_latency
TSPX_conn_update_supervision_timeout

Link: https://github.com/bluez/bluez/issues/847
Fixes: e4b019515f ("Bluetooth: Enforce validation on max value of connection interval")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-06-10 09:48:27 -04:00
Luiz Augusto von Dentz 86fbd9f63a Bluetooth: hci_sync: Fix not using correct handle
When setting up an advertisement the code shall always attempt to use
the handle set by the instance since it may not be equal to the instance
ID.

Fixes: e77f43d531 ("Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-06-10 09:48:25 -04:00
Eric Dumazet d37fe4255a tcp: fix race in tcp_v6_syn_recv_sock()
tcp_v6_syn_recv_sock() calls ip6_dst_store() before
inet_sk(newsk)->pinet6 has been set up.

This means ip6_dst_store() writes over the parent (listener)
np->dst_cookie.

This is racy because multiple threads could share the same
parent and their final np->dst_cookie could be wrong.

Move ip6_dst_store() call after inet_sk(newsk)->pinet6
has been changed and after the copy of parent ipv6_pinfo.

Fixes: e994b2f0fb ("tcp: do not lock listener to process SYN packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 13:11:22 +01:00
Linus Torvalds eecba7c070 nfsd-6.10 fixes:
- Fix an occasional memory overwrite caused by a fix added in 6.10
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmZjB6EACgkQM2qzM29m
 f5cPXBAAstE7c7QAYCQtAwK/C/rsbr8xjAxcjivwAEOFl8znKc4ZTJ3u2tVSgAqI
 DSbjXK/JVOW10ahDLawqpnWuqTUHtGlK4CMVRdZkmfEqokulhny+FmelJ4B0ZWMQ
 fUSSdYO4ONuMRcytnFperqI0LL7XuEnSsKnE1cQ4HIybQchJyilj9c/TERIVGWzn
 eHcVke07JF+lMvHWWJoDCcHhHHtl82znDPpmrsyzOLKS+jtiaP41r2ZPtohhoPDm
 9VQmTqpSXWyexXpzgQzyu4TOWz6PPakoa701x7pW/2wZgzabdRaJhkYea64Bc8GK
 RTfr7TpSKkcnttQSb2jbb2+qG15g+f7nDbWtejqWuhEppwdj/EznuyrHi80N5Klf
 pw4+XFU8HoC15jcdltMb2Jecbv6uulQ0AcfCOLUh61E86tiy7YDChmhRLgDhzRsS
 jDvq8HxRe1CNmqoxfqIuWuWiMwyCRSSd06U7u3/eqBZeLXOxQb26bxG8GXitUPgY
 G4SCCSvExs2YqnegsZNQGB0L5YKd/zhhnkoc/Y29CU9ZhoItgHlblKODsix9kVJl
 ckN/qSZS5/rrV7IsvzsB0ywNb5VkfnkspHu7X8tlYBuwgnTHJCp25AuWVt31/6jH
 E+tW4wdvd1oBTLqpWRewclTwUrmvJYVYirS01LiTVehx/mnJdw8=
 =NvAo
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Fix an occasional memory overwrite caused by a fix added in 6.10

* tag 'nfsd-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix loop termination condition in gss_free_in_token_pages()
2024-06-07 15:07:57 -07:00
Su Hui 0dcc53abf5 net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool()
Clang static checker (scan-build) warning:
net/ethtool/ioctl.c:line 2233, column 2
Called function pointer is null (null dereference).

Return '-EOPNOTSUPP' when 'ops->get_ethtool_phy_stats' is NULL to fix
this typo error.

Fixes: 201ed315f9 ("net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240605034742.921751-1-suhui@nfschina.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 13:34:33 +02:00
Eric Dumazet b01e1c0307 ipv6: fix possible race in __fib6_drop_pcpu_from()
syzbot found a race in __fib6_drop_pcpu_from() [1]

If compiler reads more than once (*ppcpu_rt),
second read could read NULL, if another cpu clears
the value in rt6_get_pcpu_route().

Add a READ_ONCE() to prevent this race.

Also add rcu_read_lock()/rcu_read_unlock() because
we rely on RCU protection while dereferencing pcpu_rt.

[1]

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
CPU: 0 PID: 7543 Comm: kworker/u8:17 Not tainted 6.10.0-rc1-syzkaller-00013-g2bfcfd584ff5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
Workqueue: netns cleanup_net
 RIP: 0010:__fib6_drop_pcpu_from.part.0+0x10a/0x370 net/ipv6/ip6_fib.c:984
Code: f8 48 c1 e8 03 80 3c 28 00 0f 85 16 02 00 00 4d 8b 3f 4d 85 ff 74 31 e8 74 a7 fa f7 49 8d bf 90 00 00 00 48 89 f8 48 c1 e8 03 <80> 3c 28 00 0f 85 1e 02 00 00 49 8b 87 90 00 00 00 48 8b 0c 24 48
RSP: 0018:ffffc900040df070 EFLAGS: 00010206
RAX: 0000000000000012 RBX: 0000000000000001 RCX: ffffffff89932e16
RDX: ffff888049dd1e00 RSI: ffffffff89932d7c RDI: 0000000000000091
RBP: dffffc0000000000 R08: 0000000000000005 R09: 0000000000000007
R10: 0000000000000001 R11: 0000000000000006 R12: ffff88807fa080b8
R13: fffffbfff1a9a07d R14: ffffed100ff41022 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff8880b9200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32c26000 CR3: 000000005d56e000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  __fib6_drop_pcpu_from net/ipv6/ip6_fib.c:966 [inline]
  fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1027 [inline]
  fib6_purge_rt+0x7f2/0x9f0 net/ipv6/ip6_fib.c:1038
  fib6_del_route net/ipv6/ip6_fib.c:1998 [inline]
  fib6_del+0xa70/0x17b0 net/ipv6/ip6_fib.c:2043
  fib6_clean_node+0x426/0x5b0 net/ipv6/ip6_fib.c:2205
  fib6_walk_continue+0x44f/0x8d0 net/ipv6/ip6_fib.c:2127
  fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2175
  fib6_clean_tree+0xd7/0x120 net/ipv6/ip6_fib.c:2255
  __fib6_clean_all+0x100/0x2d0 net/ipv6/ip6_fib.c:2271
  rt6_sync_down_dev net/ipv6/route.c:4906 [inline]
  rt6_disable_ip+0x7ed/0xa00 net/ipv6/route.c:4911
  addrconf_ifdown.isra.0+0x117/0x1b40 net/ipv6/addrconf.c:3855
  addrconf_notify+0x223/0x19e0 net/ipv6/addrconf.c:3778
  notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
  call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1992
  call_netdevice_notifiers_extack net/core/dev.c:2030 [inline]
  call_netdevice_notifiers net/core/dev.c:2044 [inline]
  dev_close_many+0x333/0x6a0 net/core/dev.c:1585
  unregister_netdevice_many_notify+0x46d/0x19f0 net/core/dev.c:11193
  unregister_netdevice_many net/core/dev.c:11276 [inline]
  default_device_exit_batch+0x85b/0xae0 net/core/dev.c:11759
  ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
  cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640
  process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
  process_scheduled_works kernel/workqueue.c:3312 [inline]
  worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393
  kthread+0x2c1/0x3a0 kernel/kthread.c:389
  ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Fixes: d52d3997f8 ("ipv6: Create percpu rt6_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20240604193549.981839-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 13:05:54 +02:00
Kuniyuki Iwashima efaf24e30e af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill().
While dumping sockets via UNIX_DIAG, we do not hold unix_state_lock().

Let's use READ_ONCE() to read sk->sk_shutdown.

Fixes: e4e541a848 ("sock-diag: Report shutdown for inet and unix sockets (v2)")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima 5d915e584d af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen().
We can dump the socket queue length via UNIX_DIAG by specifying
UDIAG_SHOW_RQLEN.

If sk->sk_state is TCP_LISTEN, we return the recv queue length,
but here we do not hold recvq lock.

Let's use skb_queue_len_lockless() in sk_diag_show_rqlen().

Fixes: c9da99e647 ("unix_diag: Fixup RQLEN extension report")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima 83690b82d2 af_unix: Use skb_queue_empty_lockless() in unix_release_sock().
If the socket type is SOCK_STREAM or SOCK_SEQPACKET, unix_release_sock()
checks the length of the peer socket's recvq under unix_state_lock().

However, unix_stream_read_generic() calls skb_unlink() after releasing
the lock.  Also, for SOCK_SEQPACKET, __skb_try_recv_datagram() unlinks
skb without unix_state_lock().

Thues, unix_state_lock() does not protect qlen.

Let's use skb_queue_empty_lockless() in unix_release_sock().

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima 45d872f0e6 af_unix: Use unix_recvq_full_lockless() in unix_stream_connect().
Once sk->sk_state is changed to TCP_LISTEN, it never changes.

unix_accept() takes advantage of this characteristics; it does not
hold the listener's unix_state_lock() and only acquires recvq lock
to pop one skb.

It means unix_state_lock() does not prevent the queue length from
changing in unix_stream_connect().

Thus, we need to use unix_recvq_full_lockless() to avoid data-race.

Now we remove unix_recvq_full() as no one uses it.

Note that we can remove READ_ONCE() for sk->sk_max_ack_backlog in
unix_recvq_full_lockless() because of the following reasons:

  (1) For SOCK_DGRAM, it is a written-once field in unix_create1()

  (2) For SOCK_STREAM and SOCK_SEQPACKET, it is changed under the
      listener's unix_state_lock() in unix_listen(), and we hold
      the lock in unix_stream_connect()

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00