linux/net
John Fastabend a454d84ee2 bpf, sockmap: Fix skb refcnt race after locking changes
There is a race where skb's from the sk_psock_backlog can be referenced
after userspace side has already skb_consumed() the sk_buff and its refcnt
dropped to zer0 causing use after free.

The flow is the following:

  while ((skb = skb_peek(&psock->ingress_skb))
    sk_psock_handle_Skb(psock, skb, ..., ingress)
    if (!ingress) ...
    sk_psock_skb_ingress
       sk_psock_skb_ingress_enqueue(skb)
          msg->skb = skb
          sk_psock_queue_msg(psock, msg)
    skb_dequeue(&psock->ingress_skb)

The sk_psock_queue_msg() puts the msg on the ingress_msg queue. This is
what the application reads when recvmsg() is called. An application can
read this anytime after the msg is placed on the queue. The recvmsg hook
will also read msg->skb and then after user space reads the msg will call
consume_skb(skb) on it effectively free'ing it.

But, the race is in above where backlog queue still has a reference to
the skb and calls skb_dequeue(). If the skb_dequeue happens after the
user reads and free's the skb we have a use after free.

The !ingress case does not suffer from this problem because it uses
sendmsg_*(sk, msg) which does not pass the sk_buff further down the
stack.

The following splat was observed with 'test_progs -t sockmap_listen':

  [ 1022.710250][ T2556] general protection fault, ...
  [...]
  [ 1022.712830][ T2556] Workqueue: events sk_psock_backlog
  [ 1022.713262][ T2556] RIP: 0010:skb_dequeue+0x4c/0x80
  [ 1022.713653][ T2556] Code: ...
  [...]
  [ 1022.720699][ T2556] Call Trace:
  [ 1022.720984][ T2556]  <TASK>
  [ 1022.721254][ T2556]  ? die_addr+0x32/0x80^M
  [ 1022.721589][ T2556]  ? exc_general_protection+0x25a/0x4b0
  [ 1022.722026][ T2556]  ? asm_exc_general_protection+0x22/0x30
  [ 1022.722489][ T2556]  ? skb_dequeue+0x4c/0x80
  [ 1022.722854][ T2556]  sk_psock_backlog+0x27a/0x300
  [ 1022.723243][ T2556]  process_one_work+0x2a7/0x5b0
  [ 1022.723633][ T2556]  worker_thread+0x4f/0x3a0
  [ 1022.723998][ T2556]  ? __pfx_worker_thread+0x10/0x10
  [ 1022.724386][ T2556]  kthread+0xfd/0x130
  [ 1022.724709][ T2556]  ? __pfx_kthread+0x10/0x10
  [ 1022.725066][ T2556]  ret_from_fork+0x2d/0x50
  [ 1022.725409][ T2556]  ? __pfx_kthread+0x10/0x10
  [ 1022.725799][ T2556]  ret_from_fork_asm+0x1b/0x30
  [ 1022.726201][ T2556]  </TASK>

To fix we add an skb_get() before passing the skb to be enqueued in the
engress queue. This bumps the skb->users refcnt so that consume_skb()
and kfree_skb will not immediately free the sk_buff. With this we can
be sure the skb is still around when we do the dequeue. Then we just
need to decrement the refcnt or free the skb in the backlog case which
we do by calling kfree_skb() on the ingress case as well as the sendmsg
case.

Before locking change from fixes tag we had the sock locked so we
couldn't race with user and there was no issue here.

Fixes: 799aa7f98d ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Reported-by: Jiri Olsa  <jolsa@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Xu Kuohai <xukuohai@huawei.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230901202137.214666-1-john.fastabend@gmail.com
2023-09-04 09:53:35 +02:00
..
6lowpan 6lowpan: Remove redundant initialisation. 2023-03-29 08:22:52 +01:00
9p net: annotate data-races around sock->ops 2023-08-09 15:32:43 -07:00
802
8021q Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
appletalk sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
atm sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
ax25 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-24 10:51:39 -07:00
bluetooth Bluetooth: HCI: Introduce HCI_QUIRK_BROKEN_LE_CODED 2023-08-24 12:23:46 -07:00
bpf bpf: Prevent inlining of bpf_fentry_test7() 2023-08-30 08:36:17 +02:00
bpfilter net: Use umd_cleanup_helper() 2023-05-31 13:06:57 +02:00
bridge netfilter: ebtables: fix fortify warnings in size_entry_mwt() 2023-08-22 15:13:20 +02:00
caif sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
can net: annotate data-races around sk->sk_tsflags 2023-09-01 07:27:33 +01:00
ceph libceph: fix potential hang in ceph_osdc_notify() 2023-08-02 09:07:34 +02:00
core bpf, sockmap: Fix skb refcnt race after locking changes 2023-09-04 09:53:35 +02:00
dcb net: dcb: choose correct policy to parse DCB_ATTR_BCN 2023-08-01 21:07:46 -07:00
dccp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-29 07:44:56 +02:00
devlink devlink: move devlink_notify_register/unregister() to dev.c 2023-08-28 08:02:24 -07:00
dns_resolver
dsa net: dsa: mark parsed interface mode for legacy switch drivers 2023-08-09 13:08:09 -07:00
ethernet
ethtool ethtool: netlink: always pass genl_info to .prepare_data 2023-08-15 15:01:03 -07:00
handshake net/handshake: fix null-ptr-deref in handshake_nl_done_doit() 2023-09-01 07:25:14 +01:00
hsr net/hsr: Remove unused function declarations 2023-07-31 20:11:47 -07:00
ieee802154 genetlink: use attrs from struct genl_info 2023-08-15 15:00:45 -07:00
ife
ipv4 ipv4: ignore dst hint for multipath routes 2023-09-01 08:11:51 +01:00
ipv6 ipv6: ignore dst hint for multipath routes 2023-09-01 08:11:51 +01:00
iucv net/iucv: Fix size of interrupt data 2023-03-16 17:34:40 -07:00
kcm sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
l2tp inet: introduce inet->inet_flags 2023-08-16 11:09:16 +01:00
l3mdev
lapb
llc net/llc/llc_conn.c: fix 4 instances of -Wmissing-variable-declarations 2023-08-09 15:34:28 -07:00
mac80211 wireless-next patches for v6.6 2023-08-25 18:35:09 -07:00
mac802154 Core WPAN changes: 2023-06-24 15:41:46 -07:00
mctp sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
mpls net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
mptcp mptcp: annotate data-races around msk->rmem_fwd_alloc 2023-09-01 07:27:33 +01:00
ncsi genetlink: make genl_info->nlhdr const 2023-08-15 14:54:44 -07:00
netfilter netfilter: nf_tables: Audit log rule reset 2023-08-31 01:29:28 +02:00
netlabel netlabel: Remove unused declaration netlbl_cipsov4_doi_free() 2023-08-02 12:28:22 -07:00
netlink genetlink: add a family pointer to struct genl_info 2023-08-15 15:01:03 -07:00
netrom netrom: Deny concurrent connect(). 2023-08-28 06:58:46 +01:00
nfc genetlink: use attrs from struct genl_info 2023-08-15 15:00:45 -07:00
nsh net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-10 14:10:53 -07:00
phonet sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
psample
qrtr net: qrtr: Handle IPCR control port format of older targets 2023-07-17 09:02:30 +01:00
rds net/rds: Remove unused function declarations 2023-08-13 12:25:42 +01:00
rfkill net: rfkill-gpio: Add explicit include for of.h 2023-04-06 20:36:27 +02:00
rose sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
rxrpc Networking changes for 6.5. 2023-06-28 16:43:10 -07:00
sched net/sched: fq_pie: avoid stalls in fq_pie_timer() 2023-08-31 11:21:52 +02:00
sctp sctp: annotate data-races around sk->sk_wmem_queued 2023-08-31 11:56:59 +02:00
smc net: annotate data-races around sk->sk_lingertime 2023-08-21 07:41:57 +01:00
strparser
sunrpc Networking changes for 6.6. 2023-08-29 11:33:01 -07:00
switchdev net: switchdev: Add a helper to replay objects on a bridge port 2023-07-21 08:54:03 +01:00
tipc genetlink: use attrs from struct genl_info 2023-08-15 15:00:45 -07:00
tls tls: get cipher_name from cipher_desc in tls_set_sw_offload 2023-08-27 17:17:42 -07:00
unix net: annotate data-races around sock->ops 2023-08-09 15:32:43 -07:00
vmw_vsock vsock: Remove unused function declarations 2023-07-31 14:41:08 -07:00
wireless wifi: nl80211: Remove unused declaration nl80211_pmsr_dump_results() 2023-08-22 21:40:40 +02:00
x25 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
xdp xsk: Fix xsk_diag use-after-free error during socket cleanup 2023-08-31 13:21:11 +02:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
compat.c net/compat: Update msg_control_is_user when setting a kernel pointer 2023-04-14 11:09:27 +01:00
devres.c
Kconfig bpf: Add fd-based tcx multi-prog infra with link support 2023-07-19 10:07:27 -07:00
Kconfig.debug
Makefile net/handshake: Create a NETLINK service for handling handshake requests 2023-04-19 18:48:48 -07:00
socket.c net: annotate data-races around sk->sk_bind_phc 2023-09-01 07:27:33 +01:00
sysctl_net.c