linux/io_uring
Gabriel Krisman Bertazi 606559dc4f io_uring: Fix sqpoll utilization check racing with dying sqpoll
Commit 3fcb9d1720 ("io_uring/sqpoll: statistics of the true
utilization of sq threads"), currently in Jens for-next branch, peeks at
io_sq_data->thread to report utilization statistics. But, If
io_uring_show_fdinfo races with sqpoll terminating, even though we hold
the ctx lock, sqd->thread might be NULL and we hit the Oops below.

Note that we could technically just protect the getrusage() call and the
sq total/work time calculations.  But showing some sq
information (pid/cpu) and not other information (utilization) is more
confusing than not reporting anything, IMO.  So let's hide it all if we
happen to race with a dying sqpoll.

This can be triggered consistently in my vm setup running
sqpoll-cancel-hang.t in a loop.

BUG: kernel NULL pointer dereference, address: 00000000000007b0
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 16587 Comm: systemd-coredum Not tainted 6.8.0-rc3-g3fcb9d17206e-dirty #69
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
RIP: 0010:getrusage+0x21/0x3e0
Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 d1 48 89 e5 41 57 41 56 41 55 41 54 49 89 fe 41 52 53 48 89 d3 48 83 ec 30 <4c> 8b a7 b0 07 00 00 48 8d 7a 08 65 48 8b 04 25 28 00 00 00 48 89
RSP: 0018:ffffa166c671bb80 EFLAGS: 00010282
RAX: 00000000000040ca RBX: ffffa166c671bc60 RCX: ffffa166c671bc60
RDX: ffffa166c671bc60 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffa166c671bbe0 R08: ffff9448cc3930c0 R09: 0000000000000000
R10: ffffa166c671bd50 R11: ffffffff9ee89260 R12: 0000000000000000
R13: ffff9448ce099480 R14: 0000000000000000 R15: ffff9448cff5b000
FS:  00007f786e225900(0000) GS:ffff94493bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000007b0 CR3: 000000010d39c000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __die_body+0x1a/0x60
 ? page_fault_oops+0x154/0x440
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? do_user_addr_fault+0x174/0x7c0
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? exc_page_fault+0x63/0x140
 ? asm_exc_page_fault+0x22/0x30
 ? getrusage+0x21/0x3e0
 ? seq_printf+0x4e/0x70
 io_uring_show_fdinfo+0x9db/0xa10
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? vsnprintf+0x101/0x4d0
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? seq_vprintf+0x34/0x50
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? seq_printf+0x4e/0x70
 ? seq_show+0x16b/0x1d0
 ? __pfx_io_uring_show_fdinfo+0x10/0x10
 seq_show+0x16b/0x1d0
 seq_read_iter+0xd7/0x440
 seq_read+0x102/0x140
 vfs_read+0xae/0x320
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __do_sys_newfstat+0x35/0x60
 ksys_read+0xa5/0xe0
 do_syscall_64+0x50/0x110
 entry_SYSCALL_64_after_hwframe+0x6e/0x76
RIP: 0033:0x7f786ec1db4d
Code: e8 46 e3 01 00 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 80 3d d9 ce 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec
RSP: 002b:00007ffcb361a4b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 000055a4c8fe42f0 RCX: 00007f786ec1db4d
RDX: 0000000000000400 RSI: 000055a4c8fe48a0 RDI: 0000000000000006
RBP: 00007f786ecfb0b0 R08: 00007f786ecfb2a8 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f786ecfaf60
R13: 000055a4c8fe42f0 R14: 0000000000000000 R15: 00007ffcb361a628
 </TASK>
Modules linked in:
CR2: 00000000000007b0
---[ end trace 0000000000000000 ]---
RIP: 0010:getrusage+0x21/0x3e0
Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 d1 48 89 e5 41 57 41 56 41 55 41 54 49 89 fe 41 52 53 48 89 d3 48 83 ec 30 <4c> 8b a7 b0 07 00 00 48 8d 7a 08 65 48 8b 04 25 28 00 00 00 48 89
RSP: 0018:ffffa166c671bb80 EFLAGS: 00010282
RAX: 00000000000040ca RBX: ffffa166c671bc60 RCX: ffffa166c671bc60
RDX: ffffa166c671bc60 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffa166c671bbe0 R08: ffff9448cc3930c0 R09: 0000000000000000
R10: ffffa166c671bd50 R11: ffffffff9ee89260 R12: 0000000000000000
R13: ffff9448ce099480 R14: 0000000000000000 R15: ffff9448cff5b000
FS:  00007f786e225900(0000) GS:ffff94493bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000007b0 CR3: 000000010d39c000 CR4: 0000000000750ef0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1ce00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Fixes: 3fcb9d1720 ("io_uring/sqpoll: statistics of the true utilization of sq threads")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20240309003256.358-1-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-09 07:27:09 -07:00
..
advise.c io_uring: always go async for unsupported fadvise flags 2023-01-29 15:18:26 -07:00
advise.h
alloc_cache.h io_uring: use mempool KASAN hook 2023-12-29 11:58:41 -08:00
cancel.c io_uring/cancel: don't default to setting req->work.cancel_seq 2024-02-08 13:27:06 -07:00
cancel.h io_uring/cancel: don't default to setting req->work.cancel_seq 2024-02-08 13:27:06 -07:00
epoll.c io_uring: undeprecate epoll_ctl support 2023-05-26 20:22:41 -06:00
epoll.h
fdinfo.c io_uring: Fix sqpoll utilization check racing with dying sqpoll 2024-03-09 07:27:09 -07:00
fdinfo.h
filetable.c io_uring: drop any code related to SCM_RIGHTS 2023-12-19 12:36:34 -07:00
filetable.h io_uring: expand main struct io_kiocb flags to 64-bits 2024-02-08 13:27:03 -07:00
fs.c io_uring/fs: consider link->flags when getting path for LINKAT 2023-11-20 09:01:42 -07:00
fs.h
futex.c io_uring: add support for vectored futex waits 2023-09-29 02:37:08 -06:00
futex.h io_uring: add support for vectored futex waits 2023-09-29 02:37:08 -06:00
io-wq.c io-wq: fully initialize wqe before calling cpuhp_state_add_instance_nocalls() 2023-10-05 14:11:18 -06:00
io-wq.h io_uring: break out of iowq iopoll on teardown 2023-09-07 09:02:27 -06:00
io_uring.c io_uring: refactor DEFER_TASKRUN multishot checks 2024-03-08 07:58:23 -07:00
io_uring.h io_uring/napi: ensure napi polling is aborted when work is available 2024-02-14 13:01:25 -07:00
kbuf.c io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE 2024-03-08 07:56:27 -07:00
kbuf.h io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE 2024-03-08 07:56:27 -07:00
Makefile io-uring: add napi busy poll support 2024-02-09 11:54:19 -07:00
msg_ring.c io_uring: use io_file_from_index in io_msg_grab_file 2023-06-20 09:36:22 -06:00
msg_ring.h io_uring: get rid of double locking 2022-12-07 06:47:13 -07:00
napi.c io_uring/napi: enable even with a timeout of 0 2024-02-15 15:37:28 -07:00
napi.h io_uring: add register/unregister napi function 2024-02-09 11:54:32 -07:00
net.c io_uring/net: dedup io_recv_finish req completion 2024-03-08 07:59:20 -07:00
net.h io_uring: Add KASAN support for alloc_caches 2023-04-03 07:16:14 -06:00
nop.c
nop.h
notif.c io_uring/notif: add constant for ubuf_info flags 2023-04-15 14:21:04 -06:00
notif.h io_uring/notif: add constant for ubuf_info flags 2023-04-15 14:21:04 -06:00
opdef.c io_uring: add support for ftruncate 2024-02-09 09:04:39 -07:00
opdef.h io_uring/rw: mark readv/writev as vectored in the opcode definition 2023-09-21 12:00:46 -06:00
openclose.c io_uring: enable audit and restrict cred override for IORING_OP_FIXED_FD_INSTALL 2024-01-23 15:25:14 -07:00
openclose.h io_uring/openclose: add support for IORING_OP_FIXED_FD_INSTALL 2023-12-12 07:42:57 -07:00
poll.c io_uring: fix io_queue_proc modifying req->flags 2024-03-07 11:10:28 -07:00
poll.h io_uring/rw: ensure poll based multishot read retries appropriately 2024-01-28 20:37:11 -07:00
refs.h
register.c io_uring: add register/unregister napi function 2024-02-09 11:54:32 -07:00
register.h io_uring/register: move io_uring_register(2) related code to register.c 2023-12-19 08:54:20 -07:00
rsrc.c io_uring: drop any code related to SCM_RIGHTS 2023-12-19 12:36:34 -07:00
rsrc.h io_uring: Don't include af_unix.h. 2024-02-12 19:02:11 -07:00
rw.c io_uring: refactor DEFER_TASKRUN multishot checks 2024-03-08 07:58:23 -07:00
rw.h io_uring/rw: add separate prep handler for fixed read/write 2023-11-06 07:43:16 -07:00
slist.h io_uring: silence variable ‘prev’ set but not used warning 2023-03-09 10:10:58 -07:00
splice.c splice: return type ssize_t from all helpers 2023-12-12 16:19:59 +01:00
splice.h
sqpoll.c io_uring/sqpoll: statistics of the true utilization of sq threads 2024-03-01 06:28:19 -07:00
sqpoll.h io_uring/sqpoll: statistics of the true utilization of sq threads 2024-03-01 06:28:19 -07:00
statx.c io_uring: for requests that require async, force it 2023-01-29 15:18:26 -07:00
statx.h
sync.c io_uring: for requests that require async, force it 2023-01-29 15:18:26 -07:00
sync.h
tctx.c io_uring: Add io_uring_setup flag to pre-register ring fd and never install it 2023-05-16 08:06:00 -06:00
tctx.h io_uring: simplify __io_uring_add_tctx_node 2022-10-07 12:25:30 -06:00
timeout.c io_uring: never overflow io_aux_cqe 2023-08-11 10:42:57 -06:00
timeout.h io_uring: remove unused return from io_disarm_next 2022-09-21 13:15:01 -06:00
truncate.c io_uring: add support for ftruncate 2024-02-09 09:04:39 -07:00
truncate.h io_uring: add support for ftruncate 2024-02-09 09:04:39 -07:00
uring_cmd.c io_uring: Don't include af_unix.h. 2024-02-12 19:02:11 -07:00
uring_cmd.h io_uring: Remove unnecessary BUILD_BUG_ON 2023-05-04 08:19:05 -06:00
waitid.c io_uring: add IORING_OP_WAITID support 2023-09-21 12:04:45 -06:00
waitid.h io_uring: add IORING_OP_WAITID support 2023-09-21 12:04:45 -06:00
xattr.c io_uring: use file_mnt_idmap helper 2024-02-06 19:55:14 -07:00
xattr.h