linux/kernel
Mike Christie f9010dbdce fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
When switching from kthreads to vhost_tasks two bugs were added:
1. The vhost worker tasks's now show up as processes so scripts doing
ps or ps a would not incorrectly detect the vhost task as another
process.  2. kthreads disabled freeze by setting PF_NOFREEZE, but
vhost tasks's didn't disable or add support for them.

To fix both bugs, this switches the vhost task to be thread in the
process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call
get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that
SIGKILL/STOP support is required because CLONE_THREAD requires
CLONE_SIGHAND which requires those 2 signals to be supported.

This is a modified version of the patch written by Mike Christie
<michael.christie@oracle.com> which was a modified version of patch
originally written by Linus.

Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER.
Including ignoring signals, setting up the register state, and having
get_signal return instead of calling do_group_exit.

Tidied up the vhost_task abstraction so that the definition of
vhost_task only needs to be visible inside of vhost_task.c.  Making
it easier to review the code and tell what needs to be done where.
As part of this the main loop has been moved from vhost_worker into
vhost_task_fn.  vhost_worker now returns true if work was done.

The main loop has been updated to call get_signal which handles
SIGSTOP, freezing, and collects the message that tells the thread to
exit as part of process exit.  This collection clears
__fatal_signal_pending.  This collection is not guaranteed to
clear signal_pending() so clear that explicitly so the schedule()
sleeps.

For now the vhost thread continues to exist and run work until the
last file descriptor is closed and the release function is called as
part of freeing struct file.  To avoid hangs in the coredump
rendezvous and when killing threads in a multi-threaded exec.  The
coredump code and de_thread have been modified to ignore vhost threads.

Remvoing the special case for exec appears to require teaching
vhost_dev_flush how to directly complete transactions in case
the vhost thread is no longer running.

Removing the special case for coredump rendezvous requires either the
above fix needed for exec or moving the coredump rendezvous into
get_signal.

Fixes: 6e890c5d50 ("vhost: use vhost_tasks for worker threads")
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Co-developed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-01 17:15:33 -04:00
..
bpf bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps 2023-05-22 10:26:39 -07:00
cgroup cgroup changes for v6.4-rc1 2023-04-29 10:05:22 -07:00
configs Char/Misc drivers for 6.4-rc1 2023-04-27 12:07:50 -07:00
debug kdb: use srcu console list iterator 2022-12-02 11:25:00 +01:00
dma dma-mapping updates for Linux 6.4 2023-04-29 10:29:57 -07:00
entry ptrace: Provide set/get interface for syscall user dispatch 2023-04-16 14:23:07 +02:00
events perf/core: Fix perf_sample_data not properly initialized for different swevents in perf_tp_event() 2023-05-08 10:58:26 +02:00
futex - Prevent the leaking of a debug timer in futex_waitv() 2023-01-01 11:15:05 -08:00
gcov gcov: add support for checksum field 2022-12-21 14:31:52 -08:00
irq xen: branch for v6.4-rc4 2023-05-27 09:42:56 -07:00
kcsan - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
livepatch Scheduler changes for v6.4: 2023-04-28 14:53:30 -07:00
locking Two fixes for debugobjects: 2023-05-28 07:15:33 -04:00
module module: fix module load for ia64 2023-05-30 09:29:40 -07:00
power More power management updates for 6.4-rc1 2023-05-03 12:01:05 -07:00
printk - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
rcu RCU Changes for 6.4: 2023-04-24 12:16:14 -07:00
sched sched: fix cid_lock kernel-doc warnings 2023-05-08 10:58:28 +02:00
time tick/broadcast: Make broadcast device replacement work correctly 2023-05-08 23:18:16 +02:00
trace tracing: Have function_graph selftest call cond_resched() 2023-05-28 21:15:46 -04:00
.gitignore
acct.c acct: fix potential integer overflow in encode_comp_t() 2022-11-30 16:13:18 -08:00
async.c
audit.c audit: use time_after to compare time 2022-08-29 19:47:03 -04:00
audit.h audit: remove selinux_audit_rule_update() declaration 2022-09-07 11:30:15 -04:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-08-22 18:50:06 -04:00
audit_tree.c audit: use fsnotify group lock helpers 2022-04-25 14:37:28 +02:00
audit_watch.c audit_init_parent(): constify path 2022-09-01 17:39:30 -04:00
auditfilter.c
auditsc.c capability: just use a 'u64' instead of a 'u32[2]' array 2023-03-01 10:01:22 -08:00
backtracetest.c
bounds.c mm: multi-gen LRU: minimal implementation 2022-09-26 19:46:09 -07:00
capability.c capability: just use a 'u64' instead of a 'u32[2]' array 2023-03-01 10:01:22 -08:00
cfi.c cfi: Switch to -fsanitize=kcfi 2022-09-26 10:13:13 -07:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-03-14 19:32:38 -07:00
configs.c
context_tracking.c context_tracking: Fix noinstr vs KASAN 2023-01-13 11:48:18 +01:00
cpu.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
cpu_pm.c cpuidle, cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}() 2023-01-13 11:48:15 +01:00
crash_core.c mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
crash_dump.c
cred.c cred: Do not default to init_cred in prepare_kernel_cred() 2022-11-01 10:04:52 -07:00
delayacct.c delayacct: track delays from IRQ/SOFTIRQ 2023-04-18 16:39:34 -07:00
dma.c
exec_domain.c
exit.c fork, vhost: Use CLONE_THREAD to fix freezer/ps regression 2023-06-01 17:15:33 -04:00
extable.c context_tracking: Take NMI eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-02-08 13:36:22 +01:00
fork.c fork, vhost: Use CLONE_THREAD to fix freezer/ps regression 2023-06-01 17:15:33 -04:00
freezer.c freezer,sched: Rewrite core freezer logic 2022-09-07 21:53:50 +02:00
gen_kheaders.sh kheaders: use standard naming for the temporary directory 2023-01-22 23:43:34 +09:00
groups.c security: Add LSM hook to setgroups() syscall 2022-07-15 18:21:49 +00:00
hung_task.c kernel/hung_task.c: set some hung_task.c variables storage-class-specifier to static 2023-04-08 13:45:37 -07:00
iomem.c
irq_work.c trace: Add trace_ipi_send_cpu() 2023-03-24 11:01:29 +01:00
jump_label.c jump_label: Prevent key->enabled int overflow 2022-12-01 15:53:05 -08:00
kallsyms.c kallsyms: Delete an unused parameter related to {module_}kallsyms_on_each_symbol() 2023-03-19 13:27:19 -07:00
kallsyms_internal.h kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[] 2022-11-12 18:47:36 -08:00
kallsyms_selftest.c kallsyms: Delete an unused parameter related to {module_}kallsyms_on_each_symbol() 2023-03-19 13:27:19 -07:00
kallsyms_selftest.h kallsyms: Add self-test facility 2022-11-15 00:42:02 -08:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c mm: replace vma->vm_flags direct modifications with modifier calls 2023-02-09 16:51:39 -08:00
kexec.c kexec: introduce sysctl parameters kexec_load_limit_* 2023-02-02 22:50:05 -08:00
kexec_core.c There is no particular theme here - mainly quick hits all over the tree. 2023-02-23 17:55:40 -08:00
kexec_elf.c
kexec_file.c kexec: remove unnecessary arch_kexec_kernel_image_load() 2023-04-08 13:45:38 -07:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2022-09-11 21:55:06 -07:00
kheaders.c kheaders: Use array declaration instead of char 2023-03-24 20:10:59 -07:00
kprobes.c x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range 2023-02-21 08:49:16 +09:00
ksysfs.c kernel/ksysfs.c: use sysfs_emit for sysfs show handlers 2023-03-24 17:09:14 +01:00
kthread.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
latencytop.c latencytop: use the last element of latency_record of system 2022-09-11 21:55:12 -07:00
Makefile modules-6.4-rc1 2023-04-27 16:36:55 -07:00
module_signature.c
notifier.c notifiers: add tracepoints to the notifiers infrastructure 2023-04-08 13:45:38 -07:00
nsproxy.c convert setns(2) to fdget()/fdput() 2023-04-20 22:55:35 -04:00
padata.c padata: use alignment when calculating the number of worker threads 2023-03-14 17:06:44 +08:00
panic.c cpu: Mark nmi_panic_self_stop() __noreturn 2023-04-14 17:31:26 +02:00
params.c module: make module_ktype structure constant 2023-03-09 12:55:15 -08:00
pid.c pid: add pidfd_prepare() 2023-04-03 11:16:56 +02:00
pid_namespace.c kernel: pid_namespace: simplify sysctls with register_sysctl() 2023-05-02 19:23:29 -07:00
pid_sysctl.h kernel: pid_namespace: simplify sysctls with register_sysctl() 2023-05-02 19:23:29 -07:00
profile.c kernel/profile.c: simplify duplicated code in profile_setup() 2022-09-11 21:55:12 -07:00
ptrace.c ptrace: Provide set/get interface for syscall user dispatch 2023-04-16 14:23:07 +02:00
range.c
reboot.c kernel/reboot: Add SYS_OFF_MODE_RESTART_PREPARE mode 2022-10-04 15:59:36 +02:00
regset.c
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-02 17:23:27 -07:00
resource.c dax/kmem: Fix leak of memory-hotplug resources 2023-02-17 14:58:01 -08:00
resource_kunit.c
rseq.c rseq: Extend struct rseq with per-memory-map concurrency ID 2022-12-27 12:52:12 +01:00
scftorture.c scftorture: Fix distribution of short handler delays 2022-04-11 17:07:29 -07:00
scs.c scs: add support for dynamic shadow call stacks 2022-11-09 18:06:35 +00:00
seccomp.c seccomp: simplify sysctls with register_sysctl_init() 2023-04-13 11:49:20 -07:00
signal.c fork, vhost: Use CLONE_THREAD to fix freezer/ps regression 2023-06-01 17:15:33 -04:00
smp.c trace,smp: Trace all smp_function_call*() invocations 2023-03-24 11:01:30 +01:00
smpboot.c smpboot: use atomic_try_cmpxchg in cpu_wait_death and cpu_report_death 2022-09-11 21:55:10 -07:00
smpboot.h
softirq.c softirq: Add trace points for tasklet entry/exit 2023-04-15 10:17:16 +02:00
stackleak.c stackleak: allow to specify arch specific stackleak poison function 2023-04-20 11:36:35 +02:00
stacktrace.c
static_call.c
static_call_inline.c static_call: Add call depth tracking support 2022-10-17 16:41:16 +02:00
stop_machine.c Scheduler changes in this cycle were: 2022-05-24 11:11:13 -07:00
sys.c mm/ksm: unmerge and clear VM_MERGEABLE when setting PR_SET_MEMORY_MERGE=0 2023-05-02 17:21:49 -07:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-08-20 15:17:45 -07:00
sysctl-test.c kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred} 2022-09-08 16:56:45 -07:00
sysctl.c mm: compaction: move compaction sysctl to its own file 2023-04-13 11:49:35 -07:00
task_work.c task_work: use try_cmpxchg in task_work_add, task_work_cancel_match and task_work_run 2022-09-11 21:55:10 -07:00
taskstats.c genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
torture.c torture: Fix hang during kthread shutdown phase 2023-01-05 12:10:35 -08:00
tracepoint.c tracepoint: Allow livepatch module add trace event 2023-02-18 14:34:36 -05:00
tsacct.c taskstats: version 12 with thread group and exe info 2022-04-29 14:38:03 -07:00
ucount.c ucounts: Split rlimit and ucount values and max values 2022-05-18 18:24:57 -05:00
uid16.c
uid16.h
umh.c umh: simplify the capability pointer logic 2023-03-03 16:18:19 -08:00
up.c
user-return-notifier.c
user.c kernel/user: Allow user_struct::locked_vm to be usable for iommufd 2022-11-30 20:16:49 -04:00
user_namespace.c userns: fix a struct's kernel-doc notation 2023-02-02 22:50:04 -08:00
usermode_driver.c blob_to_mnt(): kern_unmount() is needed to undo kern_mount() 2022-05-19 23:25:47 -04:00
utsname.c
utsname_sysctl.c utsname: simplify one-level sysctl registration for uts_kern_table 2023-04-13 11:49:35 -07:00
vhost_task.c fork, vhost: Use CLONE_THREAD to fix freezer/ps regression 2023-06-01 17:15:33 -04:00
watch_queue.c modules-6.4-rc1 2023-04-27 16:36:55 -07:00
watchdog.c powerpc updates for 6.0 2022-08-06 16:38:17 -07:00
watchdog_hld.c Revert "printk: add functions to prefer direct printing" 2022-06-23 18:41:40 +02:00
workqueue.c workqueue changes for v6.4-rc1 2023-04-29 09:48:52 -07:00
workqueue_internal.h