linux/kernel
Eric Dumazet 4cd13c21b2 softirq: Let ksoftirqd do its job
A while back, Paolo and Hannes sent an RFC patch adding threaded-able
napi poll loop support : (https://patchwork.ozlabs.org/patch/620657/)

The problem seems to be that softirqs are very aggressive and are often
handled by the current process, even if we are under stress and that
ksoftirqd was scheduled, so that innocent threads would have more chance
to make progress.

This patch makes sure that if ksoftirq is running, we let it
perform the softirq work.

Jonathan Corbet summarized the issue in https://lwn.net/Articles/687617/

Tested:

 - NIC receiving traffic handled by CPU 0
 - UDP receiver running on CPU 0, using a single UDP socket.
 - Incoming flood of UDP packets targeting the UDP socket.

Before the patch, the UDP receiver could almost never get CPU cycles and
could only receive ~2,000 packets per second.

After the patch, CPU cycles are split 50/50 between user application and
ksoftirqd/0, and we can effectively read ~900,000 packets per second,
a huge improvement in DOS situation. (Note that more packets are now
dropped by the NIC itself, since the BH handlers get less CPU cycles to
drain RX ring buffer)

Since the load runs in well identified threads context, an admin can
more easily tune process scheduling parameters if needed.

Reported-by: Paolo Abeni <pabeni@redhat.com>
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1472665349.14381.356.camel@edumazet-glaptop3.roam.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30 10:43:36 +02:00
..
bpf bpf: fix bpf_skb_in_cgroup helper naming 2016-08-12 21:53:33 -07:00
configs kconfig: tinyconfig: provide whole choice blocks to avoid warnings 2016-09-01 17:52:01 -07:00
debug mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings 2016-02-22 08:51:37 +01:00
events perf/core: Fix aux_mmap_count vs aux_refcount order 2016-09-10 11:15:36 +02:00
gcov gcov: add support for gcc version >= 6 2016-07-15 14:54:27 +09:00
irq genirq: Make function __irq_do_set_handler() static 2016-09-25 16:46:52 -04:00
livepatch modules: add ro_after_init support 2016-08-04 10:16:55 +09:30
locking locking/pvqspinlock: Fix a bug in qstat_read() 2016-08-10 14:13:29 +02:00
power PM / QoS: avoid calling cancel_delayed_work_sync() during early boot 2016-09-05 15:07:53 +02:00
printk printk/nmi: avoid direct printk()-s from __printk_nmi_flush() 2016-09-01 17:52:01 -07:00
rcu Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-29 13:55:30 -07:00
sched sched/core: Fix a race between try_to_wake_up() and a woken up task 2016-09-05 11:57:53 +02:00
time tick/nohz: Fix softlockup on scheduler stalls in kvm guest 2016-09-02 10:25:40 +02:00
trace block: Fix secure erase 2016-08-16 09:16:51 -06:00
.gitignore
acct.c
async.c async: export current_is_async() 2015-11-19 17:51:48 +01:00
audit.c Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit 2016-07-29 17:54:17 -07:00
audit.h Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit 2016-07-29 17:54:17 -07:00
audit_fsnotify.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
audit_tree.c audit: cleanup prune_tree_thread 2016-04-04 09:46:47 -04:00
audit_watch.c Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit 2016-09-01 15:55:56 -07:00
auditfilter.c audit: add fields to exclude filter by reusing user filter 2016-06-27 11:01:00 -04:00
auditsc.c Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit 2016-07-29 17:54:17 -07:00
backtracetest.c
bounds.c
capability.c kernel: Add noaudit variant of ns_capable() 2016-06-06 20:16:18 +10:00
cgroup.c Merge branch 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2016-07-29 14:29:04 -07:00
cgroup_freezer.c cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends 2015-12-03 10:24:08 -05:00
cgroup_pids.c cgroup: Use lld instead of ld when printing pids controller events_limit 2016-06-21 15:03:36 -04:00
compat.c
configs.c
context_tracking.c context_tracking: Switch to new static_branch API 2015-11-24 09:56:43 +01:00
cpu.c timers/core: Correct callback order during CPU hot plug 2016-07-28 18:56:22 +02:00
cpu_pm.c
cpuset.c cpuset: make sure new tasks conform to the current config of the cpuset 2016-08-09 23:58:01 -04:00
crash_dump.c
cred.c cred: Reject inodes with invalid ids in set_create_file_as() 2016-06-30 18:05:09 -05:00
delayacct.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
dma.c
elfcore.c
exec_domain.c
exit.c mm, mempolicy: task->mempolicy must be NULL before dropping final reference 2016-09-01 17:52:01 -07:00
extable.c
fork.c Merge branch 'akpm' (patches from Andrew) 2016-09-01 18:23:22 -07:00
freezer.c freezer, oom: check TIF_MEMDIE on the correct task 2016-07-28 16:07:41 -07:00
futex.c futex: Assume all mappings are private on !MMU systems 2016-07-29 18:44:14 +02:00
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
groups.c
hung_task.c kernel/hung_task.c: use timeout diff when timeout is updated 2016-03-22 15:36:02 -07:00
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
jump_label.c powerpc updates for 4.8 #2 2016-08-05 09:00:54 -04:00
kallsyms.c kallsyms: add support for relative offsets in kallsyms address table 2016-03-15 16:55:16 -07:00
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kernel/kcov: unproxify debugfs file's fops 2016-06-15 04:56:35 -07:00
kexec.c kexec: allow architectures to override boot mapping 2016-08-02 19:35:27 -04:00
kexec_core.c kexec: add restriction on kexec_load() segment sizes 2016-08-02 19:35:31 -04:00
kexec_file.c kexec: fix double-free when failing to relocate the purgatory 2016-09-01 17:52:01 -07:00
kexec_internal.h kexec: move some memembers and definitions within the scope of CONFIG_KEXEC_FILE 2016-01-20 17:09:18 -08:00
kmod.c kmod: don't run async usermode helper as a child of kworker thread 2015-10-23 17:55:10 +09:00
kprobes.c
ksysfs.c kexec: add a kexec_crash_loaded() function 2016-08-02 19:35:30 -04:00
kthread.c
latencytop.c sched/debug: Make schedstats a runtime tunable that is disabled by default 2016-02-09 11:54:23 +01:00
Makefile ELF/MIPS build fix 2016-05-23 17:04:14 -07:00
membarrier.c
memremap.c mm: fix cache mode of dax pmd mappings 2016-09-09 17:34:46 -07:00
module-internal.h
module.c Removed the MODULE_SIG_FORCE-means-no-MODULE_FORCE_LOAD patch. 2016-08-04 09:14:38 -04:00
module_signing.c KEYS: Move the point of trust determination to __key_link() 2016-04-11 22:43:43 +01:00
notifier.c
nsproxy.c cgroup: introduce cgroup namespaces 2016-02-16 13:04:58 -05:00
padata.c kernel/padata.c: hide unused functions 2016-05-19 19:12:14 -07:00
panic.c kexec: use core_param for crash_kexec_post_notifiers boot option 2016-08-02 19:35:29 -04:00
params.c Nothing exciting, minor tweaks and cleanups. 2015-11-09 15:53:39 -08:00
pid.c remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
pid_namespace.c
profile.c profile: Convert to hotplug state machine 2016-07-15 10:41:42 +02:00
ptrace.c tree-wide: replace config_enabled() with IS_ENABLED() 2016-08-04 08:50:07 -04:00
range.c
reboot.c
relay.c relay: add global mode support for buffer-only channels 2016-08-02 19:35:41 -04:00
resource.c /proc/iomem: only expose physical resource addresses to privileged users 2016-04-14 12:56:09 -07:00
seccomp.c seccomp: Fix tracer exit notifications during fatal signals 2016-08-30 16:12:46 -07:00
signal.c signals: Use hrtimer for sigtimedwait() 2016-07-07 10:35:07 +02:00
smp.c Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-29 13:55:30 -07:00
smpboot.c cpu/hotplug: Unpark smpboot threads from the state machine 2016-03-01 20:36:56 +01:00
smpboot.h cpu/hotplug: Create hotplug threads 2016-03-01 20:36:56 +01:00
softirq.c softirq: Let ksoftirqd do its job 2016-09-30 10:43:36 +02:00
stacktrace.c
stop_machine.c stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE 2016-07-27 11:12:11 +02:00
sys.c prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable 2016-05-23 17:04:14 -07:00
sys_ni.c vfs: add copy_file_range syscall and vfs helper 2015-12-01 14:00:53 -05:00
sysctl.c sysctl: handle error writing UINT_MAX to u32 fields 2016-08-26 17:39:35 -07:00
sysctl_binary.c kernel/sysctl_binary.c: use generic UUID library 2016-05-20 17:58:30 -07:00
task_work.c task_work: use READ_ONCE/lockless_dereference, avoid pi_lock if !task_works 2016-08-02 19:35:02 -04:00
taskstats.c taskstats: use the libnl API to align nlattr on 64-bit 2016-04-23 20:13:25 -04:00
test_kprobes.c
torture.c torture: Stop onoff task if there is only one cpu 2016-06-14 16:03:28 -07:00
tracepoint.c kernel/...: convert pr_warning to pr_warn 2016-03-22 15:36:02 -07:00
tsacct.c time, acct: Drop irq save & restore from __acct_update_integrals() 2016-02-29 09:53:09 +01:00
uid16.c
up.c
user-return-notifier.c
user.c
user_namespace.c fs: Limit file caps to the user namespace of the super block 2016-06-24 10:40:31 -05:00
utsname.c
utsname_sysctl.c
watchdog.c Revert "perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86" 2016-07-10 20:58:36 +02:00
workqueue.c Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-29 13:55:30 -07:00
workqueue_internal.h sched/core: Get rid of 'cpu' argument in wq_worker_sleeping() 2016-03-02 10:28:47 -05:00