linux/Documentation
Eric Dumazet 133c4c0d37 tcp: defer regular ACK while processing socket backlog
This idea came after a particular workload requested
the quickack attribute set on routes, and a performance
drop was noticed for large bulk transfers.

For high throughput flows, it is best to use one cpu
running the user thread issuing socket system calls,
and a separate cpu to process incoming packets from BH context.
(With TSO/GRO, bottleneck is usually the 'user' cpu)

Problem is the user thread can spend a lot of time while holding
the socket lock, forcing BH handler to queue most of incoming
packets in the socket backlog.

Whenever the user thread releases the socket lock, it must first
process all accumulated packets in the backlog, potentially
adding latency spikes. Due to flood mitigation, having too many
packets in the backlog increases chance of unexpected drops.

Backlog processing unfortunately shifts a fair amount of cpu cycles
from the BH cpu to the 'user' cpu, thus reducing max throughput.

This patch takes advantage of the backlog processing,
and the fact that ACK are mostly cumulative.

The idea is to detect we are in the backlog processing
and defer all eligible ACK into a single one,
sent from tcp_release_cb().

This saves cpu cycles on both sides, and network resources.

Performance of a single TCP flow on a 200Gbit NIC:

- Throughput is increased by 20% (100Gbit -> 120Gbit).
- Number of generated ACK per second shrinks from 240,000 to 40,000.
- Number of backlog drops per second shrinks from 230 to 0.

Benchmark context:
 - Regular netperf TCP_STREAM (no zerocopy)
 - Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids)
 - MAX_SKB_FRAGS = 17 (~60KB per GRO packet)

This feature is guarded by a new sysctl, and enabled by default:
 /proc/sys/net/ipv4/tcp_backlog_ack_defer

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 19:10:01 +02:00
..
ABI - Core Frameworks 2023-09-04 13:52:58 -07:00
accel
accounting
admin-guide workqueue: Changes for v6.6 2023-09-01 16:06:32 -07:00
arch Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
block Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
bpf Including fixes from netfilter and bpf. 2023-09-07 18:33:07 -07:00
cdrom
core-api printk changes for 6.6 2023-09-04 13:20:19 -07:00
cpu-freq
crypto
dev-tools
devicetree dt-bindings: net: Add compatible for AM64x in ICSSG 2023-09-12 10:23:50 +02:00
doc-guide
driver-api ata changes for 6.6 2023-09-05 12:37:28 -07:00
fault-injection Documentation: Fix typos 2023-08-18 11:29:03 -06:00
fb Documentation: Fix typos 2023-08-18 11:29:03 -06:00
features Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
filesystems Mixed with some fixes and cleanups, this brings in reasonably complete 2023-09-06 12:10:15 -07:00
firmware-guide Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
firmware_class
fpga
gpu Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
hid HID: Add introduction about HID for non-kernel programmers 2023-08-07 13:24:36 +02:00
hwmon Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
i2c media updates for v6.6-rc1 2023-09-01 12:21:32 -07:00
iio
images
infiniband
input input: docs: pxrc: remove reference to phoenix-sim 2023-08-28 12:43:32 -06:00
isdn
kbuild Kbuild updates for v6.6 2023-09-05 11:01:47 -07:00
kernel-hacking
leds
litmus-tests
livepatch Documentation: Fix typos 2023-08-18 11:29:03 -06:00
locking Documentation: Fix typos 2023-08-18 11:29:03 -06:00
maintainer Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
mhi
misc-devices
mm Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
netlabel
netlink doc/netlink: Add spec for rt route messages 2023-08-27 17:17:11 -07:00
networking tcp: defer regular ACK while processing socket backlog 2023-09-12 19:10:01 +02:00
nvdimm
nvme
PCI Merge branch 'pci/misc' 2023-08-29 11:03:57 -05:00
pcmcia
peci
power Documentation: Fix typos 2023-08-18 11:29:03 -06:00
powerpc powerpc updates for 6.6 2023-08-31 12:43:10 -07:00
process Including fixes from netfilter and bpf. 2023-09-07 18:33:07 -07:00
RCU
riscv RISC-V Patches for the 6.6 Merge Window, Part 1 2023-09-01 08:09:48 -07:00
rust Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
scheduler Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
scsi SCSI misc on 20230902 2023-09-02 12:02:41 -07:00
security Documentation: Fix typos 2023-08-18 11:29:03 -06:00
sound ALSA: emu10k1: add separate documentation for E-MU cards 2023-08-26 09:25:17 +02:00
sphinx Documentation: Fix typos 2023-08-18 11:29:03 -06:00
sphinx-static
spi Documentation: Fix typos 2023-08-18 11:29:03 -06:00
staging
target
timers
tools Documentation: Fix typos 2023-08-18 11:29:03 -06:00
trace Probes updates for v6.6: 2023-09-02 11:10:50 -07:00
translations Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
usb USB / Thunderbolt / PHY driver update for 6.6-rc1 2023-09-01 09:23:34 -07:00
userspace-api Including fixes from netfilter and bpf. 2023-09-07 18:33:07 -07:00
virt ARM: 2023-09-07 13:52:20 -07:00
w1 Documentation: Fix typos 2023-08-18 11:29:03 -06:00
watchdog Documentation: Fix typos 2023-08-18 11:29:03 -06:00
wmi Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
.gitignore
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py
docutils.conf
dontdiff
index.rst
Kconfig
Makefile
memory-barriers.txt
SubmittingPatches
subsystem-apis.rst