linux/net/ipv4
Neal Cardwell 4720852ed9 tcp: fix delayed ACKs for MSS boundary condition
This commit fixes poor delayed ACK behavior that can cause poor TCP
latency in a particular boundary condition: when an application makes
a TCP socket write that is an exact multiple of the MSS size.

The problem is that there is painful boundary discontinuity in the
current delayed ACK behavior. With the current delayed ACK behavior,
we have:

(1) If an app reads data when > 1*MSS is unacknowledged, then
    tcp_cleanup_rbuf() ACKs immediately because of:

     tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss ||

(2) If an app reads all received data, and the packets were < 1*MSS,
    and either (a) the app is not ping-pong or (b) we received two
    packets < 1*MSS, then tcp_cleanup_rbuf() ACKs immediately beecause
    of:

     ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED2) ||
      ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED) &&
       !inet_csk_in_pingpong_mode(sk))) &&

(3) *However*: if an app reads exactly 1*MSS of data,
    tcp_cleanup_rbuf() does not send an immediate ACK. This is true
    even if the app is not ping-pong and the 1*MSS of data had the PSH
    bit set, suggesting the sending application completed an
    application write.

Thus if the app is not ping-pong, we have this painful case where
>1*MSS gets an immediate ACK, and <1*MSS gets an immediate ACK, but a
write whose last skb is an exact multiple of 1*MSS can get a 40ms
delayed ACK. This means that any app that transfers data in one
direction and takes care to align write size or packet size with MSS
can suffer this problem. With receive zero copy making 4KB MSS values
more common, it is becoming more common to have application writes
naturally align with MSS, and more applications are likely to
encounter this delayed ACK problem.

The fix in this commit is to refine the delayed ACK heuristics with a
simple check: immediately ACK a received 1*MSS skb with PSH bit set if
the app reads all data. Why? If an skb has a len of exactly 1*MSS and
has the PSH bit set then it is likely the end of an application
write. So more data may not be arriving soon, and yet the data sender
may be waiting for an ACK if cwnd-bound or using TX zero copy. Thus we
set ICSK_ACK_PUSHED in this case so that tcp_cleanup_rbuf() will send
an ACK immediately if the app reads all of the data and is not
ping-pong. Note that this logic is also executed for the case where
len > MSS, but in that case this logic does not matter (and does not
hurt) because tcp_cleanup_rbuf() will always ACK immediately if the
app reads data and there is more than an MSS of unACKed data.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: Xin Guo <guoxin0309@gmail.com>
Link: https://lore.kernel.org/r/20231001151239.1866845-2-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04 15:34:18 -07:00
..
bpfilter net: Use umd_cleanup_helper() 2023-05-31 13:06:57 +02:00
netfilter inet: move inet->nodefrag to inet->inet_flags 2023-08-16 11:09:17 +01:00
af_inet.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-24 10:51:39 -07:00
ah4.c net: ipv4: Remove completion function scaffolding 2023-02-13 18:35:15 +08:00
arp.c neighbour: annotate lockless accesses to n->nud_state 2023-03-15 00:37:32 -07:00
bpf_tcp_ca.c bpf: Drop useless btf_vmlinux in bpf_tcp_ca 2023-07-18 17:31:10 -07:00
cipso_ipv4.c inet: move inet->is_icsk to inet->inet_flags 2023-08-16 11:09:17 +01:00
datagram.c ipv4: fix data-races around inet->inet_id 2023-08-20 11:40:49 +01:00
devinet.c net: ipv4: fix one memleak in __inet_del_ifa() 2023-09-08 08:02:17 +01:00
esp4.c net: ipv4: Use kfree_sensitive instead of kfree 2023-07-19 11:03:03 +01:00
esp4_offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-06-22 18:40:38 -07:00
fib_frontend.c ipv4: Fix incorrect table ID in IOCTL path 2023-03-16 17:26:31 -07:00
fib_lookup.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-02-17 11:44:20 -08:00
fib_notifier.c
fib_rules.c ipv4: remove unnecessary type castings 2022-04-30 15:12:58 +01:00
fib_semantics.c ipv4/fib: send notify when delete source address routes 2023-10-03 09:00:40 +02:00
fib_trie.c ipv4/fib: send notify when delete source address routes 2023-10-03 09:00:40 +02:00
fou_bpf.c bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
fou_core.c bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
fou_nl.c net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
fou_nl.h net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
gre_demux.c
gre_offload.c net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
icmp.c icmp: guard against too small mtu 2023-03-31 21:37:06 -07:00
igmp.c igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU 2023-09-05 17:49:40 +01:00
inet_connection_sock.c tcp: annotate data-races around icsk->icsk_syn_retries 2023-07-20 12:34:18 -07:00
inet_diag.c inet: move inet->defer_connect to inet->inet_flags 2023-08-16 11:09:18 +01:00
inet_fragment.c net: dropreason: add SKB_DROP_REASON_FRAG_REASM_TIMEOUT 2022-10-31 20:14:27 -07:00
inet_hashtables.c tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address. 2023-09-13 07:18:04 +01:00
inet_timewait_sock.c inet: move inet->transparent to inet->inet_flags 2023-08-16 11:09:17 +01:00
inetpeer.c inetpeer: Fix data-races around sysctl. 2022-07-08 12:10:33 +01:00
ip_forward.c net: ipv4, ipv6: fix IPSTATS_MIB_OUTOCTETS increment duplicated 2023-08-30 09:44:09 +01:00
ip_fragment.c networking: Update to register_net_sysctl_sz 2023-08-15 15:26:18 -07:00
ip_gre.c ipv4: ip_gre: fix return value check in erspan_xmit() 2023-07-19 12:27:09 +01:00
ip_input.c ipv4: ignore dst hint for multipath routes 2023-09-01 08:11:51 +01:00
ip_options.c
ip_output.c net: annotate data-races around sk->sk_tsflags 2023-09-01 07:27:33 +01:00
ip_sockglue.c net: annotate data-races around sk->sk_tsflags 2023-09-01 07:27:33 +01:00
ip_tunnel.c bpf-next-for-netdev 2023-04-13 16:43:38 -07:00
ip_tunnel_core.c tunnels: fix kasan splat when generating ipv4 pmtu error 2023-08-04 18:24:52 -07:00
ip_vti.c ip_vti: fix potential slab-use-after-free in decode_session6 2023-07-11 11:06:08 +02:00
ipcomp.c xfrm: ipcomp: add extack to ipcomp{4,6}_init_state 2022-09-29 07:18:00 +02:00
ipconfig.c net: ipconfig: move ic_nameservers_fallback into #ifdef block 2023-05-22 11:17:55 +01:00
ipip.c ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices 2023-04-12 16:40:39 -07:00
ipmr.c net: ipv4, ipv6: fix IPSTATS_MIB_OUTOCTETS increment duplicated 2023-08-30 09:44:09 +01:00
ipmr_base.c ipmr: adopt rcu_read_lock() in mr_dump() 2022-06-24 11:34:38 +01:00
Kconfig tcp: configurable source port perturb table size 2022-11-16 13:02:04 +00:00
Makefile bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
metrics.c ipv4: prevent potential spectre v1 gadget in ip_metrics_convert() 2023-01-23 21:37:25 -08:00
netfilter.c netfilter: Use l3mdev flow key when re-routing mangled packets 2022-05-16 13:03:29 +02:00
netlink.c
nexthop.c nexthop: Do not increment dump sentinel at the end of the dump 2023-08-15 18:54:53 -07:00
ping.c inet: move inet->recverr to inet->inet_flags 2023-08-16 11:09:17 +01:00
proc.c icmp: Add counters for rate limits 2023-01-26 10:52:18 +01:00
protocol.c
raw.c inet: move inet->hdrincl to inet->inet_flags 2023-08-16 11:09:17 +01:00
raw_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-06 12:01:20 -07:00
route.c ipv4: Set offload_failed flag in fibmatch results 2023-10-04 11:39:36 -07:00
syncookies.c tcp: Set route scope properly in cookie_v4_check(). 2023-06-06 21:13:03 -07:00
sysctl_net_ipv4.c networking: Update to register_net_sysctl_sz 2023-08-15 15:26:18 -07:00
tcp.c bpf: tcp_read_skb needs to pop skb regardless of seq 2023-09-29 17:04:07 +02:00
tcp_bbr.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
tcp_bic.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_bpf.c bpf, sockmap: Do not inc copied_seq when PEEK flag set 2023-09-29 17:05:00 +02:00
tcp_cdg.c Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
tcp_cong.c net: Update an existing TCP congestion control algorithm. 2023-03-22 22:53:00 -07:00
tcp_cubic.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
tcp_dctcp.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
tcp_dctcp.h
tcp_diag.c tcp: Access &tcp_hashinfo via net. 2022-09-20 10:21:49 -07:00
tcp_fastopen.c inet: move inet->defer_connect to inet->inet_flags 2023-08-16 11:09:18 +01:00
tcp_highspeed.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_htcp.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_hybla.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_illinois.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_input.c tcp: fix delayed ACKs for MSS boundary condition 2023-10-04 15:34:18 -07:00
tcp_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-24 10:51:39 -07:00
tcp_lp.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_metrics.c tcp_metrics: hash table allocation cleanup 2023-08-04 15:33:39 -07:00
tcp_minisocks.c inet: move inet->transparent to inet->inet_flags 2023-08-16 11:09:17 +01:00
tcp_nv.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_offload.c net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
tcp_output.c tcp: fix quick-ack counting to count actual ACKs of new data 2023-10-04 15:34:18 -07:00
tcp_plb.c prandom: remove prandom_u32_max() 2022-12-20 03:13:45 +01:00
tcp_rate.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-04-28 13:02:01 -07:00
tcp_recovery.c tcp: preserve const qualifier in tcp_sk() 2023-03-18 12:23:34 +00:00
tcp_scalable.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_timer.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
tcp_ulp.c net/ulp: use consistent error code when blocking ULP 2023-01-19 09:26:16 -08:00
tcp_vegas.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_vegas.h
tcp_veno.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_westwood.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_yeah.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tunnel4.c
udp.c net: annotate data-races around sk->sk_forward_alloc 2023-09-01 07:27:33 +01:00
udp_bpf.c bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() 2023-03-03 17:25:15 +01:00
udp_diag.c udp: Access &udp_table via net. 2022-11-16 09:43:35 +00:00
udp_impl.h sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
udp_offload.c net: gro: fix misuse of CB in udp socket lookup 2023-07-29 17:10:27 +01:00
udp_tunnel_core.c inet: move inet->mc_loop to inet->inet_frags 2023-08-16 11:09:17 +01:00
udp_tunnel_nic.c udp_tunnel: Add checks for nla_nest_start() in __udp_tunnel_nic_dump_write() 2022-11-29 08:44:24 -08:00
udp_tunnel_stub.c
udplite.c sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
xfrm4_input.c xfrm: fix inbound ipv4/udp/esp packets to UDPv6 dualstack sockets 2023-06-09 08:16:34 +02:00
xfrm4_output.c
xfrm4_policy.c sysctl-6.6-rc1 2023-08-29 17:39:15 -07:00
xfrm4_protocol.c net: xfrm: unexport __init-annotated xfrm4_protocol_init() 2022-06-08 10:10:13 -07:00
xfrm4_state.c
xfrm4_tunnel.c xfrm: tunnel: add extack to ipip_init_state, xfrm6_tunnel_init_state 2022-09-29 07:18:00 +02:00