linux/net
Florian Westphal be0502a3f2 netfilter: conntrack: tcp: only close if RST matches exact sequence
TCP resets cause instant transition from established to closed state
provided the reset is in-window.  Endpoints that implement RFC 5961
require resets to match the next expected sequence number.
RST segments that are in-window (but that do not match RCV.NXT) are
ignored, and a "challenge ACK" is sent back.

Main problem for conntrack is that its a middlebox, i.e.  whereas an end
host might have ACK'd SEQ (and would thus accept an RST with this
sequence number), conntrack might not have seen this ACK (yet).

Therefore we can't simply flag RSTs with non-exact match as invalid.

This updates RST processing as follows:

1. If the connection is in a state other than ESTABLISHED, nothing is
   changed, RST is subject to normal in-window check.

2. If the RSTs sequence number either matches exactly RCV.NXT,
   connection state moves to CLOSE.

3. The same applies if the RST sequence number aligns with a previous
   packet in the same direction.

In all other cases, the connection remains in ESTABLISHED state.
If the normal-in-window check passes, the timeout will be lowered
to that of CLOSE.

If the peer sends a challenge ack, connection timeout will be reset.

If the challenge ACK triggers another RST (RST was valid after all),
this 2nd RST will match expected sequence and conntrack state changes to
CLOSE.

If no challenge ACK is received, the connection will time out after
CLOSE seconds (10 seconds by default), just like without this patch.

Packetdrill test case:

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Receive a segment.
0.210 < P. 1:1001(1000) ack 1 win 46
0.210 > . 1:1(0) ack 1001

// Application writes 1000 bytes.
0.250 write(4, ..., 1000) = 1000
0.250 > P. 1:1001(1000) ack 1001

// First reset, old sequence. Conntrack (correctly) considers this
// invalid due to failed window validation (regardless of this patch).
0.260 < R  2:2(0) ack 1001 win 260

// 2nd reset, but too far ahead sequence.  Same: correctly handled
// as invalid.
0.270 < R 99990001:99990001(0) ack 1001 win 260

// in-window, but not exact sequence.
// Current Linux kernels might reply with a challenge ack, and do not
// remove connection.
// Without this patch, conntrack state moves to CLOSE.
// With patch, timeout is lowered like CLOSE, but connection stays
// in ESTABLISHED state.
0.280 < R 1010:1010(0) ack 1001 win 260

// Expect challenge ACK
0.281 > . 1001:1001(0) ack 1001 win 501

// With or without this patch, RST will cause connection
// to move to CLOSE (sequence number matches)
// 0.282 < R 1001:1001(0) ack 1001 win 260

// ACK
0.300 < . 1001:1001(0) ack 1001 win 257

// more data could be exchanged here, connection
// is still established

// Client closes the connection.
0.610 < F. 1001:1001(0) ack 1001 win 260
0.650 > . 1001:1001(0) ack 1002

// Close the connection without reading outstanding data
0.700 close(4) = 0

// so one more reset.  Will be deemed acceptable with patch as well:
// connection is already closing.
0.701 > R. 1001:1001(0) ack 1002 win 501
// End packetdrill test case.

With patch, this generates following conntrack events:
   [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]

Without patch, first RST moves connection to close, whereas socket state
does not change until FIN is received.
   [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]

Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:19:31 +01:00
..
6lowpan
9p
802
8021q net: Remove switchdev.h inclusion from team/bond/vlan 2019-02-24 17:40:46 -08:00
appletalk
atm atm: clean up vcc_seq_next() 2019-02-16 18:12:22 -08:00
ax25
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-15 12:38:38 -08:00
bluetooth Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2019-02-24 22:27:19 -08:00
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-24 12:06:19 -08:00
bpfilter bpfilter: re-add header search paths to tools include to fix build error 2019-02-23 13:34:40 -08:00
bridge netfilter: ebtables: remove BUGPRINT messages 2019-02-27 10:47:57 +01:00
caif net: caif: use skb helpers instead of open-coding them 2019-02-17 11:01:17 -08:00
can
ceph libceph: handle an empty authorize reply 2019-02-18 18:05:33 +01:00
core devlink: require non-NULL ops for devlink instances 2019-02-26 08:49:05 -08:00
dcb
dccp
decnet
dns_resolver
dsa net: devlink: turn devlink into a built-in 2019-02-26 08:49:05 -08:00
ethernet net/ethernet: Add parse_protocol header_ops support 2019-02-22 12:55:31 -08:00
hsr net: hsr: Convert timers to use timer_setup() 2017-10-25 13:00:27 +09:00
ieee802154 net: remove unused struct inet_frag_queue.fragments field 2019-02-26 08:27:05 -08:00
ife
ipv4 netfilter: nat: remove nf_nat_l3proto.h and nf_nat_core.h 2019-02-27 10:54:08 +01:00
ipv6 netfilter: nat: remove nf_nat_l3proto.h and nf_nat_core.h 2019-02-27 10:54:08 +01:00
iucv
kcm kcm: Remove unnecessary SLAB_PANIC for kmem_cache_create() in kcm_init 2019-02-23 13:46:24 -08:00
key af_key: unconditionally clone on broadcast 2019-02-12 10:36:42 +01:00
l2tp
l3mdev
lapb
llc
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-24 12:06:19 -08:00
mac802154
mpls mpls_iptunnel: use struct_size() helper 2019-02-08 22:57:27 -08:00
ncsi
netfilter netfilter: conntrack: tcp: only close if RST matches exact sequence 2019-03-01 14:19:31 +01:00
netlabel
netlink rhashtable: Remove obsolete rhashtable_walk_init function 2019-02-22 13:49:00 +01:00
netrom
nfc
nsh
openvswitch netfilter: nat: remove nf_nat_l3proto.h and nf_nat_core.h 2019-02-27 10:54:08 +01:00
packet net/packet: Remove redundant skb->protocol set 2019-02-22 12:55:31 -08:00
phonet phonet: fix building with clang 2019-02-21 16:23:56 -08:00
psample
qrtr
rds
rfkill
rose net: rose: add missing dev_put() on error in rose_bind 2019-02-19 13:22:46 -08:00
rxrpc
sched net: sched: pie: fix 64-bit division 2019-02-26 18:55:38 -08:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-24 12:06:19 -08:00
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-24 12:06:19 -08:00
strparser
sunrpc Two small fixes, one for crashes using nfs/krb5 with older enctypes, one 2019-02-16 17:38:01 -08:00
switchdev switchdev: Complete removal of switchdev_port_attr_get() 2019-02-24 22:31:41 -08:00
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-24 12:06:19 -08:00
tls tls: Return type of non-data records retrieved using MSG_PEEK in recvmsg 2019-02-24 21:58:38 -08:00
unix missing barriers in some of unix_sock ->addr and ->path accesses 2019-02-20 20:06:28 -08:00
vmw_vsock Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-15 12:38:38 -08:00
wimax
wireless Merge remote-tracking branch 'net-next/master' into mac80211-next 2019-02-22 13:48:13 +01:00
x25 net/x25: fix a race in x25_bind() 2019-02-23 18:41:06 -08:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-24 12:06:19 -08:00
xfrm xfrm: Fix inbound traffic via XFRM interfaces across network namespaces 2019-02-18 10:58:54 +01:00
compat.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-24 12:06:19 -08:00
Kconfig net: devlink: turn devlink into a built-in 2019-02-26 08:49:05 -08:00
Makefile
socket.c
sysctl_net.c