Commit graph

7992 commits

Author SHA1 Message Date
Kristof Provost 8f04209d37 pf: simplify pf_addrcpy() and pf_match_addr()
Use the v4/v6 union members rather than the uint32_t ones.
Export IN_ARE_MASKED_ADDR_EQUAL() in in_var.h and use it (and its IPv6
equivalent) for masked comparisons rather than hand-rolled code.

Event:		Kitchener-Waterloo Hackathon 202406
2024-06-06 15:45:31 +02:00
Michael Tuexen 86c9325d34 tcp: simplify stack switching protocol
Before this patch, a stack (tfb) accepts a tcpcb (tp), if the
tp->t_state is TCPS_CLOSED or tfb->tfb_tcp_handoff_ok is not NULL
and tfb->tfb_tcp_handoff_ok(tp) returns 0.
After this patch, the only check is tfb->tfb_tcp_handoff_ok(tp)
returns 0. tfb->tfb_tcp_handoff_ok must always be provided.
For existing TCP stacks (FreeBSD, RACK and BBR) there is no
functional change. However, the logic is simpler.

Reviewed by:		lstewart, peter_lei_ieee_.org, rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D45253
2024-06-06 08:29:05 +02:00
Michael Tuexen e7381521aa tcp: remove unused code in tcp_usr_attach
pr_attach is only called on a socket (so) with so->so_listen != NULL
via sonewconn. However, sonewconn is not called from the TCP code.
The listening sockets are handled in tcp_syncache.c without using
sonewconn. Therefore, the code removed is never executed.
No functional change intended.

Reviewed by:		rrs, peter.lei_ieee.org
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D45412
2024-05-30 21:23:45 +02:00
Michael Tuexen df9de82f54 tcp: fix sending RST after second inp lookup
When we first find an inp, we set also the tp. If then a second
lookup is necessary, the inp is recomputed. If this fails, the
tp is not cleared, which resulted in failing KASSERT.
Therefore, clear the tp when staring the inp lookup procedure.
Reported by:	Jenkins
Fixes:		02d15215ce ("tcp: improve blackhole support")
MFC after:	1 week
Sponsored by:	Netflix, Inc.
2024-05-25 19:58:48 +02:00
Michael Tuexen 02d15215ce tcp: improve blackhole support
There are two improvements to the TCP blackhole support:
(1) If net.inet.tcp.blackhole is set to 2, also sent no RST whenever
    a segment is received on an existing closed socket or if there is
    a port mismatch when using UDP encapsulation.
(2) If net.inet.tcp.blackhole is set to 3, no RST segment is sent in
    response to incoming segments on closed sockets or in response to
    unexpected segments on listening sockets.
Thanks to gallatin@ for suggesting such an improvement.

Reviewed by:		gallatin
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D45304
2024-05-24 06:59:13 +02:00
Henrich Hartzer 674956e199 sys/netinet/cc: Switch from deprecated random() to prng32()
Related: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277655

Signed-off-by: henrichhartzer@tuta.io
Reviewed by: imp, mav
Pull Request: https://github.com/freebsd/freebsd-src/pull/1162
2024-05-23 15:10:09 -06:00
Cy Schubert 380ee9b3c0 sys/netinet/icmp6.h: Fix build
Fix stdint.h file not found.

Fixes: 		4b75afe885
2024-05-23 14:03:55 -07:00
Lexi Winter 4b75afe885 sys/netinet/icmp6.h: use C99 uintX_t constants for new PREF64 struct
Reviewed by: imp, glebius (prior suggetions done)
Pull Request: https://github.com/freebsd/freebsd-src/pull/1206
2024-05-23 14:40:48 -06:00
Lexi Winter 1e8eb413f6 netinet/icmp6: add PREF64 definitions (RFC 8781)
Reviewed by: imp, glebius (prior suggetions done)
Pull Request: https://github.com/freebsd/freebsd-src/pull/1206
2024-05-23 14:40:11 -06:00
Michael Tuexen fe136aecc2 tcp: improve inp locking in setsockopt
Ensure that the inp is not dropped when starting a stack switch.
While there, clean-up the code by using INP_WLOCK_RECHECK, which
also re-assigns tp.

Reviewed by:		glebius
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D45241
2024-05-23 22:19:12 +02:00
Randall Stewart ea916b6412 Remove TCP_SAD optional code now that the sack filter performs this function.
With the commit of D44903 we no longer need the SAD option. Instead all stacks that
use the sack filter inherit its protection against sack-attack.

Reviewed by: tuexen@
 Differential Revision:https://reviews.freebsd.org/D45216
2024-05-18 10:57:04 -04:00
Marko Zec 42b3c16e30 fib_dxr: code hygiene, prune old code, no functional changes
The !DXR2 code corresponds to the original DXR encoding proposal from
2012 with a single direct-lookup stage, which is inferior to the more
recent (DXR2) variant with two-stage trie both in terms of memory
footprint of the lookup structures, and in terms of overall lookup
througput.

I'm axing the old code chunks to (hopefully) somewhat improve readability,
as well as to simplify future maintenance and updates.

MFC after:	1 week
2024-05-17 18:57:25 +02:00
Marko Zec 19bd24caa4 fib_dxr: do not leak memory if FIB constellation hits structural limit
DXR lookup table encoding has an inherent structural limit on the amount
of binary search ranges it can accomodate.  With the current IPv4 BGP views
(circa 1 M prefixes) and default DXR encoding we are only at around 5% of
that limit, so far, far away from hitting it.  Just in case it ever gets
hit, make sure we free the allocated structures, instead of leaking it.

MFC after:	1 week
2024-05-17 18:46:41 +02:00
Marko Zec 4ab122e8ef fib_dxr: check if cached fib_data matches the new request in dxr_init()
When calling dxr_init(), the FIB_ALGO infrastructure may provide a
pointer to a previous dxr instance, which permits reuse of auxiliary
dxr structures, i.e. incremental lookup structure updates.  For dxr this
is a crucial feature provided by FIB_ALGO, since dxr incremental updates
are typically several orders of magnitude faster than full lookup table
rebuilds.

However, the auxiliary dxr structure caches a pointer to struct fib_data and
relies upon it for performing incremental updates.  Apparently, incremental
rebuild requests from FIB_ALGO, i.e. a calls to dxr_init() with a pointer
old_data set, may (under not yet fully understood circumstances) be invoked
within a different fib_data context than the one cached in the previous
version of dxr auxiliary structures.  In such (rare) events, we ignore the
offered old dxr context, and proceed with a full lookup structure rebuild
instead of attempting an incremental one using a fib_data context which
may or may not no longer be valid, and thus lead to a system crash.

PR:		278422
MFC after:	1 week
2024-05-17 18:21:54 +02:00
Gordon Bergling 78e4dbc345 ipfw: Fix a typo in a source code comment
- s/defaul/default/

MFC after:	3 days
2024-05-12 10:53:40 +02:00
Michael Tuexen 2f923a0ced tcp rack: improve handling of front states
When the RACK stack wants to send a FIN, but still has outstanding
or unsent data, it sends a challenge ack. Don't do this when the
TCP endpoint is still in the front states, since it does not
make sense.
Reviewed by:		rrs
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D45122
2024-05-11 16:28:45 +02:00
Michael Tuexen 5120ea0d88 sctp: improve heartbeat timer computation
PR:		278666
Reviewed by:	Albin Hellqvist
MFC after:	3 days
Pull Request:	https://reviews.freebsd.org/D45107
2024-05-10 21:02:56 +02:00
Michael Tuexen b67716dd58 sctp: store heartbeat creation time as time_t
Reported by:	Coverity Scan
CID:		1493087
MFC after:	3 days
2024-05-10 20:40:15 +02:00
Michael Tuexen 42aeb8d490 sctp: store vtag expire time as time_t
Reported by:	Coverity Scan
CID:		1492525
CID:		1493239
MFC after:	3 days
2024-05-10 20:28:38 +02:00
Michael Tuexen 9d8a3718e2 sctp: store cookie secret change time as time_t
Reported by:	Coverity Scan
CID:		1492349
CID:		1493281
MFC after:	3 days
2024-05-10 20:14:16 +02:00
Michael Tuexen 0d15140d6d sctp: minor cleanup
No functional chnage intended.
MFC after:	3 days
2024-05-09 00:51:09 +02:00
Michael Tuexen 8c37094036 sctp: allow stcb == NULL in sctp_shutdown()
Consistently handle this case.
Reported by:	Coverity Scan
CID:		1533813
MFC after:	3 days
2024-05-09 00:43:28 +02:00
Michael Tuexen 83dcc7790b sctp: don't provide uninitialized memory to process_chunk_drop()
Right now, the code in process_chunk_drop() does not look the
the corresponding fields.
Therefore, no functional change intended.
Reported by:	Coverity Scan
CID:		1472476
MFC after:	3 days
2024-05-09 00:17:13 +02:00
Michael Tuexen e187fa5690 sctp: fix sctp_sendall() when an mbuf chain is provided
In this case uio is NULL, which needs to be checked and m must
be copied into the sctp_copy_all structure.
Reported by:	Coverity Scan
CID:		1400449
MFC after:	3 days
2024-05-08 23:45:55 +02:00
Michael Tuexen 3d40cc7ab8 sctp: add missing check
If memory allocation fails, m is NULL. Since this is possible,
check for it.
Reported by:	Coverity Scan
CID:		1086866
MFC after:	3 days
2024-05-08 23:03:34 +02:00
Richard Scheffenegger 2a9aae9e5f tcp: add counter to track when SACK loss recovery uses TSO
Add a counter to track how frequently SACK has transmitted
more than one MSS using TSO. Instances when this will be
beneficial is the use of PRR, or when ACK thinning due to
GRO/LRO or ACK discards by the network are present.

Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D45070
2024-05-08 14:37:33 +02:00
Richard Scheffenegger dcdfe44901 tcp: add sysctl to allow/disallow TSO during SACK loss recovery
Introduce net.inet.tcp.sack.tso for future use when TSO is ready
to be used during loss recovery.

Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D45068
2024-05-08 14:33:20 +02:00
Richard Scheffenegger cbf3575aa3 tcp: filter small SACK blocks
While the SACK Scoreboard in the base stack limits
the number of holes by default to only 128 per connection
in order to prevent CPU load attacks by splitting SACKs,
filtering out SACK blocks of unusually small size can
further improve the actual processing of SACK loss recovery.

Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D45075
2024-05-08 14:00:10 +02:00
Gleb Smirnoff a254d6870e carp: isolate VRRP from CARP
There is only one functional change here - we don't allow SIOCSVH (or
netlink request) to change sc->sc_version.  I'm convinced that allowing
such a change doesn't brings any practical value, but creates enless
minefields in front of both developers and end users (sysadmins).  If
you want to switch from VRRP to CARP or vice versa, you'd need to recreate
the VHID.

Oh, one tiny funtional change: carp_ioctl_set() won't modify any fields
if it returns EINVAL.  Previously you could provide valid advbase with
invalid advskew - that used to modify advbase and return EINVAL.

All other changes is a sweep around not ever using CARP fields when
we are in VRRP mode and vice versa.  Also adding assertions on sc_version
where necessary.

Do not send VRRP vars in CARP mode via NetLink and vice versa.  However
in compat ioctl SIOCGVH for VRRP mode the CARP fields would be zeroes.

This allows to declare softc as union and thus prevent any future logic
deterioration wrt to mixing VRRP and CARP.

Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D45039
2024-05-08 13:19:04 +02:00
Gleb Smirnoff 601438fbfa carp: refactor packet tagging for ether_output()
- Separate HMAC preparation (CARP specific) from tagging.
- In unicast mode (CARP specific) don't put tag at all.
- Don't put pointer to software context into the tag.  Putting just vhid,
  an integer value, is a safer design.

Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D45038
2024-05-08 13:19:04 +02:00
Gleb Smirnoff cda57d955b carp: assert that we are calling correct input function. We are.
Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D45037
2024-05-08 13:19:04 +02:00
Gleb Smirnoff 5ee92cbd82 carp: don't chain call vrrp_send_ad via carp_send_ad
Provide inline send_ad_locked() that switches between protocol
specific sending function.

Rename carp_send_ad() to carp_callout() to avoid getting lost in
all these multiple foo_send_ad.

No functional change intended.

Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D45036
2024-05-08 13:19:04 +02:00
Kristof Provost 3711515467 carp: support VRRPv3
Allow carp(4) to use the VRRPv3 protocol (RFC 5798). We can distinguish carp and
VRRP based on the protocol version number (carp is 2, VRRPv3 is 3), and support
both from the carp(4) code.

Reviewed by:	glebius
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D44774
2024-05-08 13:19:03 +02:00
Gleb Smirnoff b6b4ac2faa tcp_hostcache: remove unnecessary socketvar.h 2024-05-07 14:15:49 -07:00
Richard Scheffenegger 59884aea8b tcp: clean up macro useage in tcp_fixed_maxseg()
Replace local PAD macro with PADTCPOLEN macro
No functional change.

Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D45076
2024-05-04 13:04:25 +02:00
Marko Zec b24e353f9e fib_dxr: set fib_data field in struct dxr_aux early enough
Previously it was possible for dxr_build() to return with da->fd
unset in case of range_tbl or x_tbl malloc() failures.  This
may have led to NULL ptr dereferencing in dxr_change_rib_batch().

MFC after:	1 week

PR:		278422
2024-05-07 17:44:09 +02:00
Marko Zec 4aa275f12d fib_dxr: s/KASSERT/MPASS/
MFC after:	1 week
2024-05-07 17:33:23 +02:00
Marko Zec 7a5de1d4cc fib_dxr: KASSERTs for chasing NULL ptr and runaway refcount suspects
MFC after:	1 week
2024-05-07 17:22:00 +02:00
Marko Zec ed541e201a fib_dxr: move the bulko of malloc() failure logging into dxr_build() 2024-05-07 17:11:30 +02:00
Marko Zec 5295e891d0 fib_dxr: update comment.
MFC after:	1 week
2024-05-06 20:42:31 +02:00
Marko Zec 858010643c fib_dxr: free() does nothing if arg is NULL, so remove a redundant check.
MFC after:	1 week
2024-05-06 20:37:44 +02:00
Marko Zec 308caa38cd fib_dxr: log malloc() failures.
MFC after:	1 week
2024-05-06 20:21:55 +02:00
Randall Stewart fce03f85c5 TCP can be subject to Sack Attacks lets fix this issue.
There is a type of attack that a TCP peer can launch on a connection. This is for sure in Rack or BBR and probably even the default stack if it uses lists in sack processing. The idea of the attack is that the attacker is driving you to look at 100's of sack blocks that only update 1 byte. So for example if you have 1 - 10,000 bytes outstanding the attacker sends in something like:

ACK 0 SACK(1-512) SACK(1024 - 1536), SACK(2048-2536), SACK(4096 - 4608), SACK(8192-8704)
This first sack looks fine but then the attacker sends

ACK 0 SACK(1-512) SACK(1025 - 1537), SACK(2049-2537), SACK(4097 - 4609), SACK(8193-8705)
ACK 0 SACK(1-512) SACK(1027 - 1539), SACK(2051-2539), SACK(4099 - 4611), SACK(8195-8707)
...
These blocks are making you hunt across your linked list and split things up so that you have an entry for every other byte. Has your list grows you spend more and more CPU running through the lists. The idea here is the attacker chooses entries as far apart as possible that make you run through the list. This example is small but in theory if the window is open to say 1Meg you could end up with 100's of thousands link list entries.

To combat this we introduce three things.

when the peer requests a very small MSS we stop processing SACK's from them. This prevents a malicious peer from just using a small MSS to do the same thing.
Any time we get a sack block, we use the sack-filter to remove sacks that are smaller than the smallest v4 mss (minus 40 for max TCP options) unless it ties up to snd_max (since that is legal). All other sacks in theory should be at least an MSS. If we get such an attacker that means we basically start skipping all but MSS sized Sacked blocks.
The sack filter used to throw away data when its bounds were exceeded, instead now we increase its size to 15 and then throw away sack's if the filter gets over-run to prevent the malicious attacker from over-running the sack filter and thus we start to process things anyway.
The default stack will need to start using the sack-filter which we have talked about in past conference calls to take full advantage of the protections offered by it (and reduce cpu consumption when processing sacks).

After this set of changes is in rack can drop its SAD detection completely

Reviewed by:tuexen@, rscheff@
 Differential Revision:	<https://reviews.freebsd.org/D44903>
2024-05-05 09:08:47 -04:00
Richard Scheffenegger 30cf0fbf26 in_pcb: don't leak credential refcounts on error
In the error path during allocating an in_pcb, the credentials
associated with the new struct get their reference count
increased early on, but not decremented when the allocation
fails.

Reported by:		cmiller_netapp.com
MFC after:		3 days
Reviewed by:		jhb, tuexen
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D45033
2024-05-01 08:41:26 +02:00
Gleb Smirnoff c68eed82a3 accf_tls: accept filter that waits for TLS handshake header 2024-04-24 17:53:10 -07:00
Denny Page fcdf9a1989 Support ARP for 802 networks
This is used by 802.3 Ethernet.  (Also be used by 802.4 Token Bus and
802.5 Token Ring, but we don't support those.)

This was accidentally removed along with FDDI support in commit
0437c8e3b1, presumably because comments implied it was used only by
FDDI or Token Ring.

Fixes: 0437c8e3b1 ("Remove support for FDDI networks.")
Reviewed-by: emaste
Signed-off-by: Denny Page <dennypage@me.com>
Pull-request: https://github.com/freebsd/freebsd-src/pull/1166
2024-04-23 12:30:53 -04:00
Michael Tuexen 1941914d3b tcp rack: improve BBR_LOG_CWND event
Fix a typo, which resulted in missing r_ctl.gate_to_fs in the BBLog
event.

Reported by:		Coverity Scan
CID:			1540024
Reviewed by:		rrs, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44648
2024-04-18 21:57:44 +02:00
Michael Tuexen c9cd686bd4 tcp: drop data received after a FIN has been processed
RFC 9293 describes the handling of data in the CLOSE-WAIT, CLOSING,
LAST-ACK, and TIME-WAIT states:
This should not occur since a FIN has been received from the remote
side. Ignore the segment text.
Therefore, implement this handling.

Reviewed by:		rrs, rscheff
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44746
2024-04-18 21:54:42 +02:00
Michael Tuexen 605a00660e tcp bbr: improve code consistency
Improve code consistency with the RACK stack.
Reviewed by:		gallatin, rscheff
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44800
2024-04-15 23:52:08 +02:00
Mark Johnston 1d14e88e53 tcp: Make tcp_var.h more self-contained
struct tcpcb embeds a struct osd and a struct callout.  Rather than
forcing all consumers to pull in the same headers, include the headers
directly.

No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44685
2024-04-10 08:53:49 -04:00