Use the v4/v6 union members rather than the uint32_t ones.
Export IN_ARE_MASKED_ADDR_EQUAL() in in_var.h and use it (and its IPv6
equivalent) for masked comparisons rather than hand-rolled code.
Event: Kitchener-Waterloo Hackathon 202406
Before this patch, a stack (tfb) accepts a tcpcb (tp), if the
tp->t_state is TCPS_CLOSED or tfb->tfb_tcp_handoff_ok is not NULL
and tfb->tfb_tcp_handoff_ok(tp) returns 0.
After this patch, the only check is tfb->tfb_tcp_handoff_ok(tp)
returns 0. tfb->tfb_tcp_handoff_ok must always be provided.
For existing TCP stacks (FreeBSD, RACK and BBR) there is no
functional change. However, the logic is simpler.
Reviewed by: lstewart, peter_lei_ieee_.org, rrs
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D45253
pr_attach is only called on a socket (so) with so->so_listen != NULL
via sonewconn. However, sonewconn is not called from the TCP code.
The listening sockets are handled in tcp_syncache.c without using
sonewconn. Therefore, the code removed is never executed.
No functional change intended.
Reviewed by: rrs, peter.lei_ieee.org
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D45412
When we first find an inp, we set also the tp. If then a second
lookup is necessary, the inp is recomputed. If this fails, the
tp is not cleared, which resulted in failing KASSERT.
Therefore, clear the tp when staring the inp lookup procedure.
Reported by: Jenkins
Fixes: 02d15215ce ("tcp: improve blackhole support")
MFC after: 1 week
Sponsored by: Netflix, Inc.
There are two improvements to the TCP blackhole support:
(1) If net.inet.tcp.blackhole is set to 2, also sent no RST whenever
a segment is received on an existing closed socket or if there is
a port mismatch when using UDP encapsulation.
(2) If net.inet.tcp.blackhole is set to 3, no RST segment is sent in
response to incoming segments on closed sockets or in response to
unexpected segments on listening sockets.
Thanks to gallatin@ for suggesting such an improvement.
Reviewed by: gallatin
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D45304
Ensure that the inp is not dropped when starting a stack switch.
While there, clean-up the code by using INP_WLOCK_RECHECK, which
also re-assigns tp.
Reviewed by: glebius
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D45241
With the commit of D44903 we no longer need the SAD option. Instead all stacks that
use the sack filter inherit its protection against sack-attack.
Reviewed by: tuexen@
Differential Revision:https://reviews.freebsd.org/D45216
The !DXR2 code corresponds to the original DXR encoding proposal from
2012 with a single direct-lookup stage, which is inferior to the more
recent (DXR2) variant with two-stage trie both in terms of memory
footprint of the lookup structures, and in terms of overall lookup
througput.
I'm axing the old code chunks to (hopefully) somewhat improve readability,
as well as to simplify future maintenance and updates.
MFC after: 1 week
DXR lookup table encoding has an inherent structural limit on the amount
of binary search ranges it can accomodate. With the current IPv4 BGP views
(circa 1 M prefixes) and default DXR encoding we are only at around 5% of
that limit, so far, far away from hitting it. Just in case it ever gets
hit, make sure we free the allocated structures, instead of leaking it.
MFC after: 1 week
When calling dxr_init(), the FIB_ALGO infrastructure may provide a
pointer to a previous dxr instance, which permits reuse of auxiliary
dxr structures, i.e. incremental lookup structure updates. For dxr this
is a crucial feature provided by FIB_ALGO, since dxr incremental updates
are typically several orders of magnitude faster than full lookup table
rebuilds.
However, the auxiliary dxr structure caches a pointer to struct fib_data and
relies upon it for performing incremental updates. Apparently, incremental
rebuild requests from FIB_ALGO, i.e. a calls to dxr_init() with a pointer
old_data set, may (under not yet fully understood circumstances) be invoked
within a different fib_data context than the one cached in the previous
version of dxr auxiliary structures. In such (rare) events, we ignore the
offered old dxr context, and proceed with a full lookup structure rebuild
instead of attempting an incremental one using a fib_data context which
may or may not no longer be valid, and thus lead to a system crash.
PR: 278422
MFC after: 1 week
When the RACK stack wants to send a FIN, but still has outstanding
or unsent data, it sends a challenge ack. Don't do this when the
TCP endpoint is still in the front states, since it does not
make sense.
Reviewed by: rrs
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D45122
Right now, the code in process_chunk_drop() does not look the
the corresponding fields.
Therefore, no functional change intended.
Reported by: Coverity Scan
CID: 1472476
MFC after: 3 days
In this case uio is NULL, which needs to be checked and m must
be copied into the sctp_copy_all structure.
Reported by: Coverity Scan
CID: 1400449
MFC after: 3 days
Add a counter to track how frequently SACK has transmitted
more than one MSS using TSO. Instances when this will be
beneficial is the use of PRR, or when ACK thinning due to
GRO/LRO or ACK discards by the network are present.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D45070
Introduce net.inet.tcp.sack.tso for future use when TSO is ready
to be used during loss recovery.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D45068
While the SACK Scoreboard in the base stack limits
the number of holes by default to only 128 per connection
in order to prevent CPU load attacks by splitting SACKs,
filtering out SACK blocks of unusually small size can
further improve the actual processing of SACK loss recovery.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D45075
There is only one functional change here - we don't allow SIOCSVH (or
netlink request) to change sc->sc_version. I'm convinced that allowing
such a change doesn't brings any practical value, but creates enless
minefields in front of both developers and end users (sysadmins). If
you want to switch from VRRP to CARP or vice versa, you'd need to recreate
the VHID.
Oh, one tiny funtional change: carp_ioctl_set() won't modify any fields
if it returns EINVAL. Previously you could provide valid advbase with
invalid advskew - that used to modify advbase and return EINVAL.
All other changes is a sweep around not ever using CARP fields when
we are in VRRP mode and vice versa. Also adding assertions on sc_version
where necessary.
Do not send VRRP vars in CARP mode via NetLink and vice versa. However
in compat ioctl SIOCGVH for VRRP mode the CARP fields would be zeroes.
This allows to declare softc as union and thus prevent any future logic
deterioration wrt to mixing VRRP and CARP.
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D45039
- Separate HMAC preparation (CARP specific) from tagging.
- In unicast mode (CARP specific) don't put tag at all.
- Don't put pointer to software context into the tag. Putting just vhid,
an integer value, is a safer design.
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D45038
Provide inline send_ad_locked() that switches between protocol
specific sending function.
Rename carp_send_ad() to carp_callout() to avoid getting lost in
all these multiple foo_send_ad.
No functional change intended.
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D45036
Allow carp(4) to use the VRRPv3 protocol (RFC 5798). We can distinguish carp and
VRRP based on the protocol version number (carp is 2, VRRPv3 is 3), and support
both from the carp(4) code.
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D44774
Replace local PAD macro with PADTCPOLEN macro
No functional change.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D45076
Previously it was possible for dxr_build() to return with da->fd
unset in case of range_tbl or x_tbl malloc() failures. This
may have led to NULL ptr dereferencing in dxr_change_rib_batch().
MFC after: 1 week
PR: 278422
There is a type of attack that a TCP peer can launch on a connection. This is for sure in Rack or BBR and probably even the default stack if it uses lists in sack processing. The idea of the attack is that the attacker is driving you to look at 100's of sack blocks that only update 1 byte. So for example if you have 1 - 10,000 bytes outstanding the attacker sends in something like:
ACK 0 SACK(1-512) SACK(1024 - 1536), SACK(2048-2536), SACK(4096 - 4608), SACK(8192-8704)
This first sack looks fine but then the attacker sends
ACK 0 SACK(1-512) SACK(1025 - 1537), SACK(2049-2537), SACK(4097 - 4609), SACK(8193-8705)
ACK 0 SACK(1-512) SACK(1027 - 1539), SACK(2051-2539), SACK(4099 - 4611), SACK(8195-8707)
...
These blocks are making you hunt across your linked list and split things up so that you have an entry for every other byte. Has your list grows you spend more and more CPU running through the lists. The idea here is the attacker chooses entries as far apart as possible that make you run through the list. This example is small but in theory if the window is open to say 1Meg you could end up with 100's of thousands link list entries.
To combat this we introduce three things.
when the peer requests a very small MSS we stop processing SACK's from them. This prevents a malicious peer from just using a small MSS to do the same thing.
Any time we get a sack block, we use the sack-filter to remove sacks that are smaller than the smallest v4 mss (minus 40 for max TCP options) unless it ties up to snd_max (since that is legal). All other sacks in theory should be at least an MSS. If we get such an attacker that means we basically start skipping all but MSS sized Sacked blocks.
The sack filter used to throw away data when its bounds were exceeded, instead now we increase its size to 15 and then throw away sack's if the filter gets over-run to prevent the malicious attacker from over-running the sack filter and thus we start to process things anyway.
The default stack will need to start using the sack-filter which we have talked about in past conference calls to take full advantage of the protections offered by it (and reduce cpu consumption when processing sacks).
After this set of changes is in rack can drop its SAD detection completely
Reviewed by:tuexen@, rscheff@
Differential Revision: <https://reviews.freebsd.org/D44903>
In the error path during allocating an in_pcb, the credentials
associated with the new struct get their reference count
increased early on, but not decremented when the allocation
fails.
Reported by: cmiller_netapp.com
MFC after: 3 days
Reviewed by: jhb, tuexen
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D45033
This is used by 802.3 Ethernet. (Also be used by 802.4 Token Bus and
802.5 Token Ring, but we don't support those.)
This was accidentally removed along with FDDI support in commit
0437c8e3b1, presumably because comments implied it was used only by
FDDI or Token Ring.
Fixes: 0437c8e3b1 ("Remove support for FDDI networks.")
Reviewed-by: emaste
Signed-off-by: Denny Page <dennypage@me.com>
Pull-request: https://github.com/freebsd/freebsd-src/pull/1166
Fix a typo, which resulted in missing r_ctl.gate_to_fs in the BBLog
event.
Reported by: Coverity Scan
CID: 1540024
Reviewed by: rrs, rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44648
RFC 9293 describes the handling of data in the CLOSE-WAIT, CLOSING,
LAST-ACK, and TIME-WAIT states:
This should not occur since a FIN has been received from the remote
side. Ignore the segment text.
Therefore, implement this handling.
Reviewed by: rrs, rscheff
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44746
struct tcpcb embeds a struct osd and a struct callout. Rather than
forcing all consumers to pull in the same headers, include the headers
directly.
No functional change intended.
Reviewed by: glebius
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44685