Commit graph

2347 commits

Author SHA1 Message Date
Gleb Smirnoff f6963113f4 in6_rmx: remove unnecessary socketvar.h 2024-05-07 14:15:56 -07:00
Gleb Smirnoff b925d71967 sockets: garbage collect PRCOREQUESTS and stale comment
The code deleted predates FreeBSD history.  The comment deleted is 99%
outdated.  Why KAME decided to use these constants instead of normal ones
also lost in centuries.
2024-05-07 14:15:49 -07:00
Mike Karels eb3dbf2dbe in6.h: expose s6_addr* definitions to user level
The only element of of in6_addr that is specified in RFC 3493 or
in POSIX.1-2017 is s6_addr, implemented via a #define to a union
member.  However, FreeBSD and other BSD systems have additional
definitions for the other union members, s6_addr{8,16,32} which
are defined for the kernel and loader.  Some Linux applications
also use them, and they seem to be allowed by the RFC and POSIX.
Remove the current ifdefs, exposing the additional fields to user
level, and replace with #if __BSD_VISIBLE.  Add an explanatory
comment expanding on the previous "nonstandard" comment.

MFC after:	1 week
Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D44979
2024-05-02 10:24:37 -05:00
Lexi Winter 042fb58d00 sys/netinet6/in6_pcb.c: fix compile without INET
in6_mapped_sockaddr() and in6_mapped_peeraddr() both define a local
variable named 'inp', but in the non-INET case, this variable is set
and never used, causing a compiler error:

/src/freebsd/src/lf/sys/netinet6/in6_pcb.c:547:16: error:
	variable 'inp' set but not used [-Werror,-Wunused-but-set-variable]
  547 |         struct  inpcb *inp;
      |                        ^
/src/freebsd/src/lf/sys/netinet6/in6_pcb.c:573:16: error:
	variable 'inp' set but not used [-Werror,-Wunused-but-set-variable]
  573 |         struct  inpcb *inp;

Fix this by guarding all the INET-specific logic, including the variable
definition, behind #ifdef INET.

While here, tweak formatting in in6_mapped_peeraddr() so both functions
are the same.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1155
2024-04-12 10:54:27 -06:00
Gleb Smirnoff f7c4d12bcd icmp: correct the assertion that checks limit + jitter
Fixes:	4399e055ea
2024-04-08 16:54:19 -07:00
Kristof Provost 60d8dbbef0 netinet: add a probe point for IP, IP6, ICMP, ICMP6, UDP and TCP stats counters
When debugging network issues one common clue is an unexpectedly
incrementing error counter. This is helpful, in that it gives us an
idea of what might be going wrong, but often these counters may be
incremented in different functions.

Add a static probe point for them so that we can use dtrace to get
futher information (e.g. a stack trace).

For example:
	dtrace -n 'mib:ip:count: { printf("%d", arg0); stack(); }'

This can be disabled by setting the following kernel option:
	options 	KDTRACE_NO_MIB_SDT

Reviewed by:	gallatin, tuexen (previous version), gnn (previous version)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43504
2024-04-08 17:29:59 +02:00
Gleb Smirnoff 4399e055ea icmp: allow zero value for ICMP limits
Zero means limit is disabled, so the value doesn't need to be checked
against jitter value.

Fixes:	ac44739fd8
Fixes:	a03aff88a1
2024-03-24 19:52:03 -07:00
Gleb Smirnoff a03aff88a1 icmp6: bring rate limiting on a par with IPv4
Use counter_ratecheck() instead of racy and slow ppsratecheck. Use a
separate counter for every currently known type of ICMPv6. Provide logging
of ratelimit events. Provide jitter to counter open UDP port detection.

Reviewed by:		tuexen, zlei
Differential Revision:	https://reviews.freebsd.org/D44482
2024-03-24 09:13:23 -07:00
Gleb Smirnoff 4f96be33fe icmp6: move ICMPv6 related tunables to the files where they are used
Most of them can be declared as static after the move out of in6_proto.c.
Keeping sysctl(9) declarations with their text descriptions next to the
variable declaration create self-documenting code.  There should be no
functional changes.

Differential Revision:	https://reviews.freebsd.org/D44481
2024-03-24 09:13:23 -07:00
Gleb Smirnoff 32aeee8ce7 icmp6: rate limit our echo replies
The generation of ICMP6_ECHO_REPLY bypasses icmp6_error(), thus rate
limit was not applied.

Reviewed by:		tuexen, zlei
Differential Revision:	https://reviews.freebsd.org/D44480
2024-03-24 09:13:23 -07:00
Gleb Smirnoff c6c96aaba8 icmp6: make icmp6_ratelimit() responsible to update the stats counter
Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D44479
2024-03-24 09:13:23 -07:00
Andrew Gallatin 530c2c30b0 ip6_output: Reduce cache misses on pktopts
When profiling an IP6 heavy workload, I noticed that we were
getting a lot of cache misses in ip6_output() around
ip6_pktopts. This was happening because the TCP stack passes
inp->in6p_outputopts even if all options are unused. So in the
common case of no options present, pkt_opts is not null, and is
checked repeatedly for different options. Since ip6_pktopts is
large (4 cachelines), and every field is checked, we take 4
cache misses (2 of which tend to be hidden by the adjacent line
prefetcher).

To fix this common case, I introduced a new flag in ip6_pktopts
(ip6po_valid) which tracks which options have been set. In the
common case where nothing is set, this causes just a single
cache miss to load. It also eliminates a test for some options
(if (opt != NULL && opt->val >= const) vs if ((optvalid & flag) !=0 )

To keep the struct the same size in 64-bit kernels, and to keep
the integer values (like ip6po_hlim, ip6po_tclass, etc) on the
same cacheline, I moved them to the top.

As suggested by zlei, the null check in MAKE_EXTHDR() becomes
redundant, and can be removed.

For our web server workload (with the ip6po_tclass option set),
this drops the CPI from 2.9 to 2.4 for ip6_output

Differential Revision: https://reviews.freebsd.org/D44204
Reviewed by: bz, glebius, zlei
No Objection from: melifaro
Sponsored by: Netflix Inc.
2024-03-20 15:50:57 -04:00
Gleb Smirnoff 56f7860087 carp: check CARP status in in_localip_fib(), in6_localip_fib()
Don't report a BACKUP CARP address as local.  These two functions are used
only by source address validation for input packets, controlled by sysctls
net.inet.ip.source_address_validation and
net.inet6.ip6.source_address_validation.  For this purpose we definitely
want to treat BACKUP addresses as non local.

This change is conservative and doesn't modify compat in_localip() and
in6_localip().  They are used more widely than the FIB-aware versions.
The change would modify the notion of ipfw(4) 'me' keyword.  There might
be other consequences as in_localip() is used by various tunneling
protocols.

PR:			277349
2024-03-19 11:48:59 -07:00
Gleb Smirnoff ce69e37369 Revert "sockets: retire sorflush()"
Provide a comment in sorflush() why the socket I/O sx(9) lock is actually
important.

This reverts commit 507f87a799.
2024-02-03 13:08:41 -08:00
Kristof Provost ffeab76b68 pfil: PFIL_PASS never frees the mbuf
pfil hooks (i.e. firewalls) may pass, modify or free the mbuf passed
to them. (E.g. when rejecting a packet, or when gathering up packets
for reassembly).

If the hook returns PFIL_PASS the mbuf must still be present. Assert
this in pfil_mem_common() and ensure that ipfilter follows this
convention. pf and ipfw already did.
Similarly, if the hook returns PFIL_DROPPED or PFIL_CONSUMED the mbuf
must have been freed (or now be owned by the firewall for further
processing, like packet scheduling or reassembly).

This allows us to remove a few extraneous NULL checks.

Suggested by:	tuexen
Reviewed by:	tuexen, zlei
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43617
2024-01-29 14:10:19 +01:00
Mark Johnston bbf86c65d0 netinet: Remove stale references to Giant from comments
MFC after:	1 week
2024-01-27 13:51:13 -05:00
Gordon Bergling 496432f192 netinet6: Fix two typos in source code comments
- s/adddress/address/

MFC after:	3 days
2024-01-22 21:48:34 +01:00
Xavier Beaudouin 80044c785c Add UDP encapsulation of ESP in IPv6
This patch provides UDP encapsulation of ESP packets over IPv6.
Ports the IPv4 code to IPv6 and adds support for IPv6 in udpencap.c
As required by the RFC and unlike in IPv4 encapsulation,
UDP checksums are calculated.

Co-authored-by:	Aurelien Cazuc <aurelien.cazuc.external@stormshield.eu>
Sponsored-by:	Stormshield
Sponsored-by:	Wiktel
Sponsored-by:	Klara, Inc.
2024-01-16 20:44:34 +00:00
Gleb Smirnoff 507f87a799 sockets: retire sorflush()
With removal of dom_dispose method the function boils down to two
meaningful function calls: socantrcvmore() and sbrelease().  The latter is
only relevant for protocols that use generic socket buffers.

The socket I/O sx(9) lock acquisition in sorflush() is not relevant for
shutdown(2) operation as it doesn't do any I/O that may interleave with
read(2) or write(2).  The socket buffer mutex acquisition inside
sbrelease() is what guarantees thread safety.  This sx(9) acquisition in
soshutdown() can be tracked down to 4.4BSD times, where it used to be
sblock(), and it was carried over through the years evolving together with
sockets with no reconsideration of why do we carry it over.  I can't tell
if that sblock() made sense back then, but it doesn't make any today.

Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D43415
2024-01-16 10:30:49 -08:00
Gleb Smirnoff 5bba272807 sockets: make pr_shutdown fully protocol specific method
Disassemble a one-for-all soshutdown() into protocol specific methods.
This creates a small amount of copy & paste, but makes code a lot more
self documented, as protocol specific method would execute only the code
that is relevant to that protocol and nothing else.  This also fixes a
couple recent regressions and reduces risk of future regressions.  The
extended KPI for the new pr_shutdown removes need for the extra pr_flush
which was added for the sake of SCTP which could not perform its shutdown
properly with the old one.  Particularly for SCTP this change streamlines
a lot of code.

Some notes on why certain parts of code were copied or were not to certain
protocols:
* The (SS_ISCONNECTED | SS_ISCONNECTING | SS_ISDISCONNECTING) check is
  needed only for those protocols that may be connected or disconnected.
* The above reduces into only SS_ISCONNECTED for those protocols that
  always connect instantly.
* The ENOTCONN and continue processing hack is left only for datagram
  protocols.
* The SOLISTENING(so) block is copied to those protocols that listen(2).
* sorflush() on SHUT_RD is copied almost to every protocol, but that
  will be refactored later.
* wakeup(&so->so_timeo) is copied to protocols that can make a non-instant
  connect(2), can SO_LINGER or can accept(2).

There are three protocols (netgraph(4), Bluetooth, SDP) that did not have
pr_shutdown, but old soshutdown() would still perform sorflush() on
SHUT_RD for them and also wakeup(9).  Those protocols partially supported
shutdown(2) returning EOPNOTSUP for SHUT_WR/SHUT_RDWR, now they fully lost
shutdown(2) support.  I'm pretty sure netgraph(4) and Bluetooth are okay
about that and SDP is almost abandoned anyway.

Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D43413
2024-01-16 10:30:37 -08:00
John Baldwin 8cb9b68f58 sys: Use mbufq_empty instead of comparing mbufq_len against 0
Reviewed by:	bz, emaste
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D43338
2024-01-09 11:00:46 -08:00
Mark Johnston 8d01ecd8e9 frag6: Add another use of frag6_rmqueue()
No functional change intended.

Reviewed by:	kp, bz
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D43256
2024-01-04 08:39:52 -05:00
Mark Johnston 0736a38072 frag6: Reduce code duplication
The code which removes a fragment queue from the per-VNET hash table was
duplicated three times.  Factor it out into a function.  No functional
change intended.

Reviewed by:	kp, bz
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D43228
2023-12-31 11:15:48 -05:00
Mark Johnston f12a9a4c04 frag6: Drop unneeded casts from malloc calls
No functional change intended.

MFC after:	1 week
2023-12-31 11:15:22 -05:00
Gleb Smirnoff a13039e270 inpcb: reoder inpcb destruction
First, merge in_pcbdetach() with in_pcbfree().  The comment for
in_pcbdetach() was no longer correct.  Then, make sure we remove
the inpcb from the hash before we commit any destructive actions
on it.  There are couple functions that rely on the hash lock
skipping SMR + inpcb lock to lookup an inpcb.  Although there are
no known functions that similarly rely on the global inpcb list
lock, also do list removal before destructive actions.

PR:			273890
Reviewed by:		markj
Differential Revision:	https://reviews.freebsd.org/D43122
2023-12-27 08:34:37 -08:00
Gleb Smirnoff 0fac350c54 sockets: don't malloc/free sockaddr memory on getpeername/getsockname
Just like it was done for accept(2) in cfb1e92912, use same approach
for two simplier syscalls that return socket addresses.  Although,
these two syscalls aren't performance critical, this change generalizes
some code between 3 syscalls trimming code size.

Following example of accept(2), provide VNET-aware and INVARIANT-checking
wrappers sopeeraddr() and sosockaddr() around protosw methods.

Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D42694
2023-11-30 08:31:10 -08:00
Gleb Smirnoff cfb1e92912 sockets: don't malloc/free sockaddr memory on accept(2)
Let the accept functions provide stack memory for protocols to fill it in.
Generic code should provide sockaddr_storage, specialized code may provide
smaller structure.

While rewriting accept(2) make 'addrlen' a true in/out parameter, reporting
required length in case if provided length was insufficient.  Our manual
page accept(2) and POSIX don't explicitly require that, but one can read
the text as they do.  Linux also does that. Update tests accordingly.

Reviewed by:		rscheff, tuexen, zlei, dchagin
Differential Revision:	https://reviews.freebsd.org/D42635
2023-11-30 08:30:55 -08:00
Warner Losh fdafd315ad sys: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by:		Netflix
2023-11-26 22:24:00 -07:00
Warner Losh 29363fb446 sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by:		Netflix
2023-11-26 22:23:30 -07:00
Michael Tuexen 03c3a70abe udplite: make socketoption available on IPv6 sockets
This patch allows the IPPROTO_UDPLITE-level socket options
UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV to be used on
AF_INET6 sockets in addition to AF_INET sockets.

Reviewed by:		ae, rscheff
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D42430
2023-11-05 15:28:54 +01:00
Zhenlei Huang 03dac3e379 netinet6: Add sysctl flag CTLFLAG_TUN to loader tunables
The following sysctl variables are actually loader tunables. Add sysctl
flag CTLFLAG_TUN to them so that `sysctl -T` will report them correctly.

 1. net.inet6.ip6.auto_linklocal
 2. net.inet6.ip6.accept_rtadv
 3. net.inet6.ip6.no_radr

No functional change intended.

Reviewed by:	glebius
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D41928
2023-09-25 18:10:47 +08:00
Andrey V. Elsukov 0bf5377b6b Avoid IPv6 source address selection on accepting TCP connections
When an application listens IPv6 TCP socket, due to ipfw
forwarding tag it may handle connections for addresses that do not
belongs to the jail or even current host (transparent proxy).
Syncache code can successfully handle TCP handshake for such connections.
When syncache finally accepts connection it uses in6_pcbconnect() to
properly initlize new connection info.

For IPv4 this scenario just works, but for IPv6 it fails when
local address doesn't belongs to the jail. This check occurs when
in6_pcbladdr() applies IPv6 SAS algorithm.
We need IPv6 SAS when we are connection initiator, but in the above
case connection is already established and both source and destination
addresses are known.

Use unused argument to notify in6_pcbconnect() when we don't need
source address selection. This will fix `ipfw fwd` to jailed IPv6
address.

When we are connection initiator, we stil use IPv6 SAS algorithm and
apply all related restrictions.

MFC after:              1 month
Sponsored by:           Yandex LLC
Differential Revision:  https://reviews.freebsd.org/D41685
2023-09-14 11:39:06 +03:00
Michael Tuexen c3179e6660 sctp: cleanup cdefs.h include 2023-08-18 15:25:34 +02:00
Warner Losh 685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh dfc016587a sys: Remove $FreeBSD$: two-line .c pattern
Remove /^#include\s+<sys/cdefs.h>.*$\n\s+__FBSDID\("\$FreeBSD\$"\);\n/
2023-08-16 11:54:30 -06:00
Warner Losh 71625ec9ad sys: Remove $FreeBSD$: one-line .c comment pattern
Remove /^/[*/]\s*\$FreeBSD\$.*\n/
2023-08-16 11:54:24 -06:00
Warner Losh 2ff63af9b8 sys: Remove $FreeBSD$: one-line .h pattern
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:18 -06:00
Warner Losh 95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Michael Tuexen 9ade2745db sctp: remove duplicate code
No functional change intended.

MFC after:	1 week
2023-08-08 13:05:39 +02:00
Michael Tuexen c7587f7a3f sctp: cleanup
No functional change intended.

MFC after:	1 week
2023-08-08 12:40:51 +02:00
Jonathan T. Looney ff3d1a3f9d frag6: Avoid a possible integer overflow in fragment handling
Reviewed by:	kp, markj, bz
Approved by:	so
Security:	FreeBSD-SA-23:06.ipv6
Security:	CVE-2023-3107
2023-08-01 15:45:41 -04:00
Gleb Smirnoff e3ba0d6add inpcb: do not copy so_options into inp_flags2
Since f71cb9f748 socket stays connnected with inpcb through latter's
lifetime and there is no reason to complicate things and copy these
flags.

Reviewed by:		markj
Differential Revision:	https://reviews.freebsd.org/D41198
2023-07-26 20:35:42 -07:00
Marius Strobl e82d7b2952 gif(4): Revert in{,6}_gif_output() misalignment handling
The code added in c89c8a1029 in order
to compensate possible misalignment caused by prepending the IP4/6
header with an EtherIP one got broken at some point by a rewrite of
gif(4). For better or worse, 8018ac153f
relaxed the alignment of struct ip from 32 bit to 16 bit, though. As
a result, a 16 bit offset of the IPv4 header induced by the addition
of the 16 bit EtherIP one no longer is a problem in the first place.
The alignment of struct ip6_hdr currently is even only 8 bit, making
it even less problematic with regards to possible misalignment.
Thus, remove the code for handling misalignment in in{,6}_gif_output()
altogether again.
While at it, replace the 3 bcopy(9) calls in gif(4) with memcpy(9) as
there's no need to handle overlap here.
2023-07-26 13:14:22 +02:00
Shivank Garg 215bab7924 mac_ipacl: new MAC policy module to limit jail/vnet IP configuration
The mac_ipacl policy module enables fine-grained control over IP address
configuration within VNET jails from the base system.
It allows the root user to define rules governing IP addresses for
jails and their interfaces using the sysctl interface.

Requested by:	multiple
Sponsored by:	Google, Inc. (GSoC 2019)
MFC after:	2 months
Reviewed by:	bz, dch (both earlier versions)
Differential Revision: https://reviews.freebsd.org/D20967
2023-07-26 00:07:57 +00:00
Kristof Provost 9c9a76dc68 mld: always commit state changes on leaving
Resolve a race condition where we'd lose the Solicited-node multicast
group subscription if we assigned the same IPv6 address twice.

PR:		233683
Reviewed by:	ae
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D41124
2023-07-24 16:47:34 +02:00
Konstantin Belousov bc310a95c5 ip output: ensure that mbufs are mapped if ipsec is enabled
Ipsec needs access to packet headers to determine if a policy is
applicable. It seems that typically IP headers are mapped, but the code
is arguably needs to check this before blindly accessing them. Then,
operations like m_unshare() and m_makespace() are not yet ready for
unmapped mbufs.

Ensure that the packet is mapped before calling into IPSEC_OUTPUT().

PR:	272616
Reviewed by:	jhb, markj
Sponsored by:	NVidia networking
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41112
2023-07-21 21:51:13 +03:00
Kristof Provost b8039bf5b3 Fix MINIMAL build
Pre-declare struct ucred, to fix build issues on the MINIMAL config:

In file included from /usr/src/sys/netpfil/pf/pfsync_nv.c:40:
/usr/src/sys/netinet6/ip6_var.h:384:31: error: declaration of 'struct ucred' will not be visible outside of this function [-Werror,-Wvisibility]
        struct ip6_pktopts *, struct ucred *, int);
                                     ^
/usr/src/sys/netinet6/ip6_var.h:408:28: error: declaration of 'struct ucred' will not be visible outside of this function [-Werror,-Wvisibility]
    struct inpcb *, struct ucred *, int, struct in6_addr *, int *);
                           ^
2 errors generated.
2023-07-14 09:18:43 +02:00
Alexander V. Chernikov bb06a80cf6 netinet[6]: make in[6]_control use ucred instead of td.
Reviewed by:	markj, zlei
Differential Revision: https://reviews.freebsd.org/D40793
MFC after:	2 weeks
2023-07-01 06:52:24 +00:00
Andrey V. Elsukov 0cd2d88d8d carp: use nd6log() macro to log debug messages
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2023-06-28 13:27:37 +03:00
Mark Johnston 6775ef4188 netinet6: Implement in6_cksum_partial() using m_apply()
This ensures that in6_cksum_partial() can be applied to unmapped mbufs,
which can happen at least when icmp6_reflect() quotes a packet.

The basic idea is to restructure in6_cksum_partial() to operate on one
mbuf at a time.  If the buffer length is odd or unaligned, an extra
residual byte may be returned, to be incorporated into the checksum when
processing the next buffer.

PR:		268400
Reviewed by:	cy
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D40598
2023-06-23 09:55:43 -04:00