Commit graph

2300 commits

Author SHA1 Message Date
Alexander V. Chernikov bb06a80cf6 netinet[6]: make in[6]_control use ucred instead of td.
Reviewed by:	markj, zlei
Differential Revision: https://reviews.freebsd.org/D40793
MFC after:	2 weeks
2023-07-01 06:52:24 +00:00
Andrey V. Elsukov 0cd2d88d8d carp: use nd6log() macro to log debug messages
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2023-06-28 13:27:37 +03:00
Mark Johnston 6775ef4188 netinet6: Implement in6_cksum_partial() using m_apply()
This ensures that in6_cksum_partial() can be applied to unmapped mbufs,
which can happen at least when icmp6_reflect() quotes a packet.

The basic idea is to restructure in6_cksum_partial() to operate on one
mbuf at a time.  If the buffer length is odd or unaligned, an extra
residual byte may be returned, to be incorporated into the checksum when
processing the next buffer.

PR:		268400
Reviewed by:	cy
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D40598
2023-06-23 09:55:43 -04:00
Alexander V. Chernikov e32221a15f netinet6: make IPv6 fragment TTL per-VNET configurable.
Having it configurable adds more flexibility, especially
 for the systems with low amount of memory.
Additionally, it allows to speedup frag6/ tests execution.

Reviewed by:	kp, markj, bz
Differential Revision:	https://reviews.freebsd.org/D35755
MFC after:	2 weeks
2023-06-01 12:04:49 +00:00
Alexander V. Chernikov a77facd273 ifnet: consistently call hooks when the interface gets up.
Some context on the current IPv6 interface setup & address management:

There are two data path for IPv6 initialisation in context of assigning
 LL addresses:
1) Userland explicitly requests IFF_UP for the interface w/o any addresses.
if_up() then calls in6_if_up(), which calls in6_ifattach().
The latter sets up some initial ND/IN6 state and disables IPv6 for the
interface if it’s not loopback. If the interface is loopback, then it
adds ::1/128 and LL addresses via in6_ifattach_loopback().
Then, devd notification is generated (if the VNET is the default one),
which triggers rc.network ifconfig_up(), causing ifdisabled to be removed
via SIOCSIFINFO_IN6 from ifconfig. The kernel SIOCSIFINFO_IN6 handler
calls in6_if_up() once again and it assigns the interface link-local address.

2) Userland adds IPv4 or IPv6 address to the interface. SIOCAIFADDR[_IN6]
kernel handler calls IPv4/IPv6 protocol handler to add the address.
Both then call if_ioctl() with SIOCSIFADDR. Ethernet/loopback ioctl handlers
silently sets IFF_UP for the interface. Finally, if.c:ifioctl() wrapper code
compares old and new interface flags and, if IFF_UP is added, it explicitly
calls in6_if_up(), which adds link-local address if either the original
address is IPv6 or the interface is loopback.

In the latter case, “formal” interface-up notifications are missing.
The kernel does not trigger event handler event, does not call carp hook
and does not provide any userland notification.

This diff unifies the event handling in both scenarios, providing the
necessary notifications to the kernel and userland.

Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D40332
MFC after:	2 weeks
2023-06-01 11:44:19 +00:00
Doug Rabson 5ab151574c netinet*: Fix redirects for connections from localhost
Redirect rules use PFIL_IN and PFIL_OUT events to allow packet filter
rules to change the destination address and port for a connection.
Typically, the rule triggers on an input event when a packet is received
by a router and the destination address and/or port is changed to
implement the redirect. When a reply packet on this connection is output
to the network, the rule triggers again, reversing the modification.

When the connection is initiated on the same host as the packet filter,
it is initially output via lo0 which queues it for input processing.
This causes an input event on the lo0 interface, allowing redirect
processing to rewrite the destination and create state for the
connection. However, when the reply is received, no corresponding output
event is generated; instead, the packet is delivered to the higher level
protocol (e.g. tcp or udp) without reversing the redirect, the reply is
not matched to the connection and the packet is dropped (for tcp, a
connection reset is also sent).

This commit fixes the problem by adding a second packet filter call in
the input path. The second call happens right before the handoff to
higher level processing and provides the missing output event to allow
the redirect's reply processing to perform its rewrite. This extra
processing is disabled by default and can be enabled using pfilctl:

	pfilctl link -o pf:default-out inet-local
	pfilctl link -o pf:default-out6 inet6-local

PR:		268717
Reviewed-by:	kp, melifaro
MFC-after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D40256
2023-05-31 11:11:05 +01:00
Mark Johnston a306ed50ec inpcb: Restore missing validation of local addresses for jailed sockets
When looking up a listening socket, the SMR-protected lookup routine may
return a jailed socket with no local address.  This happens when using
classic jails with more than one IP address; in a single-IP classic
jail, a bound socket's local address is always rewritten to be that of
the jail.

After commit 7b92493ab1, the lookup path failed to check whether the
jail corresponding to a matched wildcard socket actually owns the
address, and would return the match regardless.  Restore the omitted
checks.

Fixes:		7b92493ab1 ("inpcb: Avoid inp_cred dereferences in SMR-protected lookup")
Reported by:	peter
Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D40268
2023-05-30 15:15:48 -04:00
Alexander V. Chernikov b50e1465e8 routing: plug mbuf leak for the packets hitting IPv6 blackhole route
Reported by:	Dmitriy Smirnov <fox@sage.su>
Tested by:	Dmitriy Smirnov <fox@sage.su>
MFC after:	1 day
2023-05-17 09:06:04 +00:00
Warner Losh 4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Ed Maste b73183d1a2 ipv6: disable RFC 4620 nodeinfo by default
RFC 4620 is an experimental RFC that can be used to request information
about a host, including:

- the fully-qualified or single-component name
- some set of the Responder's IPv6 unicast addresses
- some set of the Responder's IPv4 unicast addresses

This is not something that should be made available by default.

PR:		257709
Submitted by:	ruben@verweg.com
Reviewed by:	melifaro
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39778
2023-04-26 13:47:59 -04:00
Mark Johnston 7b92493ab1 inpcb: Avoid inp_cred dereferences in SMR-protected lookup
The SMR-protected inpcb lookup algorithm currently has to check whether
a matching inpcb belongs to a jail, in order to prioritize jailed
bound sockets.  To do this it has to maintain a ucred reference, and for
this to be safe, the reference can't be released until the UMA
destructor is called, and this will not happen within any bounded time
period.

Changing SMR to periodically recycle garbage is not trivial.  Instead,
let's implement SMR-synchronized lookup without needing to dereference
inp_cred.  This will allow the inpcb code to free the inp_cred reference
immediately when a PCB is freed, ensuring that ucred (and thus jail)
references are released promptly.

Commit 220d892129 ("inpcb: immediately return matching pcb on lookup")
gets us part of the way there.  This patch goes further to handle
lookups of unconnected sockets.  Here, the strategy is to maintain a
well-defined order of items within a hash chain so that a wild lookup
can simply return the first match and preserve existing semantics.  This
makes insertion of listening sockets more complicated in order to make
lookup simpler, which seems like the right tradeoff anyway given that
bind() is already a fairly expensive operation and lookups are more
common.

In particular, when inserting an unconnected socket, in_pcbinhash() now
keeps the following ordering:
- jailed sockets before non-jailed sockets,
- specified local addresses before unspecified local addresses.

Most of the change adds a separate SMR-based lookup path for inpcb hash
lookups.  When a match is found, we try to lock the inpcb and
re-validate its connection info.  In the common case, this works well
and we can simply return the inpcb.  If this fails, typically because
something is concurrently modifying the inpcb, we go to the slow path,
which performs a serialized lookup.

Note, I did not touch lbgroup lookup, since there the credential
reference is formally synchronized by net_epoch, not SMR.  In
particular, lbgroups are rarely allocated or freed.

I think it is possible to simplify in_pcblookup_hash_wild_locked() now,
but I didn't do it in this patch.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38572
2023-04-20 12:13:06 -04:00
Mark Johnston 3e98dcb3d5 inpcb: Move inpcb matching logic into separate functions
These functions will get some additional callers in future revisions.

No functional change intended.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38571
2023-04-20 12:13:06 -04:00
Mark Johnston fdb987bebd inpcb: Split PCB hash tables
Currently we use a single hash table per PCB database for connected and
bound PCBs.  Since we started using net_epoch to synchronize hash table
lookups, there's been a bug, noted in a comment above in_pcbrehash():
connecting a socket can cause an inpcb to move between hash chains, and
this can cause a concurrent lookup to follow the wrong linkage pointers.
I believe this could cause rare, spurious ECONNREFUSED errors in the
worse case.

Address the problem by introducing a second hash table and adding more
linkage pointers to struct inpcb.  Now the database has one table each
for connected and unconnected sockets.

When inserting an inpcb into the hash table, in_pcbinhash() now looks at
the foreign address of the inpcb to figure out which table to use.  This
ensures that queue linkage pointers are stable until the socket is
disconnected, so the problem described above goes away.  There is also a
small benefit in that in_pcblookup_*() can now search just one of the
two possible hash buckets.

I also made the "rehash" parameter of in(6)_pcbconnect() unused.  This
parameter seems confusing and it is simpler to let the inpcb code figure
out what to do using the existing INP_INHASHLIST flag.

UDP sockets pose a special problem since they can be connected and
disconnected multiple times during their lifecycle.  To handle this, the
patch plugs a hole in the inpcb structure and uses it to store an SMR
sequence number.  When an inpcb is disconnected - an operation which
requires the global PCB database hash lock - the write sequence number
is advanced, and in order to reconnect, the connecting thread must wait
for readers to drain before reusing the inpcb's hash chain linkage
pointers.

raw_ip (ab)uses the hash table without using the corresponding
accessors.  Since there are now two hash tables, it arbitrarily uses the
"connected" table for all of its PCBs.  This will be addressed in some
way in the future.

inp interators which specify a hash bucket will only visit connected
PCBs.  This is not really correct, but nothing in the tree uses that
functionality except raw_ip, which as mentioned above places all of its
PCBs in the "connected" table and so is unaffected.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38569
2023-04-20 12:13:06 -04:00
Mateusz Guzik f5a365e51f inet6: protect address manipulation with a lock
This is a total hack/bare minimum which follows inet4.

Otherwise 2 threads removing the same address can easily crash.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39317
2023-03-30 08:46:38 +00:00
Justin Hibbits bb55bb1740 inet6: Include if_private.h in one more netstack file
ip6_input() and ip6_destroy() both directly reference ifnet members.
This file was missed in 3d0d5b21

Fixes:		3d0d5b21 ("IfAPI: Explicitly include <net/if_private.h>...")
Sponsored by:	Juniper Networks, Inc.
2023-03-24 10:25:35 -04:00
Kristof Provost b52b61c0b6 pf: distinguish forwarding and output cases for pf_refragment6()
Re-introduce PFIL_FWD, because pf's pf_refragment6() needs to know if
we're ip6_forward()-ing or ip6_output()-ing.

ip6_forward() relies on m->m_pkthdr.rcvif, at least for link-local
traffic (for in6_get_unicast_scopeid()). rcvif is not set for locally
generated traffic (e.g. from icmp6_reflect()), so we need to call the
correct output function.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revisi:	https://reviews.freebsd.org/D39061
2023-03-16 10:59:04 +01:00
Mina Galić 0b0ae2e4cd jail: convert several functions from int to bool
these functions exclusively return (0) and (1), so convert them to bool

We also convert some networking related jail functions from int to bool
some of which were returning an error that was never used.

Differential Revision: https://reviews.freebsd.org/D29659
Reviewed by: imp, jamie (earlier version)
Pull Request: https://github.com/freebsd/freebsd-src/pull/663
2023-03-14 21:05:33 -06:00
Mark Johnston e9ea690ae8 udp: Fix a memory leak in udp6_send()
Reviewed by:	glebius
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D38993
2023-03-14 11:58:02 -04:00
Pawel Biernacki 35b6e52c30 net.inet6.ip6.log_interval: use ppsratecheck(9) internally
Reported by:	mjg
Differential Revision:	https://reviews.freebsd.org/D38758
2023-03-13 16:47:06 +00:00
Pawel Biernacki 3eaffc6265 netinet6: allow disabling excess log messages
RFC 4443 specifies cases where certain packets, like those originating from
local-scope addresses destined outside of the scope shouldn't be forwarded.
The current practice is to drop them, send ICMPv6 message where appropriate,
and log the message:

cannot forward src fe80:10::426:82ff:fe36:1d8, dst 2001:db8:db8::10, nxt
58, rcvif vlan5, outif vlan2

At times the volume of such messages cat get very high. Let's allow local
admins to disable such messages on per vnet basis, keeping the current
default (log).

Reported by:	zarychtam@plan-b.pwste.edu.pl
Reviewed by:	zlei (previous version), pauamma (docs)
Differential Revision:	https://reviews.freebsd.org/D38644
2023-03-13 16:46:21 +00:00
Mark Johnston aa71d6b4a2 netinet: Disallow unspecified addresses in ICMP-embedded packets
Reported by:	glebius
Reported by:	syzbot+981c528ccb5c5534dffc@syzkaller.appspotmail.com
Reviewed by:	tuexen, glebius
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D38936
2023-03-13 10:45:56 -04:00
Mark Johnston 713264f6b8 netinet: Tighten checks for unspecified source addresses
The assertions added in commit b0ccf53f24 ("inpcb: Assert against
wildcard addrs in in_pcblookup_hash_locked()") revealed that protocol
layers may pass the unspecified address to in_pcblookup().

Add some checks to filter out such packets before we attempt an inpcb
lookup:
- Disallow the use of an unspecified source address in in_pcbladdr() and
  in6_pcbladdr().
- Disallow IP packets with an unspecified destination address.
- Disallow TCP packets with an unspecified source address, and add an
  assertion to verify the comment claiming that the case of an
  unspecified destination address is handled by the IP layer.

Reported by:	syzbot+9ca890fb84e984e82df2@syzkaller.appspotmail.com
Reported by:	syzbot+ae873c71d3c71d5f41cb@syzkaller.appspotmail.com
Reported by:	syzbot+e3e689aba1d442905067@syzkaller.appspotmail.com
Reviewed by:	glebius, melifaro
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38570
2023-03-06 15:06:00 -05:00
Mark Johnston 317fa5169d netinet: Remove the IP(V6)_RSS_LISTEN_BUCKET socket option
It has no effect, and an exp-run revealed that it is not in use.

PR:		261398 (exp-run)
Reviewed by:	mjg, glebius
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38822
2023-02-28 15:57:21 -05:00
Mark Johnston 3aff4ccdd7 netinet: Remove IP(V6)_BINDMULTI
This option was added in commit 0a100a6f1e but was never completed.
In particular, there is no logic to map flowids to different listening
sockets, so it accomplishes basically the same thing as SO_REUSEPORT.
Meanwhile, we've since added SO_REUSEPORT_LB, which at least tries to
balance among listening sockets using a hash of the 4-tuple and some
optional NUMA policy.

The option was never documented or completed, and an exp-run revealed
nothing using it in the ports tree.  Moreover, it complicates the
already very complicated in_pcbbind_setup(), and the checking in
in_pcbbind_check_bindmulti() is insufficient.  So, let's remove it.

PR:		261398 (exp-run)
Reviewed by:	glebius
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38574
2023-02-27 10:03:11 -05:00
Gleb Smirnoff 96871af013 inpcb: use family specific sockaddr argument for bind functions
Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the
protocol's pr_bind method and from there on go down the call
stack with family specific argument.

Reviewed by:		zlei, melifaro, markj
Differential Revision:	https://reviews.freebsd.org/D38601
2023-02-15 10:30:16 -08:00
Mark Johnston 4130ea611f inpcb: Split in_pcblookup_hash_locked() and clean up a bit
Split the in_pcblookup_hash_locked() function into several independent
subroutine calls, each of which does some kind of hash table lookup.
This refactoring makes it easier to introduce variants of the lookup
algorithm that behave differently depending on whether they are
synchronized by SMR or the PCB database hash lock.

While here, do some related cleanup:
- Remove an unused ifnet parameter from internal functions.  Keep it in
  external functions so that it can be used in the future to derive a v6
  scopeid.
- Reorder the parameters to in_pcblookup_lbgroup() to be consistent with
  the other lookup functions.
- Remove an always-true check from in_pcblookup_lbgroup(): we can assume
  that we're performing a wildcard match.

No functional change intended.

Reviewed by:	glebius
Differential Revision:	https://reviews.freebsd.org/D38364
2023-02-09 16:15:03 -05:00
Gleb Smirnoff 220d892129 inpcb: immediately return matching pcb on lookup
This saves a lot of CPU cycles if you got large connection table.

The code removed originates from 413628a7e3, a very large changeset.
Discussed that with Bjoern, Jamie we can't recover why would we ever
have identical 4-tuples in the hash, even in the presence of jails.
Bjoern did a test that confirms that it is impossible to allocate an
identical connection from a jail to a host. Code review also confirms
that system shouldn't allow for such connections to exist.

With a lack of proper test suite we decided to take a risk and go
forward with removing that code.

Reviewed by:		gallatin, bz, markj
Differential Revision:	https://reviews.freebsd.org/D38015
2023-02-07 09:21:52 -08:00
Gleb Smirnoff a9d22cce10 inpcb: use family specific sockaddr argument for connect functions
Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the
protocol's pr_connect method and from there on go down the call
stack with family specific argument.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D38356
2023-02-03 11:33:36 -08:00
Gleb Smirnoff 3d76be28ec netinet6: require network epoch for in6_pcbconnect()
This removes recursive epoch entry in the syncache case.  Fixes
unprotected access to V_in6_ifaddrhead in in6_pcbladdr(), as
well as access to prison IP address lists. It also matches what
IPv4 in_pcbconnect() does.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D38355
2023-02-03 11:33:36 -08:00
Gleb Smirnoff 221b9e3d06 inpcb: merge two versions of in6_pcbconnect() into one
No functional change.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D38354
2023-02-03 11:33:35 -08:00
Mark Johnston 2589ec0f36 pcb: Move an assignment into in_pcbdisconnect()
All callers of in_pcbdisconnect() clear the local address, so let's just
do that in the function itself.

Note that the inp's local address is not a parameter to the inp hash
functions.  No functional change intended.

Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38362
2023-02-03 11:48:25 -05:00
Mark Johnston b0ccf53f24 inpcb: Assert against wildcard addrs in in_pcblookup_hash_locked()
No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38361
2023-02-03 11:48:25 -05:00
Mark Johnston 675e2618ae inpcb: Deduplicate some assertions
It makes more sense to check lookupflags in the function which actually
uses SMR.  No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38359
2023-02-03 11:48:25 -05:00
Justin Hibbits 3d0d5b21c9 IfAPI: Explicitly include <net/if_private.h> in netstack
Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header.  <net/if_var.h> will stop including the
header in the future.

Sponsored by:	Juniper Networks, Inc.
Reviewed by:	glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200
2023-01-31 15:02:16 -05:00
Justin Hibbits 361ac40b0f IfAPI: Hide the in6m_lookup_locked() implementation.
Summary:
in6m_lookup_locked() iterates over the ifnet's multiaddrs list.  Keep
this implementation detail private, by moving the implementation to the
netstack source from the header.

Sponsored by:	Juniper Networks, Inc.
Reviewed by:	glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38201
2023-01-31 15:02:14 -05:00
Gleb Smirnoff 5c67f7c43e udp: don't forget to initialize udpcb for UDPv6
Reported by:	tuexen
Fixes:		483fe96511
2023-01-26 10:16:32 -08:00
Alexander V. Chernikov 30dd227cff netinet6: honor blackhole/unreach routes in the non-fastforwading code.
Currently, under the conditions specified below, IPv6 ingress packet
 processing can ignore blackhole/reject flag on the prefix. The packet
 will instead be looped locally till TTL expiration and a single ICMPv6
 unreachable message will be send to the source even in case of
 RTF_BLACKHOLE.
The following conditions needs hold to make the scenario happen:
* IPv6 forwarding is enabled
* Packet is not fast-forwarded
* Destination prefix has either RTF_BLACKHOLE or RTF_REJECT flag
Fix this behavior by checking for the blackhole/reject flags in
ip6_forward().

Reported by:	Dmitriy Smirnov <fox@sage.su>
Reviewed by:	ae
Differential Revision: https://reviews.freebsd.org/D38164
MFC after:	3 days
2023-01-22 18:48:07 +00:00
Gordon Bergling fa7de6dcb9 ip_gre: Fix a common typo in source code comments
- s/addres/address/

MFC after:	3 days
2023-01-19 14:13:02 +01:00
Alexander V. Chernikov 6468b6b23e nd6: fix panic in lltable_drop_entry_queue()
nd6_resolve_slow() can be called without mbuf. If the LLE entry
 is not reachable, nd6_resolve_slow() will add this NULL mbuf to
 the holdchain via lltable_append_entry_queue, which will "append"
 NULL to the end of the queue (effectively no-op) and bump la_numhold
 value. When this entry gets freed, the kernel will panic due to the
 inconsistency between the amount of mbufs in the queue and the value
 of la_numhold.

Fix the panic by checking of mbuf is not NULL prior to inserting it
 into the holdchain.

Reported by:	kib
MFC after:	3 days
2023-01-15 15:22:42 +00:00
Justin Hibbits 5674838159 inet6: Fix LINT build
mli_delete_locked() is the only function that takes a const ifnet.
Since it's a static function there's no advantage to keeping it const.
Since `if_t` is not a const struct (currently) the compiler throws an
error passing the ifp around to ifnet functions.

Fixes:		eb1da3e525
Sponsored by:	Juniper Networks, Inc.
2022-12-20 15:23:49 -05:00
Gleb Smirnoff 3f89900bf1 udp6: fix build with INET6 and without INVARIANTS
Reported by:	Michael Butler <imb protected-networks.net>
Fixes:		483fe96511
2022-12-07 12:27:15 -08:00
Gleb Smirnoff 1aed3b3430 udp: add protocol method declarations to udp_var.h
They are shared between UDP over IPv4 and over IPv6.  To prevent all
possible kernel build failures wrap them in #ifdef _SYS_PROTOSW_H_.
Prompted by feedback from jhb@ and jrtc27@ on c93db4abf4.
2022-12-07 11:51:49 -08:00
Gleb Smirnoff 5bfc014f23 udp6: inline udp6_output() into udp6_send() 2022-12-07 11:51:48 -08:00
Gleb Smirnoff 483fe96511 udp: embed inpcb into udpcb
See similar change to TCP e68b379244 for more context.  For UDP the
change is much simplier, though.
2022-12-07 11:51:42 -08:00
Gleb Smirnoff e68b379244 tcp: embed inpcb into tcpcb
For the TCP protocol inpcb storage specify allocation size that would
provide space to most of the data a TCP connection needs, embedding
into struct tcpcb several structures, that previously were allocated
separately.

The most import one is the inpcb itself.  With embedding we can provide
strong guarantee that with a valid TCP inpcb the tcpcb is always valid
and vice versa.  Also we reduce number of allocs/frees per connection.
The embedded inpcb is placed in the beginning of the struct tcpcb,
since in_pcballoc() requires that.  However, later we may want to move
it around for cache line efficiency, and this can be done with a little
effort.  The new intotcpcb() macro is ready for such move.

The congestion algorithm data, the TCP timers and osd(9) data are
also embedded into tcpcb, and temprorary struct tcpcb_mem goes away.
There was no extra allocation here, but we went through extra pointer
every time we accessed this data.

One interesting side effect is that now TCP data is allocated from
SMR-protected zone.  Potentially this allows the TCP stacks or other
TCP related modules to utilize that for their own synchronization.

Large part of the change was done with sed script:

s/tp->ccv->/tp->t_ccv./g
s/tp->ccv/\&tp->t_ccv/g
s/tp->cc_algo/tp->t_cc/g
s/tp->t_timers->tt_/tp->tt_/g
s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g

Dependency side effect is that code that needs to know struct tcpcb
should also know struct inpcb, that added several <netinet/in_pcb.h>.

Differential revision:	https://reviews.freebsd.org/D37127
2022-12-07 09:00:48 -08:00
John Baldwin d00c20882f udp[6]_multi_input: Don't unlock freed inp.
If udp[6]_append() returns non-zero, it is because the inp has gone
away (inpcbrele_rlocked returned 1 after running the tunnel function).

Reviewed by:	ae
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37511
2022-11-30 14:38:51 -08:00
Michael Tuexen f83db6441a sctp: minor changes due to upstreaming of Glebs recent changes 2022-11-06 23:06:40 +01:00
Mark Johnston d93ec8cb13 inpcb: Allow SO_REUSEPORT_LB to be used in jails
Currently SO_REUSEPORT_LB silently does nothing when set by a jailed
process.  It is trivial to support this option in VNET jails, but it's
also useful in traditional jails.

This patch enables LB groups in jails with the following semantics:
- all PCBs in a group must belong to the same jail,
- PCB lookup prefers jailed groups to non-jailed groups

This is a straightforward extension of the semantics used for individual
listening sockets.  One pre-existing quirk of the lbgroup implementation
is that non-jailed lbgroups are searched before jailed listening
sockets; that is preserved with this change.

Discussed with:	glebius
MFC after:	1 month
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37029
2022-11-02 13:46:24 -04:00
Mark Johnston 0d5d356b36 in6: Consolidate IN6_ARE_ADDR_EQUAL definitions
It is ok to use memcmp() in the kernel.  No functional change intended.

Reviewed by:	glebius, melifaro
MFC after:	1 week
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37028
2022-11-02 13:46:24 -04:00
Mark Johnston ac1750dd14 inpcb: Remove NULL checks of credential references
Some auditing of the code shows that "cred" is never non-NULL in these
functions, either because all callers pass a non-NULL reference or
because they unconditionally dereference "cred".  So, let's simplify the
code a bit and remove NULL checks.  No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37025
2022-11-02 13:46:24 -04:00