Commit graph

5301 commits

Author SHA1 Message Date
Mark Johnston 02cbf9ebf1 lagg: Fix a teardown race
When a lagg interface is destroyed, it destroys all of the lagg ports,
which triggers an asynchronous link state change handler.  This in turn
may generate a netlink message, a portion of which requires netlink to
invoke the SIOCGIFMEDIA ioctl of the lagg interface, which involves
scanning the list of interface media.  This list is not internally
locked, it requires the interface driver to provide some kind of
synchronization.

Shortly after the link state notification has been raised, the lagg
interface detaches itself from the network stack.  As a part of this, it
blocks in order to wait for link state handlers to drain, but before
that it destroys the interface media list.  Reverse this order of
operations so that the link state change handlers drain first, avoiding
a use-after-free that is very occasionally triggered by lagg stress
tests.  This matches other ethernet drivers in the tree.

MFC after:	2 weeks
2024-06-24 10:47:29 -04:00
Mark Johnston 66b8cac8d8 pf: Sprinkle const qualifiers in state lookup routines
State keys are trivially const in lookup routines, so annotate them as
such.  No functional change intended.

Reviewed by:	kp
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum
Differential Revision:	https://reviews.freebsd.org/D45671
2024-06-24 10:46:55 -04:00
Zhenlei Huang 71f8fbf9bd ifnet: Use NET_EPOCH_WAIT() macro
This makes it easier to grep the usage.

Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45715
2024-06-24 17:57:14 +08:00
Mateusz Guzik b6196537b0 pf: fix the "keepcounters" to stop truncating to 32-bit
The machinery to support 64-bit counters even on 32-bit kernels had a
bug where it would unitentionally truncate the value back to 32-bits
when transferring to a new counter. This resulted in buggy be behavior
on 64-bit kernels as well.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-06-20 17:55:43 +00:00
Kristof Provost ba2a920786 pf: convert DIOCBEGINADDRS to netlink 2024-06-08 04:46:43 +02:00
Kristof Provost d9ab899931 pf: migrate DIOCGETLIMIT/DIOCSETLIMIT to netlink
Event:		Kitchener-Waterloo Hackathon 202406
2024-06-07 20:59:02 +02:00
Zhenlei Huang 0dfd11abc4 bpf: Make bpf_peers_present a boolean inline function
This function was introduced in commit [1] and is actually used as a
boolean function although it was not defined as so.

No functional change intended.

1. 16d878cc99 Fix the following bpf(4) race condition which can result in a panic

Reviewed by:	markj, kp, #network
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45509
2024-06-07 23:06:08 +08:00
Zhenlei Huang 89204d9dcb bpf: Prefer the boolean form when calling bpf_peers_present()
No functional change intended.

Reviewed by:	markj, kp, #network
MFC with:	8f31b879ec
Differential Revision:	https://reviews.freebsd.org/D45509
2024-06-07 23:06:07 +08:00
Kristof Provost 30bad751e8 pf: convert DIOCGETTIMEOUT/DIOCSETTIMEOUT to netlink 2024-06-06 20:46:18 +02:00
Zhenlei Huang 215a18d502 if_enc(4): Prefer the boolean form when calling bpf_peers_present()
No functional change intended.

MFC after:	1 week
2024-06-06 12:20:26 +08:00
Kristof Provost 9dbbe68bc5 pf: convert DIOCCLRSTATUS to netlink
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-06-04 14:59:58 +02:00
Kristof Provost 6ee3e37682 pf: fix incorrect anchor_call to userspace
777a4702c changed how we copy out the anchor_call string, and
incorrectly limited it to 8 (4 on 32-bit systems) bytes. Fix that so we
get the full anchor path, rather than just the first few characters.

PR:		279225
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-05-28 22:27:22 +02:00
Kristof Provost bdd12889ea if_vlan: handle VID conflicts
If we fail to change the vlan id we have to undo the removal (and vlan id
change) in the error path. Otherwise we'll have removed the vlan object from the
hash table, and have the wrong vlan id as well. Subsequent modification attempts
will then try to remove an entry which doesn't exist, and panic.

Undo the vlan id modification if the insertion in the hash table fails, and
re-insert it under the original vlan id.

PR:		279195
Reviewed by:	zlei
MFC atfer:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D45285
2024-05-22 09:08:02 +02:00
Zhenlei Huang 93fbfef0b5 if_vxlan(4): Add checking for loops and nesting of tunnels
User misconfiguration, either tunnel loops, or a large number of
different nested tunnels, can overflow the kernel stack. Prevent that
by using if_tunnel_check_nesting().

PR:		278394
Diagnosed by:	markj
Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45197
2024-05-20 20:14:07 +08:00
Kristof Provost 59a6666ec9 if_ovpn: cope with loops
User misconfiguration may lead to routing loops where we try to send the tunnel
packet into the tunnel. This eventually leads to stack overflows and panics.

Avoid this using if_tunnel_check_nesting(), which will drop the packet if we're
looping or we hit three layers of nested tunnels.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-05-13 12:11:06 +02:00
Gleb Smirnoff fadbb6f85a lagg: remove use of net epoch in the ioctl paths
Rely on LAGG_SLOCK() instead.  The use of network epoch(9) here was added
in 6573d7580b (later tidied by 87bf9b9cbe) as a large sweep that
blindly substituted blocking kernel primitives with epoch(9).  In these
particular code paths use of epoch(9) is incorrect and doesn't provide any
protection against a stale pointer.  Recent fix 48698ead6f, which should
actually have removed the epoch use, created a potential sleeping in epoch
problem.
2024-05-06 15:27:32 -07:00
Gleb Smirnoff 570685971c lagg: propagate up/down to the children
Based on the old submission from asomers@.  With modern state of locking
in lagg(4), the patch got much simplier.  Enable the test that was
waiting for this change.

PR:			226144
Reviewed by:		asomers
Differential Revision:	https://reviews.freebsd.org/D44605
2024-05-06 15:27:32 -07:00
Kristof Provost 43387b4e57 if: guard against if_ioctl being NULL
There are situations where an struct ifnet has a NULL if_ioctl pointer.

For example, e6000sw creates such struct ifnets for each of its ports so it can
call into the MII code.

If there is then a link state event this calls do_link_state_change()
-> rtnl_handle_ifevent() -> dump_iface() -> get_operstate() ->
get_operstate_ether(). That wants to know if the link is up or down, so it tries
to ioctl(SIOCGIFMEDIA), which doesn't go well if if_ioctl is NULL.

Guard against this, and return EOPNOTSUPP.

PR:		275920
MFC ater:	3 days
Sponsored by:   Rubicon Communications, LLC ("Netgate")
2024-05-06 11:39:08 +02:00
Zhenlei Huang 73585176ff if_bridge: Minor style fixes
And more comments on the #ifdef INET blocks to improve readability.

While here, revert the order of two prototypes to produce minimal diff
compared to stable branches.

MFC with:	65767e6126
2024-04-26 02:19:11 +08:00
Lexi Winter 65767e6126 sys/net/if_bridge: support non-INET kernels
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1159
2024-04-23 15:13:00 -06:00
Denny Page fcdf9a1989 Support ARP for 802 networks
This is used by 802.3 Ethernet.  (Also be used by 802.4 Token Bus and
802.5 Token Ring, but we don't support those.)

This was accidentally removed along with FDDI support in commit
0437c8e3b1, presumably because comments implied it was used only by
FDDI or Token Ring.

Fixes: 0437c8e3b1 ("Remove support for FDDI networks.")
Reviewed-by: emaste
Signed-off-by: Denny Page <dennypage@me.com>
Pull-request: https://github.com/freebsd/freebsd-src/pull/1166
2024-04-23 12:30:53 -04:00
Lexi Winter 50ecbc5142 libipsec: make const-correct
- add const to the appropriate places in the libipsec public API and the
  relevant internal functions needed to support that.

- replace caddr_t with c_caddr_t in ipsec_dump_policy()

- update the ipsec_dump_policy manpage to use c_caddr_t (this manpage
  was already wrong as it had "char *" instead of caddr_t previously).

While here, update pfkeyv2.h to not cast away const in the PFKEY_*()
macros.

This should not cause any ABI changes as the actual types have not
changed.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1099
2024-04-22 22:36:34 -06:00
Lexi Winter ef84dd8f49 if_bridge: clean up INET/INET6 handling
The if_bridge contains several instances of:

	if (AF_INET code ...
	#ifdef INET6
	    AF_INET6 code ...
	#endif
	) {
		...

Clean this up by adding a couple of macros at the top of the file that
are conditionally defined based on whether INET and/or INET6 are enabled,
which makes the code more readable and easier to maintain.

No functional change intended.

Reviewed by:	zlei, markj
MFC after:	1 week
Pull Request:	https://github.com/freebsd/freebsd-src/pull/1191
2024-04-22 12:01:27 -04:00
Seth Hoffert 2cb0fce24d bpf: Make BPF interop consistent with if_loop
The pseudo_AF_HDRCMPLT check is already being done in if_loop and
just needed to be ported over to if_ic, if_wg, if_disc, if_gif,
if_gre, if_me, if_tuntap and ng_iface.  This is needed in order to
allow these interfaces to work properly with e.g., tcpreplay.

PR:		256587
Reviewed by:	markj
MFC after:	2 weeks
Pull Request:	https://github.com/freebsd/freebsd-src/pull/876
2024-04-19 14:48:37 -04:00
Eric Joyner ed34a6b6ea
iflib: Add subinterface interrupt allocation function
The ice(4) driver will add the ability to create extra interfaces
that hang off of the base interface; to do that the driver requires
a method for the subinterface to request hardware interrupt resources
from the base interface.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D39930
2024-04-18 16:14:02 -07:00
Eric Joyner 3c7da27a47
iflib: Add sysctl to request extra MSIX vectors on driver load
Intended to be used with upcoming feature to add sub-interfaces, since
those new interfaces will be dynamically created and will need to have
spare MSI-X interrupts already allocated for them on driver load.

This sysctl is marked as a tunable since it will need to be set before
the driver is loaded since MSI-X interrupt allocation and setup is
done during the attach process.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D41326
2024-04-18 16:13:47 -07:00
Stephen J. Kiernan e4a0c92e7a iflib: Correct indentation according to style(9)
The indentation style for the SYSCTL_* macros used was not matching KNF.

Reported by:	jhb
Differential Revision:	https://reviews.freebsd.org/D44811
2024-04-16 16:36:25 -04:00
Stephen J. Kiernan 303dea74c2 iflib: Fix compiler warnings
Some of the QUAD sysctls are actually for unsigned quad values.
Switch to using UQUAD instead, as that is meant for unsigned.

Reviewed by:	erj, jhb
Obtained from:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D44620
2024-04-15 10:57:52 -04:00
Zhenlei Huang 6fe4d8395b debugnet: Fix logging of frame length
MFC after:	1 week
2024-04-09 00:47:10 +08:00
Zhenlei Huang e7102929bf ethernet: Fix logging of frame length
Both the mbuf length and the total packet length are signed.

While here, update a stall comment to reflect the current practice.

Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42390
2024-04-09 00:44:33 +08:00
Eugene Grosbein 319a5d086b if_bridge: use IF_MINMTU
Replace incorrect constant 576 with IF_MINMTU to check for minumum MTU.
This unbreaks bridging tap interfaces with small mtu.

MFC after:	1 week
2024-04-01 10:35:59 +07:00
Gleb Smirnoff fa93ba4097 if_tuntap: simplify storage of per-vnet cloners
There is no need for a separate structure neither for a linked list.
Provide each VNET with an array of pointers to if_clone that has the same
size as the driver list.

Reviewed by:		zlei, kevans, kp
Differential Revision:	https://reviews.freebsd.org/D44307
2024-03-29 12:35:41 -07:00
Gleb Smirnoff 2497c70f81 vnet: remove unneeded backslash
Fixes:	430e0e409c
2024-03-15 12:17:04 -07:00
Kristof Provost 706d465dae pf: convert kill/clear state to use netlink
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D44090
2024-02-28 23:26:18 +01:00
Gleb Smirnoff 48698ead6f lagg: wrap lagg_port2req() into LAGG_SLOCK()
Although a port addition is coded in a sequence where first all softc
information is fulfilled and only then it is attached to the lagg, we
still need a locking primitive to guarantee cache invalidation.  Panic
observed in the wild shows that lacp_portreq() called via
lagg_port_ioctl(SIOCGLAGGPORT) immediately after port creation may see
lp->lp_psc as NULL and panic.  In the core file we will see valid data
all around.  A race via lagg_ioctl() wasn't observed but potentially
is possible.

Differential Revision:	https://reviews.freebsd.org/D43501
2024-02-23 17:56:46 -08:00
Tai-hwa Liang 25a5bb7318 net: bandaid for plugging a fw_com leak in fwip_detach()
Adding a temporary workaround for plugging a fw_com upon if_fwip unloading.

Steps to reproduce(needs two hosts connected with firewire):

  while true; do
    ifconfig fwip0 10.0.0.5 up
    fwcontrol -r
    ping -c 10.0.0.3
    kldunload if_fwip
  done

There's a chance that the unloading of if_fwip.ko triggers following warning:

	Warning: memory type fw_com leaked memory on destroy (1 allocations, 64 bytes leaked).

commit d79b6b8ec2 (origin/main, origin/HEAD)
2024-02-15 01:00:49 +00:00
Marius Strobl ed81a15517 fib_algo(4): Lower level of algorithm switching messages to LOG_INFO
Otherwise, with the default flm_debug_level of LOG_NOTICE, it's rather
easy to trigger debug messages such as:
[fib_algo] inet.0 (bsearch4#18) rebuild_fd_flm: switching algo to
radix4_lockless

Also, the "severity" of these events generally only justifies LOG_INFO
and not LOG_NOTICE.

Reviewed by:	melifaro
2024-02-05 23:44:38 +01:00
Gleb Smirnoff ce69e37369 Revert "sockets: retire sorflush()"
Provide a comment in sorflush() why the socket I/O sx(9) lock is actually
important.

This reverts commit 507f87a799.
2024-02-03 13:08:41 -08:00
Kristof Provost 777a4702c5 pf: implement addrule via netlink
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-02-02 17:55:16 +01:00
Kristof Provost ffeab76b68 pfil: PFIL_PASS never frees the mbuf
pfil hooks (i.e. firewalls) may pass, modify or free the mbuf passed
to them. (E.g. when rejecting a packet, or when gathering up packets
for reassembly).

If the hook returns PFIL_PASS the mbuf must still be present. Assert
this in pfil_mem_common() and ensure that ipfilter follows this
convention. pf and ipfw already did.
Similarly, if the hook returns PFIL_DROPPED or PFIL_CONSUMED the mbuf
must have been freed (or now be owned by the firewall for further
processing, like packet scheduling or reassembly).

This allows us to remove a few extraneous NULL checks.

Suggested by:	tuexen
Reviewed by:	tuexen, zlei
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43617
2024-01-29 14:10:19 +01:00
Kristof Provost e95025ed93 pflow: show socket status in verbose mode
Introduce a verbose output mode to pflowctl, and expose the status of
the socket to userspace. This can be helpful in debugging configuration
errors.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-01-25 17:37:51 +01:00
Gordon Bergling ab6d773dbf rtsock: Fix a typo in a source code comment
- s/adddress/address/

MFC after:	3 days
2024-01-22 21:53:21 +01:00
Kristof Provost 63a5fe8343 pflow: limit to no more than 128 flow exporters
While there are no inherent limits to the number of exporters we're
likely to scale rather badly to very large numbers. There's also no
obvious use case for more than a handful. Limit to 128 exporters to
prevent foot-shooting.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-01-22 18:02:10 +01:00
Kristof Provost 54c62e3e5d pf: work around icmp6 packet-too-big not being sent when binat-ing
If we're applying NPTv6 we pass a packet with a modified source and/or
destination address to the network stack.

If that packet then turns out to be larger than the MTU of the sending
interface the stack will attempt to generate an icmp6 packet-too-big
error, but may fail to look up the appropriate source address for that
error message. Even if it does, pf would still have to undo the binat
operation inside the icmp6 packet so the sending host can make sense of
the error.

We can avoid both problems entirely by having pf also perform the MTU
check (taking the potential refragmentation into account), and
generating the icmp6 error directly in pf.

See also:	https://redmine.pfsense.org/issues/14290
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43499
2024-01-22 12:52:14 +01:00
Gordon Bergling b4c94968d1 if_llatbl: Fix a typo in a KASSERT message
- s/entires/entries/

MFC after:	5 days
2024-01-20 21:00:22 +01:00
Gordon Bergling a2fcd3af5c net: Fix two typos in source code comments
- s/strucutres/structures/

MFC after:	3 days
2024-01-20 17:28:12 +01:00
Gleb Smirnoff 507f87a799 sockets: retire sorflush()
With removal of dom_dispose method the function boils down to two
meaningful function calls: socantrcvmore() and sbrelease().  The latter is
only relevant for protocols that use generic socket buffers.

The socket I/O sx(9) lock acquisition in sorflush() is not relevant for
shutdown(2) operation as it doesn't do any I/O that may interleave with
read(2) or write(2).  The socket buffer mutex acquisition inside
sbrelease() is what guarantees thread safety.  This sx(9) acquisition in
soshutdown() can be tracked down to 4.4BSD times, where it used to be
sblock(), and it was carried over through the years evolving together with
sockets with no reconsideration of why do we carry it over.  I can't tell
if that sblock() made sense back then, but it doesn't make any today.

Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D43415
2024-01-16 10:30:49 -08:00
Gleb Smirnoff 5bba272807 sockets: make pr_shutdown fully protocol specific method
Disassemble a one-for-all soshutdown() into protocol specific methods.
This creates a small amount of copy & paste, but makes code a lot more
self documented, as protocol specific method would execute only the code
that is relevant to that protocol and nothing else.  This also fixes a
couple recent regressions and reduces risk of future regressions.  The
extended KPI for the new pr_shutdown removes need for the extra pr_flush
which was added for the sake of SCTP which could not perform its shutdown
properly with the old one.  Particularly for SCTP this change streamlines
a lot of code.

Some notes on why certain parts of code were copied or were not to certain
protocols:
* The (SS_ISCONNECTED | SS_ISCONNECTING | SS_ISDISCONNECTING) check is
  needed only for those protocols that may be connected or disconnected.
* The above reduces into only SS_ISCONNECTED for those protocols that
  always connect instantly.
* The ENOTCONN and continue processing hack is left only for datagram
  protocols.
* The SOLISTENING(so) block is copied to those protocols that listen(2).
* sorflush() on SHUT_RD is copied almost to every protocol, but that
  will be refactored later.
* wakeup(&so->so_timeo) is copied to protocols that can make a non-instant
  connect(2), can SO_LINGER or can accept(2).

There are three protocols (netgraph(4), Bluetooth, SDP) that did not have
pr_shutdown, but old soshutdown() would still perform sorflush() on
SHUT_RD for them and also wakeup(9).  Those protocols partially supported
shutdown(2) returning EOPNOTSUP for SHUT_WR/SHUT_RDWR, now they fully lost
shutdown(2) support.  I'm pretty sure netgraph(4) and Bluetooth are okay
about that and SDP is almost abandoned anyway.

Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D43413
2024-01-16 10:30:37 -08:00
Kristof Provost 2be6f75707 pflow: Turn `pflowstats' statistics counters into per-CPU counters to make them mpsafe.
The weird interactions around `pflow_flows' and `sc_gcounter' replaced
by simple `pflow_flows' increment. Since the flow sequence is the 32
bits integer, the `sc_gcounter' type replaced by the type of uint32_t.

Obtained from:	OpenBSD
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43116
2024-01-16 09:45:55 +01:00
Kristof Provost fc6e506996 pflow: add RFC8158 NAT support
Extend pflow(4) to send NAT44 Session Create and Delete events.
This applies only to IPFIX (i.e. proto version 10), and requires no
user configuration.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43114
2024-01-16 09:45:55 +01:00