When a lagg interface is destroyed, it destroys all of the lagg ports,
which triggers an asynchronous link state change handler. This in turn
may generate a netlink message, a portion of which requires netlink to
invoke the SIOCGIFMEDIA ioctl of the lagg interface, which involves
scanning the list of interface media. This list is not internally
locked, it requires the interface driver to provide some kind of
synchronization.
Shortly after the link state notification has been raised, the lagg
interface detaches itself from the network stack. As a part of this, it
blocks in order to wait for link state handlers to drain, but before
that it destroys the interface media list. Reverse this order of
operations so that the link state change handlers drain first, avoiding
a use-after-free that is very occasionally triggered by lagg stress
tests. This matches other ethernet drivers in the tree.
MFC after: 2 weeks
State keys are trivially const in lookup routines, so annotate them as
such. No functional change intended.
Reviewed by: kp
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Modirum
Differential Revision: https://reviews.freebsd.org/D45671
The machinery to support 64-bit counters even on 32-bit kernels had a
bug where it would unitentionally truncate the value back to 32-bits
when transferring to a new counter. This resulted in buggy be behavior
on 64-bit kernels as well.
Sponsored by: Rubicon Communications, LLC ("Netgate")
This function was introduced in commit [1] and is actually used as a
boolean function although it was not defined as so.
No functional change intended.
1. 16d878cc99 Fix the following bpf(4) race condition which can result in a panic
Reviewed by: markj, kp, #network
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D45509
777a4702c changed how we copy out the anchor_call string, and
incorrectly limited it to 8 (4 on 32-bit systems) bytes. Fix that so we
get the full anchor path, rather than just the first few characters.
PR: 279225
Sponsored by: Rubicon Communications, LLC ("Netgate")
If we fail to change the vlan id we have to undo the removal (and vlan id
change) in the error path. Otherwise we'll have removed the vlan object from the
hash table, and have the wrong vlan id as well. Subsequent modification attempts
will then try to remove an entry which doesn't exist, and panic.
Undo the vlan id modification if the insertion in the hash table fails, and
re-insert it under the original vlan id.
PR: 279195
Reviewed by: zlei
MFC atfer: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D45285
User misconfiguration, either tunnel loops, or a large number of
different nested tunnels, can overflow the kernel stack. Prevent that
by using if_tunnel_check_nesting().
PR: 278394
Diagnosed by: markj
Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D45197
User misconfiguration may lead to routing loops where we try to send the tunnel
packet into the tunnel. This eventually leads to stack overflows and panics.
Avoid this using if_tunnel_check_nesting(), which will drop the packet if we're
looping or we hit three layers of nested tunnels.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Rely on LAGG_SLOCK() instead. The use of network epoch(9) here was added
in 6573d7580b (later tidied by 87bf9b9cbe) as a large sweep that
blindly substituted blocking kernel primitives with epoch(9). In these
particular code paths use of epoch(9) is incorrect and doesn't provide any
protection against a stale pointer. Recent fix 48698ead6f, which should
actually have removed the epoch use, created a potential sleeping in epoch
problem.
Based on the old submission from asomers@. With modern state of locking
in lagg(4), the patch got much simplier. Enable the test that was
waiting for this change.
PR: 226144
Reviewed by: asomers
Differential Revision: https://reviews.freebsd.org/D44605
There are situations where an struct ifnet has a NULL if_ioctl pointer.
For example, e6000sw creates such struct ifnets for each of its ports so it can
call into the MII code.
If there is then a link state event this calls do_link_state_change()
-> rtnl_handle_ifevent() -> dump_iface() -> get_operstate() ->
get_operstate_ether(). That wants to know if the link is up or down, so it tries
to ioctl(SIOCGIFMEDIA), which doesn't go well if if_ioctl is NULL.
Guard against this, and return EOPNOTSUPP.
PR: 275920
MFC ater: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")
And more comments on the #ifdef INET blocks to improve readability.
While here, revert the order of two prototypes to produce minimal diff
compared to stable branches.
MFC with: 65767e6126
This is used by 802.3 Ethernet. (Also be used by 802.4 Token Bus and
802.5 Token Ring, but we don't support those.)
This was accidentally removed along with FDDI support in commit
0437c8e3b1, presumably because comments implied it was used only by
FDDI or Token Ring.
Fixes: 0437c8e3b1 ("Remove support for FDDI networks.")
Reviewed-by: emaste
Signed-off-by: Denny Page <dennypage@me.com>
Pull-request: https://github.com/freebsd/freebsd-src/pull/1166
- add const to the appropriate places in the libipsec public API and the
relevant internal functions needed to support that.
- replace caddr_t with c_caddr_t in ipsec_dump_policy()
- update the ipsec_dump_policy manpage to use c_caddr_t (this manpage
was already wrong as it had "char *" instead of caddr_t previously).
While here, update pfkeyv2.h to not cast away const in the PFKEY_*()
macros.
This should not cause any ABI changes as the actual types have not
changed.
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1099
The if_bridge contains several instances of:
if (AF_INET code ...
#ifdef INET6
AF_INET6 code ...
#endif
) {
...
Clean this up by adding a couple of macros at the top of the file that
are conditionally defined based on whether INET and/or INET6 are enabled,
which makes the code more readable and easier to maintain.
No functional change intended.
Reviewed by: zlei, markj
MFC after: 1 week
Pull Request: https://github.com/freebsd/freebsd-src/pull/1191
The pseudo_AF_HDRCMPLT check is already being done in if_loop and
just needed to be ported over to if_ic, if_wg, if_disc, if_gif,
if_gre, if_me, if_tuntap and ng_iface. This is needed in order to
allow these interfaces to work properly with e.g., tcpreplay.
PR: 256587
Reviewed by: markj
MFC after: 2 weeks
Pull Request: https://github.com/freebsd/freebsd-src/pull/876
The ice(4) driver will add the ability to create extra interfaces
that hang off of the base interface; to do that the driver requires
a method for the subinterface to request hardware interrupt resources
from the base interface.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D39930
Intended to be used with upcoming feature to add sub-interfaces, since
those new interfaces will be dynamically created and will need to have
spare MSI-X interrupts already allocated for them on driver load.
This sysctl is marked as a tunable since it will need to be set before
the driver is loaded since MSI-X interrupt allocation and setup is
done during the attach process.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D41326
Some of the QUAD sysctls are actually for unsigned quad values.
Switch to using UQUAD instead, as that is meant for unsigned.
Reviewed by: erj, jhb
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D44620
Both the mbuf length and the total packet length are signed.
While here, update a stall comment to reflect the current practice.
Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42390
There is no need for a separate structure neither for a linked list.
Provide each VNET with an array of pointers to if_clone that has the same
size as the driver list.
Reviewed by: zlei, kevans, kp
Differential Revision: https://reviews.freebsd.org/D44307
Although a port addition is coded in a sequence where first all softc
information is fulfilled and only then it is attached to the lagg, we
still need a locking primitive to guarantee cache invalidation. Panic
observed in the wild shows that lacp_portreq() called via
lagg_port_ioctl(SIOCGLAGGPORT) immediately after port creation may see
lp->lp_psc as NULL and panic. In the core file we will see valid data
all around. A race via lagg_ioctl() wasn't observed but potentially
is possible.
Differential Revision: https://reviews.freebsd.org/D43501
Adding a temporary workaround for plugging a fw_com upon if_fwip unloading.
Steps to reproduce(needs two hosts connected with firewire):
while true; do
ifconfig fwip0 10.0.0.5 up
fwcontrol -r
ping -c 10.0.0.3
kldunload if_fwip
done
There's a chance that the unloading of if_fwip.ko triggers following warning:
Warning: memory type fw_com leaked memory on destroy (1 allocations, 64 bytes leaked).
commit d79b6b8ec2 (origin/main, origin/HEAD)
Otherwise, with the default flm_debug_level of LOG_NOTICE, it's rather
easy to trigger debug messages such as:
[fib_algo] inet.0 (bsearch4#18) rebuild_fd_flm: switching algo to
radix4_lockless
Also, the "severity" of these events generally only justifies LOG_INFO
and not LOG_NOTICE.
Reviewed by: melifaro
pfil hooks (i.e. firewalls) may pass, modify or free the mbuf passed
to them. (E.g. when rejecting a packet, or when gathering up packets
for reassembly).
If the hook returns PFIL_PASS the mbuf must still be present. Assert
this in pfil_mem_common() and ensure that ipfilter follows this
convention. pf and ipfw already did.
Similarly, if the hook returns PFIL_DROPPED or PFIL_CONSUMED the mbuf
must have been freed (or now be owned by the firewall for further
processing, like packet scheduling or reassembly).
This allows us to remove a few extraneous NULL checks.
Suggested by: tuexen
Reviewed by: tuexen, zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43617
Introduce a verbose output mode to pflowctl, and expose the status of
the socket to userspace. This can be helpful in debugging configuration
errors.
Sponsored by: Rubicon Communications, LLC ("Netgate")
While there are no inherent limits to the number of exporters we're
likely to scale rather badly to very large numbers. There's also no
obvious use case for more than a handful. Limit to 128 exporters to
prevent foot-shooting.
Sponsored by: Rubicon Communications, LLC ("Netgate")
If we're applying NPTv6 we pass a packet with a modified source and/or
destination address to the network stack.
If that packet then turns out to be larger than the MTU of the sending
interface the stack will attempt to generate an icmp6 packet-too-big
error, but may fail to look up the appropriate source address for that
error message. Even if it does, pf would still have to undo the binat
operation inside the icmp6 packet so the sending host can make sense of
the error.
We can avoid both problems entirely by having pf also perform the MTU
check (taking the potential refragmentation into account), and
generating the icmp6 error directly in pf.
See also: https://redmine.pfsense.org/issues/14290
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43499
With removal of dom_dispose method the function boils down to two
meaningful function calls: socantrcvmore() and sbrelease(). The latter is
only relevant for protocols that use generic socket buffers.
The socket I/O sx(9) lock acquisition in sorflush() is not relevant for
shutdown(2) operation as it doesn't do any I/O that may interleave with
read(2) or write(2). The socket buffer mutex acquisition inside
sbrelease() is what guarantees thread safety. This sx(9) acquisition in
soshutdown() can be tracked down to 4.4BSD times, where it used to be
sblock(), and it was carried over through the years evolving together with
sockets with no reconsideration of why do we carry it over. I can't tell
if that sblock() made sense back then, but it doesn't make any today.
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D43415
Disassemble a one-for-all soshutdown() into protocol specific methods.
This creates a small amount of copy & paste, but makes code a lot more
self documented, as protocol specific method would execute only the code
that is relevant to that protocol and nothing else. This also fixes a
couple recent regressions and reduces risk of future regressions. The
extended KPI for the new pr_shutdown removes need for the extra pr_flush
which was added for the sake of SCTP which could not perform its shutdown
properly with the old one. Particularly for SCTP this change streamlines
a lot of code.
Some notes on why certain parts of code were copied or were not to certain
protocols:
* The (SS_ISCONNECTED | SS_ISCONNECTING | SS_ISDISCONNECTING) check is
needed only for those protocols that may be connected or disconnected.
* The above reduces into only SS_ISCONNECTED for those protocols that
always connect instantly.
* The ENOTCONN and continue processing hack is left only for datagram
protocols.
* The SOLISTENING(so) block is copied to those protocols that listen(2).
* sorflush() on SHUT_RD is copied almost to every protocol, but that
will be refactored later.
* wakeup(&so->so_timeo) is copied to protocols that can make a non-instant
connect(2), can SO_LINGER or can accept(2).
There are three protocols (netgraph(4), Bluetooth, SDP) that did not have
pr_shutdown, but old soshutdown() would still perform sorflush() on
SHUT_RD for them and also wakeup(9). Those protocols partially supported
shutdown(2) returning EOPNOTSUP for SHUT_WR/SHUT_RDWR, now they fully lost
shutdown(2) support. I'm pretty sure netgraph(4) and Bluetooth are okay
about that and SDP is almost abandoned anyway.
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D43413
The weird interactions around `pflow_flows' and `sc_gcounter' replaced
by simple `pflow_flows' increment. Since the flow sequence is the 32
bits integer, the `sc_gcounter' type replaced by the type of uint32_t.
Obtained from: OpenBSD
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43116
Extend pflow(4) to send NAT44 Session Create and Delete events.
This applies only to IPFIX (i.e. proto version 10), and requires no
user configuration.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43114