If we apply a route-to to an inbound packet pf_route() may hand that
packet over to dummynet. Dummynet may then delay the packet, and later
re-inject it. This re-injection (in dummynet_send()) needs to know
if the packet was inbound or outbound, to call the correct path for
continued processing.
That's done based on the pf_pdesc we pass along (through
pf_dummynet_route() and pf_pdesc_to_dnflow()). In the case of pf_route()
on inbound packets that may be wrong, because we're called in the input
path, and didn't update pf_pdesc->dir.
This can manifest in issues with fragmented packets. For example, a
fragmented packet will be re-fragmented in pf_route(), and if dummynet
makes different decisions for some of the fragments (that is, it delays
some and allows others to pass through directly) this will break.
The packets that pass through dummynet without delay will be transmitted
correctly (through the ifp->if_output() call in pf_route()), but
the delayed packets will be re-injected in the input path (and not
the output path, as they should be). These packets will pass through
pf_test(PF_IN) as they're tagged PF_MTAG_FLAG_DUMMYNET. However,
this tag is then removed and the packet will be routed and enter
pf_test(PF_OUT) where pf_reassemble() will hold them indefinitely
(as some fragments have been transmitted directly, and will never hit
pf_test(PF_OUT)).
The fix is simple: we must update pf_pfdesc->dir to PF_OUT before we
pass the packet to dummynet.
See also: https://redmine.pfsense.org/issues/15156
Reviewed by: rcm
Sponsored by: Rubicon Communications, LLC ("Netgate")
The TCP implied connect is an artifact left after T/TCP. To my surprise
it still works, hence the existence of this test. Please read this email
first:
https://lists.freebsd.org/pipermail/freebsd-net/2010-August/026311.html
An interesting fact that this test takes 220 - 240 milliseconds to
execute on my Threadripper PRO. Flipping the '#if 0' to '#if 1' in the
test, thus bringing it back to normal connect(2), would speed the test up
a hundred times and I guess all this time is fork+exec of the test.
When we route-to the state should be bound to the route-to interface,
not the default route interface. However, we should only do so for
outbound traffic, because inbound traffic should bind on the arriving
interface, not the one we eventually transmit on.
Explicitly check for this in BOUND_IFACE().
We must also extend pf_find_state(), because subsequent packets within
the established state will attempt to match the original interface, not
the route-to interface.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43589
Otherwise we get spurious test failures when running tests in parallel.
The intent here was to name jails after the tests, but this was done
incorrectly in a couple of places.
MFC after: 1 week
While there are no inherent limits to the number of exporters we're
likely to scale rather badly to very large numbers. There's also no
obvious use case for more than a handful. Limit to 128 exporters to
prevent foot-shooting.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Ensure we print it as such, rather than as a signed integer, as that
would lead to confusion.
Reported by: Jim Pingle <jimp@netgate.com>
Sponsored by: Rubicon Communications, LLC ("Netgate")
pflow(4) now also exports NAT session creation/destruction information.
Test that this works as expected.
While here improve the parsing of ipfix (i.e. pflowproto 10) a bit, and
check more information for the existing state information exports.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43117
Test that we can send netflow information over IPv6.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43115
Test that we actually send netflow messages when configured to do so.
We do not yet inspect the generated netflow messages.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43111
Basic creation, validation and cleanup test for the new pflow interface.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43109
This would previously return 1 if the slave side of the pts was closed
to force an application to read() from it and observe the EOF, but it's
not clear why and this is inconsistent both with how we handle devices
with similar mechanics (like pipes) and also with other kernels, such as
OpenBSD/NetBSD and Linux.
PR: 239604
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D43457
If a copy_file_range operation tries to read from a page that was
previously written via mmap, that page must be flushed first.
MFC after: 2 weeks
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D43451
Ensure that we do the right thing when we reassemble fragmented packet
and send it through a dummynet pipe.
Sponsored by: Rubicon Communications, LLC ("Netgate")
The bug isn't fusefs-specific, but this is the easiest way to reproduce
it.
PR: 276191
MFC after: 1 week
MFC with: bdb46c21a3
Differential Revision: https://reviews.freebsd.org/D43446
Reviewed by: kib
Change sequence of syscalls: instead of "add, delete, check, check"
run sequence "add, check, delete, check". Seems to make more sense.
Do minimal parsing of incoming messages: find the IPv4 address there
and compare it to the original.
Allocate 9000 bytes to match the largest requsted size. Add a check to
prevent the list of sizes and buffer size from getting out of sync
again.
Reviewed by: markj
Found with: CheriBSD
Differential Revision: https://reviews.freebsd.org/D43340
You can't ever safely map a single page and then map a superpage sized
mapping over it with MAP_FIXED. Even in a single-threaded program, ASLR
might mean you land too close to another mapping and on CheriBSD we
don't allow the initial reservation to grow because doing so requires
program changes that are hard to automate.
To avoid this, map the entire region we want to use upfront.
Reviewed by: markj
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D43282
Implement Netlink socket receive buffer as a simple TAILQ of nl_buf's,
same part of struct sockbuf that is used for send buffer already.
This shaves a lot of code and a lot of extra processing. The pcb rids
of the I/O queues as the socket buffer is exactly the queue. The
message writer is simplified a lot, as we now always deal with linear
buf. Notion of different buffer types goes away as way as different
kinds of writers. The only things remaining are: a socket writer and
a group writer.
The impact on the network stack is that we no longer use mbufs, so
a workaround from d187154750 disappears.
Note on message throttling. Now the taskqueue throttling mechanism
needs to look at both socket buffers protected by their respective
locks and on flags in the pcb that are protected by the pcb lock.
There is definitely some room for optimization, but this changes tries
to preserve as much as possible.
Note on new nl_soreceive(). It emulates soreceive_generic(). It
must undergo further optimization, see large comment put in there.
Note on tests/sys/netlink/test_netlink_message_writer.py. This test
boiled down almost to nothing with mbufs removed. However, I left
it with minimal functionality (it basically checks that allocating N
bytes we get N bytes) as it is one of not so many examples of ktest
framework that allows to test KPIs with python.
Note on Linux support. It got much simplier: Netlink message writer
loses notion of Linux support lifetime, it is same regardless of
process ABI. On socket write from Linux process we perform
conversion immediately in nl_receive_message() and on an output
conversion to Linux happens in in nl_send_one(). XXX: both
conversions use M_NOWAIT allocation, which used to be the case
before this change, too.
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D42524
With upcoming protocol specific socket buffer for Netlink we need some
additional tests that cover basic socket operations, w/o much of actual
Netlink knowledge. Following tests are performed:
1) Overflow. If an application keeps sending messages to the kernel,
but doesn't read out the replies, then first the receive buffer shall
fill and after that further messages from applications will be queued
on the send buffer until it is filled. After that socket operations
should block. However, reading from the receive buffer some data should
wake up the taskqueue and the send buffer should start draining again.
2) Peek & trunc. Check that socket correctly reports amount of readable
data with MSG_PEEK & MSG_TRUNC. This is typical pattern of Netlink apps.
3) Sizes. Check that zero size read doesn't affect the socket, undersize
read will return one truncated message and the message is removed from
the buffer. Check that large buffer will be filled in one read, without
any boundaries imposed by internal representation of the buffer. Check
that any meaningful read is amended with control data if requested so.
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D42525
After ad874544d9, interface name
validation has been removed, resulting in two unit tests failures.
Drop the failing tests since they no longer apply.
Reported by: markj
Building tests/sys/fs/fusefs with clang 18 results the following
warning:
tests/sys/fs/fusefs/cache.cc:145:14: error: variable length arrays in C++ are a Clang extension [-Werror,-Wvla-cxx-extension]
145 | uint8_t buf[bufsize];
| ^~~~~~~
Because we do not particularly care that this is a clang extension,
suppress the warning.
MFC after: 3 days
When this functionality was moved to libifconfig in 3dfbda3401,
the end of list calculation was modified for unknown reasons, practically
limiting the number of bridge member returned to (about) 102.
This patch changes the calculation back to what it was originally and
adds a unit test to verify it works as expected.
Reported by: Patrick M. Hausen (via ML)
Reviewed by: kp
Approved by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D43135
Have a simple Gilbert-Elliott channel model in
dummynet to mimick correlated loss behavior of
realistic environments. This allows simpler testing
of burst-loss environments.
Reviewed By: tuexen, kp, pauamma_gundo.com, #manpages
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D42980
Let the accept functions provide stack memory for protocols to fill it in.
Generic code should provide sockaddr_storage, specialized code may provide
smaller structure.
While rewriting accept(2) make 'addrlen' a true in/out parameter, reporting
required length in case if provided length was insufficient. Our manual
page accept(2) and POSIX don't explicitly require that, but one can read
the text as they do. Linux also does that. Update tests accordingly.
Reviewed by: rscheff, tuexen, zlei, dchagin
Differential Revision: https://reviews.freebsd.org/D42635
If ZFS reports that a disk had at least 8 I/O operations over 60s that
were each delayed by at least 30s (implying a queue depth > 4 or I/O
aggregation, obviously), fault that disk. Disks that respond this
slowly can degrade the entire system's performance.
MFC after: 2 weeks
Sponsored by: Axcient
Reviewed by: delphij
Differential Revision: https://reviews.freebsd.org/D42825
Add tests for adding a route using an interface only (without an IP
address).
Reviewed by: rcm
Approved by: kp (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41436