Commit 46d7b45a26 introduced these code
paths. Test and document them.
- Add inner packet too short test
- Add inner IHL too short test
- Add quoted data too short test
- Add IHL too short test
- Add max inner packet IHL without payload test
Reviewed by: markj
MFC after: 1 week
Pull Request: https://github.com/freebsd/freebsd-src/pull/863
Differential Revision: https://reviews.freebsd.org/D38528
Avoid calculating the square root of negative zero, which can easily
happen on certain architectures when calculating the population standard
deviation with a sample size of one, e.g., 0.01 - (0.1 * 0.1) =
-0.000000.
Avoid returning a NaN by capping the minimum possible variance value to
zero (positive).
In the future, maybe skip reporting statistics at all for a single
sample.
Reported by: Jenkins
Reviewed by: asomers
MFC after: 1 week
Pull Request: https://github.com/freebsd/freebsd-src/pull/863
Differential Revision: https://reviews.freebsd.org/D42114
The primary bottleneck *was* vnode_list mtx, which got artificially
worsened due to the following work done with the lock held:
1. the global heavily modified numvnodes counter was being read,
inducing massive cache line ping pong
2. should the value fit limits (which it normally did) there would be an
avoidable write to vn_alloc_cyclecount, which is being read outside
of the lock, once more inducing traffic
But if vn_alloc_cyclecount is 0, which it normally is even when facing
vnode shortage, there is no need to check numvnodes nor set it to 0 again.
Another problem was numvnodes adjustment (which made the locked read
much worse). While it fundamentally does not scale as it is not
distributed in any fashion, it was avoidably slow. When bumping over the
vnode limit, it would be modified with atomics 3 times: inc + dec to
backpedal in vn_alloc, then final inc in vn_alloc_hard.
One can let some slop persist over calls to vnlru_free instead.
In principle each thread in the system could get here and bump it, so a
limit is put in place to keep things sane.
Bench setup same as in prior commits: zfs, 20 separate directory trees
each with 1 million files in total and 20 find(1) processes stating them
in parallel (one per each tree).
Total run time (in seconds) goes down as follows:
vnode limit 8388608 400000
before ~20 ~35
after ~8 ~15
With this in place the primary bottleneck is now ZFS.
Sponsored by: Rubicon Communications, LLC ("Netgate")
bsddialog(1) uses getopt_long(3) to parse command line argument list.
Add '--' to avoid errors caused by arguments (menu items) begin
with '-'.
The change is compatible with dialog(1) and Xdialog(1).
The man page had `kern.ktrace.geniosize` but the sysctl node contains an
underscore.
PR: 274274
Reported by: Ivan Rozhuk
Sponsored by: The FreeBSD Foundation
* Interrupt the option loop as soon as we have an indication of which
protocol is intended.
* If we end up having to perform a DNS lookup, loop over the entire
result looking for either IPv4 or IPv6 addresses.
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Reviewed by: rscheff, kevans, allanjude
Differential Revision: https://reviews.freebsd.org/D42137
While we should have cleared all the pending I/O prior to calling
nvme_qpair_destroy, which should ensure that if the callout_drain causes
a call to nvme_qpair_timeout(), it won't schedule any new
timeout. However, it doesn't hurt to set timeout_pending to false in
nvme_qpair_destroy() and have nvme_qpair_timeout() exit early if it sees
it w/o scheduling a timeout. Since we don't otherwise stop the timeout
until we're about to destroy the qpair, this ensures we fail safe. The
lock/unlock also ensures the callout_drain will either remove the callout,
or wait for it to run with the early bailout.
We can likely further improve this by using callout_stop() inside the
pending lock. I'll investigate that for future refinement.
Sponsored by: Netflix
Suggestions by: jhb
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D42065
While it seemed like a good idea to have this state, we can do
everything we wanted with the state by checking ctrlr->is_failed since
that's set before we start failing the qpairs. Add some comments about
racing when we're failing the controller, though in practice I'm not
sure that kind of race could even be lost.
Sponsored by: Netflix
Reviewed by: chuck, gallatin, jhb
Differential Revision: https://reviews.freebsd.org/D42051
After da8324a925, the pre/post hooks are gone. So remove a coment
about why we don't call them in this case.
Sponsored by: Netflix
Reviewed by: chuck, jhb
Differential Revision: https://reviews.freebsd.org/D42050
da8324a925 removed one of the two instances of NVME_2X_RESET. It
failed to snag the other one, and remove it from the options file.
Remove from both of those here.
Sponsored by: Netflix
Reviewed by: chuck, gallatin, jhb
Differential Revision: https://reviews.freebsd.org/D42049
In 4b977e6dda we removed the call to nvme_ctrlr_post_failed_request
because we can now directly fail requests in this context since we're in
the reset task already. No need to queue it. I left it in place against
future need, but it's been two years and no panics have resulted. Since
the static analysis (code checking) and the dyanmic analysis (surviving
in the field for 2 years, including at $WORK where we know we've gone
through this path when we've failed drives) both signal that it's not
really needed, go ahead and GC it. If we discover at a later date a flaw
in this analysis, we can add it back easily enough by reverting this and
4b977e6dda.
Sponsored by: Netflix
Reviewed by: chuck, gallatin, jhb
Differential Revision: https://reviews.freebsd.org/D42048
Works around severe performance problems in certain corner cases, see
the commentary added.
Modifying vnlru logic has proven rather error prone in the past and a
release is near, thus take the easy way out and fix it without having to
dig into the current machinery.
If the count is high enough there is no point trying to produce more.
Not going there reduces traffic on the vnode_list mtx.
This further shaves total real time in a test mentioned in:
74be676d87 ("vfs: drop one vnode list lock trip during vnlru free
recycle") -- 20 instances of find each creating 1 million vnodes, while
total limit is set to 400k.
Time goes down from ~41 to ~35 seconds.
Sponsored by: Rubicon Communications, LLC ("Netgate")
In face of vnode shortage the count very easily can go few units above
the limit before going back down.
Calling uma_reclaim results in massive amount of work which in this case
is not warranted.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Add the build flags to enable branch protection on arm64. This enable
the use of PAC and BTI in the kernel.
For PAC we already install the kernel keys when entering the kernel
from userspace so this will start using these to sign the stack.
For BTI we need to mark the kernel page tables with a new guarded page
field. As this will require all code that could be reached through a
function pointer with an appropriate branch target instruction we
are enabling this before setting the field.
As the pointer authentication support shouldn't be reached via a
function pointer it is safe to not enable the use of BTI there.
Reviewed by: markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42079
When we enable BTI iboth the first instruction in a function that could
be called indirectly, and a branch within a function need a valid
landing pad instruction.
There are three options for these instructions:
1. A breakpoint instruction
2. A pointer authentication PACIASP/PACIBSP
3. A BTI instruction
Option 1 will raise a breakpoint exception so isn't useable in either
cases. Option 2 could be used in some function entry cases, but needs
to be paired with an authentication instruction, and is normally only
used in non-leaf functions we can't use it in this case. This leaves
option 3.
There are four variants of the instruction, the C variant is used on
function entry and the J variant is for jumping within a function.
There is also a JC that works with both and one with no target that
works with neither.
Reviewed by: markj
Sponsored by: Arm Ltd
Sponsored by: The FreeBSD Foundation (earlier version)
Differential Revision: https://reviews.freebsd.org/D42078
We now have an improved version (via netlink). The old-style ioctl will
be removed in FreeBSD 16.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D42101
Use it wherever COMPAT_FREEBSD13 is currently specified.
Reviewed by: brooks, zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D42100
Allow userspace to retrieve a list of distinct creator ids for the
current states.
This is used by pfSense, and used to require dumping all states to
userspace. It's rather inefficient to export a (potentially extremely
large) state table to obtain a handful (typically 2) of 32-bit integers.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D42092
Allow consumers to start processing states as the kernel supplies them,
rather than having to build a full list and only then start processing.
Especially for very large state tables this can significantly reduce
memory use.
Without this change when retrieving 1M states time -l reports:
real 3.55
user 1.95
sys 1.05
318832 maximum resident set size
194 average shared memory size
15 average unshared data size
127 average unshared stack size
79041 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
15096 messages sent
250001 messages received
0 signals received
22 voluntary context switches
34 involuntary context switches
With it it reported:
real 3.32
user 1.88
sys 0.86
3220 maximum resident set size
195 average shared memory size
11 average unshared data size
128 average unshared stack size
260 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
15096 messages sent
250001 messages received
0 signals received
21 voluntary context switches
31 involuntary context switches
Reviewed by: mjg
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D42091
Use netlink to export pf's state table.
The primary motivation is to improve how we deal with very large state
stables. With the previous implementation we had to build the entire
list (both in the kernel and in userspace) before we could start
processing. With netlink we start to get data in userspace while the
kernel is still generating more. This reduces peak memory consumption
(which can get to the GB range once we hit millions of states).
Netlink also makes future extension easier, in that we can easily add
fields to the state export without breaking userspace. In that regard
it's similar to an nvlist-based approach, except that it also deals
with transport to userspace and that it performs significantly better
than nvlists. Testing has failed to measure a performance difference
between the previous struct-copy based ioctl and the netlink approach.
Differential Revision: https://reviews.freebsd.org/D38888
Rework the packages TUI, do that the index caching is now done with
dialog --gauge (tested with cdialog and bsddialog).
With pkg we can know in avance the number of packages making it
possible to have a real gauge.
The cache of the index is now a file that can be sourced, meaning it
is not anymore an index like file, but a post process one, simplifying
the code.
Each menu is now built calling directly pkg rquery with just the
informations required to build the menu instead of parsing an indexfile
install all the awk index processing into a separate file to ease
reading and debuggung
svnliteversion was provided by the base system copy of subversion,
which was disabled in a2bc17474b ("Disable building svnlite(1) by
default.")
Reviewed by: zlei
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42034
Previously when printing the sysctl description (via the -d flag) we
omitted the newline if the node provided no description (i.e., NULL).
This could be observed via e.g. `sysctl -d dev`.
PR: 44034
Reviewed by: zlei
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42112
Commit 35305a8dc1 (r211393) added a check on whether 'uid' was equal
to getuid() before calling setlogincontext(). Doing so still allows
a setuid program to apply resource limits and priorities specified in
a user-controlled configuration file ('~/.login_conf') where
a non-setuid program could not. Plug the hole by checking instead that
the process' effective UID is the target one (which is likely what was
meant in the initial commit).
PR: 271750
Reviewed by: kib, des
MFC after: 2 weeks
Sponsored by: Kumacom SAS
Differential Revision: https://reviews.freebsd.org/D40351
When early AP startup was introduced in 2016 it was put behind a kernel
option EARLY_AP_STARTUP as a transition aid, so that it could be turned
off if necessary. For x86 the non-EARLY_AP_STARTUP case is no longer
functional, so disallow it.
Other archs are still incompatible with EARLY_AP_STARTUP, so the option
cannot yet be removed entirely.
Reported by: wollman
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41351
Various reports recently hit the "Invalid TXQ id" in iwlwifi again.
Unconditionally enable logging and add a note to report to a specific
PR in the log message for now.
Along with 018d93ece1 this will hopefully help us to understand what
is going on.
Sponsored by: The FreeBSD Foundation
PR: 274382
Multiple reports have shown missed state transitions in net80211 without
major cause obvious (or with a txq warning in iwlwifi).
In order to better track down potential problems add unconditional
ic_printf calls to any case in the lkpi state machine compat code which
would let us return with an error in the hope that it helps us to catch
the actual problems.
Also remove the debug conditions from ieee80211_{beacon,connection}_loss
which can also cause state transitions to have the ic_printf all the time
there too.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Since Cirrus Labs is limiting their free usage tier [1], limit CI runs
on pull requests only. Otherwise, we might deplete our monthly quota
within a few days.
Adapt the task amd64-llvm16 to execute on downstream repos or on pull
requests only.
Other alternatives will be further studied.
[1]: https://cirrus-ci.org/blog/2023/07/17/limiting-free-usage-of-cirrus-ci/