Commit graph

149946 commits

Author SHA1 Message Date
John Baldwin 8488fc417f nvme: Use the NVMEM macro instead of expanded versions
A few of these omitted a shift of 0, but this is more consistent.

Reviewed by:	chuck
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D43602
2024-01-29 10:59:37 -08:00
John Baldwin 1dade1f255 nvme: Rename NVMEB helper macro to NVMEM
The current macro always builds a full mask for a named field, so use
the M suffix for mask.

Reviewed by:	chuck, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D43601
2024-01-29 10:58:28 -08:00
John Baldwin 2cb78e7150 nda: Use the NVMEV macro instead of expanded versions
Reviewed by:	chuck
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D43596
2024-01-29 10:33:39 -08:00
John Baldwin 479680f235 nvme: Use the NVMEV macro instead of expanded versions
Reviewed by:	chuck
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D43595
2024-01-29 10:30:54 -08:00
Michael Tuexen f30c7d5654 TCP LRO: convert TCP header fields to host byte order earlier
This is a preparation for adding dtrace hooks in a follow-up commit,
which are missing in the code path, where packets are directly queued
to the tcpcb. The dtrace hooks expect the fields to be in host byte
order. This only applies when TCP HPTS is used.
No functional change intended.

Reviewed by:		rscheff
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D43594
2024-01-29 18:52:17 +01:00
Ed Maste 7e77089dcc linuxkpi: remove invalid KASSERT from hash_add_rcu
hash_add_rcu asserted that the node's prev pointer was NULL in an
attempt to detect addition of a node already on a list, but the caller
is not required to provide a zeroed node.

Reported in https://github.com/freebsd/drm-kmod/issues/282

Reviewed by:	bz, manu
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43645
2024-01-29 11:40:17 -05:00
Kristof Provost 31828075e4 pf: bind route-to states to their route-to interface
When we route-to the state should be bound to the route-to interface,
not the default route interface. However, we should only do so for
outbound traffic, because inbound traffic should bind on the arriving
interface, not the one we eventually transmit on.

Explicitly check for this in BOUND_IFACE().

We must also extend pf_find_state(), because subsequent packets within
the established state will attempt to match the original interface, not
the route-to interface.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43589
2024-01-29 14:10:26 +01:00
Kristof Provost ffeab76b68 pfil: PFIL_PASS never frees the mbuf
pfil hooks (i.e. firewalls) may pass, modify or free the mbuf passed
to them. (E.g. when rejecting a packet, or when gathering up packets
for reassembly).

If the hook returns PFIL_PASS the mbuf must still be present. Assert
this in pfil_mem_common() and ensure that ipfilter follows this
convention. pf and ipfw already did.
Similarly, if the hook returns PFIL_DROPPED or PFIL_CONSUMED the mbuf
must have been freed (or now be owned by the firewall for further
processing, like packet scheduling or reassembly).

This allows us to remove a few extraneous NULL checks.

Suggested by:	tuexen
Reviewed by:	tuexen, zlei
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43617
2024-01-29 14:10:19 +01:00
Richard Scheffenegger 0b3f9e435f tcp: move cc_post_recovery past snd_una update
The RFC6675 pipe calculation (sack.revised, enabled
by default since D28702), uses outdated information,
while the previous default calculated it correctly
with up-to-date information from the incoming ACK.

This difference can become as large as the receive
window (not the congestion window previously),
potentially triggering a massive burst of new packets.

MFC after:             1 week
Reviewed By:           tuexen, #transport
Sponsored by:          NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43520
2024-01-28 00:18:51 +01:00
Warner Losh 3be59adbb5 vtnet: Adjust for ethernet alignment.
If the header that we add to the packet's size is 0 % 4 and we're
strictly aligning, then we need to adjust where we store the header so
the packet that follows will have it's struct ip header properly
aligned.  We do this on allocation (and when we check the length of the
mbufs in the lro_nomrg case). We can't just adjust the clustersz in the
softc, because it's also used to allocate the mbufs and it needs to be
the proper size for that. Since we otherwise use the size of the mbuf
(or sometimes the smaller size of the received packet) to compute how
much we can buffer, this ensures no overflows. The 2 byte adjustment
also does not affect how many packets we can receive in the lro_nomrg
case.

PR:			271288
Sponsored by:		Netflix
Reviewed by:		bryanv
Differential Revision:	https://reviews.freebsd.org/D43224
2024-01-28 22:08:55 -07:00
Warner Losh 34f0a01b6b mpi3mr: Fix confusion over | and &
Use sc->mpi3mr_debug & MPI3MR_IOT over the | version to test if a bit is
set.

CID: 1529718
Sponsored by:		Netflix
2024-01-28 21:43:56 -07:00
Florian Walpen fb87726303 snd_hdspe(4): Per device sysctl for period.
Let the user choose a period (interrupt cadence in samples), in the
official RME drivers this setting is available as "Buffer Size".
Override the period propagated through blocksize by pcm channel latency
settings (see sound(4)), since these are unreliable and differ between
playback and recording channels.

Differential Revision: https://reviews.freebsd.org/D43527
2024-01-28 20:20:26 +00:00
Gordon Bergling 551921a757 if_eqos: Fix a typo in a kernel error message
- s/errer/error/

MFC after:	5 days
2024-01-28 16:42:12 +01:00
Andriy Gapon 4d1161f094 subr_bus: report DEVICE_SUSPEND failures
This greatly aids with diagnosing system suspend failures when
they are due to a device driver or hardware.
2024-01-28 15:21:09 +02:00
Andriy Gapon c053a56c0f hdaa_pcmchannel_setup: do not advertise AC3 8+0
The rest of the sound system supports 7+1 maximum and is not aware of 8+0.

I believe that these messages are caused by 8+0:
kernel: feeder_init(0xfffff801f935d680) on feeder_matrix returned 22
kernel: pcm0: feeder_build_matrix(): can't add feeder_matrix
2024-01-28 15:18:55 +02:00
Andriy Gapon e92491d95f dtrace: make 'ring' and 'fill' policies imply 'noswitch' flag
This should disable allocation of the second per-CPU principal buffer
which is never used.  This will also enable additional asserts
for buffers that are never switched.
2024-01-28 15:15:17 +02:00
Andriy Gapon 9cdf326b4f run acpi_shutdown_final later to give other handlers a chance
For example, shutdown_panic wants to produce some output and maybe take
some input before a system is actually reset.

The change should only make difference for the case of system reset
(reboot), poweroff and halt should not be affected.

The change makes difference only if hw.acpi.handle_reboot is set.  It
used to default to zero until r213755 / ac731af567.
2024-01-28 15:04:55 +02:00
Andriy Gapon cbf7c81b60 ichsmb: fix block read operation
First of all and unlike I2C, it's not the master that dictates how many
bytes to read in block read operation.  It's the device that informs the
master how many bytes it's sending back.

Thus, for ichsmb_bread() the count parameter is purely an output
parameter.  The code has been changed to reflect that.
The sanity checking of the response length is now done once it (the
first byte of the response) is received.

While here, handling of ICH_HST_STA_FAILED status bit has been added.
Plus some code style improvements and some new code comments in the
vicinity of the changed code.
2024-01-28 14:54:19 +02:00
Andriy Gapon 8fdb261601 change ipmi watchdog to awlays stop when system is halted
That is, wd_shutdown_countdown value is ignored when halting.

A halted system should remain halted for as long as needed until
a power cycle, so the watchdog should not reset the system.
2024-01-28 14:45:16 +02:00
Andriy Gapon 90dc788982 fix signature of ipmi_shutdown_event
The function had a signature of watchdog_fn while in fact it is used as
shutdown_fn.
2024-01-28 14:44:13 +02:00
Andriy Gapon 43ca38eb59 mmc_fdt_parse: remove redundant bootverbose check 2024-01-28 13:39:03 +02:00
Andriy Gapon ea7f45d3d8 dwmmc: fix a typo 2024-01-28 13:37:57 +02:00
Andriy Gapon 89cb925fdd rk_i2s: remove unused definition 2024-01-28 13:37:25 +02:00
Andriy Gapon 406e959d07 rk_i2s: change interrupt type from MISC to AV (audio/video)
This gives a higher priority to the interrupt handling thread.
We need it because its work (filling the hardware FIFO) is time sensitive.
2024-01-28 13:36:19 +02:00
Andriy Gapon 408a9efd75 rk3328_codec: remove diagostic printfs
Most likeyly there were intended as reminders for unimplemented functions,
but they do not seem to be useful.

Also, a small style nit.
2024-01-28 13:33:16 +02:00
Andriy Gapon b587cb69f3 audio_soc: set "status" as being at simplebus
This is more true and less confusing than previous "at EXPERIMENT".
2024-01-28 13:32:38 +02:00
Andriy Gapon 320e4beb97 gpiopower: trigger low, high and both edges
Power off or reset may be activated either by low or high signal or by an
edge.  So, try everything.

Also, the driver now supports DTS properties for timings.

Finally, the driver does not change the pin configuration during attach.
It is assumed that the pin is already in a state that does not trigger
the power event (otherwise we wouldn't be running).
2024-01-28 13:29:41 +02:00
Andriy Gapon 197944948e add allwinner overlays for enabling additional USB ports
For instance, on NanoPi NEO two additional ports are available via a
GPIO header.
2024-01-28 13:12:39 +02:00
Andriy Gapon 5b54b6ac8c usbdevs: add Ralink RT7601 aka MT7601
This is a popular USB WiFi chip.
Unfortunately, it's not supported by FreeBSD yet.
2024-01-28 13:10:51 +02:00
Andriy Gapon 34694f3da7 ds1307: restore hints-based configuration on FDT systems
Fall-through to non-FDT probe code if no matching device node is found.
While here, fix indentation of the switch statement.
Also, make the device description for the hints-based attachment the
same as for FDT attachment.

Fixes:	2486b446db ds1307: add support for the EPSON RX-8035SA I2C RTC
2024-01-28 12:45:57 +02:00
Alexander Motin 3883c6fbf2 ntb_hw_plx: Workaround read-only scratchpad registers
On several systems we've noticed that when NTB link goes down, the
Physical Layer User Test Pattern registers we use as additional
scratchpad registers (that is explicitly allowed by the chip specs)
become read-only for about 100us.  I see no explanation for this in
the chip specs, neither why it was not seen before, may be a race.
Since we do need these registers, workaround it by repeating writes
until we succeed or 1ms timeout expire.

MFC after:	1 week
2024-01-27 17:29:01 -05:00
Mark Johnston bbf86c65d0 netinet: Remove stale references to Giant from comments
MFC after:	1 week
2024-01-27 13:51:13 -05:00
Olivier Certner 6b35310173
SCHEDULER_STOPPED(): Rely on a global variable
A commit from 2012 (5d7380f8e3, r228424) introduced
'td_stopsched', on the ground that a global variable would cause all
CPUs to have a copy of it in their cache, and consequently of all other
variables sharing the same cache line.

This is really a problem only if that cache line sees relatively
frequent modifications.  This was unlikely to be the case back then
because nearby variables are almost never modified as well.  In any
case, today we have a new tool at our disposal to ensure that this
variable goes into a read-mostly section containing frequently-accessed
variables ('__read_frequently').  Most of the cache lines covering this
section are likely to always be in every CPU cache.  This makes the
second reason stated in the commit message (ensuring the field is in the
same cache line as some lock-related fields, since these are accessed in
close proximity) moot, as well as the second order effect of requiring
an additional line to be present in the cache (the one containing the
new 'scheduler_stopped' boolean, see below).

From a pure logical point of view, whether the scheduler is stopped is
a global state and is certainly not a per-thread quality.

Consequently, remove 'td_stopsched', which immediately frees a byte in
'struct thread'.  Currently, the latter's size (and layout) stays
unchanged, but some of the later re-orderings will probably benefit from
this removal.  Available bytes at the original position for
'td_stopsched' have been made explicit with the addition of the
'_td_pad0' member.

Store the global state in the new 'scheduler_stopped' boolean, which is
annotated with '__read_frequently'.

Replace uses of SCHEDULER_STOPPED_TD() with SCHEDULER_STOPPER() and
remove the former as it is now unnecessary.

Reviewed by:            markj, kib
Approved by:            markj (mentor)
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D43572
2024-01-26 22:09:38 +01:00
Olivier Certner cd0c52e50b
SCHEDULER_STOPPED(): Move it (back) to 'systm.h'
It's not an assertion, so doesn't logically belong to 'kassert.h'.
Moreover, a subsequent commit will make it rely on a variable whose
declaration also belongs to 'systm.h'.

Approved by:            markj (mentor)
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D43571
2024-01-26 22:09:16 +01:00
Olivier Certner 12d6a032df
Annotate 'rebooting' with __read_mostly
While here, put such annotation after the variable for 'dumping', since
it concerns the variable and not the type.

Reviewed by:            markj
Approved by:            markj (mentor)
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D43570
2024-01-26 22:09:10 +01:00
Olivier Certner eaed922eda
panic()/KERNEL_PANICKED(): Move back to using 'panicstr' as a flag
Currently, no performance-critical path tests for a panic.  Moreover, we
today have KERNEL_PANICKED() which wraps the test into
__predict_false(), already catering to those (potential) use cases.
Also, in practice we don't support 64-bit architectures without caches,
so reading an 'int' instead of a pointer doesn't (directly) save any
memory access.  Finally, 'panicked' is redundant with 'panicstr' (and
wastes a tiny amount of memory).

Consequently:
1. Use again 'panicstr' as a flag indicating that the system is
panicking.  To this end:
  - Modify panic() so that it ensures this pointer is set to some
    non-NULL value even if the caller didn't pass any panic string.
  - Modify KERNEL_PANICKED() to test for 'panicstr'.
  - Remove 'panicked'.
2. Annotate 'panicstr' with '__read_mostly' (instead of using
'__read_frequently' as for 'panicked').  This may have to be changed if,
in the future, some performance-intensive path needs to test it.
3. Convert a few more direct tests of 'panicstr' to using
KERNEL_PANICKED().

Reviewed by:            kib, markj, emaste
Approved by:            markj (mentor)
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D43569
2024-01-26 22:07:56 +01:00
Jamie Gritton ab0841bdbe jail: expose children.max and children.cur via sysctl
Submitted by:	Igor Ostapenko <igor.ostapenko_pm.me>
Differential Revision:	<https://reviews.freebsd.org/D43565>
2024-01-26 09:45:40 -08:00
Mark Johnston 90372a9e3c arm64: Remove pmap_san_bootstrap() and call kasan_init_early() directly
pmap_san_bootstrap() doesn't really do much, and it was hard-coding the
the bootstrap stack size defined in locore.S.  Moreover, the name is a
bit confusing given the existence of pmap_bootstrap_san().  Just remove
it and call kasan_init_early() directly like we do on amd64.  It will
not be used by KMSAN in a forthcoming patch series.

No functional change intended.

MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D43403
2024-01-26 10:42:34 -05:00
Richard Scheffenegger 2d05a1c81b tcp: commonize check for more data to send, style changes
Use SEQ_SUB instead of a plain subtraction, for an implict
type conversion and prevention of a possible overflow.
Use curly brackets in stacked if statements throughout.
Use of the ? operator to enhance readability when clearing
the FIN flag in tcp_output().

None of the above change the function.

Reviewed By:           tuexen, cc, #transport
Sponsored by:          NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43539
2024-01-26 01:20:35 +01:00
Jessica Clarke 3896a6cc0a ldscript.powerpc*: Only put .dynamic in PT_DYNAMIC
Currently there are a few output sections left as implicitly using
:kernel :dynamic before :kernel on its own is used again, which means
they end up in both the PT_LOAD and the PT_DYNAMIC segments, an unusual
situation which the new libelf-based kldxref initially treated as
invalid. Thus, hoist the :kernel to the very next section to ensure only
.dynamic is in PT_DYNAMIC, as is more normal.

Whilst here, sync ldscript.powerpc64le with ldscript.powerpc64 to pick
up various fixes that were presumably made between the start of the
powerpc64le port and it being committed and got missed.

Reviewed by:	jhibbits, jhb
Differential Revision:	https://reviews.freebsd.org/D43066
2024-01-26 00:19:02 +00:00
Richard Scheffenegger fc262fd3dc tcp: AccECN access ACE field by shifting bits
Shifting bits is quicker than checking header flag bits
one by one. Also improve readability by the use of switch
statements.

No change in behaviour.

Reviewed By:           glebius, tuexen, #transport
Sponsored by:          NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43560
2024-01-26 00:16:22 +01:00
Richard Scheffenegger 0932fb565a tcp: fix TCPSTAT accounting for SACK
Account for SACK retransmitted bytes once the actual length
is known. This prevents a call to tcp_maxseg() and prepares
for TSO support when transmitting from the SACK scoreboard.

Reviewed By:           tuexen, #transport
Sponsored by:          NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43447
2024-01-25 22:58:33 +01:00
Mark Johnston 48d5dab7ba arm64: Add a VM_FREELIST_DMA32 freelist
When booting a KMSAN kernel on an Ampere Altra, I've seen some boot time
hangs when the XHCI controller driver attempts to allocate memory for
32-bit DMA.  The system boots fine with a GENERIC kernel; I believe that
the additional memory requirements of KMSAN push it over the edge.  The
system has a bit less than 2GB of RAM below the 4GB boundary.

Allocate a new freelist to segregate memory below 4GB, as we do on
amd64, so that such memory allocation failures are less likely to occur.

Reviewed by:	alc
MFC after:	1 month
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D43503
2024-01-25 16:33:46 -05:00
Kristof Provost e95025ed93 pflow: show socket status in verbose mode
Introduce a verbose output mode to pflowctl, and expose the status of
the socket to userspace. This can be helpful in debugging configuration
errors.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-01-25 17:37:51 +01:00
Jessica Clarke 6ec8bf9f3d riscv: Convert local interrupt controller to a newbus PIC
Currently the local interrupt controller implementation is based on
pre-INTRNG arm/arm64 code, using hand-rolled event code rather than
INTRNG. This then interacts weirdly with the PLIC, and other future
interrupt controllers like the APLIC and IMSICs in the upcoming AIA
specification, since they become the root PIC despite not being the
logical root. Instead, use a real newbus device for it and register
it as the root PIC.

This also adapts the IPI code to make use of the newly-added INTRNG
generic IPI handling framework, adding a new sbi_ipi as the PIC. In
future there will be alternative devices for sending IPIs that will
register with higher priorities, such as the proposed AIA IMSIC and
ACLINT SSWI.

Reviewed by:	mhorne
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D35901
2024-01-24 23:49:54 +00:00
Jessica Clarke c55272fdf8 riscv: Create a newbus device for the SBI driver
This approach is based on the Arm PSCI driver, though that makes more
extensive use of its softc than we do here. This will be used to extract
the SBI IPI code as a real PIC.

Reviewed by:	mhorne, imp
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D35900
2024-01-24 23:49:54 +00:00
Jessica Clarke 103d39efe0 intrng: Allow alternative IPI PICs to be registered and used
On RISC-V, the root PIC (whether the PLIC or, as will be the case in
future, the local interrupt controller) cannot send IPIs, relying on
another means to trigger the necessary software interrupts (firmware
calls), but there are upcoming standard devices that will be able to
inject them, so we can't just put the firmware calls in the root PIC
driver.

Thus, split out a new intr_ipi_dev from intr_irq_root_dev to use for
sending IPIs. New devices can be registered with a given priority up
until the first IPI is set up, when the best device seen so far gets
frozen as the IPI device to use.

Reviewed by:	mhorne
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D35899
2024-01-24 23:49:54 +00:00
Jessica Clarke fae8755f16 intrng: Extract arm/arm64 IPI->PIC glue code
The arm and arm64 implementations of dispatching IPIs via PIC_IPI_SEND
are almost identical, and entirely MI with the lone exception of a
single store barrier on arm64 (that is likely either redundant or needed
on arm too). Thus, de-duplicate this code by moving it to INTRNG as a
generic IPI glue framework. The ipi_* functions remain declared in MD
smp.h headers and implemented in MD code, but are trivial wrappers
around intr_ipi_send that could be made MI, at least for INTRNG ports,
at a later date.

Note that, whilst both arm and arm64 had an ii_send member in intr_ipi
to abstract over how to send interrupts,, they were always ultimately
using PIC_IPI_SEND, and so this complexity has been removed. A follow-up
commit will re-introduce the same flexibility by instead allowing a
device other than the root PIC to be registered as the IPI sender.

As part of this, strengthen a MAXCPU assertion that was missed in commit
2f0b059eea ("intrng: switch from MAXCPU to mp_ncpus") (which itself is
mis-titled).

Reviewed by:	mmel, mhorne
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D35898
2024-01-24 23:49:53 +00:00
Jessica Clarke e06afdb285 intrng: Remove irq_root_ipicount and corresponding intr_pic_claim_root arg
The static irq_root_ipicount variable is only ever written to (with the
value passed to irq_root_ipicount), never read. Moreover, the bcm2836
driver, as used by the Raspberry Pi 2B and 3A/B (but not 4, which uses a
GIC-400, though does have the legacy interrupt controller present too)
passes 0 as ipicount, despite implementing IPIs. It's thus inaccurate
and serves no purpose, so should be removed.

Reviewed by:	mmel, imp, mhorne
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D35897
2024-01-24 23:49:53 +00:00
Kyle Evans 5738d741fb kern: tty: fix recanonicalization
`ti->ti_begin` is actually the offset within the first block that is
unread, so we must use that for our lower bound.

Moving to the previous block has to be done at the end of the loop in
order to correctly handle the case of ti_begin == TTYINQ_DATASIZE.  At
that point, lastblock is still the last one with data written and the
next write into the queue would advance lastblock.  If we move to the
previous block at the beginning, then we're essentially off by one block
for the entire scan and run the risk of running off the end of the block
queue.

The ti_begin == 0 case is still handled correctly, as we skip the loop
entirely and the linestart gets recorded as the first byte available for
writing.  The bit after the loop about moving to the next block is also
still correct, even with both previous fixes in mind: we skipped moving
to the previous block if we hit ti_begin, and `off + 1` would in-fact be
a member of the next block from where we're reading if it falls on a
block boundary.

Reported by:	dim
Fixes:	522083ffbd ("kern: tty: recanonicalize the buffer on [...]")
2024-01-24 13:48:31 -06:00