Commit graph

1479 commits

Author SHA1 Message Date
Mark Johnston ca4c785900 nexus: Consistently return a pointer in failure paths
No functional change intended.

MFC after:	1 week
2023-05-26 15:38:08 -04:00
Mark Johnston 9fb6718d1b smp: Dynamically allocate the stoppcbs array
This avoids bloating the kernel image when MAXCPU is large.

A follow-up patch for kgdb and other kernel debuggers is needed since
the stoppcbs symbol is now a pointer.  Bump __FreeBSD_version so that
debuggers can use osreldate to figure out how to handle stoppcbs.

PR:		269572
MFC after:	never
Reviewed by:	mjg, emaste
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39806
2023-05-25 18:09:55 -04:00
Warner Losh b61a573019 spdx: The BSD-2-Clause-NetBSD identifier is obsolete, drop -NetBSD
The SPDX folks have obsoleted the BSD-2-Clause-NetBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:04 -06:00
Warner Losh 4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Igor Ostapenko a1f2b76761 sys/x86/x86/mptable.c: typo (compatiblity)
https://bugs.freebsd.org/269753

PR:                      269753
Reported by:             Igor Ostapenko
Approved by:             doc, src (delphij, imp, zlei)
Differential revision:   https://reviews.freebsd.org/D38741
2023-05-05 01:23:09 +01:00
Jason A. Harmening 6f378116e9 Intel DMAR: remove parsing of 6-level paging capability
Early versions of the VT-d spec mentioned 6-level paging support as a
possible value for the SAGAW capability, but later versions removed it
and SAGAW=0x10 is currently listed as a reserved value.

The 6-level (agaw=64) entry in sagaw_bits is furthermore problematic
with clang15 because the attempted comparison against 1ULL << 64 in
dmar_maxaddr2mgaw() causes the compiler to elide the last iteration
of the initial loop, which bypasses the subsequent logic to find the
greatest HW-supported address width.  This results in 5-level paging
always being selected regardless of whether the hardware supports it,
which can result address translation failure due to invalid context-
entry programming.

Reviewed by:	kib
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D39896
2023-05-02 09:06:11 -05:00
Mark Johnston 6eca9db1e7 busdma: Update KMSAN shadow maps later in bounce_bus_dmamap_sync()
Otherwise POSTREAD syncs may re-invalidate the shadow of the data buffer
when copying from bounce pages, resulting in false-positive KMSAN
reports.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2023-04-28 10:59:01 -04:00
Dimitry Andric bab8274c09 Use bool for one-bit wide bit-fields
A signed one-bit wide bit-field can take only the values 0 and -1. Clang
16 introduced a warning that "implicit truncation from 'int' to a
one-bit wide bit-field changes value from 1 to -1". Fix the warnings by
using C99 bool.

Reported by:	Clang 16
Reviewed by:	emaste, jhb
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D39705
2023-04-25 19:26:03 +02:00
Dmitry Chagin de4da6cd04 x86: Move i386 timerreg.h to x86
Reviewed by:		emaste, jhb
Differential Revision:	https://reviews.freebsd.org/D39656
MFC after:		1 month
2023-04-20 19:42:59 +03:00
Dmitry Chagin d1f4c44aa8 x86: Move i386 ppireg.h to x86
Differential Revision:	https://reviews.freebsd.org/D39655
MFC after:		1 month
2023-04-20 19:42:59 +03:00
Elliott Mitchell 6d765bff6f xen: move common variables off of sys/x86/xen/hvm.c
The xen_domain_type and HYPERVISOR_shared_info variables are shared by
all Xen architectures, so they should be in common rather than
reimplemented by each architecture.

hvm_start_flags is used by xen_initial_domain() and so needs to be in
common.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D28982
2023-04-14 15:59:11 +02:00
Julien Grall 5e2183dab8 xen/intr: move sys/x86/xen/xen_intr.c to sys/dev/xen/bus/
The event channel source code or equivalent is needed on all
architectures.  Since much of this is viable to share, get this moved out
of x86-land.  Each interrupt interface then needs a distinct back-end
implementation.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2014-01-13 17:41:04
Differential Revision: https://reviews.freebsd.org/D30236
2023-04-14 15:58:57 +02:00
Elliott Mitchell 6699c22c1c xen/intr: move interrupt allocation/release to architecture
Simply moving the interrupt allocation and release functions into files
which belong to the architecture.  Since x86 interrupt handling is quite
distinct from other architectures, this is a crucial necessary step.

Identifying the border between x86 and architecture-independent is
actually quite tricky.  Similarly, getting the prototypes for the
border right is also quite tricky.

Inspired by the work of Julien Grall <julien@xen.org>,
2015-10-20 09:14:56, but heavily adjusted.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30936
2023-04-14 15:58:56 +02:00
Julien Grall 2d795ab1ea xen/intr: move x86 PIC interface to xen_arch_intr.c, introduce wrappers
The x86 PIC interface is very much x86-specific and not used by other
architectures.  Since most of xen_intr.c can be shared with other
architectures, the PIC interface needs to be broken off.

Introduce wrappers for calls into the architecture-dependent interrupt
layer.  All architectures need roughly the same functionality, but the
interface is slightly different between architectures.  Due to the
wrappers being so thin, all of them are implemented as inline in
arch-intr.h.

The original implementation was done by Julien Grall in 2015, but this
has required major updating.

Removal of PVHv1 meant substantial portions disappeared.  The original
implementation took care of moving interrupt allocation to
xen_arch_intr.c, but this has required massive rework and was broken
off.

In the original implementation the wrappers were normal functions.  Some
had empty stubs in xen_intr.c and were removed.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30909
2023-04-14 15:58:56 +02:00
Elliott Mitchell 373301019f xen/intr: remove type argument from xen_intr_alloc_isrc()
This value doesn't need to be set in xen_intr_alloc_isrc().  What is
needed is simply to ensure the allocated xenisrc won't appear as free,
even if xi_type is written non-atomically.  Since the type is no longer
used to indicate free or not, the calling function should take care of
all non-architecture initialization.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D31188
2023-04-14 15:58:55 +02:00
Elliott Mitchell d0a69069bb xen/x86: rework isrc allocation to use list instead of table scanning
Scanning the list of interrupts to find an unused entry is rather
inefficient.  Instead overlay a free list structure and use a list
instead.

This also has the useful effect of removing the last use of evtchn_type
values outside of xen_intr.c.

Reviewed by: royger
[royger]
 - Make avail_list static.
2023-04-14 15:58:54 +02:00
Julien Grall ab7ce14b1d xen/intr: introduce dev/xen/bus/intr-internal.h
Move the xenisrc structure which needs to be shared between the core Xen
interrupt code and architecture-dependent code into a separate header.  A
similar situation exists for the NR_EVENT_CHANNELS constant.

Turn xi_intsrc into a type definition named xi_arch to reflect the new
purpose of being an architectural variable for the interrupt source.

This was originally implemented by Julien Grall, but has been heavily
modified.  The core side was renamed "intr-internal.h" and is #include'd
by "arch-intr.h" instead of the other way around.  This allows the
architecture to add function definitions which use struct xenisrc.

The original version only moved xi_intsrc into xen_arch_isrc_t.  Moving
xi_vector was done by the submitter.

The submitter had also moved xi_activehi and xi_edgetrigger into
xen_arch_isrc_t.  Those disappeared with the removal of PVHv1 support.

Copyright note.  The current xenisrc structure was introduced at
76acc41fb7 by Justin T. Gibbs.  Traces remain, but the strength of
Copyright claims from before 2013 seem pretty weak.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>, 2021-03-17 19:09:01
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30648
[royger]
 - Adjust some line lengths
 - Fix comment about NR_EVENT_CHANNELS after movement.
 - Use #include instead of symlinks.
2023-04-14 15:58:53 +02:00
Elliott Mitchell af610cabf1 xen/intr: adjust xen_intr_handle_upcall() to match driver filter
xen_intr_handle_upcall() has two interfaces.  It needs to be called by
the x86 assembly code invoked by the APIC.  Second, it needs to be called
as a driver_filter_t for the XenPCI code and for architectures besides
x86.

Unfortunately the driver_filter_t interface was implemented as a wrapper
around the x86-APIC interface.  Now create a simple wrapper for the
x86-APIC code, which calls an architecture-independent
xen_intr_handle_upcall().

When called via intr_event_handle(), driver_filter_t functions expect
preemption to be disabled.  This removes the need for
critical_enter()/critical_exit() when called this way.

The lapic_eoi() call is only needed on x86 in some cases when invoked
directly as an APIC vector handler.

Additionally driver_filter_t functions have no need to handle interrupt
counters.  The intrcnt_add() calling function was reworked to match the
current situation.  intrcnt_add() is now only called via one path.

The increment/decrement of curthread->td_intr_nesting_level had
previously been left out.  Appears this was mostly harmless, but this
was noticed during implementation and has been added.

CONFIG_X86 is a leftover from use with Linux.  While the barrier isn't
needed for FreeBSD on x86, it will be needed for FreeBSD on other
architectures.

Copyright note.  xen_intr_intrcnt_add() was introduced at 76acc41fb7
by Justin T. Gibbs.  xen_intrcnt_init() was introduced at fd036deac1
by John Baldwin.

sys/x86/xen/xen_arch_intr.c was originally created by Julien Grall in
2015 for the purpose of holding the x86 interrupt interface.  Later it
was found xen_intr_handle_upcall() was better earlier, and the x86
interrupt interface better later.  As such the filename and header list
belong to Julien Grall, but what those were created for is later.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30006
2023-04-14 15:58:52 +02:00
Elliott Mitchell 2794893ebf xen/intr: do full xenisrc initialization during binding
Keeping released xenisrcs in a known state simplifies allocation, but
forces the allocation function to maintain that state.  This turns into
a problem when trying to allow for interchangeable allocation functions.
Fix this issue by ensuring xenisrcs are always *fully* initialized
during binding.

Reviewed by: royger
2023-04-14 15:58:51 +02:00
Elliott Mitchell ff73b1d69b xen/intr: split xen_intr_isrc_lock uses
There are actually several distinct locking domains in xen_intr.c, but
all were sharing the same lock.  Both xen_intr_port_to_isrc[] and the
x86 interrupt structures needed protection.  Split these two apart as a
precursor to splitting the architecture portions off the file.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
2023-04-14 15:58:51 +02:00
Elliott Mitchell 834013dea2 xen/intr: rework xen_intr_alloc_isrc() locking
Locking for allocation was being done in xen_intr_bind_isrc(), but the
unlock was inside xen_intr_alloc_isrc().  While the lock acquisition at
the end of xen_intr_alloc_isrc() was to modify xen_intr_port_to_isrc[],
NOT allocation.  Fix this garbled (though working) locking scheme.

Now locking for allocation is strictly in xen_intr_alloc_isrc(), while
locking to modify xen_intr_port_to_isrc[] is in xen_intr_bind_isrc().

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
2023-04-14 15:58:50 +02:00
Elliott Mitchell 09bd542d17 xen/intr: rework xen_intr_alloc_isrc() call structure
The call structure around xen_intr_alloc_isrc() was rather awful.
Notably finding a structure for reuse is part of allocation, but this
was done outside xen_intr_alloc_isrc().  Move this into
xen_intr_alloc_isrc() so the function handles all allocation steps.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
2023-04-14 15:58:49 +02:00
Elliott Mitchell 149c581018 xen/intr: adjust xenisrc types, adjust format strings to match
As "CPUs", IRQs (vector) and virtual IRQs are always positive integers,
adjust the Xen code to use unsigned integers.  Several format strings
need adjustment to match.  Additionally single-bit bitfields are
boolean.

No functional change expected.

Reviewed by: royger
2023-04-14 15:58:49 +02:00
Julien Grall 28a78d860e xen: introduce XEN_CPUID_TO_VCPUID()/XEN_VCPUID()
Part of the series for allowing FreeBSD/ARM to run on Xen.  On ARM the
function is a trivial pass-through, other architectures need distinct
implementations.

While implementing XEN_VCPUID() as a call to XEN_CPUID_TO_VCPUID()
works, that involves multiple accesses to the PCPU region.  As such make
this a distinct macro.  Only callers in machine independent code have
been switched.

Add a wrapper for the x86 PIC interface to use matching the old
prototype.

Partially inspired by the work of Julien Grall <julien@xen.org>,
2015-08-01 09:45:06, but XEN_VCPUID() was redone by Elliott Mitchell on
2022-06-13 12:51:57.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2014-04-19 08:57:40
Original implementation: Julien Grall <julien@xen.org>, 2014-04-19 14:32:01
Differential Revision: https://reviews.freebsd.org/D29404
2023-04-14 15:58:46 +02:00
Elliott Mitchell 054073c283 xen/intr: xen_intr_bind_isrc() always set handle
Previously the upper layer handle was being set before the last
potential error condition.  The reasoning appears to have been it was
assumed invalid in case of an error being returned.  Now ensure it is
invalid until just before a successful return.

Fixes: 76acc41fb7 ("Implement vector callback for PVHVM and unify event channel implementations")
Fixes: 6d54cab1fe ("xen: allow to register event channels without handlers")
Reviewed by: royger
2023-04-14 15:58:45 +02:00
Elliott Mitchell b94341afcb xen/intr: rework xen_intr_resume() for in-place remapping
The prior implementation of xen_intr_resume() was wiping
xen_intr_port_to_isrc[] and then rebuilding from the x86 interrupt
table.  Rework to instead wipe the channel numbers (->xi_port) and then
scan the table for sources with invalid channels.

This will be slower due to scanning the whole table, but this removes
the dependency on the x86 interrupt code.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30599
[royger]
Split line over 80 characters.
2023-03-29 09:51:45 +02:00
Elliott Mitchell e45c8ea31c xen/intr: merge parts of resume functionality into new function
The portions of xen_rebind_ipi() and xen_rebind_virq() were already
near-identical.  While xen_rebind_ipi() should panic() on
single-processor, still having the functionality to invoke seems
harmless.

Meanwhile much of the loop from xen_intr_resume() seemed to want to be
closer to this same code.  This pushes related bits closer together.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30598
2023-03-29 09:51:44 +02:00
Julien Grall 910bd069f8 xen/intr: remove x86 APIC headers from xen_intr.c
Remove these no longer needed headers.  Key for making xen_intr.c
machine-independent as they don't exist on other architectures.

Originally this was part of a much larger commit, but was broken off
for submission to the FreeBSD project.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
MFC after: 1 week
2023-03-29 09:51:43 +02:00
Elliott Mitchell 40ad9aaa88 xen/intr: stop passing shared_info_t to xen_intr_active_ports()
There is only a single global HYPERVISOR_shared_info pointer, so
directly use the global pointer.

Reviewed by: royger
MFC after: 1 week
2023-03-29 09:51:43 +02:00
Elliott Mitchell 9f3be3a6ec xen: switch to using core atomics for synchronization
Now that the atomic macros are always genuinely atomic on x86, they can
be used for synchronization with Xen.  A single core VM isn't too
unusual, but actual single core hardware is uncommon.

Replace an open-coding of evtchn_clear_port() with the inline.

Substantially inspired by work done by Julien Grall <julien@xen.org>,
2014-01-13 17:40:58.

Reviewed by: royger
MFC after: 1 week
2023-03-29 09:51:42 +02:00
Elliott Mitchell 49ca3167b7 xen/intr: add check for intr_register_source() errors
While unusual, intr_register_source() can return failure.  A likely
cause might be another device grabbing from Xen's interrupt range.
This should NOT happen, but could happen due to a bug.  As such check
for this and fail if it occurs.

This theoretical situation also effects xen_intr_find_unused_isrc().
There, .is_pic must be tested to ensure such an intrusion doesn't cause
misbehavior.

Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31995
2023-03-29 09:51:41 +02:00
Elliott Mitchell 1797ff9627 xen/intr: cleanup event channel number use
Consistently use ~0 instead of 0 when clearing xenisrc structures.
0 is a valid event channel number, even though it is reserved by Xen.
Whereas ~0 is guaranteed invalid.

Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30743
2023-03-29 09:51:41 +02:00
Elliott Mitchell 2b2415bafa xen/intr: fix corruption of event channel table
In xen_intr_release_isrc(), the isrc should only be removed if it is
assigned to a valid port.  This had been mitigated by using 0 for not
having a port, but this is actually corrupting the table.  Fix this bug
as modifying the code would cause this bug to manifest as kernel memory
corruption.  Similar issue for the vCPU bitmap masks.

The KASSERT() doesn't need lock protection.

Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30743
2023-03-29 09:51:40 +02:00
Elliott Mitchell 0ebf9bb42d xen/intr: fix overflow of Xen interrupt range
The comparison was wrong.  Hopefully this never occurred in the wild,
but now ensure the error message will occur before damage is caused.
This appears non-exploitable as exploitation would require a guest to
force Domain 0 to allocate all event channels, which a guest shouldn't
be able to do.

Adjust the error message to better describe what has occurred.

Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30743
2023-03-29 09:51:39 +02:00
Elliott Mitchell 2d5e325303 xen/intr: always set xi_close in xen_intr_bind_isrc()
Appears errors are uncommon since calling xen_intr_release_isrc() on a
xenisrc with xi_close in an undefined state could be bad.  Fix this
problematic lurking nasty.

Reviewed by: royger
MFC after: 1 week
2023-03-29 09:51:39 +02:00
Corvin Köhne e8988d60d2
pci: expose intel_graphics_stolen as sysctl
The Intel graphics stolen memory is used by the Intel GOP driver on
boot. When using bhyve with GPU passthrough, it's also used by the guest
GOP driver at guest boot. For that reason, bhyve needs to know the
address and size of this region to inform the guest about this region.
Exposing the variables as sysctl allows bhyve to easily read them.
2023-03-27 11:40:49 +02:00
Brooks Davis eb232cffc9 amd64: reduce header pollution in _stdint.h
In 38d1ac34ff SIGATOMIC_{MIN,MAX} were
defined in terms of LONG_{MIN,MAX}.  Later, they were switched to
__LONG_{MIN,MAX} in 78fe75bc28 where an
include of machine/_limits.h was added.  Switch to using fixed width
INT64_{MIN,MAX} and remove the header pollution.

No functional change.

Reviewed by:	theraven, emaste
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D39196
2023-03-22 16:23:57 +00:00
Jean-Sébastien Pédron d780c6a6ab
x86/pci_early_quirks: Support Intel 11th+ gen
Newer Intel CPUs/iGPUs use a new method to determine the base address of
the stolen memory. This code was ported from Linux.

Reviewed by:	manu
Approved by:	manu
Differential Revision:	https://reviews.freebsd.org/D39057
2023-03-20 21:47:36 +01:00
Mitchell Horne 99bd5c1fe3 x86: nexus code tidy-up
Make a pass at the various nexus implementations, fixing some very minor
style issues, obsolete comments, etc.

The method declaration section has become unwieldy in many respects.
Attempt to tame it by:
 - Using generated method typedefs
 - Grouping methods roughly by category, and then alphabetically.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D38495
2023-03-20 17:35:47 -03:00
Roger Pau Monné a78e46a7db xen: take struct size into account for video information
The xenpf_dom0_console_t structure can grow as more data is added, and
hence we need to check that the fields we accesses have been filled by
Xen.  The only extra field FreeBSD currently uses is the top 32 bits
for the frame buffer physical address.

Note that this field is present in all the versions that make the
information available from the platform hypercall interface, so the
check here is mostly cosmetic, and to remember us that newly added
fields require checking the size of the returned data.

Fixes: 6f80738b22 ('xen: fetch dom0 video console information from Xen')
Sponsored by: Citrix Systems R&D
2023-03-14 09:59:08 +01:00
Roger Pau Monné 6f80738b22 xen: fetch dom0 video console information from Xen
It's possible for Xen to switch the video mode set by the boot loader,
so that the information passed in the kernel metadata is no longer
valid.  Fetch the video mode used by Xen using an hypercall and update
the medatada for the kernel to use the correct video mode.

Sponsored by: Citrix Systems R&D
2023-03-09 17:13:17 +01:00
Roger Pau Monné 5489d7e93a xen: bump used interface version
This is required for a further change that will make use of a field
that was added in version 0x00040d00.

No functional change expected.

Sponsored by: Citrix Systems R&D
2023-03-09 17:13:17 +01:00
John-Mark Gurney 2fee875629
abstract out the vm detection via smbios..
This makes the detection of VMs common between platforms that
have SMBios.

Reviewed by:		imp, kib
Differential Revision:	https://reviews.freebsd.org/D38800
2023-03-02 16:54:21 -08:00
Mina Galić 499171a98c apic: prevent divide by zero in CPU frequency init
If a CPU for some reason returns 0 as CPU frequency, we currently panic
on the resulting divide by zero when trying to initialize the CPU(s) via
APIC. When this happens, we'll fallback to measuring the frequency
instead.

PR: 269767
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/664
2023-02-25 09:47:40 -07:00
Mateusz Guzik f4a9e9fc79 x86: whack kernel gcov vestige
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-02-23 16:42:38 +00:00
Dmitry Chagin 496a9a4f60 linux(4): Cleanup includes under x86/linux
Cleanup unneeded includes, sort the rest according to style(9).
No functional changes.

MFC after:		2 weeks
2023-02-14 17:46:33 +03:00
Dmitry Chagin 10d16789a3 linux(4): Get rid of the opt_compat.h include.
Since e013e369 COMPAT_LINUX, COMPAT_LINUX32 build options are removed,
so include of opt_compat.h is no more needed.

MFC after:		2 weeks
2023-02-12 20:24:32 +03:00
Val Packett 4a1c4de232 Allow sysctl hw.machine/hw.machine_arch in capability mode
There's no harm in reading strings like 'amd64'.

Reviewed by: emaste, manu
Sponsored by: https://www.patreon.com/valpackett
Differential Revision: https://reviews.freebsd.org/D28703
2023-02-06 14:00:52 -05:00
Mark Johnston 2bed14192c pvclock: Export a vDSO page even without rdtscp available
When the cycle counter is "stable", i.e., synchronized across vCPUs by
the hypervisor, userspace can use a serialized rdtsc instead of relying
on rdtscp, just like the kernel timecounter does.  This can be useful
for performance in guests where the hypervisor hides rdtscp for some
reason.

To avoid breaking compatibility with older userspace which expects
rdtscp to be usable when pvclock exports timekeeping info, hide this
feature behind a sysctl.

Reviewed by:	kib
Tested by:	Shrikanth R Kamath <kshrikanth@juniper.net>
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D38342
2023-02-03 11:48:25 -05:00
Dmitry Chagin a95cb95e12 linux(4): Preserve fpu fxsave state across signal delivery on amd64.
PR:			240768
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D38302
MFC after:		1 week
2023-02-02 20:21:37 +03:00
Corvin Köhne 55f1ca209d
atrtc: expose power loss as sysctl
Exposing the a power loss of the rtc as an sysctl makes it easier to
detect an empty cmos battery.

Reviewed by:		manu
MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D38325
2023-02-02 08:25:08 +01:00
Konstantin Belousov 2555f175b3 Move kstack_contains() and GET_STACK_USAGE() to MD machine/stack.h
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D38320
2023-02-02 00:59:26 +02:00
Dmitry Chagin 5c32146723 amd64: Eliminate write only cpu_fxsr.
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D38289
MFC after:		1 week
2023-02-01 18:17:06 +03:00
Dmitry Chagin 290afc5d55 mp_x86: Trim trailing whitespaces.
MFC after:		1 week
2023-01-29 16:18:39 +03:00
Dmitry Chagin 6fdf04a2be smp: Drop confusing braces and return statement as panic() is never returns.
Reviewed by:		imp, kib
Differential Revision:	https://reviews.freebsd.org/D38235
MFC after:		1 week
2023-01-29 15:33:16 +03:00
Konstantin Belousov 11989314dc x86: add more definitions for XCR0 bits
This covers all currently defined bits, adding PKRU and TILE.

Reviewed by:	jhb, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D38219
2023-01-27 19:44:49 +02:00
Corvin Köhne 122405c903
x86: ignore stepping for APL30 errata
The issue is present in all apollolake cpus and it doesn't look like
there'll be a fix in the future.

See
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-celeron-n-series-j-series-datasheet-spec-update.pdf

MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D37621
2023-01-12 10:08:17 +01:00
Konstantin Belousov 45ac7755a7 amd64: identify small cores
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37770
2023-01-01 00:09:45 +02:00
Li-Wen Hsu 611c6b233d
Complete retire cp(4)
And fix the LINT build.

Sponsored by:	The FreeBSD Foundation

Fixes:	895992bb66
2022-12-14 11:38:55 +08:00
Mateusz Guzik c3f1a13902 Retire broken GPROF support from the kernel
The option is not even recognized and with that patched it does not
compile. Even if it did work, it would be prohibitively expensive to
use.

Interested parties can use pmcstat or dtrace instead.
2022-11-15 14:17:10 +00:00
Warner Losh 1d21f64149 bhyve: Implement MSR_MISC_FEATURES_ENABLES
Linux reads MISC_FEATURES_ENABLES to manage the CPUID faulting feature
(undocumented in the Intel SDM, but documented in 323850-004 (Intel
Virtualization Technology FlexMigration Application Note). Since bhyve
doesn't emulate this feature, we always return 0. Neither does bhyve
support the MONITOR/MWAIT fault bit also in this MSR (which is
documented in the sdm), so always return 0.

Sponsored by:		Netflix
Reviewed by:		jhb
Differential Revision:	https://reviews.freebsd.org/D36602
2022-10-27 11:34:41 -06:00
Ed Maste c8113dad7e Increase MAX_APIC_ID safeguard to 0x800
MAX_APIC_ID must be at least twice MAXCPU.  Increase it to 0x800 so that
it is possible to set MAXCPU to 512 or 1024 in a custom kernel config
file.

Note that increasing this limit does not itself cause any allocations
to be larger; it just allows madt_parse_cpu() to process higher APIC
IDs.

APIC IDs may be sparse and so we can waste memory.  This is independent
of this change, but becomes more of an issue as the maximum APIC ID
grows.  This should be addressed with future work.

Reviewed by:	royger
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D37067
2022-10-27 12:33:34 -04:00
Konstantin Belousov 829145388b x86/include/elf.h: make inclusion blocks for elf32.h and elf64.h similar
They were copy-pasted when x86/include/elf.h file was merged from its
i386 and amd64 counterparts.  Having the text around inclusions
significantly different is somewhat confusing.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37085
2022-10-25 19:00:44 +03:00
Konstantin Belousov 5f00525dfc i386: move hard-coded load address for PIE below default linker base
both for i386 native and compat32 amd64.  We know the ld-elf.so.1 size
in advance, it fits there.  Trying to push it up after the end of a
binary cannot work reliably and eventually fail for large binaries.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37085
2022-10-25 19:00:44 +03:00
Colin Percival b7761f1f08 x86/busdma: Limit reserved pages if low nsegs
When bus_dmamap_create is called, if bouncing might be required we
reserve enough pages for a maximum-length request, subject to the
MAX_BPAGES constraint (32 MB on amd64; 32 MB or 2 MB on i386
depending on the amount of RAM).

Since pages used for bouncing are typically non-consecutive, each
bounced page will typically constitute a busdma segment; as such, we
are unlikely to ever successfully use more pages than the nsegments
limit.  Limit the number of pages reserved to nsegments.

On FreeBSD/Firecracker, this reduces bounce page memory consumption
from 32 MB to 512 kB, making VMs with 128 MB of RAM usable.

Reviewed by:	imp, mav
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D37082
2022-10-21 22:47:33 -07:00
Colin Percival 13f34e211b PVH: Set bootmethod to PVH
Now that we can PVH boot on a non-Xen hypervisor, we shouldn't set
machdep.bootmethod to "XEN".  Instead, set it to "PVH"; there are
other ways to discern the hypervisor.

Reviewed by:	royger
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D36191
2022-10-17 23:02:22 -07:00
Colin Percival c4a4011c74 PVH: support whitespace cmdline splitting
For historical reasons, Xen kernel command lines have options
separated by commas.  Every other FreeBSD platform uses whitespace;
this is also necessary in PVH in order to support the Firecracker
VMM.  Allow options to be separated by any combination of commas
and whitespace.

Reviewed by:	imp
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D36190
2022-10-17 23:02:22 -07:00
Colin Percival a8ea154064 x86: Distinguish Xen from non-Xen PVH boots
The PVH boot protocol, introduced by Xen, is now used by some non-Xen
platforms (e.g. the Firecracker VM) as well.  In order to accommodate
these, we use CPUID to detect Xen and only perform Xen-specific setup
when running on that platform.

The "isxen" function duplicates some work done by identcpu.c later in
the boot process; but we need it here since this is the very first C
code which runs when PVH booting (even before hammer_time).

In many places the existing code had
        xc_printf(...);
        HYPERVISOR_shutdown(SHUTDOWN_crash);
making use of Xen functionality to print a message and shut down; in
the places where this idiom can be reached in the non-xen case, we
replace it idiom with a CRASH(...) macro which calls those in the Xen
case and halts in the non-Xen case.

Reviewed by:	royger
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D35801
2022-10-17 23:02:22 -07:00
Colin Percival 023a025b5c x86: Add support for PVH version 1 memmap
Version 0 of PVH booting uses a Xen hypercall to retrieve the system
memory map; in version 1 the memory map can be provided via the
start_info structure.

Using the memory map from the version 1 start_info structure allows
FreeBSD to use PVH booting on systems other than Xen, e.g. on the
Firecracker VM.

Reviewed by:	royger
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D35800
2022-10-17 23:02:22 -07:00
Colin Percival d1ca8cc638 x86: Add MPTABLE_LINUX_BUG_COMPAT option
Linux has two bugs in its handling of the x86 MP table:
1. It assumes that there is always 640 kB of base memory, and looks for
the MP table in the top kB of this even if the memory map indicates
that memory location does not exist.
2. It ignores that entry_count field and instead iterates through the
MP table by scanning until it runs out of bytes in the table.

The Firecracker VM (and probably other related VMs) relies on both of
these bugs.  With the MPTABLE_LINUX_BUG_COMPAT option, we search for
the MP table at address 639k even if that isn't in the memory map; and
replace a zeroed entry_count with a value computed from scanning the
table until we run out of table bytes.

Reviewed by:	imp
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D35799
2022-10-17 23:02:22 -07:00
Colin Percival 2297a1633d Add NO_LEGACY_PCIB kernel option to i386, amd64
On systems without a PCI bus, legacy_pcib_identify by default creates
one anyway:
    legacy_pcib_identify: no bridge found, adding pcib0 anyway

This commit adds a kernel option NO_LEGACY_PCIB which disables this,
allowing systems to be fully PCI-free.

Reviewed by:	imp
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D35798
2022-10-17 23:02:22 -07:00
John Baldwin a9fca3b987 Fix various places which cast a pointer to a vm_paddr_t or vice versa.
GCC warns about the mismatched sizes on i386 where vm_paddr_t is 64
bits.

Reviewed by:	imp, markj
Differential Revision:	https://reviews.freebsd.org/D36750
2022-10-03 16:10:41 -07:00
John Baldwin f49fd63a6a kmem_malloc/free: Use void * instead of vm_offset_t for kernel pointers.
Reviewed by:	kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36549
2022-09-22 15:09:19 -07:00
John Baldwin 7ae99f80b6 pmap_unmapdev/bios: Accept a pointer instead of a vm_offset_t.
This matches the return type of pmap_mapdev/bios.

Reviewed by:	kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36548
2022-09-22 15:08:52 -07:00
Konstantin Belousov fd25c62278 i386: check that trap() and syscall() run on the thread kstack
and not on the trampoline stack.  This is a useful way to ensure that
we did not enabled interrupts while on user %cr3 or trampoline stack.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2022-09-14 18:46:32 +03:00
Gordon Bergling 9755e244c9 x86: Correct a typo in source code comment
- s/occured/occurred/

MFC after:	3 days
2022-09-04 13:36:53 +02:00
Colin Percival 02ab915ae0 lapic_init: Reduce LOOPS
While I'm here, instrument lapic_init with TSLOG so it shows up (or
typically not, after this change) on flamecharts.

Reviewed by:	kib
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D36186
2022-08-13 15:28:09 -07:00
Mateusz Guzik 648edd6378 x86: remove MP_WATCHDOG
It does not work with ULE, which is the default scheduler for over a
decade.

Reviewed by:	emaste, kib
Differential Revision:	https://reviews.freebsd.org/D36094
2022-08-11 21:35:32 +00:00
Emmanuel Vadot 821b850a3b x86: Remove redundant parentheses
Reported by:	avg
Sponsored by:	Beckhoff Automation GmbH & Co. KG
MFC after:	1 week
MFC-With:	b223c1f1a0 ("x86: Add another cpuid for Apollo Lake errata APL30")
2022-08-09 09:46:50 +02:00
Corvin Köhne b223c1f1a0 x86: Add another cpuid for Apollo Lake errata APL30
Sponsored by:	Beckhoff Automation GmbH & Co. KG
MFC after:	1 week
2022-08-09 09:07:59 +02:00
Alan Cox 7f46deccbe x86/iommu: Reduce the number of queued invalidation interrupts
Restructure dmar_qi_task() so as to reduce the number of invalidation
completion interrupts.  Specifically, because processing completed
invalidations in dmar_qi_task() can take quite some time, don't reenable
completion interrupts until processing has completed a first time. Then,
check a second time after reenabling completion interrupts, so that
any invalidations that complete just before interrupts are reenabled
do not linger until a future invalidation might raise an interrupt.
(Recent changes have made checking for completed invalidations cheap; no
locking is required.)

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36054
2022-08-06 13:05:58 -05:00
Alexander Motin ac64943ca8 mca: Add sysctl to mute corrected errors.
Setting hw.mca.log_corrected to 0 will mute corrected errors logging
except ones marked as reaching Yellow threshold by hardware.

MFC after:	1 week
2022-08-05 13:48:05 -04:00
Alan Cox 4670f90846 iommu_gas: Eliminate redundant parameters and push down lock acquisition
Since IOMMU map entries store a reference to the domain in which they
reside, there is no need to pass the domain to iommu_gas_free_entry(),
iommu_gas_free_space(), and iommu_gas_free_region().

Push down the acquisition and release of the IOMMU domain lock into
iommu_gas_free_space() and iommu_gas_free_region().

Both of these changes allow for simplifications in the callers of the
functions without really complicating the functions themselves.
Moreover, the latter change eliminates the direct use of the IOMMU
domain lock from the x86-specific DMAR code.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35995
2022-07-30 14:28:48 -05:00
Alan Cox 42736dc44d x86/iommu: Reduce DMAR lock contention
Replace the DMAR unit's tlb_flush TAILQ by a custom list implementation
that enables dmar_qi_task() to dequeue entries without holding the DMAR
lock.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35951
2022-07-29 00:11:33 -05:00
Alan Cox c251563470 x86/iommu: Correct a recent change to iommu_domain_unload_entry()
Correct 8bc3673847.  When iommu_domain_unload_entry() performs a
synchronous IOTLB invalidation, it must call dmar_domain_free_entry()
to remove the entry from the domain's RB_TREE.

Push down the acquisition and release of the DMAR lock into the
recently introduced function dmar_qi_invalidate_sync_locked() and
remove the _locked suffix.

MFC with:	8bc3673847
2022-07-26 01:07:21 -05:00
Alan Cox 8bc3673847 iommu_gas: Eliminate a possible case of use-after-free
Eliminate a possible case of use-after-free in an error handling path
after a mapping failure.  Specifically, eliminate IOMMU_MAP_ENTRY_QI_NF
and instead perform the IOTLB invalidation synchronously.  Otherwise,
when iommu_domain_unload_entry() is called and told not to free the
IOMMU map entry, the caller could free the entry before dmar_qi_task()
is finished with it.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35878
2022-07-25 11:14:58 -05:00
Dimitry Andric eadef926b0 Adjust linux_vdso_{cpu,tsc}_selector_idx() definitions to avoid clang 15 warnings
With clang 15, the following -Werror warnings are produced:

    sys/x86/linux/linux_vdso_selector_x86.c:44:28: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    linux_vdso_tsc_selector_idx()
                               ^
                                void
    sys/x86/linux/linux_vdso_selector_x86.c:62:28: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    linux_vdso_cpu_selector_idx()
                               ^
                                void

This is because linux_vdso_tsc_selector_idx() and
linux_vdso_cpu_selector_idx are declared with (void) argument lists, but
defined with empty argument lists. Make the definitions match the
declarations.

MFC after:	3 days
2022-07-25 00:40:13 +02:00
Alan Cox 4eaaacc755 x86/iommu: Shrink the critical section in dmar_qi_task()
It is safe to test and clear the Invalidation Wait Descriptor
Complete flag before acquiring the DMAR lock in dmar_qi_task(),
rather than waiting until the lock is held.

Reviewed by:	kib
MFC after:	2 weeks
2022-07-18 22:23:13 -05:00
Colin Percival 05350f0936 x86: Remove 1 second DELAY from cpu_reset
On SMP systems, cpu_reset broadcasts a message telling the APs to stop
themselves, and then the BSP waits 1 second before actually resetting
itself; this behaviour dates back to 1998-05-17.

I assume that this delay was added in order to allow the APs to stop
themselves before the BSP resets; but we wait until the APs have all
acknowledged entering the "stopped" state, so it no longer seems to
serve any purpose.

Reviewed by:	jhb, kib
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D35797
2022-07-18 17:23:25 -07:00
Mitchell Horne c84c5e00ac ddb: annotate some commands with DB_CMD_MEMSAFE
This is not completely exhaustive, but covers a large majority of
commands in the tree.

Reviewed by:	markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35583
2022-07-18 22:06:09 +00:00
Alan Cox da55f86c61 x86/iommu: Eliminate redundant wrappers
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D35832
2022-07-16 18:05:37 -05:00
Alan Cox db0110a536 iommu: Shrink the iommu map entry structure
Eliminate the unroll_entry field from struct iommu_map_entry, shrinking
the struct by 16 bytes on 64-bit architectures.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D35769
2022-07-15 22:24:52 -05:00
Mark Johnston 03f868b163 x86: Add a required store-load barrier in cpu_idle()
ULE's tdq_notify() tries to avoid delivering IPIs to the idle thread.
In particular, it tries to detect whether the idle thread is running.
There are two mechanisms for this:
- tdq_cpu_idle, an MI flag which is set prior to calling cpu_idle().  If
  tdq_cpu_idle == 0, then no IPI is needed;
- idle_state, an x86-specific state flag which is updated after
  cpu_idleclock() is called.

The implementation of the second mechanism is racy; the race can cause a
CPU to go to sleep with pending work.  Specifically, cpu_idle_*() set
idle_state = STATE_SLEEPING, then check for pending work by loading the
tdq_load field of the CPU's runqueue.  These operations can be reordered
so that the idle thread observes tdq_load == 0, and tdq_notify()
observes idle_state == STATE_RUNNING.

Some counters indicate that the idle_state check in tdq_notify()
frequently elides an IPI.  So, fix the problem by inserting a fence
after the store to idle_state, immediately before idling the CPU.

PR:		264867
Reviewed by:	mav, kib, jhb
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35777
2022-07-14 10:28:01 -04:00
Mark Johnston ece453d5fa eventtimer: Simplify KTR traces
Stop including the current CPU in all event messages, since it's already
saved in KTR log entries and thus is redundant.  All eventtimer traces
occur in a context where CPU migration is not possible.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2022-07-11 15:58:43 -04:00
Mitchell Horne 258958b3c7 ddb: use _FLAGS command macros where appropriate
Some command definitions were forced to use DB_FUNC in order to specify
their required flags, CS_OWN or CS_MORE. Use the new macros to simplify
these.

Reviewed by:	markj, jhb
MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35582
2022-07-05 11:56:55 -03:00
Dmitry Chagin 03473e8ec8 linux(4): Use saved cpu feature bits
MFC after:		3 days
2022-07-04 23:42:07 +03:00
Warner Losh 26031009cf amd64/efi: Stop falling back to hints for RSDP
All boot loaders for the last 6 years set acpi.rsdp in addition to the
hints. This was planned for removal ~5 years ago. Belatedly remove it
from here.

Sponsored by:		Netflix
Reviewed by:		jhb
Differential Revision:	https://reviews.freebsd.org/D35633
2022-07-02 08:02:12 -06:00
Roger Pau Monné 77cb05db0c x86/xen: stop assuming kernel memory loading order in PVH
Do not assume that start_info will always be loaded at the highest
memory address, and instead check the position of all the loaded
elements in order to find the last loaded one, and thus a likely safe
place to use as early boot allocation memory space.

Reported by: markj, cperciva
Sponsored by: Citrix Systems R&D
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35628
2022-06-30 08:53:16 +02:00
Dmitry Chagin 050f5a8405 amd64: Reload CPU ext features after resume or cr4 changes
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D35555
MFC after:		2 weeks
2022-06-29 10:34:43 +03:00
Roger Pau Monné 091febc04a xen/blkback: do not use x86 CPUID in generic code
Move checker for whether Xen creates IOMMU mappings for foreign pages
into a helper that's defined in arch-specific code.

Reported by: Elliott Mitchell <ehem+freebsd@m5p.com>
Fixes: 1d528f95e8 ('xen/blkback: remove bounce buffering mode')
Sponsored by: Citrix Systems R&D
2022-06-28 09:51:57 +02:00
John Baldwin 15a6642da6 x86 mptable: Include <x86/legacvar.h> for legacy_get_pcibus().
Fixes:		b076d8d54c mptable_hostb: Use legacy_get_pcibus() to fetch PCI bus number.
MFC after:	1 week
2022-06-23 15:00:12 -07:00
Mitchell Horne 8701571df9 set_cputicker: use a bool
The third argument to this function indicates whether the supplied
ticker is fixed or variable, i.e. requiring calibration. Give this
argument a type and name that better conveys this purpose.

Reviewed by:	kib, markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35459
2022-06-23 15:15:11 -03:00
John Baldwin b076d8d54c mptable_hostb: Use legacy_get_pcibus() to fetch PCI bus number.
The mptable_hostb driver is a child of legacy0 and has legacy bus
ivars, not PCI or PCI bridge ivars.

PR:		264819
Reported by:	Dennis Clarke <dclarke@blastwave.org>
Diagnosed by:	avg
Reviewed by:	avg
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35548
2022-06-23 10:49:09 -07:00
Mark Johnston f6b799a86b Fix the test used to wait for AP startup on x86, arm64, riscv
On arm64, testing pc_curpcb != NULL is not correct since pc_curpcb is
set in pmap_switch() while the bootstrap stack is still in use.  As a
result, smp_after_idle_runnable() can free the boot stack prematurely.

Take a different approach: use smp_rendezvous() to wait for all APs to
acknowledge an interrupt.  Since APs must not enable interrupts until
they've entered the scheduler, i.e., switched off the boot stack, this
provides the right guarantee without depending as much on the
implementation of cpu_throw().  And, this approach applies to all
platforms, so convert x86 and riscv as well.

Reported by:	mmel
Tested by:	mmel
Reviewed by:	kib
Fixes:		8db2e8fd16 ("Remove the secondary_stacks array in arm64 and riscv kernels.")
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35435
2022-06-15 11:38:04 -04:00
Dmitry Chagin 4a6c2d075d linux(4): Properly restore the thread signal mask after signal delivery on i386
Replace sigframe sf_extramask by native sigset_t and use it to
store/restore the thread signal mask without conversion to/from
Linux signal mask.

Pointy hat to:		dchagin
MFC after:		2 weeks
2022-05-30 20:03:49 +03:00
Corvin Köhne 7468332f55 x86/mp: don't create empty cpu groups
When some APICs are disabled by tunables, some cpu groups could end up
empty. An empty cpu group causes the system to panic because not all
functions handle them correctly. Additionally, it's wasted time to
handle and inspect empty cpu groups. Therefore, just don't create them.

Reviewed by:	kib, avg, cem
Sponsored by:	Beckhoff Automation GmbH & Co. KG
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24927
2022-05-30 11:21:46 +02:00
Dmitry Chagin 9016ec056a linux(4): Deduplicate bsd_to_linux_trapcode()
As bsd_to_linux_trapcode() is common for x86 Linuxulators,
move it under x86/linux.

MFC after:		2 weeks
2022-05-23 13:16:58 +03:00
Dmitry Chagin 2434137f69 linux(4): Deduplicate translate_traps()
As translate_traps() is common for x86 Linuxulators,
move it under x86/linux.

MFC after:		2 weeks
2022-05-23 13:16:26 +03:00
Dmitry Chagin 6e826d27c3 linux(4): Better naming for ucontext field of struct rt_sigframe
To reduce sendsig code difference and to avoid confusing me,
rename sf_sc to sf_uc to match the content.

MFC after:		2 weeks
2022-05-15 21:06:47 +03:00
Dmitry Chagin 21f2461741 linux(4): Move sigframe definitions to separate headers
The signal trampoine-related definitions are used only in the MD part
of code, wherefore moved from everywhere used linux.h to separate MD
headers.

MFC after:		2 weeks
2022-05-15 21:03:01 +03:00
Dmitry Chagin 5a6a4fb284 linux(4): Implement vdso getcpu for x86.
This is modeled after f2395455 (by kib@).

MFC after:		2 weeks
2022-05-08 17:20:52 +03:00
Dmitry Chagin 332eca05b5 linux(4): Refactor vdso_gettc_x86 includes.
Factor out includes from common vdso_gettc_x86 file to the corresponding
MD files.

MFC after:		2 weeks
2022-05-08 17:20:51 +03:00
John Baldwin 80d2b3de16 x86: Remove unused devclass arguments to DRIVER_MODULE. 2022-05-06 15:46:58 -07:00
John Baldwin b3407dcc58 cpufreq: Remove unused devclass arguments to DRIVER_MODULE. 2022-05-06 15:39:29 -07:00
John Baldwin 09fd3b43ad Remove isa_devclass from ISA bus drivers. 2022-05-06 15:39:28 -07:00
Dmitry Chagin fe2c9f83a6 Remove dead code.
is_physical_memory() dead since 235a54de.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D35056
MFC after:		2 weeks
2022-04-26 19:40:59 +03:00
John Baldwin d4ab3a8d4f busdma_bounce: Add free_bounce_pages helper function.
Deduplicate code to iterate over the bpages list in a bus_dmamap_t
freeing bounce pages during bus_dmamap_unload.

Reviewed by:	imp
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D34967
2022-04-21 10:42:14 -07:00
John Baldwin 489e8f24a5 smbios/vpd: Use devclass_find to lookup devclass in module event handler.
While here, use a modern function declaration for smbios_modevent and
vpd_modevent.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D34996
2022-04-21 10:29:14 -07:00
Kornel Duleba 06f659c39d dmar: Disable PMR in driver attach routine
Previously it was disabled right before translation was enabled.
This way the disable logic is still executed even when translation
is not be activated, e.g. with hw.iommu.dma=0 tunable set.
On some platforms we need to disable PMR in order for core dump to work.
At the same time it was observed that enabling translation has
a significant impact on network performance.
With this patch PMR can be disabled, with IOMMU translation not being
turned on by appending the following to the loader.conf:

hw.dmar.enable=1
hw.dmar.pmr.disable=1
hw.dmar.dma=0

Sponsored by: Stormshield
Obtained from: Semihalf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D34907
2022-04-20 09:40:28 +02:00
John Baldwin 3d6f4411e4 Remove checks for <sys/cdefs.h> being included.
These files no longer depend on the macros required when these checks
were added.

PR:		263102 (exp-run)
Reviewed by:	brooks, imp, emaste
Differential Revision:	https://reviews.freebsd.org/D34804
2022-04-12 10:06:18 -07:00
John Baldwin 56f5947a71 Remove checks for __GNUCLIKE_ASM assuming it is always true.
All supported compilers (modern versions of GCC and clang) support
this.

Many places didn't have an #else so would just silently do the wrong
thing.  Ancient versions of icc (the original motivation for this) are
no longer a compiler FreeBSD supports.

PR:		263102 (exp-run)
Reviewed by:	brooks, imp
Differential Revision:	https://reviews.freebsd.org/D34797
2022-04-12 10:05:45 -07:00
Mark Johnston aa597d4049 i386: Fix the nodevice apic build
PR:		263124
Fixes:		62d09b46ad ("x86: Defer LAPIC calibration until after timecounters are available")
Reviewed by:	kib, jhb, emaste
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34830
2022-04-08 11:47:52 -04:00
John Baldwin 354ef278e9 powernow(4): Fix unused variable warnings by using the variables. 2022-04-06 16:45:28 -07:00
John Baldwin 89abc0fbbd x86 bounce_bus_dma_tag_destroy: Silence set but unused warning. 2022-04-06 16:45:27 -07:00
Gordon Bergling bba12ee453 xen(4): Fix a few typos in source code comments
- s/querried/queried/

MFC after:	3 days
2022-03-28 19:37:20 +02:00
John Baldwin 931983ee08 x86: Add a NT_X86_SEGBASES register set.
This register set contains the values of the fsbase and gsbase
registers.  Note that these registers can already be controlled
individually via ptrace(2) via MD operations, so the main reason for
adding this is to include these register values in core dumps.  In
particular, this will enable looking up the value of TLS variables
from core dumps in gdb.

The value of NT_X86_SEGBASES was chosen to match the value of
NT_386_TLS on Linux.  The notes serve similar purposes, but FreeBSD
will never dump a note equivalent to NT_386_TLS (which dumps a single
segment descriptor rather than a pair of addresses) and picking a
currently-unused value in the NT_X86_* range could result in a future
conflict.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D34650
2022-03-24 11:36:19 -07:00
Roger Pau Monné 1ca34862dc x86/tsc: fetch frequency from CPUID when running on Xen
Introduce a helper to fetch the TSC frequency from CPUID when running
under Xen.

Since the TSC can also be initialized early when running as a Xen
guest pull out the call to tsc_init() from the
early_clock_source_init() handlers and place it in clock_init(), as
otherwise all handlers would call tsc_init() anyway.

Reviewed by: markj
Sponsored by: Citrix Systems R&D
Differential revision: https://reviews.freebsd.org/D34581
2022-03-18 10:21:04 +01:00
Roger Pau Monné 396a8479b0 x86/xen: fix CPUID signature
MFC: 3 days
Reviewed by: cem
Sponsored by: Citrix Systems R&D
Differential revision: https://reviews.freebsd.org/D34580
2022-03-17 12:56:36 +01:00
Zhenlei Huang ba46c6c4b7 x86: Correctly report unexpected cache level
Reviewed by:	rpokala, emaste
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D34577
2022-03-16 16:30:38 -04:00
Mark Johnston 075e2779ac x86: Defer early TSC timecounter calibration to SI_SUB_CPU
If we can't determine the TSC frequency using CPU registers, we need to
give a chance for Hyper-V drivers to register a timecounter (during
SI_SUB_HYPERVISOR) since an emulated 8254 might not be available.
Thus, split probe_tsc_freq() into early and late stages, and wait until
the latter to attempt calibration using a reference clock.

Fixes:		84369dd523 ("x86: Probe the TSC frequency earlier")
Reported and tested by:	khng, Shawn Webb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34444
2022-03-04 19:34:43 -05:00
Mark Johnston 84369dd523 x86: Probe the TSC frequency earlier
This lets us use the TSC to implement early DELAY, limiting the use of
the sometimes-unreliable 8254 PIT.

PR:		262155
Reviewed by:	emaste
Tested by:	emaste, mike tancsa <mike@sentex.net>, Stefan Hegnauer <stefan.hegnauer@gmx.ch>
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34367
2022-03-01 09:39:35 -05:00
Elliott Mitchell ad7dd51499 xen: switch to use headers in contrib
These headers originate with the Xen project and shouldn't be mixed with
the main portion of the FreeBSD kernel. Notably they shouldn't be the
target of clean-up commits.

Switch to use the headers in sys/contrib/xen.

Reviewed by: royger
2022-02-07 10:11:56 +01:00
Elliott Mitchell b6da4ec609 xen: remove leftover bits missed in commit ac3ede5371
These fields are now unused, remove them.

Fixes: ac3ede5371 ('x86/xen: remove PVHv1 code')
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D31206
2022-02-07 10:06:27 +01:00
Takanori Watanabe eb815a7419 atrtc: Install address space handler for \_SB and its descendant.
SystemCMOS address space is accessible for system wide.
 So install address handler in \_SB space.

Reviewed by: jhb

Differential Revision: https://reviews.freebsd.org/D33892
2022-01-21 15:32:30 +09:00
Roger Pau Monné e0516c7553 x86/apic: remove apic_ops
All supported Xen instances by FreeBSD provide a local APIC
implementation, so there's no need to replace the native local APIC
implementation anymore.

Leave just the ipi_vectored hook in order to be able to override it
with an implementation based on event channels if the underlying local
APIC is not virtualized by hardware. Note the hook cannot use ifuncs,
because at the point where ifuncs are resolved the kernel doesn't yet
know whether it will benefit from using the optimization.

Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D33917
2022-01-18 10:19:04 +01:00
Roger Pau Monné 2450da6776 x86/xen: use x{2}APIC if virtualized by hardware
Instead of using event channels or hypercalls to deal with IPIs and
NMIs.

Using a hardware virtualized APIC should be faster than using any PV
interface, since the VM exit can be avoided.

Xen exposes whether the domain is using hardware assisted x{2}APIC
emulation in a CPUID bit.

Sponsored by: Citrix Systems R&D
2022-01-18 10:18:22 +01:00
Roger Pau Monné ad15eeeaba x86/xen: fallback when VCPUOP_send_nmi is not available
It has been reported that on some AWS instances VCPUOP_send_nmi
returns -38 (ENOSYS). The hypercall is only available for HVM guests
in Xen 4.7 and newer. Add a fallback to use the native NMI sending
procedure when VCPUOP_send_nmi is not available, so that the NMI is
not lost.

Reported and Tested by: avg
MFC after: 1 week
Fixes: b2802351c1 ('xen: fix dispatching of NMIs')
Sponsored by: Citrix Systems R&D
2022-01-17 11:06:40 +01:00
Colin Percival de1292c6ff Use CPUID leaf 0x40000010 for local APIC freq
Some VM systems announce the frequency of the local APIC via the
CPUID leaf 0x40000010.  Using this allows us to boot slightly
faster by avoiding the need for timer calibration.

Reviewed by:	markj
Sponsored by:	https://www.patreon.com/cperciva
2022-01-14 17:30:17 -08:00
Colin Percival 4a432614f6 TSC: Use 0x40000010 CPUID leaf for all VM types
While this CPUID leaf was originally only used by VMWare, other
hypervisors now also use it to announce the TSC frequency to guests.

This speeds up the boot process by 100 ms in EC2 and other systems,
by allowing the early calibration DELAY to be skipped.

Reviewed by:	markj
Sponsored by:	https://www.patreon.com/cperciva
2022-01-14 17:30:17 -08:00
Colin Percival fd980feb57 Detect CPU type before asking VMWare for TSC freq
This allows us to set tsc_is_invariant and select appropriately
fenced versions of RDTSC based on the CPU type.

Reviewed by:	markj
Sponsored by:	https://www.patreon.com/cperciva
2022-01-14 17:30:17 -08:00
Austin Zhang e1ef6c0ef2 atrtc: reads Century field from FADT table
The ACPI spec describes the FADT->Century field as:

    The RTC CMOS RAM index to the century of data value (hundred and
    thousand year decimals).  If this field contains a zero, then the
    RTC centenary feature is not supported.  If this field has a non-zero
    value, then this field contains an index into RTC RAM space that
    OSPM can use to program the centenary field.

Use this field to decide whether to program the CENTURY register
of the CMOS RTC device.

Reviewed by:	akumar3@isilon.com, dab, vangyzen
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D33667

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2022-01-13 11:24:00 -06:00
Roger Pau Monné 7d06c761c8 x86/madt: allow Xen guest to use x2APIC mode
The old bogus Xen versions that would deliver a GPF when writing to
the LAPIC MSR are likely retired, so it's safe to enable x2APIC
unconditionally now if available.

Tested by: avg
Reviewed by: kib
Sponsored by: Citrix Systems R&D
Differential revision: https://reviews.freebsd.org/D33877
2022-01-13 17:15:24 +01:00
Roger Pau Monné ca46f3289d xen: use an hypercall for shutdown and reboot
When running as a Xen guest it's easier to use an hypercall in order
to do power management operations (power off, power cycle). Do this
for all supported guest types (HVM and PVH). Note that for HVM the
power operation could also be done using ACPI, but there's no reason
to differentiate between PVH and HVM.

While there fix the shutdown handler to properly differentiate between
power cycle and power off requests.

Reported by: Freddy DISSAUX
MFC: 1 week
Sponsored by: Citrix Systems R&D
2022-01-13 16:54:30 +01:00
Colin Percival c2705ceaeb x86: Speed up clock calibration
Prior to this commit, the TSC and local APIC frequencies were calibrated
at boot time by measuring the clocks before and after a one-second sleep.
This was simple and effective, but had the disadvantage of *requiring a
one-second sleep*.

Rather than making two clock measurements (before and after sleeping) we
now perform many measurements; and rather than simply subtracting the
starting count from the ending count, we calculate a best-fit regression
between the target clock and the reference clock (for which the current
best available timecounter is used). While we do this, we keep track
of an estimate of the uncertainty in the regression slope (aka. the ratio
of clock speeds), and stop measuring when we believe the uncertainty is
less than 1 PPM.

In order to avoid the risk of aliasing resulting from the data-gathering
loop synchronizing with (a multiple of) the frequency of the reference
clock, we add some additional spinning depending upon the iteration number.

For numerical stability and simplicity of implementation, we make use of
floating-point arithmetic for the statistical calculations.

On the author's Dell laptop, this reduces the time spent in calibration
from 2000 ms to 29 ms; on an EC2 c5.xlarge instance, it is reduced from
2000 ms to 2.5 ms.

Reviewed by:	bde (previous version), kib
MFC after:	1 month
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D33802
2022-01-12 12:34:07 -08:00
John Baldwin 7def1e10b3 bus_dma: Deduplicate locking helper functions.
- Move busdma_lock_mutex to subr_bus_dma.c.

- Move _busdma_lock_dflt to subr_bus_dma.c.  This function was named a
  couple of different things previously.  It is not a public API but
  an internal helper used in place of a NULL pointer.  The prototype
  is in <sys/bus_dma.h> as not all backends include
  <sys/bus_dma_internal.h>.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D33694
2022-01-05 13:50:40 -08:00
John Baldwin 85b4607324 Deduplicate bus_dma bounce code.
Move mostly duplicated code in various MD bus_dma backends to support
bounce pages into sys/kern/subr_busdma_bounce.c.  This file is
currently #include'd into the backends rather than compiled standalone
since it requires access to internal members of opaque bus_dma
structures such as bus_dmamap_t and bus_dma_tag_t.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D33684
2022-01-05 13:50:40 -08:00
Mark Johnston 0e494a9e3f x86: Skip late calibration if our reference timer has low quality
Some AMD Geode-based systems end up using the 8254 PIT to calibrate the
TSC during late calibration, which doesn't work because that
timecounter's mask (65535) is much smaller than its frequency (1193182).
Moreover, early calibration is done against the 8254 timer anyway.

Work around the problem by simply using early calibration results if no
high-quality timecounters exist.

PR:		260868
Fixes:		22875f8879 ("x86: Implement deferred TSC calibration")
Reported and tested by:	mike@sentex.net, Stefan Hegnauer <stefan.hegnauer@gmx.ch>
Reviewed by:	imp, kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33730
2022-01-03 13:00:50 -05:00
Colin Percival 698727d637 Fix variable name: freq_khz -> freq
An earlier version of this code computed the TSC frequency in kHz.
When the code was changed to compute the frequency more accurately,
the variable name was not updated.

Reviewed by:	markj
Fixes:		22875f8879 x86: Implement deferred TSC calibration
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D33696
2022-01-02 13:07:53 -08:00
Colin Percival 9cb3288287 Skip TSC calibration if exact value known
It's possible that the "early" TSC calibration gave us a value which
is known to be exact; in that case, skip the later re-calibration.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D33695
2022-01-02 13:07:53 -08:00
Doug Moore f1e7a532d1 busdma: _bus_dmamap_addseg repaired
A recent change introduced a one-off error into a test allowing
coalescing chunks into segments.  This fixes that error.

broke a check in _bus_dmamap_addseg on many architectures. This change makes it clear that it is not a particular range that is being boundary-checked, but the proposed union of the two adjacent ranges.
Reported by:	se
Reviewed by:	se
Fixes:	c606ab59e7 vm_extern: use standard address checkers everywhere
Differential Revision:	https://reviews.freebsd.org/D33715
2022-01-02 12:37:05 -06:00