Summary:
This review ports mlx5 driver, kernel's OFED stack (userland is already enabled), KTLS and krping to powerpc64 and powerpc64le.
krping requires a small change since it uses assembly for amd64 / i386.
NOTE: On powerpc64le RDMA works fine in the userspace with libmlx5, but on powerpc64 it does not. The problem is that contrib/ofed/libmlx5/doorbell.h checks for SIZEOF_LONG but this macro exists on neither powerpc64* nor amd64. Thus, the file silently goes to the fallback function written for 32-bit architectures. It works fine on little-endian architectures, but causes a hard fail on big-endian. It's possible it may also cause some runtime issues on little-endian.
Thus, on powerpc64 I verified that RDMA works with krping.
Reviewers: #powerpc, hselasky
Subscribers: bdrewery, imp, emaste, jhibbits
Differential Revision: https://reviews.freebsd.org/D38786
Most options in kernel config files use "options<space><tab>OPTION".
This allows the option to be commented out without shifting columns.
A few options had two tabs, and some had spaces. Make them consistent.
Very old versions of gcc defined _BIG_ENDIAN and _LITTLE_ENDIAN. So to
work around that, we undefined them here. However, that causes problems
for programs that do:
(and many other variations on that theme). Since this often is the
result of weirdly nested includes in the ports world that are hard to
unwind, drop this workaround to help more ports build out of the box.
If there's still an issue here (and my testing hasn't shown it), we'll
fix the issue in a brand-new way once I have a reproducer.
This fixes the mesa-devel build, and others
Sponsored by: Netflix
Tested by: pkubaj
MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D38564
Rather than using the DEVICE_IDENTIFY method, let's have other
ofwbus-using platforms add ofwbus0 explicitly in nexus, like arm64. This
gives them the same flexibility, e.g. if riscv starts supporting ACPI,
and cleans up the #ifdefs.
We were doing this already on riscv, but adjust the 'order' parameters.
Reviewed by: andrew, jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D38492
The architecture nexus should handle allocation and release of memory and
interrupts. This is to ensure that system-wide resources such as these
are available to all devices, not just children of ofwbus0.
On powerpc this moves the ownership of these resources up one level,
from ofwbus0 to nexus0. Other architectures already have the required
logic in their nexus implementation, so this eliminates the duplication
of resources. An implementation of nexus_adjust_resource() is added for
arm, arm64, and riscv.
As noted by ian@ in the review, resource handling was the main bit of
logic distinguishing ofwbus from simplebus. With some attention to
detail, it should be possible to merge the two in the future.
Co-authored by: mhorne
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D30554
This is a followup of 692e19cf51 (add netlink to GENERIC@amd64).
Netlink is a communication protocol defined in RFC 3549. It is async,
TLV-based protocol, providing 1-1 and 1-many communications between kernel
and userland. Netlink is currently used in Linux kernel to modify, read and
subscribe for nearly all networking states. Interface state, addresses, routes,
firewall, rules, fibs, etc, are controlled via Netlink.
Netlink support was added in D36002. It has got a number of improvements and
first customers since then:
* net/bird2 got netlink support, enabling route multipath in FreeBSD
* netlink-based devd notifications are being worked on ( D37574 ).
* linux(4) fully supports and depends on Netlink
Enabling Netlink in GENERIC targets two goals.
The first one is to provide stability for the third-party userland applications,
so they can rely on the fact that netlink always exists since 14.0 and potentially 13.2.
Loadable module makes life of the app delepers harder. For example, `net/bird2` can be
either build with netlink or rtsock support, but not both.
The second goal is to enable gradual conversion of the base userland tools
to use netlink(4) interfaces. Converting tools like netstat (D36529), route,
ifconfig one-by-one simplifies testing and addressing the feedback.
Othewise, switching all base to use netlink at once may be too big of a leap.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D37783
Remove the platform-specific definitions of VM_BATCHQUEUE_SIZE
for amd64 and powerpc64, and instead treat all 64-bit platforms
identically. This has the effect of increasing the arm64
and riscv VM_BATCHQUEUE_SIZE to match that of other platforms.
Reviewed by: jhb, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D37707
Rather than waiting until the batchqueue is full to acquire the lock &
process the queue, we now start trying to acquire the lock using trylocks
when the batchqueue is 1/2 full. This removes almost all contention on the
vm pagequeue mutex for for our busy sendfile() based web workload.
It also greadly reduces the amount of time a network driver ithread
remains blocked on a mutex, and eliminates some packet drops under
heavy load.
So that the system does not loose the benefit of processing large
batchqueues, I've doubled the size of the batchqueues. This way, when
there is no contention, we process the same batch size as before.
This has been run for several months on a busy Netflix server, as well
as on my personal desktop.
Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D37305
Flags should be M_WAITOK | M_ZERO instead of just M_ZERO
Reviewed by: markj
MFC after: 2 days
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D36865
This matches the return type of pmap_mapdev/bios.
Reviewed by: kib, markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36548
This changes the default TCP Congestion Control (CC) to CUBIC.
For small, transactional exchanges (e.g. web objects <15kB), this
will not have a material effect. However, for long duration data
transfers, CUBIC allocates a slightly higher fraction of the
available bandwidth, when competing against NewReno CC.
Reviewed By: tuexen, mav, #transport, guest-ccui, emaste
Relnotes: Yes
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D36537
This applies one of the changes from
5567d6b441 to other architectures
besides arm64.
Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36263
With clang 15, the following -Werror warnings are produced:
sys/powerpc/booke/mp_cpudep.c:54:20: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
cpudep_ap_bootstrap()
^
void
sys/powerpc/booke/mp_cpudep.c:97:16: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
cpudep_ap_setup()
^
void
This is because cpudep_ap_bootstrap() and cpudep_ap_setup() are declared
with (void) argument lists, but defined with empty argument lists. Make
the definitions match the declarations.
MFC after: 3 days
With clang 15, the following -Werror warning is produced:
sys/powerpc/aim/moea64_native.c:306:22: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
moea64_install_native()
^
void
This is because moea64_install_native() is declared with a (void)
argument list, but defined with an empty argument list. Make the
definition match the declaration.
MFC after: 3 days
With clang 15, the following -Werror warning is produced:
sys/powerpc/aim/mmu_radix.c:5409:22: error: variable 'freed' set but not used [-Werror,-Wunused-but-set-variable]
int allfree, field, freed, idx;
^
The 'freed' variable is only used when PV_STATS is defined. Ensure it is
only declared and set in that case.
MFC after: 3 days
With clang 15, the following -Werror warnings are produced:
sys/powerpc/aim/mmu_radix.c:786:20: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
mmu_radix_tlbie_all()
^
void
sys/powerpc/aim/mmu_radix.c:3615:15: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
mmu_radix_init()
^
void
sys/powerpc/aim/mmu_radix.c:6081:20: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
mmu_radix_scan_init()
^
void
This is because mmu_radix_tlbie_all(), mmu_radix_init(), and
mmu_radix_scan_init() are declared with (void) argument lists, but
defined with empty argument lists. Make the definitions match the
declarations.
MFC after: 3 days
With clang 15, the following -Werror warnings are produced:
sys/powerpc/aim/mmu_oea64.c:1947:12: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
moea64_init()
^
void
sys/powerpc/aim/mmu_oea64.c:3257:17: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
moea64_scan_init()
^
void
This is because moea64_init() and moea64_scan_init() are declared with
(void) argument lists, but defined with empty argument lists. Make the
definitions match the declarations.
MFC after: 3 days
With clang 15, the following -Werror warning is produced:
sys/powerpc/aim/aim_machdep.c:750:14: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
mpc745x_sleep()
^
void
This is because mpc745x_sleep() is declared with a (void) argument list,
but defined with an empty argument list. Make the definition match the
declaration.
MFC after: 3 days
Since the likelihood of new Book-E PowerPC SoCs being produced is near
zero clamp MAXCPU to around the highest number of cores/threads
available currently, for both 64-bit and 32-bit. Even though the
current highest core/thread count is 24, the cap is set at 32 in case
there is code that assumes power of 2.
The CAM 'maxio' is a 'pessimized' size, assuming 4k pages and one page
per segment. Since there are at most 63 segments in a transaction with
this driver, and one would necessarily be the indirect segment marker,
clamp the maxio to the minimum of maxphys (tunable) or (63 - 1) pages
(248k).
MFC after: 3 days
Make powerpspe kernel config in sync with other targets making
GEOM_LABEL built-in to allow use of labels when mounting partitions.
MFC after: 2 days
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Make most AST handlers dynamically registered. This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it. For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.
Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit. For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.
The dynamic registration also allows third-party modules to register AST
handlers if needed. There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it. In fact, this
is already present behavior for hwpmc.ko and ufs.ko. I do not think it
is worth the efforts and the runtime overhead to try to fix it.
Reviewed by: markj
Tested by: emaste (arm64), pho
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888
Sometimes we need to reset a PCIe bus, but sometimes it breaks the
downstream device(s). Since, from my testing, this is only needed for
Radeon cards installed in the AmigaOne machines because the card was
already initialized by firmware, make the reset dependent on a device
hint (hint.pcib.X.reset=1). With this, AmigaOne X5000 machines can have
other devices in the secondary PCIe slots.
The devmap variants used vm_offset_t for some reason, and a few places
explicitly cast bus addresses to vm_offset_t. (Probably those casts
along with similar casts for vm_size_t should just be removed and
instead permit the compiler to DTRT.)
Reviewed by: markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D35961
Store the shared page address in struct vmspace.
Also instead of storing absolute addresses of various shared page
segments save their offsets with respect to the shared page address.
This will be more useful when the shared page address is randomized.
Approved by: mw(mentor)
Sponsored by: Stormshield
Obtained from: Semihalf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35393
Use a getter macro instead of fetching the sigcode address directly
from a sysent of a given process. It assumes that the sigcode is stored
in the shared page, which is true in all cases, except for a.out
binaries. This will be later useful when the shared page address
randomization is introduced.
No functional change intended.
Approved by: mw(mentor)
Sponsored by: Stormshield
Obtained from: Semihalf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35392
Stop including the current CPU in all event messages, since it's already
saved in KTR log entries and thus is redundant. All eventtimer traces
occur in a context where CPU migration is not possible.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Partially revert the previous change; we need to keep this method as a
specific override for pci_driver subclasses which should not use
pci_rescan_method() -- cardbus and ofw_pcibus. However, change the return
value to ENODEV for the same reasoning given in the original commit, and
use this as the default rescan method in bus_if.m.
Reported by: jhb
Fixes: 36a8572ee8 ("bus_if: provide a default null rescan method")
MFC with: 36a8572ee8
The third argument to this function indicates whether the supplied
ticker is fixed or variable, i.e. requiring calibration. Give this
argument a type and name that better conveys this purpose.
Reviewed by: kib, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35459
There is an existing helper method in subr_bus.c, but almost no drivers
know to use it. It also returns the same error as an empty method,
making it not very useful. Move this to bus_if.m and return a more
sensible error code.
This gives a slightly more meaningful error message when attempting
'devctl rescan' on buses and devices alike:
"Device not configured" --> "Operation not supported by device"
Reviewed by: imp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35501
It is unused, especially now that the underlying d_dumper methods do not
accept the argument.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35174
Keep the global variable for its uses in ata-pci.c and
chipsets/ata-fsl.c but initialize it in the existing
ata_module_event_handler. Move the module event handler a bit earlier
to ensure the variable is set before any devices are attached.
Deduplicate code to iterate over the bpages list in a bus_dmamap_t
freeing bounce pages during bus_dmamap_unload.
Reviewed by: imp
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D34967
The probe routine was setting a value in the softc, but since the
probe routine was not returning zero, this value was lost since the
softc was reallocated (and re-zeroed) when the device was attached.
This is similar in nature to the fixes from
965205eb66.
To fix, move the code to set the 'shasta' flag to the start of attach
along with related code to set an IRQ resource on some non-shasta
devices. The IRQ resource still "worked" being in the probe routine
as the IRQ resource persisted after probe returned, but it is cleaner
to go ahead and move it to attach after setting the 'shasta' flag.
I have no way to test this, but noticed this while reading the code.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D34888
In theory the errors during llan_attach should be handled, but other
errors in llan_attach (e.g. bus_setup_intr) are already ignored, so
just remove the unused variable to preserve the status quo.
These files no longer depend on the macros required when these checks
were added.
PR: 263102 (exp-run)
Reviewed by: brooks, imp, emaste
Differential Revision: https://reviews.freebsd.org/D34804
This fixes PCI devices not being found on QEMU ppce500. This
generic board used to have its first PCI slot at 0x11, like the
mpc8544dsi and some real HW. After commit [1], it was changed to
0x1 and our driver wasn't prepared for that.
[1] 3bb7e02a97
Reviewed by: jhibbits, bdragon
MFC after: 2 days
Sponsored by: Institudo de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D34621
Adding it in order to make easier using powerpcspe images under qemu
Reviewed by: jhibbits
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D34554
Add a static assert for the siginfo_t, mcontext_t and ucontext_t
sizes. These are de-facto ABI options and cannot change size ever. For
powerpc64, also add asserts for {u,m}mcontext32_t and siginfo32.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D34213
clang doesn't implement it, and Linux doesn't enforce it. As a
result, new instances keep cropping up both in FreeBSD's code and in
upstream sources from vendors.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D34144
After b5d227b0 FreeBSD was panicking on boot with "Duplicate free" in
UMA. Analyzing the asm, the '1' mask was treated as an integer, rather
than a long, causing 'slw' (shift left word) to be used for the shifting
instruction, not 'sld' (shift left double). This means the upper bits
of the bitfield were not getting used, resulting in corruption of the
bitfield.
While fixing this, the 'and' check of the mask does not need to be
recorded, so don't record (drop the '.').
This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.
The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.
Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19831
The size of the ps_strings structure varies between ABIs, so this is
useful for computing the address of the ps_strings structure relative to
the top of the stack when stack address randomization is enabled.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
powerpc64le requires at minimum POWER8 hardware, so ISA 2.06 atomic
instructions are always available.
This isn't so for powerpc64 (BE), so isn't enabled by default there.
Add machine-optimized implementations for the following:
* atomic_testandset_int
* atomic_testandclear_int
* atomic_testandset_long
* atomic_testandclear_long
This fixes the build with ISA_206_ATOMICS enabled.
Add the associated atomic_testandset_32, atomic_testandclear_32, so
that ice(4) can potentially build.
- Move busdma_lock_mutex to subr_bus_dma.c.
- Move _busdma_lock_dflt to subr_bus_dma.c. This function was named a
couple of different things previously. It is not a public API but
an internal helper used in place of a NULL pointer. The prototype
is in <sys/bus_dma.h> as not all backends include
<sys/bus_dma_internal.h>.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33694
Move mostly duplicated code in various MD bus_dma backends to support
bounce pages into sys/kern/subr_busdma_bounce.c. This file is
currently #include'd into the backends rather than compiled standalone
since it requires access to internal members of opaque bus_dma
structures such as bus_dmamap_t and bus_dma_tag_t.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33684
The original logic was to check if there's no filter and the address is
misaligned relative to the requirements. The refactoring in
c606ab59e7 missed this, and instead caused
it to return failure if the address *is* properly aligned.
A recent change introduced a one-off error into a test allowing
coalescing chunks into segments. This fixes that error.
broke a check in _bus_dmamap_addseg on many architectures. This change makes it clear that it is not a particular range that is being boundary-checked, but the proposed union of the two adjacent ranges.
Reported by: se
Reviewed by: se
Fixes: c606ab59e7 vm_extern: use standard address checkers everywhere
Differential Revision: https://reviews.freebsd.org/D33715
Define simple functions for alignment and boundary checks and use them
everywhere instead of having slightly different implementations
scattered about. Define them in vm_extern.h and use them where
possible where vm_extern.h is included.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D33685
The introduction of <sched.h> improved compatibility with some 3rd
party software, but caused the configure scripts of some ports to
assume that they were run in a GLIBC compatible environment.
Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being
added to ports, but there still were compatibility issues due to
invalid assumptions made in autoconfigure scripts.
The differences between the FreeBSD version of macros like CPU_AND,
CPU_OR, etc. and the GLIBC versions was in the number of arguments:
FreeBSD used a 2-address scheme (one source argument is also used as
the destination of the operation), while GLIBC uses a 3-adderess
scheme (2 source operands and a separately passed destination).
The GLIBC scheme provides a super-set of the functionality of the
FreeBSD macros, since it does not prevent passing the same variable
as source and destination arguments. In code that wanted to preserve
both source arguments, the FreeBSD macros required a temporary copy of
one of the source arguments.
This patch set allows to unconditionally provide functions and macros
expected by 3rd party software written for GLIBC based systems, but
breaks builds of externally maintained sources that use any of the
following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR.
One contributed driver (contrib/ofed/libmlx5) has been patched to
support both the old and the new CPU_OR signatures. If this commit
is merged to -STABLE, the version test will have to be extended to
cover more ranges.
Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do
no longer require that option.
The FreeBSD version has been bumped to 1400046 to reflect this
incompatible change.
Reviewed by: kib
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D33451
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.
Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.
One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.
Reviewed by: imp, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33447
Summary:
Disable timebase on (some) AIM platforms (tested on PowerMac G4) prior
to synchronization.
Some platforms use a GPIO to enable and disable timebase, while others
use a platform function.
This mirrors 0d69f00b on mpc85xx.
Todo:
* Implement various G5 timebase controls.
* Print out platform code on unknown G5s so we can collect it.
* Change API to be give/take pairs like Linux does so it's possible to
do a software sync protocol.
Reviewed By: #powerpc, jhibbits
Subscribers: mikael, markmi_dsl-only.net, luporl, alfredo
Tags: #powerpc
Differential Revision: https://reviews.freebsd.org/D29136
The calculation of Maxmem was skipping the last phys_avail segment,
because of a wrong stop condition.
This was detected when using QEMU/PowerNV with Radix MMU and low
memory (2G). In this case opal_pci would allocate a DMA window that
was too small to cover all physical memory, resulting in reading all
zeroes from disk when using memory that was not inside the allocated
window.
Reviewed by: jhibbits
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D33449
MFC after: 2 weeks
The header exports the following:
- Definition of struct tcb.
- Helpers to get/set the tcb for the current thread.
- TLS_TCB_SIZE (size of TCB)
- TLS_TCB_ALIGN (alignment of TCB)
- TLS_VARIANT_I or TLS_VARIANT_II
- TLS_DTV_OFFSET (bias of pointers in dtv[])
- TLS_TP_OFFSET (bias of "thread pointer" relative to TCB)
Note that TLS_TP_OFFSET does not account for if the unbiased thread
pointer points to the start of the TCB (arm and x86) or the end of the
TCB (MIPS, PowerPC, and RISC-V).
Note also that for amd64, the struct tcb does not include the unused
tcb_spare field included in the current structure in libthr. libthr
does not use this field, and the existing calls in libc and rtld that
allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and
thus do not allocate room for tcb_spare).
A <sys/_tls_variant_i.h> header is used by architectures using
Variant I TLS which uses a common struct tcb.
Reviewed by: kib (older version of x86/tls.h), jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33351
After a round of cleanups in late 2020, all definitions are
functionally identical.
This removes a rotted __aligned(8) on arm. It was added in
b7112ead32 and was intended to align the
args member so that 64-bit types (off_t, etc) could be safely read on
armeb compiled with clang. With the removal of armev, this is no
longer needed (armv7 requires that 32-bit aligned reads of 64-bit
values be supported and we enable such support on armv6). As further
evidence this is unnecessary, cleanups to struct syscall_args have
resulted in args being 32-bit aligned on 32-bit systems. The sole
effect is to bloat the struct by 4 bytes.
Reviewed by: kib, jhb, imp
Differential Revision: https://reviews.freebsd.org/D33308
We do not consider the space reserved for the pcb to be part of the
total kstack size, so it should not be included in the calculation of
the used stack size.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Add configuration file to be used by "FreeBSD-<branch>-powerpc64le-LINT"
CI/Jenkins job
Reviewed by: lwhsu
MFC after: 2 days
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D33136
When using QEMU PowerNV with latest op-build release (v2.7), its
kexec transfers control to FreeBSD kernel in BE mode, causing an
instant exception on LE kernels. Make kboot able to detect and
swap endian to fix this.
Reviewed by: imp
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D33104
Add a new ioctl to vt to make it possible to export RGB offsets
set by vt drivers. This is needed to fix colors on X and Mesa
on some machines, especially on modern PowerPC64 BE ones.
With the appropriate changes in SCFB, to use this ioctl to find
out the correct RGB offsets, this fixes wrong colors on Talos II
and Blackbird, when used with their built-in video cards.
Reviewed by: alfredo
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D29000
in_cksum() and related routines are implemented separately for each
platform, but only i386 and arm have optimized versions. Other
platforms' copies of in_cksum.c are identical except for style
differences and support for big-endian CPUs.
Deduplicate the implementations for the rest of the platforms. This
will make it easier to implement in_cksum() for unmapped mbufs. On arm
and i386, define HAVE_MD_IN_CKSUM to mean that the MI implementation is
not to be compiled.
No functional change intended.
Reviewed by: kp, glebius
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33095
It was never implemented on powerpc or riscv and appears to have been
unused since it was added in 1998. No functional change intended.
Reviewed by: kp, glebius, cy
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33093
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.
To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.
Reviewed by: kib, markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31992
Don't assume we are dumping the global message buffer, but use the one
provided by the state argument. While here, drop superfluous
cast to char *.
Reviewed by: markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31991
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.
This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.
Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.
The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.
Reviewed by: kib, markj, jhb
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31989
Define CC_NEWRENO in all the appropriate DEFAULTS and std.* config
files. It's the default congestion control algorithm. Add code to cc.c
so that CC_DEFAULT is "newreno" if it's not overriden in the config
file.
Sponsored by: Netflix
Fixes: b8d60729de ("tcp: Congestion control cleanup.")
Revired by: manu, hselasky, jhb, glebius, tuexen
Differential Revision: https://reviews.freebsd.org/D32964
NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!
This patch does a bit of cleanup on TCP congestion control modules. There were some rather
interesting surprises that one could get i.e. where you use a socket option to change
from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get
a memory failure and end up on cc_newreno. This is not what one would expect. The
new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK
and pass in to the init function preallocated memory. The CC init is expected in this
case *not* to fail but if it does and a module does break the
"no fail with memory given" contract we do fall back to the CC that was in place at the time.
This also fixes up a set of common newreno utilities that can be shared amongst other
CC modules instead of the other CC modules reaching into newreno and executing
what they think is a "common and understood" function. Lets put these functions in
cc.c and that way we have a common place that is easily findable by future developers or
bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE
and HYSTART++ without having to dance through hoops for other CC modules, instead
both newreno and the other modules just call into the common functions if they desire
that behavior or roll there own if that makes more sense.
Note: This commit changes the kernel configuration!! If you are not using GENERIC in
some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC,
CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined
as well if you desire. Note that if you create a kernel configuration that does not
define a congestion control module and includes INET or INET6 the kernel compile will
break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\"
but you can specify any string that represents the name of the CC module (same names
that show up in the CC module list under net.inet.tcp.cc). If you fail to add the
options CC_DEFAULT in your kernel configuration the kernel build will also break.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
RELNOTES:YES
Differential Revision: https://reviews.freebsd.org/D32693
sched_throw() can no longer take a NULL thread, APs enter through
sched_ap_entry() instead. This completely removes branching in the
common case and cleans up both paths. No functional change intended.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D32829
schedinit_ap() sets up an AP for a later call to sched_throw(NULL).
Currently, ULE sets up some pcpu bits and fixes the idlethread lock with
a call to sched_throw(NULL); this results in a window where curthread is
setup in platforms' init_secondary(), but it has the wrong td_lock.
Typical platform AP startup procedure looks something like:
- Setup curthread
- ... other stuff, including cpu_initclocks_ap()
- Signal smp_started
- sched_throw(NULL) to enter the scheduler
cpu_initclocks_ap() may have callouts to process (e.g., nvme) and
attempt to sched_add() for this AP, but this attempt fails because
of the noted violated assumption leading to locking heartburn in
sched_setpreempt().
Interrupts are still disabled until cpu_throw() so we're not really at
risk of being preempted -- just let the scheduler in on it a little
earlier as part of setting up curthread.
Reviewed by: alfredo, kib, markj
Triage help from: andrew, markj
Smoke-tested by: alfredo (ppc), kevans (arm64, x86), mhorne (arm)
Differential Revision: https://reviews.freebsd.org/D32797
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.
Similarly, convert vm_page_alloc_domain() callers.
Note that callers are now responsible for assigning the pindex.
Reviewed by: alc, hselasky, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31986
As Radix MMU with superpages enabled is now stable, make it the
default choice on supported hardware (POWER9 and above), since its
performance is greater than that of HPT MMU.
Reviewed by: alfredo, jhibbits
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D30797
Current implementation of Radix MMU doesn't support mapping
arbitrary virtual addresses, such as the ones generated by
"direct mapping" I/O addresses. This caused the system to hang, when
early I/O addresses, such as those used by OpenFirmware Frame Buffer,
were remapped after the MMU was up.
To avoid having to modify mmu_radix_kenter_attr just to support this
use case, this change makes early I/O map use virtual addresses from
KVA area instead (similar to what mmu_radix_mapdev_attr does), as
these can be safely remapped later.
Reviewed by: alfredo (earlier version), jhibbits (in irc)
MFC after: 2 weeks
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D31232
This partially reverts e81e77c5a0, leaving the option both in
GENERICs on amd64/arm64/arm, and in global NOTES file. Apparently
this better matches existing practice, where we do not try to hard
to make LINT and GENERIC complimentary.
Requested and reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Remove the option from NOTES/LINT, and add to NOTES for powerpc and
riscv.
PR: 259036
Requested by: John Hay <john@sanren.ac.za>
Discussed with: ian, imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31885
The function is identical in each minidump implementation, so move it to
vm_phys.c. The only slight exception is powerpc where the function was
public, for use in moea64_scan_pmap().
Reviewed by: kib, markj, imp (earlier version)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31884
When running in a virtualized environment, TLB invalidations can only
be performed on process scope, as only the hypervisor is allowed to
invalidate a global scope, or else a Program Interrupt is triggered.
Since we are here, also make sure that the register process table
hypercall returns success.
Reviewed by: jhibbits
MFC after: 2 weeks
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D31775
Summary:
Failure to update the FP / vector state was causing daemon(3) to violate C ABI by failing to preserve nonvolatile registers.
This was causing a weird issue where moused was not working on PowerBook G4s when daemonizing, but was working fine when running it foreground.
Force saving off the same state that cpu_switch() does in cases where we are about to copy a thread.
MFC after: 1 week
Sponsored by: Tag1 Consulting, Inc.
Test Plan:
```
/*
* Test for ABI violation due to side effects of daemon(3).
*
* NOTE: Compile with -O2 to see the effect.
*/
/* Allow compiling for Linux too. */
static double test = 1234.56f;
/*
* This contrivance coerces clang to not bounce the double
* off of memory again in main.
*/
void __attribute__((noinline))
print_double(int j1, int j2, double d)
{
printf("%f\n", d);
}
int
main(int argc, char *argv[])
{
print_double(0, 0, test);
if (daemon(0, 1)) {
}
/* Compiler assumes nonvolatile regs are intact... */
print_double(0, 0, test);
return(0);
}
```
Working output:
```
1234.560059
1234.560059
```
Output in broken case:
```
1234.560059
0.0
```
Reviewers: #powerpc
Subscribers: jhibbits, luporl, alfredo
Tags: #powerpc
Differential Revision: https://reviews.freebsd.org/D29851
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
ISA 3.0 allows for nested radix translations with minimal to no
involvement of the hypervisor. This should make pseries signficantly
faster on POWER9 pseries instances, as fewer hypercalls are needed to
manage pmap now.
MFC after: 2 weeks
Relnotes: yes
which is the place to put MD asserts about allocated pages.
On amd64, verify that allocated page does not belong to the kernel
(text, data) or early allocated pages.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121
As the Processor Version Register (PVR) is a 32-bit PowerPC
register, change mfpvr() return type to match it and avoid
type casts on its callers.
Suggested by: jhibbits
Reviewed by: jhibbits, imp
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D31332