Commit graph

2463 commits

Author SHA1 Message Date
Linus Torvalds c89d780cc1 arm64 updates for 6.11:
* Virtual CPU hotplug support for arm64 ACPI systems
 
 * cpufeature infrastructure cleanups and making the FEAT_ECBHB ID bits
   visible to guests
 
 * CPU errata: expand the speculative SSBS workaround to more CPUs
 
 * arm64 ACPI:
 
   - acpi=nospcr option to disable SPCR as default console for arm64
 
   - Move some ACPI code (cpuidle, FFH) to drivers/acpi/arm64/
 
 * GICv3, use compile-time PMR values: optimise the way regular IRQs are
   masked/unmasked when GICv3 pseudo-NMIs are used, removing the need for
   a static key in fast paths by using a priority value chosen
   dynamically at boot time
 
 * arm64 perf updates:
 
   - Rework of the IMX PMU driver to enable support for I.MX95
 
   - Enable support for tertiary match groups in the CMN PMU driver
 
   - Initial refactoring of the CPU PMU code to prepare for the fixed
     instruction counter introduced by Arm v9.4
 
   - Add missing PMU driver MODULE_DESCRIPTION() strings
 
   - Hook up DT compatibles for recent CPU PMUs
 
 * arm64 kselftest updates:
 
   - Kernel mode NEON fp-stress
 
   - Cleanups, spelling mistakes
 
 * arm64 Documentation update with a minor clarification on TBI
 
 * Miscellaneous:
 
   - Fix missing IPI statistics
 
   - Implement raw_smp_processor_id() using thread_info rather than a
     per-CPU variable (better code generation)
 
   - Make MTE checking of in-kernel asynchronous tag faults conditional
     on KASAN being enabled
 
   - Minor cleanups, typos
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmaQKN4ACgkQa9axLQDI
 XvE0Nw/+JZ6OEQ+DMUHXZfbWanvn1p0nVOoEV3MYVpOeQK1ILYCoDapatLNIlet0
 wcja7tohKbL1ifc7GOqlkitu824LMlotncrdOBycRqb/4C5KuJ+XhygFv5hGfX0T
 Uh2zbo4w52FPPEUMICfEAHrKT3QB9tv7f66xeUNbWWFqUn3rY02/ZVQVVdw6Zc0e
 fVYWGUUoQDR7+9hRkk6tnYw3+9YFVAUAbLWk+DGrW7WsANi6HuJ/rBMibwFI6RkG
 SZDZHum6vnwx0Dj9H7WrYaQCvUMm7AlckhQGfPbIFhUk6pWysfJtP5Qk49yiMl7p
 oRk/GrSXpiKumuetgTeOHbokiE1Nb8beXx0OcsjCu4RrIaNipAEpH1AkYy5oiKoT
 9vKZErMDtQgd96JHFVaXc+A3D2kxVfkc1u7K3TEfVRnZFV7CN+YL+61iyZ+uLxVi
 d9xrAmwRsWYFVQzlZG3NWvSeQBKisUA1L8JROlzWc/NFDwTqDGIt/zS4pZNL3+OM
 EXW0LyKt7Ijl6vPXKCXqrODRrPlcLc66VMZxofZOl0/dEqyJ+qLL4GUkWZu8lTqO
 BqydYnbTSjiDg/ntWjTrD0uJ8c40Qy7KTPEdaPqEIQvkDEsUGlOnhAQjHrnGNb9M
 psZtpDW2xm7GykEOcd6rgSz4Xeky2iLsaR4Wc7FTyDS0YRmeG44=
 =ob2k
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "The biggest part is the virtual CPU hotplug that touches ACPI,
  irqchip. We also have some GICv3 optimisation for pseudo-NMIs that has
  been queued via the arm64 tree. Otherwise the usual perf updates,
  kselftest, various small cleanups.

  Core:

   - Virtual CPU hotplug support for arm64 ACPI systems

   - cpufeature infrastructure cleanups and making the FEAT_ECBHB ID
     bits visible to guests

   - CPU errata: expand the speculative SSBS workaround to more CPUs

   - GICv3, use compile-time PMR values: optimise the way regular IRQs
     are masked/unmasked when GICv3 pseudo-NMIs are used, removing the
     need for a static key in fast paths by using a priority value
     chosen dynamically at boot time

  ACPI:

   - 'acpi=nospcr' option to disable SPCR as default console for arm64

   - Move some ACPI code (cpuidle, FFH) to drivers/acpi/arm64/

  Perf updates:

   - Rework of the IMX PMU driver to enable support for I.MX95

   - Enable support for tertiary match groups in the CMN PMU driver

   - Initial refactoring of the CPU PMU code to prepare for the fixed
     instruction counter introduced by Arm v9.4

   - Add missing PMU driver MODULE_DESCRIPTION() strings

   - Hook up DT compatibles for recent CPU PMUs

  Kselftest updates:

   - Kernel mode NEON fp-stress

   - Cleanups, spelling mistakes

  Miscellaneous:

   - arm64 Documentation update with a minor clarification on TBI

   - Fix missing IPI statistics

   - Implement raw_smp_processor_id() using thread_info rather than a
     per-CPU variable (better code generation)

   - Make MTE checking of in-kernel asynchronous tag faults conditional
     on KASAN being enabled

   - Minor cleanups, typos"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (69 commits)
  selftests: arm64: tags: remove the result script
  selftests: arm64: tags_test: conform test to TAP output
  perf: add missing MODULE_DESCRIPTION() macros
  arm64: smp: Fix missing IPI statistics
  irqchip/gic-v3: Fix 'broken_rdists' unused warning when !SMP and !ACPI
  ACPI: Add acpi=nospcr to disable ACPI SPCR as default console on ARM64
  Documentation: arm64: Update memory.rst for TBI
  arm64/cpufeature: Replace custom macros with fields from ID_AA64PFR0_EL1
  KVM: arm64: Replace custom macros with fields from ID_AA64PFR0_EL1
  perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h
  perf: arm_v6/7_pmu: Drop non-DT probe support
  perf/arm: Move 32-bit PMU drivers to drivers/perf/
  perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check
  perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold
  arm64: Kconfig: Fix dependencies to enable ACPI_HOTPLUG_CPU
  perf: imx_perf: add support for i.MX95 platform
  perf: imx_perf: fix counter start and config sequence
  perf: imx_perf: refactor driver for imx93
  perf: imx_perf: let the driver manage the counter usage rather the user
  perf: imx_perf: add macro definitions for parsing config attr
  ...
2024-07-15 17:06:19 -07:00
Catalin Marinas 3346c56685 Merge branches 'for-next/cpufeature', 'for-next/misc', 'for-next/kselftest', 'for-next/mte', 'for-next/errata', 'for-next/acpi', 'for-next/gic-v3-pmr' and 'for-next/doc', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  perf: add missing MODULE_DESCRIPTION() macros
  perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h
  perf: arm_v6/7_pmu: Drop non-DT probe support
  perf/arm: Move 32-bit PMU drivers to drivers/perf/
  perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check
  perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold
  perf: imx_perf: add support for i.MX95 platform
  perf: imx_perf: fix counter start and config sequence
  perf: imx_perf: refactor driver for imx93
  perf: imx_perf: let the driver manage the counter usage rather the user
  perf: imx_perf: add macro definitions for parsing config attr
  dt-bindings: perf: fsl-imx-ddr: Add i.MX95 compatible
  perf: pmuv3: Add new Cortex and Neoverse PMUs
  dt-bindings: arm: pmu: Add new Cortex and Neoverse cores
  perf/arm-cmn: Enable support for tertiary match group
  perf/arm-cmn: Decouple wp_config registers from filter group number

* for-next/cpufeature:
  : Various cpufeature infrastructure patches
  arm64/cpufeature: Replace custom macros with fields from ID_AA64PFR0_EL1
  KVM: arm64: Replace custom macros with fields from ID_AA64PFR0_EL1
  arm64/cpufeatures/kvm: Add ARMv8.9 FEAT_ECBHB bits in ID_AA64MMFR1 register

* for-next/misc:
  : Miscellaneous patches
  arm64: smp: Fix missing IPI statistics
  arm64: Cleanup __cpu_set_tcr_t0sz()
  arm64/mm: Stop using ESR_ELx_FSC_TYPE during fault
  arm64: Kconfig: fix typo in __builtin_return_adddress
  ARM64: reloc_test: add missing MODULE_DESCRIPTION() macro
  arm64: implement raw_smp_processor_id() using thread_info
  arm64/arch_timer: include <linux/percpu.h>

* for-next/kselftest:
  : arm64 kselftest updates
  selftests: arm64: tags: remove the result script
  selftests: arm64: tags_test: conform test to TAP output
  kselftest/arm64: Fix a couple of spelling mistakes
  kselftest/arm64: Fix redundancy of a testcase
  kselftest/arm64: Include kernel mode NEON in fp-stress

* for-next/mte:
  : MTE updates
  arm64: mte: Make mte_check_tfsr_*() conditional on KASAN instead of MTE

* for-next/errata:
  : Arm CPU errata workarounds
  arm64: errata: Expand speculative SSBS workaround
  arm64: errata: Unify speculative SSBS errata logic
  arm64: cputype: Add Cortex-X925 definitions
  arm64: cputype: Add Cortex-A720 definitions
  arm64: cputype: Add Cortex-X3 definitions

* for-next/acpi:
  : arm64 ACPI patches
  ACPI: Add acpi=nospcr to disable ACPI SPCR as default console on ARM64
  ACPI / amba: Drop unnecessary check for registered amba_dummy_clk
  arm64: FFH: Move ACPI specific code into drivers/acpi/arm64/
  arm64: cpuidle: Move ACPI specific code into drivers/acpi/arm64/
  ACPI: arm64: Sort entries alphabetically

* for-next/gic-v3-pmr:
  : arm64: irqchip/gic-v3: Use compiletime constant PMR values
  arm64: irqchip/gic-v3: Select priorities at boot time
  irqchip/gic-v3: Detect GICD_CTRL.DS and SCR_EL3.FIQ earlier
  irqchip/gic-v3: Make distributor priorities variables
  irqchip/gic-common: Remove sync_access callback
  wordpart.h: Add REPEAT_BYTE_U32()

* for-next/doc:
  : arm64 documentation updates
  Documentation: arm64: Update memory.rst for TBI
2024-07-11 19:09:35 +01:00
Anshuman Khandual 42b9fed388 KVM: arm64: Replace custom macros with fields from ID_AA64PFR0_EL1
This replaces custom macros usage (i.e ID_AA64PFR0_EL1_ELx_64BIT_ONLY and
ID_AA64PFR0_EL1_ELx_32BIT_64BIT) and instead directly uses register fields
from ID_AA64PFR0_EL1 sysreg definition.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: kvmarm@lists.linux.dev
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20240613102710.3295108-2-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-07-04 12:21:12 +01:00
Rob Herring (Arm) d688ffa269 perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h
The arm64 asm/arm_pmuv3.h depends on defines from
linux/perf/arm_pmuv3.h. Rather than depend on include order, follow the
usual pattern of "linux" headers including "asm" headers of the same
name.

With this change, the include of linux/kvm_host.h is problematic due to
circular includes:

In file included from ../arch/arm64/include/asm/arm_pmuv3.h:9,
                 from ../include/linux/perf/arm_pmuv3.h:312,
                 from ../include/kvm/arm_pmu.h:11,
                 from ../arch/arm64/include/asm/kvm_host.h:38,
                 from ../arch/arm64/mm/init.c:41:
../include/linux/kvm_host.h:383:30: error: field 'arch' has incomplete type

Switching to asm/kvm_host.h solves the issue.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20240626-arm-pmu-3-9-icntr-v2-5-c9784b4f4065@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03 14:07:14 +01:00
Vincent Donnefort d66e50beb9 KVM: arm64: FFA: Release hyp rx buffer
According to the FF-A spec (Buffer states and ownership), after a
producer has written into a buffer, it is "full" and now owned by the
consumer. The producer won't be able to use that buffer, until the
consumer hands it over with an invocation such as RX_RELEASE.

It is clear in the following paragraph (Transfer of buffer ownership),
that MEM_RETRIEVE_RESP is transferring the ownership from producer (in
our case SPM) to consumer (hypervisor). RX_RELEASE is therefore
mandatory here.

It is less clear though what is happening with MEM_FRAG_TX. But this
invocation, as a response to MEM_FRAG_RX writes into the same hypervisor
RX buffer (see paragraph "Transmission of transaction descriptor in
fragments"). Also this is matching the TF-A implementation where the RX
buffer is marked "full" during a MEM_FRAG_RX.

Release the RX hypervisor buffer in those two cases. This will unblock
later invocations using this buffer which would otherwise fail.
(RETRIEVE_REQ, MEM_FRAG_RX and PARTITION_INFO_GET).

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20240611175317.1220842-1-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-11 19:39:22 +01:00
Marc Zyngier 0d92e4a7ff KVM: arm64: Disassociate vcpus from redistributor region on teardown
When tearing down a redistributor region, make sure we don't have
any dangling pointer to that region stored in a vcpu.

Fixes: e5a3563546 ("kvm: arm64: vgic-v3: Introduce vgic_v3_free_redist_region()")
Reported-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240605175637.1635653-1-maz@kernel.org
Cc: stable@vger.kernel.org
2024-06-06 08:54:15 +01:00
Fuad Tabba afb91f5f8a KVM: arm64: Ensure that SME controls are disabled in protected mode
KVM (and pKVM) do not support SME guests. Therefore KVM ensures
that the host's SME state is flushed and that SME controls for
enabling access to ZA storage and for streaming are disabled.

pKVM needs to protect against a buggy/malicious host. Ensure that
it wouldn't run a guest when protected mode is enabled should any
of the SME controls be enabled.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-10-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba a69283ae1d KVM: arm64: Refactor CPACR trap bit setting/clearing to use ELx format
When setting/clearing CPACR bits for EL0 and EL1, use the ELx
format of the bits, which covers both. This makes the code
clearer, and reduces the chances of accidentally missing a bit.

No functional change intended.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-9-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba 1696fc2174 KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVM
Now that we have introduced finalize_init_hyp_mode(), lets
consolidate the initializing of the host_data fpsimd_state and
sve state.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240603122852.3923848-8-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba b5b9955617 KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM
When running in protected mode we don't want to leak protected
guest state to the host, including whether a guest has used
fpsimd/sve. Therefore, eagerly restore the host state on guest
exit when running in protected mode, which happens only if the
guest has used fpsimd/sve.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-7-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba 66d5b53e20 KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM
Protected mode needs to maintain (save/restore) the host's sve
state, rather than relying on the host kernel to do that. This is
to avoid leaking information to the host about guests and the
type of operations they are performing.

As a first step towards that, allocate memory mapped at hyp, per
cpu, for the host sve state. The following patch will use this
memory to save/restore the host state.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba e511e08a9f KVM: arm64: Specialize handling of host fpsimd state on trap
In subsequent patches, n/vhe will diverge on saving the host
fpsimd/sve state when taking a guest fpsimd/sve trap. Add a
specialized helper to handle it.

No functional change intended.

Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba 6d8fb3cbf7 KVM: arm64: Abstract set/clear of CPTR_EL2 bits behind helper
The same traps controlled by CPTR_EL2 or CPACR_EL1 need to be
toggled in different parts of the code, but the exact bits and
their polarity differ between these two formats and the mode
(vhe/nvhe/hvhe).

To reduce the amount of duplicated code and the chance of getting
the wrong bit/polarity or missing a field, abstract the set/clear
of CPTR_EL2 bits behind a helper.

Since (h)VHE is the way of the future, use the CPACR_EL1 format,
which is a subset of the VHE CPTR_EL2, as a reference.

No functional change intended.

Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba 45f4ea9bcf KVM: arm64: Fix prototype for __sve_save_state/__sve_restore_state
Since the prototypes for __sve_save_state/__sve_restore_state at
hyp were added, the underlying macro has acquired a third
parameter for saving/restoring ffr.

Fix the prototypes to account for the third parameter, and
restore the ffr for the guest since it is saved.

Suggested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240603122852.3923848-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:32 +01:00
Fuad Tabba 87bb39ed40 KVM: arm64: Reintroduce __sve_save_state
Now that the hypervisor is handling the host sve state in
protected mode, it needs to be able to save it.

This reverts commit e66425fc9b ("KVM: arm64: Remove unused
__sve_save_state").

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:32 +01:00
Marc Zyngier 47eb2d68d1 KVM: arm64: nv: Expose BTI and CSV_frac to a guest hypervisor
Now that we expose PAC to NV guests, we can also expose BTI (as
the two as joined at the hip, due to some of the PAC instructions
being landing pads).

While we're at it, also propagate CSV_frac, which requires no
particular emulation.

Fixes: f4f6a95bac ("KVM: arm64: nv: Advertise support for PAuth")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240528100632.1831995-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-30 17:36:22 +01:00
Marc Zyngier 41011e2de3 KVM: arm64: nv: Fix relative priorities of exceptions generated by ERETAx
ERETAx can fail in multiple ways:

(1) ELR_EL2 points lalaland
(2) we get a PAC failure
(3) SPSR_EL2 has the wrong mode

(1) is easy, as we just let the CPU do its thing and deliver an
Instruction Abort. However, (2) and (3) are interesting, because
the PAC failure priority is way below that of the Illegal Execution
State exception.

Which means that if we have detected a PAC failure (and that we have
FPACCOMBINE), we must be careful to give priority to the Illegal
Execution State exception, should one be pending.

Solving this involves hoisting the SPSR calculation earlier and
testing for the IL bit before injecting the FPAC exception.

In the extreme case of a ERETAx returning to an invalid mode *and*
failing its PAC check, we end up with an Instruction Abort (due
to the new PC being mangled by the failed Auth) *and* PSTATE.IL
being set. Which matches the requirements of the architecture.

Whilst we're at it, remove a stale comment that states the obvious
and only confuses the reader.

Fixes: 213b3d1ea1 ("KVM: arm64: nv: Handle ERETA[AB] instructions")
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240528100632.1831995-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-30 17:36:22 +01:00
Marc Zyngier c92e8b9eac KVM: arm64: AArch32: Fix spurious trapping of conditional instructions
We recently upgraded the view of ESR_EL2 to 64bit, in keeping with
the requirements of the architecture.

However, the AArch32 emulation code was left unaudited, and the
(already dodgy) code that triages whether a trap is spurious or not
(because the condition code failed) broke in a subtle way:

If ESR_EL2.ISS2 is ever non-zero (unlikely, but hey, this is the ARM
architecture we're talking about), the hack that tests the top bits
of ESR_EL2.EC will break in an interesting way.

Instead, use kvm_vcpu_trap_get_class() to obtain the EC, and list
all the possible ECs that can fail a condition code check.

While we're at it, add SMC32 to the list, as it is explicitly listed
as being allowed to trap despite failing a condition code check (as
described in the HCR_EL2.TSC documentation).

Fixes: 0b12620fdd ("KVM: arm64: Treat ESR_EL2 as a 64-bit register")
Cc: stable@vger.kernel.org
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240524141956.1450304-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-27 17:46:09 +01:00
Marc Zyngier dfe6d190f3 KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode
It appears that we don't allow a vcpu to be restored in AArch32
System mode, as we *never* included it in the list of valid modes.

Just add it to the list of allowed modes.

Fixes: 0d854a60b1 ("arm64: KVM: enable initialization of a 32bit vcpu")
Cc: stable@vger.kernel.org
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240524141956.1450304-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-27 17:45:35 +01:00
Marc Zyngier 947051e361 KVM: arm64: Fix AArch32 register narrowing on userspace write
When userspace writes to one of the core registers, we make
sure to narrow the corresponding GPRs if PSTATE indicates
an AArch32 context.

The code tries to check whether the context is EL0 or EL1 so
that it narrows the correct registers. But it does so by checking
the full PSTATE instead of PSTATE.M.

As a consequence, and if we are restoring an AArch32 EL0 context
in a 64bit guest, and that PSTATE has *any* bit set outside of
PSTATE.M, we narrow *all* registers instead of only the first 15,
destroying the 64bit state.

Obviously, this is not something the guest is likely to enjoy.

Correctly masking PSTATE to only evaluate PSTATE.M fixes it.

Fixes: 90c1f934ed ("KVM: arm64: Get rid of the AArch32 register mapping code")
Reported-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Cc: stable@vger.kernel.org
Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240524141956.1450304-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-27 17:45:21 +01:00
Linus Torvalds ff9a79307f Kbuild updates for v6.10
- Avoid 'constexpr', which is a keyword in C23
 
  - Allow 'dtbs_check' and 'dt_compatible_check' run independently of
    'dt_binding_check'
 
  - Fix weak references to avoid GOT entries in position-independent
    code generation
 
  - Convert the last use of 'optional' property in arch/sh/Kconfig
 
  - Remove support for the 'optional' property in Kconfig
 
  - Remove support for Clang's ThinLTO caching, which does not work with
    the .incbin directive
 
  - Change the semantics of $(src) so it always points to the source
    directory, which fixes Makefile inconsistencies between upstream and
    downstream
 
  - Fix 'make tar-pkg' for RISC-V to produce a consistent package
 
  - Provide reasonable default coverage for objtool, sanitizers, and
    profilers
 
  - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.
 
  - Remove the last use of tristate choice in drivers/rapidio/Kconfig
 
  - Various cleanups and fixes in Kconfig
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmZFlGcVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsG8voQALC8NtFpduWVfLRj2Qg6Ll/xf1vX
 2igcTJEOFHkeqXLGoT8dTDKLEipUBUvKyguPq66CGwVTe2g6zy/nUSXeVtFrUsIa
 msLTi8FqhqUo5lodNvGMRf8qqmuqcvnXoiQwIocF92jtsFy14bhiFY+n4HfcFNjj
 GOKwqBZYQUwY/VVb090efc7RfS9c7uwABJSBelSoxg3AGZriwjGy7Pw5aSKGgVYi
 inqL1eR6qwPP6z7CgQWM99soP+zwybFZmnQrsD9SniRBI4rtAat8Ih5jQFaSUFUQ
 lk2w0NQBRFN88/uR2IJ2GWuIlQ74WeJ+QnCqVuQ59tV5zw90wqSmLzngfPD057Dv
 JjNuhk0UyXVtpIg3lRtd4810ppNSTe33b9OM4O2H846W/crju5oDRNDHcflUXcwm
 Rmn5ho1rb5QVzDVejJbgwidnUInSgJ9PZcvXQ/RJVZPhpgsBzAY9pQexG1G3hviw
 y9UDrt6KP6bF9tHjmolmtdIes9Pj0c4dN6/Rdj4HS4hIQ/GDar0tnwvOvtfUctNL
 orJlBsA6GeMmDVXKkR0ytOCWRYqWWbyt8g70RVKQJfuHX7/hGyAQPaQ2/u4mQhC2
 aevYfbNJMj0VDfGz81HDBKFtkc5n+Ite8l157dHEl2LEabkOkRdNVcn7SNbOvZmd
 ZCSnZ31h7woGfNho
 =D5B/
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Avoid 'constexpr', which is a keyword in C23

 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of
   'dt_binding_check'

 - Fix weak references to avoid GOT entries in position-independent code
   generation

 - Convert the last use of 'optional' property in arch/sh/Kconfig

 - Remove support for the 'optional' property in Kconfig

 - Remove support for Clang's ThinLTO caching, which does not work with
   the .incbin directive

 - Change the semantics of $(src) so it always points to the source
   directory, which fixes Makefile inconsistencies between upstream and
   downstream

 - Fix 'make tar-pkg' for RISC-V to produce a consistent package

 - Provide reasonable default coverage for objtool, sanitizers, and
   profilers

 - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.

 - Remove the last use of tristate choice in drivers/rapidio/Kconfig

 - Various cleanups and fixes in Kconfig

* tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits)
  kconfig: use sym_get_choice_menu() in sym_check_prop()
  rapidio: remove choice for enumeration
  kconfig: lxdialog: remove initialization with A_NORMAL
  kconfig: m/nconf: merge two item_add_str() calls
  kconfig: m/nconf: remove dead code to display value of bool choice
  kconfig: m/nconf: remove dead code to display children of choice members
  kconfig: gconf: show checkbox for choice correctly
  kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal
  Makefile: remove redundant tool coverage variables
  kbuild: provide reasonable defaults for tool coverage
  modules: Drop the .export_symbol section from the final modules
  kconfig: use menu_list_for_each_sym() in sym_check_choice_deps()
  kconfig: use sym_get_choice_menu() in conf_write_defconfig()
  kconfig: add sym_get_choice_menu() helper
  kconfig: turn defaults and additional prompt for choice members into error
  kconfig: turn missing prompt for choice members into error
  kconfig: turn conf_choice() into void function
  kconfig: use linked list in sym_set_changed()
  kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED
  kconfig: gconf: remove debug code
  ...
2024-05-18 12:39:20 -07:00
Masahiro Yamada 7f7f6f7ad6 Makefile: remove redundant tool coverage variables
Now Kbuild provides reasonable defaults for objtool, sanitizers, and
profilers.

Remove redundant variables.

Note:

This commit changes the coverage for some objects:

  - include arch/mips/vdso/vdso-image.o into UBSAN, GCOV, KCOV
  - include arch/sparc/vdso/vdso-image-*.o into UBSAN
  - include arch/sparc/vdso/vma.o into UBSAN
  - include arch/x86/entry/vdso/extable.o into KASAN, KCSAN, UBSAN, GCOV, KCOV
  - include arch/x86/entry/vdso/vdso-image-*.o into KASAN, KCSAN, UBSAN, GCOV, KCOV
  - include arch/x86/entry/vdso/vdso32-setup.o into KASAN, KCSAN, UBSAN, GCOV, KCOV
  - include arch/x86/entry/vdso/vma.o into GCOV, KCOV
  - include arch/x86/um/vdso/vma.o into KASAN, GCOV, KCOV

I believe these are positive effects because all of them are kernel
space objects.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Roberto Sassu <roberto.sassu@huawei.com>
2024-05-14 23:35:48 +09:00
Paolo Bonzini e5f62e27b1 KVM/arm64 updates for Linux 6.10
- Move a lot of state that was previously stored on a per vcpu
   basis into a per-CPU area, because it is only pertinent to the
   host while the vcpu is loaded. This results in better state
   tracking, and a smaller vcpu structure.
 
 - Add full handling of the ERET/ERETAA/ERETAB instructions in
   nested virtualisation. The last two instructions also require
   emulating part of the pointer authentication extension.
   As a result, the trap handling of pointer authentication has
   been greattly simplified.
 
 - Turn the global (and not very scalable) LPI translation cache
   into a per-ITS, scalable cache, making non directly injected
   LPIs much cheaper to make visible to the vcpu.
 
 - A batch of pKVM patches, mostly fixes and cleanups, as the
   upstreaming process seems to be resuming. Fingers crossed!
 
 - Allocate PPIs and SGIs outside of the vcpu structure, allowing
   for smaller EL2 mapping and some flexibility in implementing
   more or less than 32 private IRQs.
 
 - Purge stale mpidr_data if a vcpu is created after the MPIDR
   map has been created.
 
 - Preserve vcpu-specific ID registers across a vcpu reset.
 
 - Various minor cleanups and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmY/PT4ACgkQI9DQutE9
 ekNwSA/7BTro0n5gP5/SfSFJeEedigpmHQJtHJk9og0LBzjXZTvYqKpI5J1HnpWE
 AFsDf3aDRPaSCvI+S14LkkK+TmGtVEXUg8YGytQo08IcO2x6xBT/YjpkVOHy23kq
 SGgNMPNUH2sycb7hTcz9Z/V0vBeYwFzYEAhmpvtROvmaRd8ZIyt+ofcclwUZZAQ2
 SolOXR2d+ynCh8ZCOexqyZ67keikW1NXtW5aNWWFc6S6qhmcWdaWJGDcSyHauFac
 +YuHjPETJYh7TNpwYTmKclRh1fk/CgA/e+r71Hlgdkg+DGCyVnEZBQxqMi6GTzNC
 dzy3qhTtRT61SR54q55yMVIC3o6uRSkht+xNg1Nd+UghiqGKAtoYhvGjduodONW2
 1Eas6O+vHipu98HgFnkJRPlnF1HR3VunPDwpzIWIZjK0fIXEfrWqCR3nHFaxShOR
 dniTEPfELguxOtbl3jCZ+KHCIXueysczXFlqQjSDkg/P1l0jKBgpkZzMPY2mpP1y
 TgjipfSL5gr1GPdbrmh4WznQtn5IYWduKIrdEmSBuru05OmBaCO4geXPUwL4coHd
 O8TBnXYBTN/z3lORZMSOj9uK8hgU1UWmnOIkdJ4YBBAL8DSS+O+KtCRkHQP0ghl+
 whl0q1SWTu4LtOQzN5CUrhq9Tge11erEt888VyJbBJmv8x6qJjE=
 =CEfD
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 6.10

- Move a lot of state that was previously stored on a per vcpu
  basis into a per-CPU area, because it is only pertinent to the
  host while the vcpu is loaded. This results in better state
  tracking, and a smaller vcpu structure.

- Add full handling of the ERET/ERETAA/ERETAB instructions in
  nested virtualisation. The last two instructions also require
  emulating part of the pointer authentication extension.
  As a result, the trap handling of pointer authentication has
  been greattly simplified.

- Turn the global (and not very scalable) LPI translation cache
  into a per-ITS, scalable cache, making non directly injected
  LPIs much cheaper to make visible to the vcpu.

- A batch of pKVM patches, mostly fixes and cleanups, as the
  upstreaming process seems to be resuming. Fingers crossed!

- Allocate PPIs and SGIs outside of the vcpu structure, allowing
  for smaller EL2 mapping and some flexibility in implementing
  more or less than 32 private IRQs.

- Purge stale mpidr_data if a vcpu is created after the MPIDR
  map has been created.

- Preserve vcpu-specific ID registers across a vcpu reset.

- Various minor cleanups and improvements.
2024-05-12 03:15:53 -04:00
Paolo Bonzini 4232da23d7 Merge tag 'loongarch-kvm-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.10

1. Add ParaVirt IPI support.
2. Add software breakpoint support.
3. Add mmio trace events support.
2024-05-10 13:20:18 -04:00
Masahiro Yamada b1992c3772 kbuild: use $(src) instead of $(srctree)/$(src) for source directory
Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for
checked-in source files. It is merely a convention without any functional
difference. In fact, $(obj) and $(src) are exactly the same, as defined
in scripts/Makefile.build:

    src := $(obj)

When the kernel is built in a separate output directory, $(src) does
not accurately reflect the source directory location. While Kbuild
resolves this discrepancy by specifying VPATH=$(srctree) to search for
source files, it does not cover all cases. For example, when adding a
header search path for local headers, -I$(srctree)/$(src) is typically
passed to the compiler.

This introduces inconsistency between upstream and downstream Makefiles
because $(src) is used instead of $(srctree)/$(src) for the latter.

To address this inconsistency, this commit changes the semantics of
$(src) so that it always points to the directory in the source tree.

Going forward, the variables used in Makefiles will have the following
meanings:

  $(obj)     - directory in the object tree
  $(src)     - directory in the source tree  (changed by this commit)
  $(objtree) - the top of the kernel object tree
  $(srctree) - the top of the kernel source tree

Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced
with $(src).

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2024-05-10 04:34:52 +09:00
Marc Zyngier eaa46a28d5 Merge branch kvm-arm64/mpidr-reset into kvmarm-master/next
* kvm-arm64/mpidr-reset:
  : .
  : Fixes for CLIDR_EL1 and MPIDR_EL1 being accidentally mutable across
  : a vcpu reset, courtesy of Oliver. From the cover letter:
  :
  : "For VM-wide feature ID registers we ensure they get initialized once for
  : the lifetime of a VM. On the other hand, vCPU-local feature ID registers
  : get re-initialized on every vCPU reset, potentially clobbering the
  : values userspace set up.
  :
  : MPIDR_EL1 and CLIDR_EL1 are the only registers in this space that we
  : allow userspace to modify for now. Clobbering the value of MPIDR_EL1 has
  : some disastrous side effects as the compressed index used by the
  : MPIDR-to-vCPU lookup table assumes MPIDR_EL1 is immutable after KVM_RUN.
  :
  : Series + reproducer test case to address the problem of KVM wiping out
  : userspace changes to these registers. Note that there are still some
  : differences between VM and vCPU scoped feature ID registers from the
  : perspective of userspace. We do not allow the value of VM-scope
  : registers to change after KVM_RUN, but vCPU registers remain mutable."
  : .
  KVM: selftests: arm64: Test vCPU-scoped feature ID registers
  KVM: selftests: arm64: Test that feature ID regs survive a reset
  KVM: selftests: arm64: Store expected register value in set_id_regs
  KVM: selftests: arm64: Rename helper in set_id_regs to imply VM scope
  KVM: arm64: Only reset vCPU-scoped feature ID regs once
  KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs()
  KVM: arm64: Rename is_id_reg() to imply VM scope

Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-09 18:44:15 +01:00
Oliver Upton e016333745 KVM: arm64: Only reset vCPU-scoped feature ID regs once
The general expecation with feature ID registers is that they're 'reset'
exactly once by KVM for the lifetime of a vCPU/VM, such that any
userspace changes to the CPU features / identity are honored after a
vCPU gets reset (e.g. PSCI_ON).

KVM handles what it calls VM-scoped feature ID registers correctly, but
feature ID registers local to a vCPU (CLIDR_EL1, MPIDR_EL1) get wiped
after every reset. What's especially concerning is that a
potentially-changing MPIDR_EL1 breaks MPIDR compression for indexing
mpidr_data, as the mask of useful bits to build the index could change.

This is absolutely no good. Avoid resetting vCPU feature ID registers
more than once.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502233529.1958459-4-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-09 18:39:45 +01:00
Oliver Upton 44cbe80b76 KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs()
A subsequent change to KVM will expand the range of feature ID registers
that get special treatment at reset. Fold the existing ones back in to
kvm_reset_sys_regs() to avoid the need for an additional table walk.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502233529.1958459-3-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-09 18:39:45 +01:00
Oliver Upton 592efc606b KVM: arm64: Rename is_id_reg() to imply VM scope
The naming of some of the feature ID checks is ambiguous. Rephrase the
is_id_reg() helper to make its purpose slightly clearer.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502233529.1958459-2-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-09 18:39:45 +01:00
Marc Zyngier e28157060c Merge branch kvm-arm64/misc-6.10 into kvmarm-master/next
* kvm-arm64/misc-6.10:
  : .
  : Misc fixes and updates targeting 6.10
  :
  : - Improve boot-time diagnostics when the sysreg tables
  :   are not correctly sorted
  :
  : - Allow FFA_MSG_SEND_DIRECT_REQ in the FFA proxy
  :
  : - Fix duplicate XNX field in the ID_AA64MMFR1_EL1
  :   writeable mask
  :
  : - Allocate PPIs and SGIs outside of the vcpu structure, allowing
  :   for smaller EL2 mapping and some flexibility in implementing
  :   more or less than 32 private IRQs.
  :
  : - Use bitmap_gather() instead of its open-coded equivalent
  :
  : - Make protected mode use hVHE if available
  :
  : - Purge stale mpidr_data if a vcpu is created after the MPIDR
  :   map has been created
  : .
  KVM: arm64: Destroy mpidr_data for 'late' vCPU creation
  KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support
  KVM: arm64: Fix hvhe/nvhe early alias parsing
  KVM: arm64: Convert kvm_mpidr_index() to bitmap_gather()
  KVM: arm64: vgic: Allocate private interrupts on demand
  KVM: arm64: Remove duplicated AA64MMFR1_EL1 XNX
  KVM: arm64: Remove FFA_MSG_SEND_DIRECT_REQ from the denylist
  KVM: arm64: Improve out-of-order sysreg table diagnostics

Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-08 16:41:50 +01:00
Oliver Upton ce5d2448eb KVM: arm64: Destroy mpidr_data for 'late' vCPU creation
A particularly annoying userspace could create a vCPU after KVM has
computed mpidr_data for the VM, either by racing against VGIC
initialization or having a userspace irqchip.

In any case, this means mpidr_data no longer fully describes the VM, and
attempts to find the new vCPU with kvm_mpidr_to_vcpu() will fail. The
fix is to discard mpidr_data altogether, as it is only a performance
optimization and not required for correctness. In all likelihood KVM
will recompute the mappings when KVM_RUN is called on the new vCPU.

Note that reads of mpidr_data are not guarded by a lock; promote to RCU
to cope with the possibility of mpidr_data being invalidated at runtime.

Fixes: 54a8006d0b ("KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240508071952.2035422-1-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-08 16:39:41 +01:00
Marc Zyngier 8540bd1b99 Merge branch kvm-arm64/pkvm-6.10 into kvmarm-master/next
* kvm-arm64/pkvm-6.10: (25 commits)
  : .
  : At last, a bunch of pKVM patches, courtesy of Fuad Tabba.
  : From the cover letter:
  :
  : "This series is a bit of a bombay-mix of patches we've been
  : carrying. There's no one overarching theme, but they do improve
  : the code by fixing existing bugs in pKVM, refactoring code to
  : make it more readable and easier to re-use for pKVM, or adding
  : functionality to the existing pKVM code upstream."
  : .
  KVM: arm64: Force injection of a data abort on NISV MMIO exit
  KVM: arm64: Restrict supported capabilities for protected VMs
  KVM: arm64: Refactor setting the return value in kvm_vm_ioctl_enable_cap()
  KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst
  KVM: arm64: Rename firmware pseudo-register documentation file
  KVM: arm64: Reformat/beautify PTP hypercall documentation
  KVM: arm64: Clarify rationale for ZCR_EL1 value restored on guest exit
  KVM: arm64: Introduce and use predicates that check for protected VMs
  KVM: arm64: Add is_pkvm_initialized() helper
  KVM: arm64: Simplify vgic-v3 hypercalls
  KVM: arm64: Move setting the page as dirty out of the critical section
  KVM: arm64: Change kvm_handle_mmio_return() return polarity
  KVM: arm64: Fix comment for __pkvm_vcpu_init_traps()
  KVM: arm64: Prevent kmemleak from accessing .hyp.data
  KVM: arm64: Do not map the host fpsimd state to hyp in pKVM
  KVM: arm64: Rename __tlb_switch_to_{guest,host}() in VHE
  KVM: arm64: Support TLB invalidation in guest context
  KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE
  KVM: arm64: Check for PTE validity when checking for executable/cacheable
  KVM: arm64: Avoid BUG-ing from the host abort path
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-03 11:39:52 +01:00
Marc Zyngier 3d5689e01a Merge branch kvm-arm64/lpi-xa-cache into kvmarm-master/next
* kvm-arm64/lpi-xa-cache:
  : .
  : New and improved LPI translation cache from Oliver Upton.
  :
  : From the cover letter:
  :
  : "As discussed [*], here is the new take on the LPI translation cache,
  : migrating to an xarray indexed by (devid, eventid) per ITS.
  :
  : The end result is quite satisfying, as it becomes possible to rip out
  : other nasties such as the lpi_list_lock. To that end, patches 2-6 aren't
  : _directly_ related to the translation cache cleanup, but instead are
  : done to enable the cleanups at the end of the series.
  :
  : I changed out my test machine from the last time so the baseline has
  : moved a bit, but here are the results from the vgic_lpi_stress test:
  :
  : +----------------------------+------------+-------------------+
  : |       Configuration        |  v6.8-rc1  | v6.8-rc1 + series |
  : +----------------------------+------------+-------------------+
  : | -v 1 -d 1 -e 1 -i 1000000  | 2063296.81 |        1362602.35 |
  : | -v 16 -d 16 -e 16 -i 10000 |  610678.33 |        5200910.01 |
  : | -v 16 -d 16 -e 17 -i 10000 |  678361.53 |        5890675.51 |
  : | -v 32 -d 32 -e 1 -i 100000 |  580918.96 |        8304552.67 |
  : | -v 1 -d 1 -e 17 -i 1000    | 1512443.94 |         1425953.8 |
  : +----------------------------+------------+-------------------+
  :
  : Unlike last time, no dramatic regressions at any performance point. The
  : regression on a single interrupt stream is to be expected, as the
  : overheads of SRCU and two tree traversals (kvm_io_bus_get_dev(),
  : translation cache xarray) are likely greater than that of a linked-list
  : with a single node."
  : .
  KVM: selftests: Add stress test for LPI injection
  KVM: selftests: Use MPIDR_HWID_BITMASK from cputype.h
  KVM: selftests: Add helper for enabling LPIs on a redistributor
  KVM: selftests: Add a minimal library for interacting with an ITS
  KVM: selftests: Add quadword MMIO accessors
  KVM: selftests: Standardise layout of GIC frames
  KVM: selftests: Align with kernel's GIC definitions
  KVM: arm64: vgic-its: Get rid of the lpi_list_lock
  KVM: arm64: vgic-its: Rip out the global translation cache
  KVM: arm64: vgic-its: Use the per-ITS translation cache for injection
  KVM: arm64: vgic-its: Spin off helper for finding ITS by doorbell addr
  KVM: arm64: vgic-its: Maintain a translation cache per ITS
  KVM: arm64: vgic-its: Scope translation cache invalidations to an ITS
  KVM: arm64: vgic-its: Get rid of vgic_copy_lpi_list()
  KVM: arm64: vgic-debug: Use an xarray mark for debug iterator
  KVM: arm64: vgic-its: Walk LPI xarray in vgic_its_cmd_handle_movall()
  KVM: arm64: vgic-its: Walk LPI xarray in vgic_its_invall()
  KVM: arm64: vgic-its: Walk LPI xarray in its_sync_lpi_pending_table()
  KVM: Treat the device list as an rculist

Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-03 11:39:31 +01:00
Marc Zyngier 2d38f43930 Merge branch kvm-arm64/nv-eret-pauth into kvmarm-master/next
* kvm-arm64/nv-eret-pauth:
  : .
  : Add NV support for the ERETAA/ERETAB instructions. From the cover letter:
  :
  : "Although the current upstream NV support has *some* support for
  : correctly emulating ERET, that support is only partial as it doesn't
  : support the ERETAA and ERETAB variants.
  :
  : Supporting these instructions was cast aside for a long time as it
  : involves implementing some form of PAuth emulation, something I wasn't
  : overly keen on. But I have reached a point where enough of the
  : infrastructure is there that it actually makes sense. So here it is!"
  : .
  KVM: arm64: nv: Work around lack of pauth support in old toolchains
  KVM: arm64: Drop trapping of PAuth instructions/keys
  KVM: arm64: nv: Advertise support for PAuth
  KVM: arm64: nv: Handle ERETA[AB] instructions
  KVM: arm64: nv: Add emulation for ERETAx instructions
  KVM: arm64: nv: Add kvm_has_pauth() helper
  KVM: arm64: nv: Reinject PAC exceptions caused by HCR_EL2.API==0
  KVM: arm64: nv: Handle HCR_EL2.{API,APK} independently
  KVM: arm64: nv: Honor HFGITR_EL2.ERET being set
  KVM: arm64: nv: Fast-track 'InHost' exception returns
  KVM: arm64: nv: Add trap forwarding for ERET and SMC
  KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2
  KVM: arm64: nv: Drop VCPU_HYP_CONTEXT flag
  KVM: arm64: Constraint PAuth support to consistent implementations
  KVM: arm64: Add helpers for ESR_ELx_ERET_ISS_ERET*
  KVM: arm64: Harden __ctxt_sys_reg() against out-of-range values

Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-03 11:39:19 +01:00
Marc Zyngier 34c0d5a606 Merge branch kvm-arm64/host_data into kvmarm-master/next
* kvm-arm64/host_data:
  : .
  : Rationalise the host-specific data to live as part of the per-CPU state.
  :
  : From the cover letter:
  :
  : "It appears that over the years, we have accumulated a lot of cruft in
  : the kvm_vcpu_arch structure. Part of the gunk is data that is strictly
  : host CPU specific, and this result in two main problems:
  :
  : - the structure itself is stupidly large, over 8kB. With the
  :   arch-agnostic kvm_vcpu, we're above 10kB, which is insane. This has
  :   some ripple effects, as we need physically contiguous allocation to
  :   be able to map it at EL2 for !VHE. There is more to it though, as
  :   some data structures, although per-vcpu, could be allocated
  :   separately.
  :
  : - We lose track of the life-cycle of this data, because we're
  :   guaranteed that it will be around forever and we start relying on
  :   wrong assumptions. This is becoming a maintenance burden.
  :
  : This series rectifies some of these things, starting with the two main
  : offenders: debug and FP, a lot of which gets pushed out to the per-CPU
  : host structure. Indeed, their lifetime really isn't that of the vcpu,
  : but tied to the physical CPU the vpcu runs on.
  :
  : This results in a small reduction of the vcpu size, but mainly a much
  : clearer understanding of the life-cycle of these structures."
  : .
  KVM: arm64: Move management of __hyp_running_vcpu to load/put on VHE
  KVM: arm64: Exclude FP ownership from kvm_vcpu_arch
  KVM: arm64: Exclude host_fpsimd_state pointer from kvm_vcpu_arch
  KVM: arm64: Exclude mdcr_el2_host from kvm_vcpu_arch
  KVM: arm64: Exclude host_debug_data from vcpu_arch
  KVM: arm64: Add accessor for per-CPU state

Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-03 11:39:07 +01:00
Marc Zyngier 9a39359903 KVM: arm64: Move management of __hyp_running_vcpu to load/put on VHE
The per-CPU host context structure contains a __hyp_running_vcpu that
serves as a replacement for kvm_get_current_vcpu() in contexts where
we cannot make direct use of it (such as in the nVHE hypervisor).
Since there is a lot of common code between nVHE and VHE, the latter
also populates this field even if kvm_get_running_vcpu() always works.

We currently pretty inconsistent when populating __hyp_running_vcpu
to point to the currently running vcpu:

- on {n,h}VHE, we set __hyp_running_vcpu on entry to __kvm_vcpu_run
  and clear it on exit.

- on VHE, we set __hyp_running_vcpu on entry to __kvm_vcpu_run_vhe
  and never clear it, effectively leaving a dangling pointer...

VHE is obviously the odd one here. Although we could make it behave
just like nVHE, this wouldn't match the behaviour of KVM with VHE,
where the load phase is where most of the context-switch gets done.

So move all the __hyp_running_vcpu management to the VHE-specific
load/put phases, giving us a bit more sanity and matching the
behaviour of kvm_get_running_vcpu().

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502154030.3011995-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-03 11:36:24 +01:00
Marc Zyngier 03b3d00a70 KVM: arm64: vgic: Allocate private interrupts on demand
Private interrupts are currently part of the CPU interface structure
that is part of each and every vcpu we create.

Currently, we have 32 of them per vcpu, resulting in a per-vcpu array
that is just shy of 4kB. On its own, that's no big deal, but it gets
in the way of other things:

- each vcpu gets mapped at EL2 on nVHE/hVHE configurations. This
  requires memory that is physically contiguous. However, the EL2
  code has no purpose looking at the interrupt structures and
  could do without them being mapped.

- supporting features such as EPPIs, which extend the number of
  private interrupts past the 32 limit would make the array
  even larger, even for VMs that do not use the EPPI feature.

Address these issues by moving the private interrupt array outside
of the vcpu, and replace it with a simple pointer. We take this
opportunity to make it obvious what gets initialised when, as
that path was remarkably opaque, and tighten the locking.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502154545.3012089-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-03 11:33:50 +01:00
Marc Zyngier 3b467b1658 KVM: arm64: Force injection of a data abort on NISV MMIO exit
If a vcpu exits for a data abort with an invalid syndrome, the
expectations are that userspace has a chance to save the day if
it has requested to see such exits.

However, this is completely futile in the case of a protected VM,
as none of the state is available. In this particular case, inject
a data abort directly into the vcpu, consistent with what userspace
could do.

This also helps with pKVM, which discards all syndrome information when
forwarding data aborts that are not known to be MMIO.

Finally, document this tweak to the API.

Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-31-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01 16:48:15 +01:00
Fuad Tabba 92536992cf KVM: arm64: Restrict supported capabilities for protected VMs
For practical reasons as well as security related ones, not all
capabilities are supported for protected VMs in pKVM.

Add a function that restricts the capabilities for protected VMs.
This behaves as an allow-list to ensure that future capabilities
are checked for compatibility and security before being allowed
for protected VMs.

Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-30-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01 16:48:14 +01:00
Fuad Tabba 97a3dee172 KVM: arm64: Refactor setting the return value in kvm_vm_ioctl_enable_cap()
Initialize r = -EINVAL to get rid of the error-path
initializations in kvm_vm_ioctl_enable_cap().

No functional change intended.

Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-29-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01 16:48:14 +01:00
Fuad Tabba eef4ce6363 KVM: arm64: Clarify rationale for ZCR_EL1 value restored on guest exit
Expand comment clarifying why the host value representing SVE
vector length being restored for ZCR_EL1 on guest exit isn't the
same as it was on guest entry.

Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-21-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01 16:48:14 +01:00
Fuad Tabba b6ed4fa941 KVM: arm64: Introduce and use predicates that check for protected VMs
In order to determine whether or not a VM or vcpu are protected,
introduce helpers to query this state. While at it, use the vcpu
helper to check vcpus protected state instead of the kvm one.

Co-authored-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-19-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01 16:48:14 +01:00
Marc Zyngier 948e1a53c2 KVM: arm64: Simplify vgic-v3 hypercalls
Consolidate the GICv3 VMCR accessor hypercalls into the APR save/restore
hypercalls so that all of the EL2 GICv3 state is covered by a single pair
of hypercalls.

Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-17-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01 16:48:14 +01:00
Fuad Tabba 9c30fc615d KVM: arm64: Move setting the page as dirty out of the critical section
Move the unlock earlier in user_mem_abort() to shorten the
critical section. This also helps for future refactoring and
reuse of similar code.

This moves out marking the page as dirty outside of the critical
section. That code does not interact with the stage-2 page
tables, which the read lock in the critical section protects.

Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-16-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01 16:48:14 +01:00
Fuad Tabba cc81b6dfc3 KVM: arm64: Change kvm_handle_mmio_return() return polarity
Most exit handlers return <= 0 to indicate that the host needs to
handle the exit. Make kvm_handle_mmio_return() consistent with
the exit handlers in handle_exit(). This makes the code easier to
reason about, and makes it easier to add other handlers in future
patches.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-15-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01 16:48:14 +01:00
Fuad Tabba 40458a66af KVM: arm64: Fix comment for __pkvm_vcpu_init_traps()
Fix the comment to clarify that __pkvm_vcpu_init_traps()
initializes traps for all VMs in protected mode, and not only
for protected VMs.

Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-14-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01 16:48:14 +01:00
Quentin Perret 06cacc9d28 KVM: arm64: Prevent kmemleak from accessing .hyp.data
We've added a .data section for the hypervisor, which kmemleak is
eager to parse. This clearly doesn't go well, so add the section
to kmemleak's block list.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-13-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01 16:48:14 +01:00
Fuad Tabba d48965bc47 KVM: arm64: Do not map the host fpsimd state to hyp in pKVM
pKVM maintains its own state at EL2 for tracking the host fpsimd
state. Therefore, no need to map and share the host's view with
it.

Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-12-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01 16:48:14 +01:00
Fuad Tabba cfbdc546b6 KVM: arm64: Rename __tlb_switch_to_{guest,host}() in VHE
Rename __tlb_switch_to_{guest,host}() to
{enter,exit}_vmid_context() in VHE code to maintain symmetry
between the nVHE and VHE TLB invalidations.

No functional change intended.

Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-11-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01 16:48:14 +01:00
Will Deacon 58f3b0fc3b KVM: arm64: Support TLB invalidation in guest context
Typically, TLB invalidation of guest stage-2 mappings using nVHE is
performed by a hypercall originating from the host. For the invalidation
instruction to be effective, therefore, __tlb_switch_to_{guest,host}()
swizzle the active stage-2 context around the TLBI instruction.

With guest-to-host memory sharing and unsharing hypercalls
originating from the guest under pKVM, there is need to support
both guest and host VMID invalidations issued from guest context.

Replace the __tlb_switch_to_{guest,host}() functions with a more general
{enter,exit}_vmid_context() implementation which supports being invoked
from guest context and acts as a no-op if the target context matches the
running context.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-10-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01 16:47:45 +01:00