mirror of
https://github.com/torvalds/linux
synced 2024-10-03 01:43:05 +00:00
Documentation: hyperv: Improve synic and interrupt handling description
Current documentation does not describe how Linux handles the synthetic interrupt controller (synic) that Hyper-V provides to guest VMs, nor how VMBus or timer interrupts are handled. Add text describing the synic and reorganize existing text to make this more clear. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20240511133818.19649-2-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240511133818.19649-2-mhklinux@outlook.com>
This commit is contained in:
parent
4c5a65fd10
commit
a0b134032e
|
@ -62,12 +62,21 @@ shared page with scale and offset values into user space. User
|
|||
space code performs the same algorithm of reading the TSC and
|
||||
applying the scale and offset to get the constant 10 MHz clock.
|
||||
|
||||
Linux clockevents are based on Hyper-V synthetic timer 0. While
|
||||
Hyper-V offers 4 synthetic timers for each CPU, Linux only uses
|
||||
timer 0. Interrupts from stimer0 are recorded on the "HVS" line in
|
||||
/proc/interrupts. Clockevents based on the virtualized PIT and
|
||||
local APIC timer also work, but the Hyper-V synthetic timer is
|
||||
preferred.
|
||||
Linux clockevents are based on Hyper-V synthetic timer 0 (stimer0).
|
||||
While Hyper-V offers 4 synthetic timers for each CPU, Linux only uses
|
||||
timer 0. In older versions of Hyper-V, an interrupt from stimer0
|
||||
results in a VMBus control message that is demultiplexed by
|
||||
vmbus_isr() as described in the Documentation/virt/hyperv/vmbus.rst
|
||||
documentation. In newer versions of Hyper-V, stimer0 interrupts can
|
||||
be mapped to an architectural interrupt, which is referred to as
|
||||
"Direct Mode". Linux prefers to use Direct Mode when available. Since
|
||||
x86/x64 doesn't support per-CPU interrupts, Direct Mode statically
|
||||
allocates an x86 interrupt vector (HYPERV_STIMER0_VECTOR) across all CPUs
|
||||
and explicitly codes it to call the stimer0 interrupt handler. Hence
|
||||
interrupts from stimer0 are recorded on the "HVS" line in /proc/interrupts
|
||||
rather than being associated with a Linux IRQ. Clockevents based on the
|
||||
virtualized PIT and local APIC timer also work, but Hyper-V stimer0
|
||||
is preferred.
|
||||
|
||||
The driver for the Hyper-V synthetic system clock and timers is
|
||||
drivers/clocksource/hyperv_timer.c.
|
||||
|
|
|
@ -102,10 +102,10 @@ resources. For Windows Server 2019 and later, this limit is
|
|||
approximately 1280 Mbytes. For versions prior to Windows Server
|
||||
2019, the limit is approximately 384 Mbytes.
|
||||
|
||||
VMBus messages
|
||||
--------------
|
||||
All VMBus messages have a standard header that includes the message
|
||||
length, the offset of the message payload, some flags, and a
|
||||
VMBus channel messages
|
||||
----------------------
|
||||
All messages sent in a VMBus channel have a standard header that includes
|
||||
the message length, the offset of the message payload, some flags, and a
|
||||
transactionID. The portion of the message after the header is
|
||||
unique to each VSP/VSC pair.
|
||||
|
||||
|
@ -137,7 +137,7 @@ control message contains a list of GPAs that describe the data
|
|||
buffer. For example, the storvsc driver uses this approach to
|
||||
specify the data buffers to/from which disk I/O is done.
|
||||
|
||||
Three functions exist to send VMBus messages:
|
||||
Three functions exist to send VMBus channel messages:
|
||||
|
||||
1. vmbus_sendpacket(): Control-only messages and messages with
|
||||
embedded data -- no GPAs
|
||||
|
@ -165,6 +165,37 @@ performed in this temporary buffer without the risk of Hyper-V
|
|||
maliciously modifying the message after it is validated but before
|
||||
it is used.
|
||||
|
||||
Synthetic Interrupt Controller (synic)
|
||||
--------------------------------------
|
||||
Hyper-V provides each guest CPU with a synthetic interrupt controller
|
||||
that is used by VMBus for host-guest communication. While each synic
|
||||
defines 16 synthetic interrupts (SINT), Linux uses only one of the 16
|
||||
(VMBUS_MESSAGE_SINT). All interrupts related to communication between
|
||||
the Hyper-V host and a guest CPU use that SINT.
|
||||
|
||||
The SINT is mapped to a single per-CPU architectural interrupt (i.e,
|
||||
an 8-bit x86/x64 interrupt vector, or an arm64 PPI INTID). Because
|
||||
each CPU in the guest has a synic and may receive VMBus interrupts,
|
||||
they are best modeled in Linux as per-CPU interrupts. This model works
|
||||
well on arm64 where a single per-CPU Linux IRQ is allocated for
|
||||
VMBUS_MESSAGE_SINT. This IRQ appears in /proc/interrupts as an IRQ labelled
|
||||
"Hyper-V VMbus". Since x86/x64 lacks support for per-CPU IRQs, an x86
|
||||
interrupt vector is statically allocated (HYPERVISOR_CALLBACK_VECTOR)
|
||||
across all CPUs and explicitly coded to call vmbus_isr(). In this case,
|
||||
there's no Linux IRQ, and the interrupts are visible in aggregate in
|
||||
/proc/interrupts on the "HYP" line.
|
||||
|
||||
The synic provides the means to demultiplex the architectural interrupt into
|
||||
one or more logical interrupts and route the logical interrupt to the proper
|
||||
VMBus handler in Linux. This demultiplexing is done by vmbus_isr() and
|
||||
related functions that access synic data structures.
|
||||
|
||||
The synic is not modeled in Linux as an irq chip or irq domain,
|
||||
and the demultiplexed logical interrupts are not Linux IRQs. As such,
|
||||
they don't appear in /proc/interrupts or /proc/irq. The CPU
|
||||
affinity for one of these logical interrupts is controlled via an
|
||||
entry under /sys/bus/vmbus as described below.
|
||||
|
||||
VMBus interrupts
|
||||
----------------
|
||||
VMBus provides a mechanism for the guest to interrupt the host when
|
||||
|
@ -176,16 +207,18 @@ unnecessary. If a guest sends an excessive number of unnecessary
|
|||
interrupts, the host may throttle that guest by suspending its
|
||||
execution for a few seconds to prevent a denial-of-service attack.
|
||||
|
||||
Similarly, the host will interrupt the guest when it sends a new
|
||||
message on the VMBus control path, or when a VMBus channel "in" ring
|
||||
buffer transitions from empty to non-empty. Each CPU in the guest
|
||||
may receive VMBus interrupts, so they are best modeled as per-CPU
|
||||
interrupts in Linux. This model works well on arm64 where a single
|
||||
per-CPU IRQ is allocated for VMBus. Since x86/x64 lacks support for
|
||||
per-CPU IRQs, an x86 interrupt vector is statically allocated (see
|
||||
HYPERVISOR_CALLBACK_VECTOR) across all CPUs and explicitly coded to
|
||||
call the VMBus interrupt service routine. These interrupts are
|
||||
visible in /proc/interrupts on the "HYP" line.
|
||||
Similarly, the host will interrupt the guest via the synic when
|
||||
it sends a new message on the VMBus control path, or when a VMBus
|
||||
channel "in" ring buffer transitions from empty to non-empty due to
|
||||
the host inserting a new VMBus channel message. The control message stream
|
||||
and each VMBus channel "in" ring buffer are separate logical interrupts
|
||||
that are demultiplexed by vmbus_isr(). It demultiplexes by first checking
|
||||
for channel interrupts by calling vmbus_chan_sched(), which looks at a synic
|
||||
bitmap to determine which channels have pending interrupts on this CPU.
|
||||
If multiple channels have pending interrupts for this CPU, they are
|
||||
processed sequentially. When all channel interrupts have been processed,
|
||||
vmbus_isr() checks for and processes any messages received on the VMBus
|
||||
control path.
|
||||
|
||||
The guest CPU that a VMBus channel will interrupt is selected by the
|
||||
guest when the channel is created, and the host is informed of that
|
||||
|
@ -212,10 +245,9 @@ neither "unmanaged" nor "managed" interrupts.
|
|||
The CPU that a VMBus channel will interrupt can be seen in
|
||||
/sys/bus/vmbus/devices/<deviceGUID>/ channels/<channelRelID>/cpu.
|
||||
When running on later versions of Hyper-V, the CPU can be changed
|
||||
by writing a new value to this sysfs entry. Because the interrupt
|
||||
assignment is done outside of the normal Linux affinity mechanism,
|
||||
there are no entries in /proc/irq corresponding to individual
|
||||
VMBus channel interrupts.
|
||||
by writing a new value to this sysfs entry. Because VMBus channel
|
||||
interrupts are not Linux IRQs, there are no entries in /proc/interrupts
|
||||
or /proc/irq corresponding to individual VMBus channel interrupts.
|
||||
|
||||
An online CPU in a Linux guest may not be taken offline if it has
|
||||
VMBus channel interrupts assigned to it. Any such channel
|
||||
|
@ -223,15 +255,6 @@ interrupts must first be manually reassigned to another CPU as
|
|||
described above. When no channel interrupts are assigned to the
|
||||
CPU, it can be taken offline.
|
||||
|
||||
When a guest CPU receives a VMBus interrupt from the host, the
|
||||
function vmbus_isr() handles the interrupt. It first checks for
|
||||
channel interrupts by calling vmbus_chan_sched(), which looks at a
|
||||
bitmap setup by the host to determine which channels have pending
|
||||
interrupts on this CPU. If multiple channels have pending
|
||||
interrupts for this CPU, they are processed sequentially. When all
|
||||
channel interrupts have been processed, vmbus_isr() checks for and
|
||||
processes any message received on the VMBus control path.
|
||||
|
||||
The VMBus channel interrupt handling code is designed to work
|
||||
correctly even if an interrupt is received on a CPU other than the
|
||||
CPU assigned to the channel. Specifically, the code does not use
|
||||
|
|
Loading…
Reference in a new issue