Commit graph

76932 commits

Author SHA1 Message Date
Bharat Bhushan ee53e560a8 booke: Added DBCR4 SPR number
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-02-13 12:56:42 +01:00
Bharat Bhushan 1d542d9c2b KVM: PPC: booke: Allow multiple exception types
Current kvmppc_booke_handlers uses the same macro (KVM_HANDLER) and
all handlers are considered to be the same size. This will not be
the case if we want to use different macros for different handlers.

This patch improves the kvmppc_booke_handler so that it can
support different macros for different handlers.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
[bharat.bhushan@freescale.com: Substantial changes]
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-02-13 12:56:40 +01:00
Bharat Bhushan ffe129ecd7 KVM: PPC: booke: use vcpu reference from thread_struct
Like other places, use thread_struct to get vcpu reference.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-02-13 12:56:39 +01:00
Alexander Graf dd92d6f274 Merge commit 'origin/next' into kvm-ppc-next 2013-02-13 12:56:14 +01:00
Gleb Natapov b0da5bec30 KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-05 23:29:46 -02:00
Dongxiao Xu c08800a56c KVM: VMX: disable SMEP feature when guest is in non-paging mode
SMEP is disabled if CPU is in non-paging mode in hardware.
However KVM always uses paging mode to emulate guest non-paging
mode with TDP. To emulate this behavior, SMEP needs to be manually
disabled when guest switches to non-paging mode.

We met an issue that, SMP Linux guest with recent kernel (enable
SMEP support, for example, 3.5.3) would crash with triple fault if
setting unrestricted_guest=0. This is because KVM uses an identity
mapping page table to emulate the non-paging mode, where the page
table is set with USER flag. If SMEP is still enabled in this case,
guest will meet unhandlable page fault and then crash.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-05 23:28:07 -02:00
Gleb Natapov 834be0d83f Revert "KVM: MMU: split kvm_mmu_free_page"
This reverts commit bd4c86eaa6.

There is not user for kvm_mmu_isolate_page() any more.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-05 22:47:39 -02:00
Gleb Natapov eb3fce87cc KVM: MMU: drop superfluous is_present_gpte() check.
Gust page walker puts only present ptes into ptes[] array. No need to
check it again.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04 23:24:28 -02:00
Gleb Natapov 116eb3d30e KVM: MMU: drop superfluous min() call.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04 23:24:28 -02:00
Gleb Natapov 2c9afa52ef KVM: MMU: set base_role.nxe during mmu initialization.
Move base_role.nxe initialisation to where all other roles are initialized.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04 23:24:28 -02:00
Gleb Natapov 9bb4f6b15e KVM: MMU: drop unneeded checks.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04 23:24:28 -02:00
Gleb Natapov feb3eb704a KVM: MMU: make spte_is_locklessly_modifiable() more clear
spte_is_locklessly_modifiable() checks that both SPTE_HOST_WRITEABLE and
SPTE_MMU_WRITEABLE are present on spte. Make it more explicit.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04 23:24:28 -02:00
Christian Borntraeger 0c29b2293b s390/kvm: Fix instruction decoding
Instructions with long displacement have a signed displacement.
Currently the sign bit is interpreted as 2^20: Lets fix it by doing the
sign extension from 20bit to 32bit and then use it as a signed variable
in the addition (see kvm_s390_get_base_disp_rsy).

Furthermore, there are lots of "int" in that code. This is problematic,
because shifting on a signed integer is undefined/implementation defined
if the bit value happens to be negative.
Fortunately the promotion rules will make the right hand side unsigned
anyway, so there is no real problem right now.
Let's convert them anyway to unsigned where appropriate to avoid
problems if the code is changed or copy/pasted later on.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-30 12:35:59 +02:00
Christian Borntraeger 15bc8d8457 s390/kvm: Fix store status for ACRS/FPRS
On store status we need to copy the current state of registers
into a save area. Currently we might save stale versions:
The sie state descriptor doesnt have fields for guest ACRS,FPRS,
those registers are simply stored in the host registers. The host
program must copy these away if needed. We do that in vcpu_put/load.

If we now do a store status in KVM code between vcpu_put/load, the
saved values are not up-to-date. Lets collect the ACRS/FPRS before
saving them.

This also fixes some strange problems with hotplug and virtio-ccw,
since the low level machine check handler (on hotplug a machine check
will happen) will revalidate all registers with the content of the
save area.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: stable@vger.kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-30 12:35:51 +02:00
Yang Zhang c7c9c56ca2 x86, apicv: add virtual interrupt delivery support
Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:

- for pending interrupt, instead of direct injection, we may need
  update architecture specific indicators before resuming to guest.

- A pending interrupt, which is masked by ISR, should be also
  considered in above update action, since hardware will decide
  when to inject it at right time. Current has_interrupt and
  get_interrupt only returns a valid vector from injection p.o.v.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-29 10:48:19 +02:00
Yang Zhang 8d14695f95 x86, apicv: add virtual x2apic support
basically to benefit from apicv, we need to enable virtualized x2apic mode.
Currently, we only enable it when guest is really using x2apic.

Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic:
0x800 - 0x8ff: no read intercept for apicv register virtualization,
               except APIC ID and TMCCT which need software's assistance to
               get right value.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-29 10:48:06 +02:00
Yang Zhang 83d4c28693 x86, apicv: add APICv register virtualization support
- APIC read doesn't cause VM-Exit
- APIC write becomes trap-like

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-29 10:47:54 +02:00
Avi Kivity 3f0c3d0bb2 KVM: x86 emulator: fix test_cc() build failure on i386
'pushq' doesn't exist on i386.  Replace with 'push', which should work
since the operand is a register.

Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-27 11:09:38 +02:00
Alexander Graf b9e3e20893 KVM: PPC: E500: Remove kvmppc_e500_tlbil_all usage from guest TLB code
The guest TLB handling code should not have any insight into how the host
TLB shadow code works.

kvmppc_e500_tlbil_all() is a function that is used for distinction between
e500v2 and e500mc (E.HV) on how to flush shadow entries. This function really
is private between the e500.c/e500mc.c file and e500_mmu_host.c.

Instead of this one, use the public kvmppc_core_flush_tlb() function to flush
all shadow TLB entries. As a nice side effect, with this we also end up
flushing TLB1 entries which we forgot to do before.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:34 +01:00
Alexander Graf 483ba97c0f KVM: PPC: E500: Make clear_tlb_refs and clear_tlb1_bitmap static
Host shadow TLB flushing is logic that the guest TLB code should have
no insight about. Declare the internal clear_tlb_refs and clear_tlb1_bitmap
functions static to the host TLB handling file.

Instead of these, we can use the already exported kvmppc_core_flush_tlb().
This gives us a common API across the board to say "please flush any
pending host shadow translation".

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:33 +01:00
Alexander Graf c015c62b13 KVM: PPC: e500: Implement TLB1-in-TLB0 mapping
When a host mapping fault happens in a guest TLB1 entry today, we
map the translated guest entry into the host's TLB1.

This isn't particularly clever when the guest is mapped by normal 4k
pages, since these would be a lot better to put into TLB0 instead.

This patch adds the required logic to map 4k TLB1 shadow maps into
the host's TLB0.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:32 +01:00
Alexander Graf b71c9e2fb7 KVM: PPC: E500: Split host and guest MMU parts
This patch splits the file e500_tlb.c into e500_mmu.c (guest TLB handling)
and e500_mmu_host.c (host TLB handling).

The main benefit of this split is readability and maintainability. It's
just a lot harder to write dirty code :).

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:31 +01:00
Alexander Graf 9d98b3ff94 KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping
When emulating tlbwe, we want to automatically map the entry that just got
written in our shadow TLB map, because chances are quite high that it's
going to be used very soon.

Today this happens explicitly, duplicating all the logic that is in
kvmppc_mmu_map() already. Just call that one instead.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:31 +01:00
Alexander Graf 2c378fd779 KVM: PPC: E500: Propagate errors when shadow mapping
When shadow mapping a page, mapping this page can fail. In that case we
don't have a shadow map.

Take this case into account, otherwise we might end up writing bogus TLB
entries into the host TLB.

While at it, also move the write_stlbe() calls into the respective TLBn
handlers.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:30 +01:00
Alexander Graf 523f0e5421 KVM: PPC: E500: Explicitly mark shadow maps invalid
When we invalidate shadow TLB maps on the host, we don't mark them
as not valid. But we should.

Fix this by removing the E500_TLB_VALID from their flags when
invalidating.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:30 +01:00
Alexander Graf 9445ef0181 KVM: PPC: E500: Move write_stlbe higher
Later patches want to call the function and it doesn't have
dependencies on anything below write_host_tlbe.

Move it higher up in the file.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:29 +01:00
Gleb Natapov 141687869f KVM: VMX: set vmx->emulation_required only when needed.
If emulate_invalid_guest_state=false vmx->emulation_required is never
actually used, but it ends up to be always set to true since
handle_invalid_guest_state(), the only place it is reset back to
false, is never called. This, besides been not very clean, makes vmexit
and vmentry path to check emulate_invalid_guest_state needlessly.

The patch fixes that by keeping emulation_required coherent with
emulate_invalid_guest_state setting.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:31 -02:00
Gleb Natapov 378a8b099f KVM: x86: fix use of uninitialized memory as segment descriptor in emulator.
If VMX reports segment as unusable, zero descriptor passed by the emulator
before returning. Such descriptor will be considered not present by the
emulator.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:31 -02:00
Gleb Natapov 91b0aa2ca6 KVM: VMX: rename fix_pmode_dataseg to fix_pmode_seg.
The function deals with code segment too.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:30 -02:00
Gleb Natapov 25391454e7 KVM: VMX: don't clobber segment AR of unusable segments.
Usability is returned in unusable field, so not need to clobber entire
AR. Callers have to know how to deal with unusable segments already
since if emulate_invalid_guest_state=true AR is not zeroed.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:28 -02:00
Gleb Natapov 218e763f45 KVM: VMX: skip vmx->rmode.vm86_active check on cr0 write if unrestricted guest is enabled
vmx->rmode.vm86_active is never true is unrestricted guest is enabled.
Make it more explicit that neither enter_pmode() nor enter_rmode() is
called in this case.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:28 -02:00
Gleb Natapov 286da4156d KVM: VMX: remove hack that disables emulation on vcpu reset/init
There is no reason for it. If state is suitable for vmentry it
will be detected during guest entry and no emulation will happen.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:27 -02:00
Gleb Natapov c5e97c80b5 KVM: VMX: if unrestricted guest is enabled vcpu state is always valid.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:26 -02:00
Gleb Natapov 2f143240cb KVM: VMX: reset CPL only on CS register write.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:26 -02:00
Gleb Natapov 1f3141e80b KVM: VMX: remove special CPL cache access during transition to real mode.
Since vmx_get_cpl() always returns 0 when VCPU is in real mode it is no
longer needed. Also reset CPL cache to zero during transaction to
protected mode since transaction may happen while CS.selectors & 3 != 0,
but in reality CPL is 0.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:25 -02:00
Avi Kivity 158de57f90 KVM: x86 emulator: convert a few freestanding emulations to fastop
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:40 -02:00
Avi Kivity 34b77652b9 KVM: x86 emulator: rearrange fastop definitions
Make fastop opcodes usable in other emulations.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:39 -02:00
Avi Kivity 4d7583493e KVM: x86 emulator: convert 2-operand IMUL to fastop
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:38 -02:00
Avi Kivity 11c363ba8f KVM: x86 emulator: convert BT/BTS/BTR/BTC/BSF/BSR to fastop
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:37 -02:00
Avi Kivity 95413dc413 KVM: x86 emulator: convert INC/DEC to fastop
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:37 -02:00
Avi Kivity 9ae9febae9 KVM: x86 emulator: covert SETCC to fastop
This is a bit of a special case since we don't have the usual
byte/word/long/quad switch; instead we switch on the condition code embedded
in the instruction.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:36 -02:00
Avi Kivity 007a3b5475 KVM: x86 emulator: convert shift/rotate instructions to fastop
SHL, SHR, ROL, ROR, RCL, RCR, SAR, SAL

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:35 -02:00
Avi Kivity 0bdea06892 KVM: x86 emulator: Convert SHLD, SHRD to fastop
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:33 -02:00
Xiao Guangrong 93c05d3ef2 KVM: x86: improve reexecute_instruction
The current reexecute_instruction can not well detect the failed instruction
emulation. It allows guest to retry all the instructions except it accesses
on error pfn

For example, some cases are nested-write-protect - if the page we want to
write is used as PDE but it chains to itself. Under this case, we should
stop the emulation and report the case to userspace

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-21 22:58:33 -02:00
Xiao Guangrong 95b3cf69bd KVM: x86: let reexecute_instruction work for tdp
Currently, reexecute_instruction refused to retry all instructions if
tdp is enabled. If nested npt is used, the emulation may be caused by
shadow page, it can be fixed by dropping the shadow page. And the only
condition that tdp can not retry the instruction is the access fault
on error pfn

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-21 22:58:32 -02:00
Xiao Guangrong 22368028fe KVM: x86: clean up reexecute_instruction
Little cleanup for reexecute_instruction, also use gpa_to_gfn in
retry_instruction

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-21 22:58:31 -02:00
Cong Ding a046b816a4 KVM: s390: kvm/sigp.c: fix memory leakage
the variable inti should be freed in the branch CPUSTAT_STOPPED.

Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-17 08:41:48 +02:00
Takuya Yoshikawa 6b81b05e44 KVM: MMU: Conditionally reschedule when kvm_mmu_slot_remove_write_access() takes a long time
If the userspace starts dirty logging for a large slot, say 64GB of
memory, kvm_mmu_slot_remove_write_access() needs to hold mmu_lock for
a long time such as tens of milliseconds.  This patch controls the lock
hold time by asking the scheduler if we need to reschedule for others.

One penalty for this is that we need to flush TLBs before releasing
mmu_lock.  But since holding mmu_lock for a long time does affect not
only the guest, vCPU threads in other words, but also the host as a
whole, we should pay for that.

In practice, the cost will not be so high because we can protect a fair
amount of memory before being rescheduled: on my test environment,
cond_resched_lock() was called only once for protecting 12GB of memory
even without THP.  We can also revisit Avi's "unlocked TLB flush" work
later for completely suppressing extra TLB flushes if needed.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:14:28 +02:00
Takuya Yoshikawa 9d1beefb71 KVM: Make kvm_mmu_slot_remove_write_access() take mmu_lock by itself
Better to place mmu_lock handling and TLB flushing code together since
this is a self-contained function.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:14:17 +02:00
Takuya Yoshikawa b34cb590fb KVM: Make kvm_mmu_change_mmu_pages() take mmu_lock by itself
No reason to make callers take mmu_lock since we do not need to protect
kvm_mmu_change_mmu_pages() and kvm_mmu_slot_remove_write_access()
together by mmu_lock in kvm_arch_commit_memory_region(): the former
calls kvm_mmu_commit_zap_page() and flushes TLBs by itself.

Note: we do not need to protect kvm->arch.n_requested_mmu_pages by
mmu_lock as can be seen from the fact that it is read locklessly.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:14:09 +02:00