Commit graph

4016 commits

Author SHA1 Message Date
Radim Krčmář a9ff720e0f Merge branch 'x86/cpufeature' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
For AVX512_VPOPCNTDQ.
2017-01-17 17:53:01 +01:00
Paolo Bonzini 33ab91103b KVM: x86: fix emulation of "MOV SS, null selector"
This is CVE-2017-2583.  On Intel this causes a failed vmentry because
SS's type is neither 3 nor 7 (even though the manual says this check is
only done for usable SS, and the dmesg splat says that SS is unusable!).
On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.

The fix fabricates a data segment descriptor when SS is set to a null
selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
this in turn ensures CPL < 3 because RPL must be equal to CPL.

Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
the bug and deciphering the manuals.

Reported-by: Xiaohan Zhang <zhangxiaohan1@huawei.com>
Fixes: 79d5b4c3cd
Cc: stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-12 15:17:13 +01:00
Wanpeng Li 546d87e5c9 KVM: x86: fix NULL deref in vcpu_scan_ioapic
Reported by syzkaller:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0
    IP: _raw_spin_lock+0xc/0x30
    PGD 3e28eb067
    PUD 3f0ac6067
    PMD 0
    Oops: 0002 [#1] SMP
    CPU: 0 PID: 2431 Comm: test Tainted: G           OE   4.10.0-rc1+ #3
    Call Trace:
     ? kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
     kvm_arch_vcpu_ioctl_run+0x10a8/0x15f0 [kvm]
     ? pick_next_task_fair+0xe1/0x4e0
     ? kvm_arch_vcpu_load+0xea/0x260 [kvm]
     kvm_vcpu_ioctl+0x33a/0x600 [kvm]
     ? hrtimer_try_to_cancel+0x29/0x130
     ? do_nanosleep+0x97/0xf0
     do_vfs_ioctl+0xa1/0x5d0
     ? __hrtimer_init+0x90/0x90
     ? do_nanosleep+0x5b/0xf0
     SyS_ioctl+0x79/0x90
     do_syscall_64+0x6e/0x180
     entry_SYSCALL64_slow_path+0x25/0x25
    RIP: _raw_spin_lock+0xc/0x30 RSP: ffffa43688973cc0

The syzkaller folks reported a NULL pointer dereference due to
ENABLE_CAP succeeding even without an irqchip.  The Hyper-V
synthetic interrupt controller is activated, resulting in a
wrong request to rescan the ioapic and a NULL pointer dereference.

    #include <sys/ioctl.h>
    #include <sys/mman.h>
    #include <sys/types.h>
    #include <linux/kvm.h>
    #include <pthread.h>
    #include <stddef.h>
    #include <stdint.h>
    #include <stdlib.h>
    #include <string.h>
    #include <unistd.h>

    #ifndef KVM_CAP_HYPERV_SYNIC
    #define KVM_CAP_HYPERV_SYNIC 123
    #endif

    void* thr(void* arg)
    {
	struct kvm_enable_cap cap;
	cap.flags = 0;
	cap.cap = KVM_CAP_HYPERV_SYNIC;
	ioctl((long)arg, KVM_ENABLE_CAP, &cap);
	return 0;
    }

    int main()
    {
	void *host_mem = mmap(0, 0x1000, PROT_READ|PROT_WRITE,
			MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
	int kvmfd = open("/dev/kvm", 0);
	int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
	struct kvm_userspace_memory_region memreg;
	memreg.slot = 0;
	memreg.flags = 0;
	memreg.guest_phys_addr = 0;
	memreg.memory_size = 0x1000;
	memreg.userspace_addr = (unsigned long)host_mem;
	host_mem[0] = 0xf4;
	ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &memreg);
	int cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0);
	struct kvm_sregs sregs;
	ioctl(cpufd, KVM_GET_SREGS, &sregs);
	sregs.cr0 = 0;
	sregs.cr4 = 0;
	sregs.efer = 0;
	sregs.cs.selector = 0;
	sregs.cs.base = 0;
	ioctl(cpufd, KVM_SET_SREGS, &sregs);
	struct kvm_regs regs = { .rflags = 2 };
	ioctl(cpufd, KVM_SET_REGS, &regs);
	ioctl(vmfd, KVM_CREATE_IRQCHIP, 0);
	pthread_t th;
	pthread_create(&th, 0, thr, (void*)(long)cpufd);
	usleep(rand() % 10000);
	ioctl(cpufd, KVM_RUN, 0);
	pthread_join(th, 0);
	return 0;
    }

This patch fixes it by failing ENABLE_CAP if without an irqchip.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 5c919412fe (kvm/x86: Hyper-V synthetic interrupt controller)
Cc: stable@vger.kernel.org # 4.5+
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-12 14:52:52 +01:00
Steve Rutherford 129a72a0d3 KVM: x86: Introduce segmented_write_std
Introduces segemented_write_std.

Switches from emulated reads/writes to standard read/writes in fxsave,
fxrstor, sgdt, and sidt.  This fixes CVE-2017-2584, a longstanding
kernel memory leak.

Since commit 283c95d0e3 ("KVM: x86: emulate FXSAVE and FXRSTOR",
2016-11-09), which is luckily not yet in any final release, this would
also be an exploitable kernel memory *write*!

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: 96051572c8
Fixes: 283c95d0e3
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-12 14:34:58 +01:00
David Matlack cef84c302f KVM: x86: flush pending lapic jump label updates on module unload
KVM's lapic emulation uses static_key_deferred (apic_{hw,sw}_disabled).
These are implemented with delayed_work structs which can still be
pending when the KVM module is unloaded. We've seen this cause kernel
panics when the kvm_intel module is quickly reloaded.

Use the new static_key_deferred_flush() API to flush pending updates on
module unload.

Signed-off-by: David Matlack <dmatlack@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-12 14:33:17 +01:00
Jim Mattson 21e7fbe7db kvm: nVMX: Reorder error checks for emulated VMXON
Checks on the operand to VMXON are performed after the check for
legacy mode operation and the #GP checks, according to the pseudo-code
in Intel's SDM.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-01-09 14:48:04 +01:00
Paolo Bonzini 4d82d12b39 KVM: lapic: do not scan IRR when delivering an interrupt
On interrupt delivery the PPR can only grow (except for auto-EOI),
so it is impossible that non-auto-EOI interrupt delivery results
in KVM_REQ_EVENT.  We can therefore use __apic_update_ppr.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:48:03 +01:00
Paolo Bonzini 26fbbee581 KVM: lapic: do not set KVM_REQ_EVENT unnecessarily on PPR update
On PPR update, we set KVM_REQ_EVENT unconditionally anytime PPR is lowered.
But we can take into account IRR here already.

Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:48:02 +01:00
Paolo Bonzini b3c045d332 KVM: lapic: remove unnecessary KVM_REQ_EVENT on PPR update
PPR needs to be updated whenever on every IRR read because we
may have missed TPR writes that _increased_ PPR.  However, these
writes need not generate KVM_REQ_EVENT, because either KVM_REQ_EVENT
has been set already in __apic_accept_irq, or we are going to
process the interrupt right away.

Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:48:01 +01:00
Paolo Bonzini eb90f3417a KVM: vmx: speed up TPR below threshold vmexits
Since we're already in VCPU context, all we have to do here is recompute
the PPR value.  That will in turn generate a KVM_REQ_EVENT if necessary.

Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:48:00 +01:00
Paolo Bonzini 0f1e261ead KVM: x86: add VCPU stat for KVM_REQ_EVENT processing
This statistic can be useful to estimate the cost of an IRQ injection
scenario, by comparing it with irq_injections.  For example the stat
shows that sti;hlt triggers more KVM_REQ_EVENT than sti;nop.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:47:59 +01:00
Tom Lendacky 0f89b207b0 kvm: svm: Use the hardware provided GPA instead of page walk
When a guest causes a NPF which requires emulation, KVM sometimes walks
the guest page tables to translate the GVA to a GPA. This is unnecessary
most of the time on AMD hardware since the hardware provides the GPA in
EXITINFO2.

The only exception cases involve string operations involving rep or
operations that use two memory locations. With rep, the GPA will only be
the value of the initial NPF and with dual memory locations we won't know
which memory address was translated into EXITINFO2.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:47:58 +01:00
Radim Krčmář 5bd5db385b KVM: x86: allow hotplug of VCPU with APIC ID over 0xff
LAPIC after reset is in xAPIC mode, which poses a problem for hotplug of
VCPUs with high APIC ID, because reset VCPU is waiting for INIT/SIPI,
but there is no way to uniquely address it using xAPIC.

From many possible options, we chose the one that also works on real
hardware: accepting interrupts addressed to LAPIC's x2APIC ID even in
xAPIC mode.

KVM intentionally differs from real hardware, because real hardware
(Knights Landing) does just "x2apic_id & 0xff" to decide whether to
accept the interrupt in xAPIC mode and it can deliver one interrupt to
more than one physical destination, e.g. 0x123 to 0x123 and 0x23.

Fixes: 682f732ecf ("KVM: x86: bump MAX_VCPUS to 288")
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:47:48 +01:00
Radim Krčmář b4535b58ae KVM: x86: make interrupt delivery fast and slow path behave the same
Slow path tried to prevent IPIs from x2APIC VCPUs from being delivered
to xAPIC VCPUs and vice-versa.  Make slow path behave like fast path,
which never distinguished that.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:47:40 +01:00
Radim Krčmář 6e50043912 KVM: x86: replace kvm_apic_id with kvm_{x,x2}apic_id
There were three calls sites:
 - recalculate_apic_map and kvm_apic_match_physical_addr, where it would
   only complicate implementation of x2APIC hotplug;
 - in apic_debug, where it was still somewhat preserved, but keeping the
   old function just for apic_debug was not worth it

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:47:29 +01:00
Radim Krčmář f98a3efb28 KVM: x86: use delivery to self in hyperv synic
Interrupt to self can be sent without knowing the APIC ID.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:46:13 +01:00
Junaid Shahid f160c7b7bb kvm: x86: mmu: Lockless access tracking for Intel CPUs without EPT A bits.
This change implements lockless access tracking for Intel CPUs without EPT
A bits. This is achieved by marking the PTEs as not-present (but not
completely clearing them) when clear_flush_young() is called after marking
the pages as accessed. When an EPT Violation is generated as a result of
the VM accessing those pages, the PTEs are restored to their original values.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:46:11 +01:00
Junaid Shahid 37f0e8fe6b kvm: x86: mmu: Do not use bit 63 for tracking special SPTEs
MMIO SPTEs currently set both bits 62 and 63 to distinguish them as special
PTEs. However, bit 63 is used as the SVE bit in Intel EPT PTEs. The SVE bit
is ignored for misconfigured PTEs but not necessarily for not-Present PTEs.
Since MMIO SPTEs use an EPT misconfiguration, so using bit 63 for them is
acceptable. However, the upcoming fast access tracking feature adds another
type of special tracking PTE, which uses not-Present PTEs and hence should
not set bit 63.

In order to use common bits to distinguish both type of special PTEs, we
now use only bit 62 as the special bit.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:46:10 +01:00
Junaid Shahid f39a058d0e kvm: x86: mmu: Introduce a no-tracking version of mmu_spte_update
mmu_spte_update() tracks changes in the accessed/dirty state of
the SPTE being updated and calls kvm_set_pfn_accessed/dirty
appropriately. However, in some cases (e.g. when aging the SPTE),
this shouldn't be done. mmu_spte_update_no_track() is introduced
for use in such cases.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:46:09 +01:00
Junaid Shahid 83ef6c8155 kvm: x86: mmu: Refactor accessed/dirty checks in mmu_spte_update/clear
This simplifies mmu_spte_update() a little bit.
The checks for clearing of accessed and dirty bits are refactored into
separate functions, which are used inside both mmu_spte_update() and
mmu_spte_clear_track_bits(), as well as kvm_test_age_rmapp(). The new
helper functions handle both the case when A/D bits are supported in
hardware and the case when they are not.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:46:08 +01:00
Junaid Shahid 97dceba29a kvm: x86: mmu: Fast Page Fault path retries
This change adds retries into the Fast Page Fault path. Without the
retries, the code still works, but if a retry does end up being needed,
then it will result in a second page fault for the same memory access,
which will cause much more overhead compared to just retrying within the
original fault.

This would be especially useful with the upcoming fast access tracking
change, as that would make it more likely for retries to be needed
(e.g. due to read and write faults happening on different CPUs at
the same time).

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:46:07 +01:00
Junaid Shahid ea4114bcd3 kvm: x86: mmu: Rename spte_is_locklessly_modifiable()
This change renames spte_is_locklessly_modifiable() to
spte_can_locklessly_be_made_writable() to distinguish it from other
forms of lockless modifications. The full set of lockless modifications
is covered by spte_has_volatile_bits().

Signed-off-by: Junaid Shahid <junaids@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:46:06 +01:00
Junaid Shahid 27959a4415 kvm: x86: mmu: Use symbolic constants for EPT Violation Exit Qualifications
This change adds some symbolic constants for VM Exit Qualifications
related to EPT Violations and updates handle_ept_violation() to use
these constants instead of hard-coded numbers.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:46:05 +01:00
David Matlack 114df303a7 kvm: x86: reduce collisions in mmu_page_hash
When using two-dimensional paging, the mmu_page_hash (which provides
lookups for existing kvm_mmu_page structs), becomes imbalanced; with
too many collisions in buckets 0 and 512. This has been seen to cause
mmu_lock to be held for multiple milliseconds in kvm_mmu_get_page on
VMs with a large amount of RAM mapped with 4K pages.

The current hash function uses the lower 10 bits of gfn to index into
mmu_page_hash. When doing shadow paging, gfn is the address of the
guest page table being shadow. These tables are 4K-aligned, which
makes the low bits of gfn a good hash. However, with two-dimensional
paging, no guest page tables are being shadowed, so gfn is the base
address that is mapped by the table. Thus page tables (level=1) have
a 2MB aligned gfn, page directories (level=2) have a 1GB aligned gfn,
etc. This means hashes will only differ in their 10th bit.

hash_64() provides a better hash. For example, on a VM with ~200G
(99458 direct=1 kvm_mmu_page structs):

hash            max_mmu_page_hash_collisions
--------------------------------------------
low 10 bits     49847
hash_64         105
perfect         97

While we're changing the hash, increase the table size by 4x to better
support large VMs (further reduces number of collisions in 200G VM to
29).

Note that hash_64() does not provide a good distribution prior to commit
ef703f49a6 ("Eliminate bad hash multipliers from hash_32() and
hash_64()").

Signed-off-by: David Matlack <dmatlack@google.com>
Change-Id: I5aa6b13c834722813c6cca46b8b1ed6f53368ade
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:46:04 +01:00
David Matlack f3414bc774 kvm: x86: export maximum number of mmu_page_hash collisions
Report the maximum number of mmu_page_hash collisions as a per-VM stat.
This will make it easy to identify problems with the mmu_page_hash in
the future.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:46:03 +01:00
Radim Krčmář 826da32140 KVM: x86: simplify conditions with split/kernel irqchip
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:45:57 +01:00
Radim Krčmář 8231f50d98 KVM: x86: prevent setup of invalid routes
The check in kvm_set_pic_irq() and kvm_set_ioapic_irq() was just a
temporary measure until the code improved enough for us to do this.

This changes APIC in a case when KVM_SET_GSI_ROUTING is called to set up pic
and ioapic routes before KVM_CREATE_IRQCHIP.  Those rules would get overwritten
by KVM_CREATE_IRQCHIP at best, so it is pointless to allow it.  Userspaces
hopefully noticed that things don't work if they do that and don't do that.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:45:50 +01:00
Radim Krčmář e5dc48777d KVM: x86: refactor pic setup in kvm_set_routing_entry
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:45:36 +01:00
Radim Krčmář 099413664c KVM: x86: make pic setup code look like ioapic setup
We don't treat kvm->arch.vpic specially anymore, so the setup can look
like ioapic.  This gets a bit more information out of return values.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:45:28 +01:00
Radim Krčmář 49776faf93 KVM: x86: decouple irqchip_in_kernel() and pic_irqchip()
irqchip_in_kernel() tried to save a bit by reusing pic_irqchip(), but it
just complicated the code.
Add a separate state for the irqchip mode.

Reviewed-by: David Hildenbrand <david@redhat.com>
[Used Paolo's version of condition in irqchip_in_kernel().]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-01-09 14:42:47 +01:00
Radim Krčmář 35e6eaa3df KVM: x86: don't allow kernel irqchip with split irqchip
Split irqchip cannot be created after creating the kernel irqchip, but
we forgot to restrict the other way.  This is an API change.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:38:19 +01:00
Linus Torvalds 08289086b0 KVM fixes for v4.10-rc3
MIPS: (both for stable)
  - fix host kernel crashes when receiving a signal with 64-bit userspace
  - flush instruction cache on all vcpus after generating entry code
 
 x86:
  - fix NULL dereference in MMU caused by SMM transitions (for stable)
  - correct guest instruction pointer after emulating some VMX errors
  - minor cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJYb/N7AAoJEED/6hsPKofoa4QH/0/jwHr64lFeiOzMxqZfTF0y
 wufcTqw3zGq5iPaNlEwn+6AkKnTq2IPws92FludfPHPb7BrLUPqrXxRlSRN+XPVw
 pHVcV9u0q4yghMi7/6Flu3JASnpD6PrPZ7ezugZwgXFrR7pewd/+sTq6xBUnI9rZ
 nNEYsfh8dYiBicxSGXlmZcHLuJJHKshjsv9F6ngyBGXAAf/F+nLiJReUzPO0m2+P
 gmXi5zhVu6z05zlaCW1KAmJ1QV1UJla1vZnzrnK3twRK/05l7YX+xCbHIo1wB03R
 2YhKDnSrnG3Zt+KpXfRhADXazNgM5ASvORdvI6RvjLNVxlnOveQtAcfRyvZezT4=
 =LXLf
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "MIPS:
   - fix host kernel crashes when receiving a signal with 64-bit
     userspace

   - flush instruction cache on all vcpus after generating entry code

     (both for stable)

  x86:
   - fix NULL dereference in MMU caused by SMM transitions (for stable)

   - correct guest instruction pointer after emulating some VMX errors

   - minor cleanup"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: remove duplicated declaration
  KVM: MIPS: Flush KVM entry code from icache globally
  KVM: MIPS: Don't clobber CP0_Status.UX
  KVM: x86: reset MMU on KVM_SET_VCPU_EVENTS
  KVM: nVMX: fix instruction skipping during emulated vm-entry
2017-01-06 15:27:17 -08:00
Jan Dakinevich 69130ea1e6 KVM: VMX: remove duplicated declaration
Declaration of VMX_VPID_EXTENT_SUPPORTED_MASK occures twice in the code.
Probably, it was happened after unsuccessful merge.

Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-01-05 15:08:48 +01:00
Linus Torvalds 3ddc76dfc7 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer type cleanups from Thomas Gleixner:
 "This series does a tree wide cleanup of types related to
  timers/timekeeping.

   - Get rid of cycles_t and use a plain u64. The type is not really
     helpful and caused more confusion than clarity

   - Get rid of the ktime union. The union has become useless as we use
     the scalar nanoseconds storage unconditionally now. The 32bit
     timespec alike storage got removed due to the Y2038 limitations
     some time ago.

     That leaves the odd union access around for no reason. Clean it up.

  Both changes have been done with coccinelle and a small amount of
  manual mopping up"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ktime: Get rid of ktime_equal()
  ktime: Cleanup ktime_set() usage
  ktime: Get rid of the union
  clocksource: Use a plain u64 instead of cycle_t
2016-12-25 14:30:04 -08:00
Thomas Gleixner 8b0e195314 ktime: Cleanup ktime_set() usage
ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Thomas Gleixner a5a1d1c291 clocksource: Use a plain u64 instead of cycle_t
There is no point in having an extra type for extra confusion. u64 is
unambiguous.

Conversion was done with the following coccinelle script:

@rem@
@@
-typedef u64 cycle_t;

@fix@
typedef cycle_t;
@@
-cycle_t
+u64

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
2016-12-25 11:04:12 +01:00
Thomas Gleixner 73c1b41e63 cpu/hotplug: Cleanup state names
When the state names got added a script was used to add the extra argument
to the calls. The script basically converted the state constant to a
string, but the cleanup to convert these strings into meaningful ones did
not happen.

Replace all the useless strings with 'subsys/xxx/yyy:state' strings which
are used in all the other places already.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-25 10:47:44 +01:00
Xiao Guangrong 6ef4e07ecd KVM: x86: reset MMU on KVM_SET_VCPU_EVENTS
Otherwise, mismatch between the smm bit in hflags and the MMU role
can cause a NULL pointer dereference.

Cc: stable@vger.kernel.org
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-24 10:16:04 +01:00
David Matlack b428018a06 KVM: nVMX: fix instruction skipping during emulated vm-entry
kvm_skip_emulated_instruction() should not be called after emulating
a VM-entry failure during or after loading guest state
(nested_vmx_entry_failure()). Otherwise the L1 hypervisor is resumed
some number of bytes past vmcs->host_rip.

Fixes: eb27756217
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-21 18:55:09 +01:00
Jim Mattson ef85b67385 kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)
When L2 exits to L0 due to "exception or NMI", software exceptions
(#BP and #OF) for which L1 has requested an intercept should be
handled by L1 rather than L0. Previously, only hardware exceptions
were forwarded to L1.

Signed-off-by: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-19 16:05:31 +01:00
Andrea Arcangeli cc0d907c09 kvm: take srcu lock around kvm_steal_time_set_preempted()
kvm_memslots() will be called by kvm_write_guest_offset_cached() so
take the srcu lock.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-19 15:45:15 +01:00
Andrea Arcangeli 931f261b42 kvm: fix schedule in atomic in kvm_steal_time_set_preempted()
kvm_steal_time_set_preempted() isn't disabling the pagefaults before
calling __copy_to_user and the kernel debug notices.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-19 15:45:14 +01:00
Paolo Bonzini 3f5ad8be37 KVM: hyperv: fix locking of struct kvm_hv fields
Introduce a new mutex to avoid an AB-BA deadlock between kvm->lock and
vcpu->mutex.  Protect accesses in kvm_hv_setup_tsc_page too, as suggested
by Roman.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-16 17:53:38 +01:00
Yi Sun 83781d180b KVM: x86: Expose Intel AVX512IFMA/AVX512VBMI/SHA features to guest.
Expose AVX512IFMA/AVX512VBMI/SHA features to guest.

AVX512 spec can be found at:
https://software.intel.com/sites/default/files/managed/26/40/319433-026.pdf

SHA spec can be found at:
https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf

This patch depends on below patch.
http://marc.info/?l=linux-kernel&m=147932800828178&w=2

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-15 15:02:44 +01:00
GanShun 37b9a671f3 kvm: nVMX: Correct a VMX instruction error code for VMPTRLD
When the operand passed to VMPTRLD matches the address of the VMXON
region, the VMX instruction error code should be
VMXERR_VMPTRLD_VMXON_POINTER rather than VMXERR_VMCLEAR_VMXON_POINTER.

Signed-off-by: GanShun <ganshun@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-15 15:02:44 +01:00
Linus Torvalds 93173b5bf2 Small release, the most interesting stuff is x86 nested virt improvements.
x86: userspace can now hide nested VMX features from guests; nested
 VMX can now run Hyper-V in a guest; support for AVX512_4VNNIW and
 AVX512_FMAPS in KVM; infrastructure support for virtual Intel GPUs.
 
 PPC: support for KVM guests on POWER9; improved support for interrupt
 polling; optimizations and cleanups.
 
 s390: two small optimizations, more stuff is in flight and will be
 in 4.11.
 
 ARM: support for the GICv3 ITS on 32bit platforms.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQExBAABCAAbBQJYTkP0FBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D
 lZIH/iT1n9OQXcuTpYYnQhuCenzI3GZZOIMTbCvK2i5bo0FIJKxVn0EiAAqZSXvO
 nO185FqjOgLuJ1AD1kJuxzye5suuQp4HIPWWgNHcexLuy43WXWKZe0IQlJ4zM2Xf
 u31HakpFmVDD+Cd1qN3yDXtDrRQ79/xQn2kw7CWb8olp+pVqwbceN3IVie9QYU+3
 gCz0qU6As0aQIwq2PyalOe03sO10PZlm4XhsoXgWPG7P18BMRhNLTDqhLhu7A/ry
 qElVMANT7LSNLzlwNdpzdK8rVuKxETwjlc1UP8vSuhrwad4zM2JJ1Exk26nC2NaG
 D0j4tRSyGFIdx6lukZm7HmiSHZ0=
 =mkoB
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Small release, the most interesting stuff is x86 nested virt
  improvements.

  x86:
   - userspace can now hide nested VMX features from guests
   - nested VMX can now run Hyper-V in a guest
   - support for AVX512_4VNNIW and AVX512_FMAPS in KVM
   - infrastructure support for virtual Intel GPUs.

  PPC:
   - support for KVM guests on POWER9
   - improved support for interrupt polling
   - optimizations and cleanups.

  s390:
   - two small optimizations, more stuff is in flight and will be in
     4.11.

  ARM:
   - support for the GICv3 ITS on 32bit platforms"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits)
  arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest
  KVM: arm/arm64: timer: Check for properly initialized timer on init
  KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUs
  KVM: x86: Handle the kthread worker using the new API
  KVM: nVMX: invvpid handling improvements
  KVM: nVMX: check host CR3 on vmentry and vmexit
  KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry
  KVM: nVMX: propagate errors from prepare_vmcs02
  KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT
  KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry
  KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID
  KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
  KVM: nVMX: support restore of VMX capability MSRs
  KVM: nVMX: generate non-true VMX MSRs based on true versions
  KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs.
  KVM: x86: Add kvm_skip_emulated_instruction and use it.
  KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12
  KVM: VMX: Reorder some skip_emulated_instruction calls
  KVM: x86: Add a return value to kvm_emulate_cpuid
  KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h
  ...
2016-12-13 15:47:02 -08:00
Linus Torvalds 9439b3710d Main pull request for drm for 4.10 kernel
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYT3qqAAoJEAx081l5xIa+dLMP/2dqBybSAeWlPmAwVenIHRtS
 KFNktISezFSY/LBcIP2mHkFJmjTKBMZFxWnyEJL9NmFUD1cS2WMyNnC1282h/+rD
 +P8Bsmzmt/daV4UTFxVDpzlmVlavAyakNi6FnSQfAfmf+3PB1yzU3gn8ld9pU/if
 h7KEp9fDn9eYZreTRfCUloI2yoVpD9d0DG3uaGDN/N0kGUnCC6TZT5ig5j2JO016
 fYf/DqoYAk3ItWF9WK/uG7qJIGi37afCpQq+kbSSJk+p3HjJqu8JUe9jzqYdl7j9
 26TGSY5o9WLhZkxDgbcCIJzcFJhMmXgMdhjil9lqaHmnNG5FPFU7g8DK1CZqbel9
 m8+aRPn1EgxIahMgdl8NblW1pfO2Kco0tZmoP5vXx1uqhivd67h0hiQqp66WxOJd
 i2yMLncaCEv8M161CVEgtzuI5a7nCfaZv7J9ArzbkD/huBwu51IZgTs7Dz4njgvz
 VPB5FBTB/ZYteErUNoh6gjF0hLngWvvJSPvuzT+EFO7yypek0IJ28GTdbxYSP+jR
 13697s5Itigf/D3KUdRRGsWRzyVVN9n+djkl//sy5ddL9eOlKSKEga4ujOUjTWaW
 hTvAxpK9GmJS/Iun5jIP6f75zDbi+e8FWUeB/OI2lPtnApaSKdXBTPXsco2RnTEV
 +G6XrH8IMEIsTxOk7hWU
 =7s/c
 -----END PGP SIGNATURE-----

Merge tag 'drm-for-v4.10' of git://people.freedesktop.org/~airlied/linux

Pull drm updates from Dave Airlie:
 "This is the main pull request for drm for 4.10 kernel.

  New drivers:
   - ZTE VOU display driver (zxdrm)
   - Amlogic Meson Graphic Controller GXBB/GXL/GXM SoCs (meson)
   - MXSFB support (mxsfb)

  Core:
   - Format handling has been reworked
   - Better atomic state debugging
   - drm_mm leak debugging
   - Atomic explicit fencing support
   - fbdev helper ops
   - Documentation updates
   - MST fbcon fixes

  Bridge:
   - Silicon Image SiI8620 driver

  Panel:
   - Add support for new simple panels

  i915:
   - GVT Device model
   - Better HDMI2.0 support on skylake
   - More watermark fixes
   - GPU idling rework for suspend/resume
   - DP Audio workarounds
   - Scheduler prep-work
   - Opregion CADL handling
   - GPU scheduler and priority boosting

  amdgfx/radeon:
   - Support for virtual devices
   - New VM manager for non-contig VRAM buffers
   - UVD powergating
   - SI register header cleanup
   - Cursor fixes
   - Powermanagement fixes

  nouveau:
   - Powermangement reworks for better voltage/clock changes
   - Atomic modesetting support
   - Displayport Multistream (MST) support.
   - GP102/104 hang and cursor fixes
   - GP106 support

  hisilicon:
   - hibmc support (BMC chip for aarch64 servers)

  armada:
   - add tracing support for overlay change
   - refactor plane support
   - de-midlayer the driver

  omapdrm:
   - Timing code cleanups

  rcar-du:
   - R8A7792/R8A7796 support
   - Misc fixes.

  sunxi:
   - A31 SoC display engine support

  imx-drm:
   - YUV format support
   - Cleanup plane atomic update

  mali-dp:
   - Misc fixes

  dw-hdmi:
   - Add support for HDMI i2c master controller

  tegra:
   - IOMMU support fixes
   - Error handling fixes

  tda998x:
   - Fix connector registration
   - Improved robustness
   - Fix infoframe/audio compliance

  virtio:
   - fix busid issues
   - allocate more vbufs

  qxl:
   - misc fixes and cleanups.

  vc4:
   - Fragment shader threading
   - ETC1 support
   - VEC (tv-out) support

  msm:
   - A5XX GPU support
   - Lots of atomic changes

  tilcdc:
   - Misc fixes and cleanups.

  etnaviv:
   - Fix dma-buf export path
   - DRAW_INSTANCED support
   - fix driver on i.MX6SX

  exynos:
   - HDMI refactoring

  fsl-dcu:
   - fbdev changes"

* tag 'drm-for-v4.10' of git://people.freedesktop.org/~airlied/linux: (1343 commits)
  drm/nouveau/kms/nv50: fix atomic regression on original G80
  drm/nouveau/bl: Do not register interface if Apple GMUX detected
  drm/nouveau/bl: Assign different names to interfaces
  drm/nouveau/bios/dp: fix handling of LevelEntryTableIndex on DP table 4.2
  drm/nouveau/ltc: protect clearing of comptags with mutex
  drm/nouveau/gr/gf100-: handle GPC/TPC/MPC trap
  drm/nouveau/core: recognise GP106 chipset
  drm/nouveau/ttm: wait for bo fence to signal before unmapping vmas
  drm/nouveau/gr/gf100-: FECS intr handling is not relevant on proprietary ucode
  drm/nouveau/gr/gf100-: properly ack all FECS error interrupts
  drm/nouveau/fifo/gf100-: recover from host mmu faults
  drm: Add fake controlD* symlinks for backwards compat
  drm/vc4: Don't use drm_put_dev
  drm/vc4: Document VEC DT binding
  drm/vc4: Add support for the VEC (Video Encoder) IP
  drm: Add TV connector states to drm_connector_state
  drm: Turn DRM_MODE_SUBCONNECTOR_xx definitions into an enum
  drm/vc4: Fix ->clock_select setting for the VEC encoder
  drm/amdgpu/dce6: Set MASTER_UPDATE_MODE to 0 in resume_mc_access as well
  drm/amdgpu: use pin rather than pin_restricted in a few cases
  ...
2016-12-13 09:35:09 -08:00
Linus Torvalds 518bacf5a5 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU updates from Ingo Molnar:
 "The main changes in this cycle were:

   - do a large round of simplifications after all CPUs do 'eager' FPU
     context switching in v4.9: remove CR0 twiddling, remove leftover
     eager/lazy bts, etc (Andy Lutomirski)

   - more FPU code simplifications: remove struct fpu::counter, clarify
     nomenclature, remove unnecessary arguments/functions and better
     structure the code (Rik van Riel)"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Remove clts()
  x86/fpu: Remove stts()
  x86/fpu: Handle #NM without FPU emulation as an error
  x86/fpu, lguest: Remove CR0.TS support
  x86/fpu, kvm: Remove host CR0.TS manipulation
  x86/fpu: Remove irq_ts_save() and irq_ts_restore()
  x86/fpu: Stop saving and restoring CR0.TS in fpu__init_check_bugs()
  x86/fpu: Get rid of two redundant clts() calls
  x86/fpu: Finish excising 'eagerfpu'
  x86/fpu: Split old_fpu & new_fpu handling into separate functions
  x86/fpu: Remove 'cpu' argument from __cpu_invalidate_fpregs_state()
  x86/fpu: Split old & new FPU code paths
  x86/fpu: Remove __fpregs_(de)activate()
  x86/fpu: Rename lazy restore functions to "register state valid"
  x86/fpu, kvm: Remove KVM vcpu->fpu_counter
  x86/fpu: Remove struct fpu::counter
  x86/fpu: Remove use_eager_fpu()
  x86/fpu: Remove the XFEATURE_MASK_EAGER/LAZY distinction
  x86/fpu: Hard-disable lazy FPU mode
  x86/crypto, x86/fpu: Remove X86_FEATURE_EAGER_FPU #ifdef from the crc32c code
2016-12-12 14:27:49 -08:00
Ingo Molnar 6f38751510 Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-11 13:07:13 +01:00
Petr Mladek 36da91bdf5 KVM: x86: Handle the kthread worker using the new API
Use the new API to create and destroy the "kvm-pit" kthread
worker. The API hides some implementation details.

In particular, kthread_create_worker() allocates and initializes
struct kthread_worker. It runs the kthread the right way
and stores task_struct into the worker structure.

kthread_destroy_worker() flushes all pending works, stops
the kthread and frees the structure.

This patch does not change the existing behavior except for
dynamically allocating struct kthread_worker and storing
only the pointer of this structure.

It is compile tested only because I did not find an easy
way how to run the code. Well, it should be pretty safe
given the nature of the change.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Message-Id: <1476877847-11217-1-git-send-email-pmladek@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-08 15:31:11 +01:00