Skip to content

Commit

Permalink
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Browse files Browse the repository at this point in the history
Pull kvm fixes from Paolo Bonzini:

 - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr

 - Documentation improvements

 - Prevent module exit until all VMs are freed

 - PMU Virtualization fixes

 - Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences

 - Other miscellaneous bugfixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
  KVM: x86: fix sending PV IPI
  KVM: x86/mmu: do compare-and-exchange of gPTE via the user address
  KVM: x86: Remove redundant vm_entry_controls_clearbit() call
  KVM: x86: cleanup enter_rmode()
  KVM: x86: SVM: fix tsc scaling when the host doesn't support it
  kvm: x86: SVM: remove unused defines
  KVM: x86: SVM: move tsc ratio definitions to svm.h
  KVM: x86: SVM: fix avic spec based definitions again
  KVM: MIPS: remove reference to trap&emulate virtualization
  KVM: x86: document limitations of MSR filtering
  KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr
  KVM: x86/emulator: Emulate RDPID only if it is enabled in guest
  KVM: x86/pmu: Fix and isolate TSX-specific performance event logic
  KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set
  KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
  KVM: x86: Trace all APICv inhibit changes and capture overall status
  KVM: x86: Add wrappers for setting/clearing APICv inhibits
  KVM: x86: Make APICv inhibit reasons an enum and cleanup naming
  KVM: X86: Handle implicit supervisor access with SMAP
  KVM: X86: Rename variable smap to not_smap in permission_fault()
  ...
  • Loading branch information
torvalds committed Apr 2, 2022
2 parents 6f34f8c + c15e0ae commit 3890491
Show file tree
Hide file tree
Showing 49 changed files with 617 additions and 414 deletions.
61 changes: 55 additions & 6 deletions Documentation/virt/kvm/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -151,12 +151,6 @@ In order to create user controlled virtual machines on S390, check
KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
privileged user (CAP_SYS_ADMIN).

To use hardware assisted virtualization on MIPS (VZ ASE) rather than
the default trap & emulate implementation (which changes the virtual
memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
flag KVM_VM_MIPS_VZ.


On arm64, the physical address size for a VM (IPA Size limit) is limited
to 40bits by default. The limit can be configured if the host supports the
extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
Expand Down Expand Up @@ -4081,6 +4075,11 @@ x2APIC MSRs are always allowed, independent of the ``default_allow`` setting,
and their behavior depends on the ``X2APIC_ENABLE`` bit of the APIC base
register.

.. warning::
MSR accesses coming from nested vmentry/vmexit are not filtered.
This includes both writes to individual VMCS fields and reads/writes
through the MSR lists pointed to by the VMCS.

If a bit is within one of the defined ranges, read and write accesses are
guarded by the bitmap's value for the MSR index if the kind of access
is included in the ``struct kvm_msr_filter_range`` flags. If no range
Expand Down Expand Up @@ -5293,6 +5292,10 @@ type values:

KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO
Sets the guest physical address of the vcpu_info for a given vCPU.
As with the shared_info page for the VM, the corresponding page may be
dirtied at any time if event channel interrupt delivery is enabled, so
userspace should always assume that the page is dirty without relying
on dirty logging.

KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
Sets the guest physical address of an additional pvclock structure
Expand Down Expand Up @@ -7719,3 +7722,49 @@ only be invoked on a VM prior to the creation of VCPUs.
At this time, KVM_PMU_CAP_DISABLE is the only capability. Setting
this capability will disable PMU virtualization for that VM. Usermode
should adjust CPUID leaf 0xA to reflect that the PMU is disabled.

9. Known KVM API problems
=========================

In some cases, KVM's API has some inconsistencies or common pitfalls
that userspace need to be aware of. This section details some of
these issues.

Most of them are architecture specific, so the section is split by
architecture.

9.1. x86
--------

``KVM_GET_SUPPORTED_CPUID`` issues
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In general, ``KVM_GET_SUPPORTED_CPUID`` is designed so that it is possible
to take its result and pass it directly to ``KVM_SET_CPUID2``. This section
documents some cases in which that requires some care.

Local APIC features
~~~~~~~~~~~~~~~~~~~

CPU[EAX=1]:ECX[21] (X2APIC) is reported by ``KVM_GET_SUPPORTED_CPUID``,
but it can only be enabled if ``KVM_CREATE_IRQCHIP`` or
``KVM_ENABLE_CAP(KVM_CAP_IRQCHIP_SPLIT)`` are used to enable in-kernel emulation of
the local APIC.

The same is true for the ``KVM_FEATURE_PV_UNHALT`` paravirtualized feature.

CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID``.
It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel
has enabled in-kernel emulation of the local APIC.

Obsolete ioctls and capabilities
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

KVM_CAP_DISABLE_QUIRKS does not let userspace know which quirks are actually
available. Use ``KVM_CHECK_EXTENSION(KVM_CAP_DISABLE_QUIRKS2)`` instead if
available.

Ordering of KVM_GET_*/KVM_SET_* ioctls
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

TBD
26 changes: 7 additions & 19 deletions Documentation/virt/kvm/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,13 @@ KVM
:maxdepth: 2

api
amd-memory-encryption
cpuid
halt-polling
hypercalls
locking
mmu
msr
nested-vmx
ppc-pv
s390-diag
s390-pv
s390-pv-boot
timekeeping
vcpu-requests

review-checklist
devices/index

arm/index
s390/index
ppc-pv
x86/index

devices/index

running-nested-guests
locking
vcpu-requests
review-checklist
43 changes: 34 additions & 9 deletions Documentation/virt/kvm/locking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -210,32 +210,47 @@ time it will be set using the Dirty tracking mechanism described above.
3. Reference
------------

:Name: kvm_lock
``kvm_lock``
^^^^^^^^^^^^

:Type: mutex
:Arch: any
:Protects: - vm_list

:Name: kvm_count_lock
``kvm_count_lock``
^^^^^^^^^^^^^^^^^^

:Type: raw_spinlock_t
:Arch: any
:Protects: - hardware virtualization enable/disable
:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
migration.

:Name: kvm_arch::tsc_write_lock
:Type: raw_spinlock
``kvm->mn_invalidate_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^

:Type: spinlock_t
:Arch: any
:Protects: mn_active_invalidate_count, mn_memslots_update_rcuwait

``kvm_arch::tsc_write_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:Type: raw_spinlock_t
:Arch: x86
:Protects: - kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
- tsc offset in vmcb
:Comment: 'raw' because updating the tsc offsets must not be preempted.

:Name: kvm->mmu_lock
:Type: spinlock_t
``kvm->mmu_lock``
^^^^^^^^^^^^^^^^^
:Type: spinlock_t or rwlock_t
:Arch: any
:Protects: -shadow page/shadow tlb entry
:Comment: it is a spinlock since it is used in mmu notifier.

:Name: kvm->srcu
``kvm->srcu``
^^^^^^^^^^^^^
:Type: srcu lock
:Arch: any
:Protects: - kvm->memslots
Expand All @@ -246,10 +261,20 @@ time it will be set using the Dirty tracking mechanism described above.
The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
if it is needed by multiple functions.

:Name: blocked_vcpu_on_cpu_lock
``kvm->slots_arch_lock``
^^^^^^^^^^^^^^^^^^^^^^^^
:Type: mutex
:Arch: any (only needed on x86 though)
:Protects: any arch-specific fields of memslots that have to be modified
in a ``kvm->srcu`` read-side critical section.
:Comment: must be held before reading the pointer to the current memslots,
until after all changes to the memslots are complete

``wakeup_vcpus_on_cpu_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Type: spinlock_t
:Arch: x86
:Protects: blocked_vcpu_on_cpu
:Protects: wakeup_vcpus_on_cpu
:Comment: This is a per-CPU lock and it is used for VT-d posted-interrupts.
When VT-d posted-interrupts is supported and the VM has assigned
devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
Expand Down
12 changes: 12 additions & 0 deletions Documentation/virt/kvm/s390/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
.. SPDX-License-Identifier: GPL-2.0
====================
KVM for s390 systems
====================

.. toctree::
:maxdepth: 2

s390-diag
s390-pv
s390-pv-boot
File renamed without changes.
File renamed without changes.
File renamed without changes.
10 changes: 10 additions & 0 deletions Documentation/virt/kvm/vcpu-requests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,16 @@ KVM_REQ_UNHALT
such as a pending signal, which does not indicate the VCPU's halt
emulation should stop, and therefore does not make the request.

KVM_REQ_OUTSIDE_GUEST_MODE

This "request" ensures the target vCPU has exited guest mode prior to the
sender of the request continuing on. No action needs be taken by the target,
and so no request is actually logged for the target. This request is similar
to a "kick", but unlike a kick it guarantees the vCPU has actually exited
guest mode. A kick only guarantees the vCPU will exit at some point in the
future, e.g. a previous kick may have started the process, but there's no
guarantee the to-be-kicked vCPU has fully exited guest mode.

KVM_REQUEST_MASK
----------------

Expand Down
File renamed without changes.
39 changes: 39 additions & 0 deletions Documentation/virt/kvm/x86/errata.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@

=======================================
Known limitations of CPU virtualization
=======================================

Whenever perfect emulation of a CPU feature is impossible or too hard, KVM
has to choose between not implementing the feature at all or introducing
behavioral differences between virtual machines and bare metal systems.

This file documents some of the known limitations that KVM has in
virtualizing CPU features.

x86
===

``KVM_GET_SUPPORTED_CPUID`` issues
----------------------------------

x87 features
~~~~~~~~~~~~

Unlike most other CPUID feature bits, CPUID[EAX=7,ECX=0]:EBX[6]
(FDP_EXCPTN_ONLY) and CPUID[EAX=7,ECX=0]:EBX]13] (ZERO_FCS_FDS) are
clear if the features are present and set if the features are not present.

Clearing these bits in CPUID has no effect on the operation of the guest;
if these bits are set on hardware, the features will not be present on
any virtual machine that runs on that hardware.

**Workaround:** It is recommended to always set these bits in guest CPUID.
Note however that any software (e.g ``WIN87EM.DLL``) expecting these features
to be present likely predates these CPUID feature bits, and therefore
doesn't know to check for them anyway.

Nested virtualization features
------------------------------

TBD

File renamed without changes.
File renamed without changes.
19 changes: 19 additions & 0 deletions Documentation/virt/kvm/x86/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
.. SPDX-License-Identifier: GPL-2.0
===================
KVM for x86 systems
===================

.. toctree::
:maxdepth: 2

amd-memory-encryption
cpuid
errata
halt-polling
hypercalls
mmu
msr
nested-vmx
running-nested-guests
timekeeping
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion arch/s390/kvm/kvm-s390.c
Original file line number Diff line number Diff line change
Expand Up @@ -3462,7 +3462,7 @@ void exit_sie(struct kvm_vcpu *vcpu)
/* Kick a guest cpu out of SIE to process a request synchronously */
void kvm_s390_sync_request(int req, struct kvm_vcpu *vcpu)
{
kvm_make_request(req, vcpu);
__kvm_make_request(req, vcpu);
kvm_s390_vcpu_request(vcpu);
}

Expand Down
46 changes: 31 additions & 15 deletions arch/x86/include/asm/kvm_host.h
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,7 @@ enum x86_intercept_stage;
#define PFERR_SGX_BIT 15
#define PFERR_GUEST_FINAL_BIT 32
#define PFERR_GUEST_PAGE_BIT 33
#define PFERR_IMPLICIT_ACCESS_BIT 48

#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
Expand All @@ -259,6 +260,7 @@ enum x86_intercept_stage;
#define PFERR_SGX_MASK (1U << PFERR_SGX_BIT)
#define PFERR_GUEST_FINAL_MASK (1ULL << PFERR_GUEST_FINAL_BIT)
#define PFERR_GUEST_PAGE_MASK (1ULL << PFERR_GUEST_PAGE_BIT)
#define PFERR_IMPLICIT_ACCESS (1ULL << PFERR_IMPLICIT_ACCESS_BIT)

#define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
PFERR_WRITE_MASK | \
Expand Down Expand Up @@ -430,7 +432,7 @@ struct kvm_mmu {
void (*inject_page_fault)(struct kvm_vcpu *vcpu,
struct x86_exception *fault);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
gpa_t gva_or_gpa, u32 access,
gpa_t gva_or_gpa, u64 access,
struct x86_exception *exception);
int (*sync_page)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp);
Expand Down Expand Up @@ -512,6 +514,7 @@ struct kvm_pmu {
u64 global_ctrl_mask;
u64 global_ovf_ctrl_mask;
u64 reserved_bits;
u64 raw_event_mask;
u8 version;
struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
struct kvm_pmc fixed_counters[KVM_PMC_MAX_FIXED];
Expand Down Expand Up @@ -1040,14 +1043,16 @@ struct kvm_x86_msr_filter {
struct msr_bitmap_range ranges[16];
};

#define APICV_INHIBIT_REASON_DISABLE 0
#define APICV_INHIBIT_REASON_HYPERV 1
#define APICV_INHIBIT_REASON_NESTED 2
#define APICV_INHIBIT_REASON_IRQWIN 3
#define APICV_INHIBIT_REASON_PIT_REINJ 4
#define APICV_INHIBIT_REASON_X2APIC 5
#define APICV_INHIBIT_REASON_BLOCKIRQ 6
#define APICV_INHIBIT_REASON_ABSENT 7
enum kvm_apicv_inhibit {
APICV_INHIBIT_REASON_DISABLE,
APICV_INHIBIT_REASON_HYPERV,
APICV_INHIBIT_REASON_NESTED,
APICV_INHIBIT_REASON_IRQWIN,
APICV_INHIBIT_REASON_PIT_REINJ,
APICV_INHIBIT_REASON_X2APIC,
APICV_INHIBIT_REASON_BLOCKIRQ,
APICV_INHIBIT_REASON_ABSENT,
};

struct kvm_arch {
unsigned long n_used_mmu_pages;
Expand Down Expand Up @@ -1401,7 +1406,7 @@ struct kvm_x86_ops {
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
bool (*check_apicv_inhibit_reasons)(ulong bit);
bool (*check_apicv_inhibit_reasons)(enum kvm_apicv_inhibit reason);
void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
void (*hwapic_isr_update)(struct kvm_vcpu *vcpu, int isr);
Expand Down Expand Up @@ -1585,7 +1590,7 @@ void kvm_mmu_module_exit(void);

void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
int kvm_mmu_create(struct kvm_vcpu *vcpu);
void kvm_mmu_init_vm(struct kvm *kvm);
int kvm_mmu_init_vm(struct kvm *kvm);
void kvm_mmu_uninit_vm(struct kvm *kvm);

void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu);
Expand Down Expand Up @@ -1795,11 +1800,22 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,

bool kvm_apicv_activated(struct kvm *kvm);
void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
void kvm_request_apicv_update(struct kvm *kvm, bool activate,
unsigned long bit);
void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
enum kvm_apicv_inhibit reason, bool set);
void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
enum kvm_apicv_inhibit reason, bool set);

static inline void kvm_set_apicv_inhibit(struct kvm *kvm,
enum kvm_apicv_inhibit reason)
{
kvm_set_or_clear_apicv_inhibit(kvm, reason, true);
}

void __kvm_request_apicv_update(struct kvm *kvm, bool activate,
unsigned long bit);
static inline void kvm_clear_apicv_inhibit(struct kvm *kvm,
enum kvm_apicv_inhibit reason)
{
kvm_set_or_clear_apicv_inhibit(kvm, reason, false);
}

int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);

Expand Down
Loading

0 comments on commit 3890491

Please sign in to comment.