Skip to content

Commit

Permalink
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Browse files Browse the repository at this point in the history
Pull kvm updates from Paolo Bonzini:
 "ARM:
   - VHE optimizations

   - EL2 address space randomization

   - speculative execution mitigations ("variant 3a", aka execution past
     invalid privilege register access)

   - bugfixes and cleanups

  PPC:
   - improvements for the radix page fault handler for HV KVM on POWER9

  s390:
   - more kvm stat counters

   - virtio gpu plumbing

   - documentation

   - facilities improvements

  x86:
   - support for VMware magic I/O port and pseudo-PMCs

   - AMD pause loop exiting

   - support for AMD core performance extensions

   - support for synchronous register access

   - expose nVMX capabilities to userspace

   - support for Hyper-V signaling via eventfd

   - use Enlightened VMCS when running on Hyper-V

   - allow userspace to disable MWAIT/HLT/PAUSE vmexits

   - usual roundup of optimizations and nested virtualization bugfixes

  Generic:
   - API selftest infrastructure (though the only tests are for x86 as
     of now)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
  kvm: x86: fix a prototype warning
  kvm: selftests: add sync_regs_test
  kvm: selftests: add API testing infrastructure
  kvm: x86: fix a compile warning
  KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
  KVM: X86: Introduce handle_ud()
  KVM: vmx: unify adjacent #ifdefs
  x86: kvm: hide the unused 'cpu' variable
  KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
  Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
  kvm: Add emulation for movups/movupd
  KVM: VMX: raise internal error for exception during invalid protected mode state
  KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
  KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
  KVM: x86: Fix misleading comments on handling pending exceptions
  KVM: x86: Rename interrupt.pending to interrupt.injected
  KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
  x86/kvm: use Enlightened VMCS when running on Hyper-V
  x86/hyper-v: detect nested features
  x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
  ...
  • Loading branch information
torvalds committed Apr 9, 2018
2 parents e9092d0 + e01bca2 commit d8312a3
Show file tree
Hide file tree
Showing 150 changed files with 11,900 additions and 1,982 deletions.
3 changes: 3 additions & 0 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1907,6 +1907,9 @@
kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
Default is 0 (don't ignore, but inject #GP)

kvm.enable_vmware_backdoor=[KVM] Support VMware backdoor PV interface.
Default is false (don't support).

kvm.mmu_audit= [KVM] This is a R/W parameter which allows audit
KVM MMU at runtime.
Default is 0 (off)
Expand Down
9 changes: 6 additions & 3 deletions Documentation/arm64/memory.txt
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,12 @@ Translation table lookup with 64KB pages:
+-------------------------------------------------> [63] TTBR0/1


When using KVM without the Virtualization Host Extensions, the hypervisor
maps kernel pages in EL2 at a fixed offset from the kernel VA. See the
kern_hyp_va macro for more details.
When using KVM without the Virtualization Host Extensions, the
hypervisor maps kernel pages in EL2 at a fixed (and potentially
random) offset from the linear mapping. See the kern_hyp_va macro and
kvm_update_va_mask function for more details. MMIO devices such as
GICv2 gets mapped next to the HYP idmap page, as do vectors when
ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.

When using KVM with the Virtualization Host Extensions, no additional
mappings are created, since the host kernel runs directly in EL2.
10 changes: 7 additions & 3 deletions Documentation/virtual/kvm/00-INDEX
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
00-INDEX
- this file.
amd-memory-encryption.rst
- notes on AMD Secure Encrypted Virtualization feature and SEV firmware
command description
api.txt
- KVM userspace API.
arm
- internal ABI between the kernel and HYP (for arm/arm64)
cpuid.txt
- KVM-specific cpuid leaves (x86).
devices/
Expand All @@ -26,6 +31,5 @@ s390-diag.txt
- Diagnose hypercall description (for IBM S/390)
timekeeping.txt
- timekeeping virtualization for x86-based architectures.
amd-memory-encryption.txt
- notes on AMD Secure Encrypted Virtualization feature and SEV firmware
command description
vcpu-requests.rst
- internal VCPU request API
135 changes: 124 additions & 11 deletions Documentation/virtual/kvm/api.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3480,7 +3480,7 @@ encrypted VMs.

Currently, this ioctl is used for issuing Secure Encrypted Virtualization
(SEV) commands on AMD Processors. The SEV commands are defined in
Documentation/virtual/kvm/amd-memory-encryption.txt.
Documentation/virtual/kvm/amd-memory-encryption.rst.

4.111 KVM_MEMORY_ENCRYPT_REG_REGION

Expand Down Expand Up @@ -3516,6 +3516,38 @@ Returns: 0 on success; -1 on error
This ioctl can be used to unregister the guest memory region registered
with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above.

4.113 KVM_HYPERV_EVENTFD

Capability: KVM_CAP_HYPERV_EVENTFD
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_hyperv_eventfd (in)

This ioctl (un)registers an eventfd to receive notifications from the guest on
the specified Hyper-V connection id through the SIGNAL_EVENT hypercall, without
causing a user exit. SIGNAL_EVENT hypercall with non-zero event flag number
(bits 24-31) still triggers a KVM_EXIT_HYPERV_HCALL user exit.

struct kvm_hyperv_eventfd {
__u32 conn_id;
__s32 fd;
__u32 flags;
__u32 padding[3];
};

The conn_id field should fit within 24 bits:

#define KVM_HYPERV_CONN_ID_MASK 0x00ffffff

The acceptable values for the flags field are:

#define KVM_HYPERV_EVENTFD_DEASSIGN (1 << 0)

Returns: 0 on success,
-EINVAL if conn_id or flags is outside the allowed range
-ENOENT on deassign if the conn_id isn't registered
-EEXIST on assign if the conn_id is already registered


5. The kvm_run structure
------------------------
Expand Down Expand Up @@ -3873,7 +3905,7 @@ in userspace.
__u64 kvm_dirty_regs;
union {
struct kvm_sync_regs regs;
char padding[1024];
char padding[SYNC_REGS_SIZE_BYTES];
} s;

If KVM_CAP_SYNC_REGS is defined, these fields allow userspace to access
Expand Down Expand Up @@ -4078,6 +4110,46 @@ Once this is done the KVM_REG_MIPS_VEC_* and KVM_REG_MIPS_MSA_* registers can be
accessed, and the Config5.MSAEn bit is accessible via the KVM API and also from
the guest.

6.74 KVM_CAP_SYNC_REGS
Architectures: s390, x86
Target: s390: always enabled, x86: vcpu
Parameters: none
Returns: x86: KVM_CHECK_EXTENSION returns a bit-array indicating which register
sets are supported (bitfields defined in arch/x86/include/uapi/asm/kvm.h).

As described above in the kvm_sync_regs struct info in section 5 (kvm_run):
KVM_CAP_SYNC_REGS "allow[s] userspace to access certain guest registers
without having to call SET/GET_*REGS". This reduces overhead by eliminating
repeated ioctl calls for setting and/or getting register values. This is
particularly important when userspace is making synchronous guest state
modifications, e.g. when emulating and/or intercepting instructions in
userspace.

For s390 specifics, please refer to the source code.

For x86:
- the register sets to be copied out to kvm_run are selectable
by userspace (rather that all sets being copied out for every exit).
- vcpu_events are available in addition to regs and sregs.

For x86, the 'kvm_valid_regs' field of struct kvm_run is overloaded to
function as an input bit-array field set by userspace to indicate the
specific register sets to be copied out on the next exit.

To indicate when userspace has modified values that should be copied into
the vCPU, the all architecture bitarray field, 'kvm_dirty_regs' must be set.
This is done using the same bitflags as for the 'kvm_valid_regs' field.
If the dirty bit is not set, then the register set values will not be copied
into the vCPU even if they've been modified.

Unused bitfields in the bitarrays must be set to zero.

struct kvm_sync_regs {
struct kvm_regs regs;
struct kvm_sregs sregs;
struct kvm_vcpu_events events;
};

7. Capabilities that can be enabled on VMs
------------------------------------------

Expand Down Expand Up @@ -4286,6 +4358,26 @@ enables QEMU to build error log and branch to guest kernel registered
machine check handling routine. Without this capability KVM will
branch to guests' 0x200 interrupt vector.

7.13 KVM_CAP_X86_DISABLE_EXITS

Architectures: x86
Parameters: args[0] defines which exits are disabled
Returns: 0 on success, -EINVAL when args[0] contains invalid exits

Valid bits in args[0] are

#define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0)
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)

Enabling this capability on a VM provides userspace with a way to no
longer intercept some instructions for improved latency in some
workloads, and is suggested when vCPUs are associated to dedicated
physical CPUs. More bits can be added in the future; userspace can
just pass the KVM_CHECK_EXTENSION result to KVM_ENABLE_CAP to disable
all such vmexits.

Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.

8. Other capabilities.
----------------------

Expand Down Expand Up @@ -4398,15 +4490,6 @@ reserved.
Both registers and addresses are 64-bits wide.
It will be possible to run 64-bit or 32-bit guest code.

8.8 KVM_CAP_X86_GUEST_MWAIT

Architectures: x86

This capability indicates that guest using memory monotoring instructions
(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit. As such time
spent while virtual CPU is halted in this way will then be accounted for as
guest running time on the host (as opposed to e.g. HLT).

8.9 KVM_CAP_ARM_USER_IRQ

Architectures: arm, arm64
Expand Down Expand Up @@ -4483,3 +4566,33 @@ Parameters: none
This capability indicates if the flic device will be able to get/set the
AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
to discover this without having to create a flic device.

8.14 KVM_CAP_S390_PSW

Architectures: s390

This capability indicates that the PSW is exposed via the kvm_run structure.

8.15 KVM_CAP_S390_GMAP

Architectures: s390

This capability indicates that the user space memory used as guest mapping can
be anywhere in the user memory address space, as long as the memory slots are
aligned and sized to a segment (1MB) boundary.

8.16 KVM_CAP_S390_COW

Architectures: s390

This capability indicates that the user space memory used as guest mapping can
use copy-on-write semantics as well as dirty pages tracking via read-only page
tables.

8.17 KVM_CAP_S390_BPB

Architectures: s390

This capability indicates that kvm will implement the interfaces to handle
reset, migration and nested KVM for branch prediction blocking. The stfle
facility 82 should not be provided to the guest without this capability.
15 changes: 13 additions & 2 deletions Documentation/virtual/kvm/cpuid.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ This function queries the presence of KVM cpuid leafs.


function: define KVM_CPUID_FEATURES (0x40000001)
returns : ebx, ecx, edx = 0
eax = and OR'ed group of (1 << flag), where each flags is:
returns : ebx, ecx
eax = an OR'ed group of (1 << flag), where each flags is:


flag || value || meaning
Expand Down Expand Up @@ -66,3 +66,14 @@ KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side
|| || per-cpu warps are expected in
|| || kvmclock.
------------------------------------------------------------------------------

edx = an OR'ed group of (1 << flag), where each flags is:


flag || value || meaning
==================================================================================
KVM_HINTS_DEDICATED || 0 || guest checks this feature bit to
|| || determine if there is vCPU pinning
|| || and there is no vCPU over-commitment,
|| || allowing optimizations
----------------------------------------------------------------------------------
2 changes: 1 addition & 1 deletion MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -6516,7 +6516,7 @@ S: Maintained
F: Documentation/networking/netvsc.txt
F: arch/x86/include/asm/mshyperv.h
F: arch/x86/include/asm/trace/hyperv.h
F: arch/x86/include/uapi/asm/hyperv.h
F: arch/x86/include/asm/hyperv-tlfs.h
F: arch/x86/kernel/cpu/mshyperv.c
F: arch/x86/hyperv
F: drivers/hid/hid-hyperv.c
Expand Down
5 changes: 4 additions & 1 deletion arch/arm/include/asm/kvm_asm.h
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,10 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);

extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);

extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
/* no VHE on 32-bit :( */
static inline int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { BUG(); return 0; }

extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);

extern void __init_stage2_translation(void);

Expand Down
21 changes: 13 additions & 8 deletions arch/arm/include/asm/kvm_emulate.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,17 @@ static inline unsigned long *vcpu_reg32(struct kvm_vcpu *vcpu, u8 reg_num)
return vcpu_reg(vcpu, reg_num);
}

unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu);
unsigned long *__vcpu_spsr(struct kvm_vcpu *vcpu);

static inline unsigned long vpcu_read_spsr(struct kvm_vcpu *vcpu)
{
return *__vcpu_spsr(vcpu);
}

static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
{
*__vcpu_spsr(vcpu) = v;
}

static inline unsigned long vcpu_get_reg(struct kvm_vcpu *vcpu,
u8 reg_num)
Expand Down Expand Up @@ -92,14 +102,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
vcpu->arch.hcr = HCR_GUEST_MASK;
}

static inline unsigned long vcpu_get_hcr(const struct kvm_vcpu *vcpu)
{
return vcpu->arch.hcr;
}

static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
static inline unsigned long *vcpu_hcr(const struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr = hcr;
return (unsigned long *)&vcpu->arch.hcr;
}

static inline bool vcpu_mode_is_32bit(const struct kvm_vcpu *vcpu)
Expand Down
6 changes: 3 additions & 3 deletions arch/arm/include/asm/kvm_host.h
Original file line number Diff line number Diff line change
Expand Up @@ -155,9 +155,6 @@ struct kvm_vcpu_arch {
/* HYP trapping configuration */
u32 hcr;

/* Interrupt related fields */
u32 irq_lines; /* IRQ and FIQ levels */

/* Exception Information */
struct kvm_vcpu_fault_info fault;

Expand Down Expand Up @@ -315,4 +312,7 @@ static inline bool kvm_arm_harden_branch_predictor(void)
return false;
}

static inline void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu) {}
static inline void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu) {}

#endif /* __ARM_KVM_HOST_H__ */
4 changes: 4 additions & 0 deletions arch/arm/include/asm/kvm_hyp.h
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,10 @@ void __sysreg_restore_state(struct kvm_cpu_context *ctxt);

void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);

asmlinkage void __vfp_save_state(struct vfp_hard_struct *vfp);
asmlinkage void __vfp_restore_state(struct vfp_hard_struct *vfp);
Expand Down
16 changes: 15 additions & 1 deletion arch/arm/include/asm/kvm_mmu.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,13 @@
*/
#define kern_hyp_va(kva) (kva)

/* Contrary to arm64, there is no need to generate a PC-relative address */
#define hyp_symbol_addr(s) \
({ \
typeof(s) *addr = &(s); \
addr; \
})

/*
* KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels.
*/
Expand All @@ -42,8 +49,15 @@
#include <asm/pgalloc.h>
#include <asm/stage2_pgtable.h>

/* Ensure compatibility with arm64 */
#define VA_BITS 32

int create_hyp_mappings(void *from, void *to, pgprot_t prot);
int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
void __iomem **kaddr,
void __iomem **haddr);
int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
void **haddr);
void free_hyp_pgds(void);

void stage2_unmap_vm(struct kvm *kvm);
Expand Down
9 changes: 9 additions & 0 deletions arch/arm/include/uapi/asm/kvm.h
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,15 @@ struct kvm_arch_memory_slot {
#define KVM_REG_ARM_CRM_SHIFT 7
#define KVM_REG_ARM_32_CRN_MASK 0x0000000000007800
#define KVM_REG_ARM_32_CRN_SHIFT 11
/*
* For KVM currently all guest registers are nonsecure, but we reserve a bit
* in the encoding to distinguish secure from nonsecure for AArch32 system
* registers that are banked by security. This is 1 for the secure banked
* register, and 0 for the nonsecure banked register or if the register is
* not banked by security.
*/
#define KVM_REG_ARM_SECURE_MASK 0x0000000010000000
#define KVM_REG_ARM_SECURE_SHIFT 28

#define ARM_CP15_REG_SHIFT_MASK(x,n) \
(((x) << KVM_REG_ARM_ ## n ## _SHIFT) & KVM_REG_ARM_ ## n ## _MASK)
Expand Down
Loading

0 comments on commit d8312a3

Please sign in to comment.