Skip to content

Commit

Permalink
Merge branch 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linu…
Browse files Browse the repository at this point in the history
…x/kernel/git/tip/tip

Pull x86 MPX support from Thomas Gleixner:
 "This enables support for x86 MPX.

  MPX is a new debug feature for bound checking in user space.  It
  requires kernel support to handle the bound tables and decode the
  bound violating instruction in the trap handler"

* 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  asm-generic: Remove asm-generic arch_bprm_mm_init()
  mm: Make arch_unmap()/bprm_mm_init() available to all architectures
  x86: Cleanly separate use of asm-generic/mm_hooks.h
  x86 mpx: Change return type of get_reg_offset()
  fs: Do not include mpx.h in exec.c
  x86, mpx: Add documentation on Intel MPX
  x86, mpx: Cleanup unused bound tables
  x86, mpx: On-demand kernel allocation of bounds tables
  x86, mpx: Decode MPX instruction to get bound violation information
  x86, mpx: Add MPX-specific mmap interface
  x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
  x86, mpx: Add MPX to disabled features
  ia64: Sync struct siginfo with general version
  mips: Sync struct siginfo with general version
  mpx: Extend siginfo structure to include bound violation information
  x86, mpx: Rename cfg_reg_u and status_reg
  x86: mpx: Give bndX registers actual names
  x86: Remove arbitrary instruction size limit in instruction decoder
  • Loading branch information
torvalds committed Dec 10, 2014
2 parents 9e66645 + 9f7789f commit 3eb5b89
Show file tree
Hide file tree
Showing 35 changed files with 1,591 additions and 47 deletions.
234 changes: 234 additions & 0 deletions Documentation/x86/intel_mpx.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
1. Intel(R) MPX Overview
========================

Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability
introduced into Intel Architecture. Intel MPX provides hardware features
that can be used in conjunction with compiler changes to check memory
references, for those references whose compile-time normal intentions are
usurped at runtime due to buffer overflow or underflow.

For more information, please refer to Intel(R) Architecture Instruction
Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection
Extensions.

Note: Currently no hardware with MPX ISA is available but it is always
possible to use SDE (Intel(R) Software Development Emulator) instead, which
can be downloaded from
http://software.intel.com/en-us/articles/intel-software-development-emulator


2. How to get the advantage of MPX
==================================

For MPX to work, changes are required in the kernel, binutils and compiler.
No source changes are required for applications, just a recompile.

There are a lot of moving parts of this to all work right. The following
is how we expect the compiler, application and kernel to work together.

1) Application developer compiles with -fmpx. The compiler will add the
instrumentation as well as some setup code called early after the app
starts. New instruction prefixes are noops for old CPUs.
2) That setup code allocates (virtual) space for the "bounds directory",
points the "bndcfgu" register to the directory and notifies the kernel
(via the new prctl(PR_MPX_ENABLE_MANAGEMENT)) that the app will be using
MPX.
3) The kernel detects that the CPU has MPX, allows the new prctl() to
succeed, and notes the location of the bounds directory. Userspace is
expected to keep the bounds directory at that locationWe note it
instead of reading it each time because the 'xsave' operation needed
to access the bounds directory register is an expensive operation.
4) If the application needs to spill bounds out of the 4 registers, it
issues a bndstx instruction. Since the bounds directory is empty at
this point, a bounds fault (#BR) is raised, the kernel allocates a
bounds table (in the user address space) and makes the relevant entry
in the bounds directory point to the new table.
5) If the application violates the bounds specified in the bounds registers,
a separate kind of #BR is raised which will deliver a signal with
information about the violation in the 'struct siginfo'.
6) Whenever memory is freed, we know that it can no longer contain valid
pointers, and we attempt to free the associated space in the bounds
tables. If an entire table becomes unused, we will attempt to free
the table and remove the entry in the directory.

To summarize, there are essentially three things interacting here:

GCC with -fmpx:
* enables annotation of code with MPX instructions and prefixes
* inserts code early in the application to call in to the "gcc runtime"
GCC MPX Runtime:
* Checks for hardware MPX support in cpuid leaf
* allocates virtual space for the bounds directory (malloc() essentially)
* points the hardware BNDCFGU register at the directory
* calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to
start managing the bounds directories
Kernel MPX Code:
* Checks for hardware MPX support in cpuid leaf
* Handles #BR exceptions and sends SIGSEGV to the app when it violates
bounds, like during a buffer overflow.
* When bounds are spilled in to an unallocated bounds table, the kernel
notices in the #BR exception, allocates the virtual space, then
updates the bounds directory to point to the new table. It keeps
special track of the memory with a VM_MPX flag.
* Frees unused bounds tables at the time that the memory they described
is unmapped.


3. How does MPX kernel code work
================================

Handling #BR faults caused by MPX
---------------------------------

When MPX is enabled, there are 2 new situations that can generate
#BR faults.
* new bounds tables (BT) need to be allocated to save bounds.
* bounds violation caused by MPX instructions.

We hook #BR handler to handle these two new situations.

On-demand kernel allocation of bounds tables
--------------------------------------------

MPX only has 4 hardware registers for storing bounds information. If
MPX-enabled code needs more than these 4 registers, it needs to spill
them somewhere. It has two special instructions for this which allow
the bounds to be moved between the bounds registers and some new "bounds
tables".

#BR exceptions are a new class of exceptions just for MPX. They are
similar conceptually to a page fault and will be raised by the MPX
hardware during both bounds violations or when the tables are not
present. The kernel handles those #BR exceptions for not-present tables
by carving the space out of the normal processes address space and then
pointing the bounds-directory over to it.

The tables need to be accessed and controlled by userspace because
the instructions for moving bounds in and out of them are extremely
frequent. They potentially happen every time a register points to
memory. Any direct kernel involvement (like a syscall) to access the
tables would obviously destroy performance.

Why not do this in userspace? MPX does not strictly require anything in
the kernel. It can theoretically be done completely from userspace. Here
are a few ways this could be done. We don't think any of them are practical
in the real-world, but here they are.

Q: Can virtual space simply be reserved for the bounds tables so that we
never have to allocate them?
A: MPX-enabled application will possibly create a lot of bounds tables in
process address space to save bounds information. These tables can take
up huge swaths of memory (as much as 80% of the memory on the system)
even if we clean them up aggressively. In the worst-case scenario, the
tables can be 4x the size of the data structure being tracked. IOW, a
1-page structure can require 4 bounds-table pages. An X-GB virtual
area needs 4*X GB of virtual space, plus 2GB for the bounds directory.
If we were to preallocate them for the 128TB of user virtual address
space, we would need to reserve 512TB+2GB, which is larger than the
entire virtual address space today. This means they can not be reserved
ahead of time. Also, a single process's pre-popualated bounds directory
consumes 2GB of virtual *AND* physical memory. IOW, it's completely
infeasible to prepopulate bounds directories.

Q: Can we preallocate bounds table space at the same time memory is
allocated which might contain pointers that might eventually need
bounds tables?
A: This would work if we could hook the site of each and every memory
allocation syscall. This can be done for small, constrained applications.
But, it isn't practical at a larger scale since a given app has no
way of controlling how all the parts of the app might allocate memory
(think libraries). The kernel is really the only place to intercept
these calls.

Q: Could a bounds fault be handed to userspace and the tables allocated
there in a signal handler intead of in the kernel?
A: mmap() is not on the list of safe async handler functions and even
if mmap() would work it still requires locking or nasty tricks to
keep track of the allocation state there.

Having ruled out all of the userspace-only approaches for managing
bounds tables that we could think of, we create them on demand in
the kernel.

Decoding MPX instructions
-------------------------

If a #BR is generated due to a bounds violation caused by MPX.
We need to decode MPX instructions to get violation address and
set this address into extended struct siginfo.

The _sigfault feild of struct siginfo is extended as follow:

87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
88 struct {
89 void __user *_addr; /* faulting insn/memory ref. */
90 #ifdef __ARCH_SI_TRAPNO
91 int _trapno; /* TRAP # which caused the signal */
92 #endif
93 short _addr_lsb; /* LSB of the reported address */
94 struct {
95 void __user *_lower;
96 void __user *_upper;
97 } _addr_bnd;
98 } _sigfault;

The '_addr' field refers to violation address, and new '_addr_and'
field refers to the upper/lower bounds when a #BR is caused.

Glibc will be also updated to support this new siginfo. So user
can get violation address and bounds when bounds violations occur.

Cleanup unused bounds tables
----------------------------

When a BNDSTX instruction attempts to save bounds to a bounds directory
entry marked as invalid, a #BR is generated. This is an indication that
no bounds table exists for this entry. In this case the fault handler
will allocate a new bounds table on demand.

Since the kernel allocated those tables on-demand without userspace
knowledge, it is also responsible for freeing them when the associated
mappings go away.

Here, the solution for this issue is to hook do_munmap() to check
whether one process is MPX enabled. If yes, those bounds tables covered
in the virtual address region which is being unmapped will be freed also.

Adding new prctl commands
-------------------------

Two new prctl commands are added to enable and disable MPX bounds tables
management in kernel.

155 #define PR_MPX_ENABLE_MANAGEMENT 43
156 #define PR_MPX_DISABLE_MANAGEMENT 44

Runtime library in userspace is responsible for allocation of bounds
directory. So kernel have to use XSAVE instruction to get the base
of bounds directory from BNDCFG register.

But XSAVE is expected to be very expensive. In order to do performance
optimization, we have to get the base of bounds directory and save it
into struct mm_struct to be used in future during PR_MPX_ENABLE_MANAGEMENT
command execution.


4. Special rules
================

1) If userspace is requesting help from the kernel to do the management
of bounds tables, it may not create or modify entries in the bounds directory.

Certainly users can allocate bounds tables and forcibly point the bounds
directory at them through XSAVE instruction, and then set valid bit
of bounds entry to have this entry valid. But, the kernel will decline
to assist in managing these tables.

2) Userspace may not take multiple bounds directory entries and point
them at the same bounds table.

This is allowed architecturally. See more information "Intel(R) Architecture
Instruction Set Extensions Programming Reference" (9.3.4).

However, if users did this, the kernel might be fooled in to unmaping an
in-use bounds table since it does not recognize sharing.
8 changes: 6 additions & 2 deletions arch/ia64/include/uapi/asm/siginfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@ typedef struct siginfo {
unsigned int _flags; /* see below */
unsigned long _isr; /* isr */
short _addr_lsb; /* lsb of faulting address */
struct {
void __user *_lower;
void __user *_upper;
} _addr_bnd;
} _sigfault;

/* SIGPOLL */
Expand Down Expand Up @@ -110,9 +114,9 @@ typedef struct siginfo {
/*
* SIGSEGV si_codes
*/
#define __SEGV_PSTKOVF (__SI_FAULT|3) /* paragraph stack overflow */
#define __SEGV_PSTKOVF (__SI_FAULT|4) /* paragraph stack overflow */
#undef NSIGSEGV
#define NSIGSEGV 3
#define NSIGSEGV 4

#undef NSIGTRAP
#define NSIGTRAP 4
Expand Down
4 changes: 4 additions & 0 deletions arch/mips/include/uapi/asm/siginfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,10 @@ typedef struct siginfo {
int _trapno; /* TRAP # which caused the signal */
#endif
short _addr_lsb;
struct {
void __user *_lower;
void __user *_upper;
} _addr_bnd;
} _sigfault;

/* SIGPOLL, SIGXFSZ (To do ...) */
Expand Down
11 changes: 11 additions & 0 deletions arch/s390/include/asm/mmu_context.h
Original file line number Diff line number Diff line change
Expand Up @@ -120,4 +120,15 @@ static inline void arch_exit_mmap(struct mm_struct *mm)
{
}

static inline void arch_unmap(struct mm_struct *mm,
struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
}

static inline void arch_bprm_mm_init(struct mm_struct *mm,
struct vm_area_struct *vma)
{
}

#endif /* __S390_MMU_CONTEXT_H */
24 changes: 19 additions & 5 deletions arch/um/include/asm/mmu_context.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,26 @@
#include <asm/mmu.h>

extern void uml_setup_stubs(struct mm_struct *mm);
/*
* Needed since we do not use the asm-generic/mm_hooks.h:
*/
static inline void arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
{
uml_setup_stubs(mm);
}
extern void arch_exit_mmap(struct mm_struct *mm);
static inline void arch_unmap(struct mm_struct *mm,
struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
}
static inline void arch_bprm_mm_init(struct mm_struct *mm,
struct vm_area_struct *vma)
{
}
/*
* end asm-generic/mm_hooks.h functions
*/

#define deactivate_mm(tsk,mm) do { } while (0)

Expand Down Expand Up @@ -41,11 +60,6 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
}
}

static inline void arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
{
uml_setup_stubs(mm);
}

static inline void enter_lazy_tlb(struct mm_struct *mm,
struct task_struct *tsk)
{
Expand Down
11 changes: 11 additions & 0 deletions arch/unicore32/include/asm/mmu_context.h
Original file line number Diff line number Diff line change
Expand Up @@ -86,4 +86,15 @@ static inline void arch_dup_mmap(struct mm_struct *oldmm,
{
}

static inline void arch_unmap(struct mm_struct *mm,
struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
}

static inline void arch_bprm_mm_init(struct mm_struct *mm,
struct vm_area_struct *vma)
{
}

#endif
4 changes: 4 additions & 0 deletions arch/x86/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,10 @@ config HAVE_INTEL_TXT
def_bool y
depends on INTEL_IOMMU && ACPI

config X86_INTEL_MPX
def_bool y
depends on CPU_SUP_INTEL

config X86_32_SMP
def_bool y
depends on X86_32 && SMP
Expand Down
8 changes: 7 additions & 1 deletion arch/x86/include/asm/disabled-features.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@
* cpu_feature_enabled().
*/

#ifdef CONFIG_X86_INTEL_MPX
# define DISABLE_MPX 0
#else
# define DISABLE_MPX (1<<(X86_FEATURE_MPX & 31))
#endif

#ifdef CONFIG_X86_64
# define DISABLE_VME (1<<(X86_FEATURE_VME & 31))
# define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31))
Expand All @@ -34,6 +40,6 @@
#define DISABLED_MASK6 0
#define DISABLED_MASK7 0
#define DISABLED_MASK8 0
#define DISABLED_MASK9 0
#define DISABLED_MASK9 (DISABLE_MPX)

#endif /* _ASM_X86_DISABLED_FEATURES_H */
Loading

0 comments on commit 3eb5b89

Please sign in to comment.