Skip to content

Commit

Permalink
Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/k…
Browse files Browse the repository at this point in the history
…ernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:

 - Replace pcommit with ADR / directed-flushing.

   The pcommit instruction, which has not shipped on any product, is
   deprecated.  Instead, the requirement is that platforms implement
   either ADR, or provide one or more flush addresses per nvdimm.

   ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
   to the memory controller on a power-fail event.

   Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
   Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
   A flush hint is an mmio address that when written and fenced assures
   that all previous posted writes targeting a given dimm have been
   flushed to media.

 - On-demand ARS (address range scrub).

   Linux uses the results of the ACPI ARS commands to track bad blocks
   in pmem devices.  When latent errors are detected we re-scrub the
   media to refresh the bad block list, userspace can also request a
   re-scrub at any time.

 - Support for the Microsoft DSM (device specific method) command
   format.

 - Support for EDK2/OVMF virtual disk device memory ranges.

 - Various fixes and cleanups across the subsystem.

* tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
  libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
  nfit: do an ARS scrub on hitting a latent media error
  nfit: move to nfit/ sub-directory
  nfit, libnvdimm: allow an ARS scrub to be triggered on demand
  libnvdimm: register nvdimm_bus devices with an nd_bus driver
  pmem: clarify a debug print in pmem_clear_poison
  x86/insn: remove pcommit
  Revert "KVM: x86: add pcommit support"
  nfit, tools/testing/nvdimm/: unify shutdown paths
  libnvdimm: move ->module to struct nvdimm_bus_descriptor
  nfit: cleanup acpi_nfit_init calling convention
  nfit: fix _FIT evaluation memory leak + use after free
  tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
  tools/testing/nvdimm: add virtual ramdisk range
  acpi, nfit: treat virtual ramdisk SPA as pmem region
  pmem: kill __pmem address space
  pmem: kill wmb_pmem()
  libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
  fs/dax: remove wmb_pmem()
  libnvdimm, pmem: flush posted-write queues on shutdown
  ...
  • Loading branch information
torvalds committed Jul 29, 2016
2 parents d94ba9e + 0606263 commit f0c98eb
Show file tree
Hide file tree
Showing 65 changed files with 1,375 additions and 1,015 deletions.
2 changes: 1 addition & 1 deletion Documentation/filesystems/Locking
Original file line number Diff line number Diff line change
Expand Up @@ -395,7 +395,7 @@ prototypes:
int (*release) (struct gendisk *, fmode_t);
int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
int (*direct_access) (struct block_device *, sector_t, void __pmem **,
int (*direct_access) (struct block_device *, sector_t, void **,
unsigned long *);
int (*media_changed) (struct gendisk *);
void (*unlock_native_capacity) (struct gendisk *);
Expand Down
28 changes: 9 additions & 19 deletions Documentation/nvdimm/btt.txt
Original file line number Diff line number Diff line change
Expand Up @@ -256,28 +256,18 @@ If any of these error conditions are encountered, the arena is put into a read
only state using a flag in the info block.


5. In-kernel usage
==================
5. Usage
========

Any block driver that supports byte granularity IO to the storage may register
with the BTT. It will have to provide the rw_bytes interface in its
block_device_operations struct:
The BTT can be set up on any disk (namespace) exposed by the libnvdimm subsystem
(pmem, or blk mode). The easiest way to set up such a namespace is using the
'ndctl' utility [1]:

int (*rw_bytes)(struct gendisk *, void *, size_t, off_t, int rw);
For example, the ndctl command line to setup a btt with a 4k sector size is:

It may register with the BTT after it adds its own gendisk, using btt_init:
ndctl create-namespace -f -e namespace0.0 -m sector -l 4k

struct btt *btt_init(struct gendisk *disk, unsigned long long rawsize,
u32 lbasize, u8 uuid[], int maxlane);
See ndctl create-namespace --help for more options.

note that maxlane is the maximum amount of concurrency the driver wishes to
allow the BTT to use.

The BTT 'disk' appears as a stacked block device that grabs the underlying block
device in the O_EXCL mode.

When the driver wishes to remove the backing disk, it should similarly call
btt_fini using the same struct btt* handle that was provided to it by btt_init.

void btt_fini(struct btt *btt);
[1]: https://github.com/pmem/ndctl

4 changes: 2 additions & 2 deletions arch/powerpc/sysdev/axonram.c
Original file line number Diff line number Diff line change
Expand Up @@ -143,12 +143,12 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
*/
static long
axon_ram_direct_access(struct block_device *device, sector_t sector,
void __pmem **kaddr, pfn_t *pfn, long size)
void **kaddr, pfn_t *pfn, long size)
{
struct axon_ram_bank *bank = device->bd_disk->private_data;
loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;

*kaddr = (void __pmem __force *) bank->io_addr + offset;
*kaddr = (void *) bank->io_addr + offset;
*pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
return bank->size - offset;
}
Expand Down
1 change: 0 additions & 1 deletion arch/x86/include/asm/cpufeatures.h
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,6 @@
#define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */
#define X86_FEATURE_ADX ( 9*32+19) /* The ADCX and ADOX instructions */
#define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */
#define X86_FEATURE_PCOMMIT ( 9*32+22) /* PCOMMIT instruction */
#define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */
#define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */
#define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
Expand Down
77 changes: 19 additions & 58 deletions arch/x86/include/asm/pmem.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,73 +26,48 @@
* @n: length of the copy in bytes
*
* Copy data to persistent memory media via non-temporal stores so that
* a subsequent arch_wmb_pmem() can flush cpu and memory controller
* write buffers to guarantee durability.
* a subsequent pmem driver flush operation will drain posted write queues.
*/
static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
size_t n)
static inline void arch_memcpy_to_pmem(void *dst, const void *src, size_t n)
{
int unwritten;
int rem;

/*
* We are copying between two kernel buffers, if
* __copy_from_user_inatomic_nocache() returns an error (page
* fault) we would have already reported a general protection fault
* before the WARN+BUG.
*/
unwritten = __copy_from_user_inatomic_nocache((void __force *) dst,
(void __user *) src, n);
if (WARN(unwritten, "%s: fault copying %p <- %p unwritten: %d\n",
__func__, dst, src, unwritten))
rem = __copy_from_user_inatomic_nocache(dst, (void __user *) src, n);
if (WARN(rem, "%s: fault copying %p <- %p unwritten: %d\n",
__func__, dst, src, rem))
BUG();
}

static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src,
size_t n)
static inline int arch_memcpy_from_pmem(void *dst, const void *src, size_t n)
{
if (static_cpu_has(X86_FEATURE_MCE_RECOVERY))
return memcpy_mcsafe(dst, (void __force *) src, n);
memcpy(dst, (void __force *) src, n);
return memcpy_mcsafe(dst, src, n);
memcpy(dst, src, n);
return 0;
}

/**
* arch_wmb_pmem - synchronize writes to persistent memory
*
* After a series of arch_memcpy_to_pmem() operations this drains data
* from cpu write buffers and any platform (memory controller) buffers
* to ensure that written data is durable on persistent memory media.
*/
static inline void arch_wmb_pmem(void)
{
/*
* wmb() to 'sfence' all previous writes such that they are
* architecturally visible to 'pcommit'. Note, that we've
* already arranged for pmem writes to avoid the cache via
* arch_memcpy_to_pmem().
*/
wmb();
pcommit_sfence();
}

/**
* arch_wb_cache_pmem - write back a cache range with CLWB
* @vaddr: virtual start address
* @size: number of bytes to write back
*
* Write back a cache range using the CLWB (cache line write back)
* instruction. This function requires explicit ordering with an
* arch_wmb_pmem() call.
* instruction.
*/
static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size)
static inline void arch_wb_cache_pmem(void *addr, size_t size)
{
u16 x86_clflush_size = boot_cpu_data.x86_clflush_size;
unsigned long clflush_mask = x86_clflush_size - 1;
void *vaddr = (void __force *)addr;
void *vend = vaddr + size;
void *vend = addr + size;
void *p;

for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
for (p = (void *)((unsigned long)addr & ~clflush_mask);
p < vend; p += x86_clflush_size)
clwb(p);
}
Expand All @@ -113,16 +88,14 @@ static inline bool __iter_needs_pmem_wb(struct iov_iter *i)
* @i: iterator with source data
*
* Copy data from the iterator 'i' to the PMEM buffer starting at 'addr'.
* This function requires explicit ordering with an arch_wmb_pmem() call.
*/
static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
static inline size_t arch_copy_from_iter_pmem(void *addr, size_t bytes,
struct iov_iter *i)
{
void *vaddr = (void __force *)addr;
size_t len;

/* TODO: skip the write-back by always using non-temporal stores */
len = copy_from_iter_nocache(vaddr, bytes, i);
len = copy_from_iter_nocache(addr, bytes, i);

if (__iter_needs_pmem_wb(i))
arch_wb_cache_pmem(addr, bytes);
Expand All @@ -136,28 +109,16 @@ static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
* @size: number of bytes to zero
*
* Write zeros into the memory range starting at 'addr' for 'size' bytes.
* This function requires explicit ordering with an arch_wmb_pmem() call.
*/
static inline void arch_clear_pmem(void __pmem *addr, size_t size)
static inline void arch_clear_pmem(void *addr, size_t size)
{
void *vaddr = (void __force *)addr;

memset(vaddr, 0, size);
memset(addr, 0, size);
arch_wb_cache_pmem(addr, size);
}

static inline void arch_invalidate_pmem(void __pmem *addr, size_t size)
static inline void arch_invalidate_pmem(void *addr, size_t size)
{
clflush_cache_range((void __force *) addr, size);
}

static inline bool __arch_has_wmb_pmem(void)
{
/*
* We require that wmb() be an 'sfence', that is only guaranteed on
* 64-bit builds
*/
return static_cpu_has(X86_FEATURE_PCOMMIT);
clflush_cache_range(addr, size);
}
#endif /* CONFIG_ARCH_HAS_PMEM_API */
#endif /* __ASM_X86_PMEM_H__ */
46 changes: 0 additions & 46 deletions arch/x86/include/asm/special_insns.h
Original file line number Diff line number Diff line change
Expand Up @@ -253,52 +253,6 @@ static inline void clwb(volatile void *__p)
: [pax] "a" (p));
}

/**
* pcommit_sfence() - persistent commit and fence
*
* The PCOMMIT instruction ensures that data that has been flushed from the
* processor's cache hierarchy with CLWB, CLFLUSHOPT or CLFLUSH is accepted to
* memory and is durable on the DIMM. The primary use case for this is
* persistent memory.
*
* This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH and PCOMMIT
* with appropriate fencing.
*
* Example:
* void flush_and_commit_buffer(void *vaddr, unsigned int size)
* {
* unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
* void *vend = vaddr + size;
* void *p;
*
* for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
* p < vend; p += boot_cpu_data.x86_clflush_size)
* clwb(p);
*
* // SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes
* // MFENCE via mb() also works
* wmb();
*
* // PCOMMIT and the required SFENCE for ordering
* pcommit_sfence();
* }
*
* After this function completes the data pointed to by 'vaddr' has been
* accepted to memory and will be durable if the 'vaddr' points to persistent
* memory.
*
* PCOMMIT must always be ordered by an MFENCE or SFENCE, so to help simplify
* things we include both the PCOMMIT and the required SFENCE in the
* alternatives generated by pcommit_sfence().
*/
static inline void pcommit_sfence(void)
{
alternative(ASM_NOP7,
".byte 0x66, 0x0f, 0xae, 0xf8\n\t" /* pcommit */
"sfence",
X86_FEATURE_PCOMMIT);
}

#define nop() asm volatile ("nop")


Expand Down
1 change: 0 additions & 1 deletion arch/x86/include/asm/vmx.h
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,6 @@
#define SECONDARY_EXEC_SHADOW_VMCS 0x00004000
#define SECONDARY_EXEC_ENABLE_PML 0x00020000
#define SECONDARY_EXEC_XSAVES 0x00100000
#define SECONDARY_EXEC_PCOMMIT 0x00200000
#define SECONDARY_EXEC_TSC_SCALING 0x02000000

#define PIN_BASED_EXT_INTR_MASK 0x00000001
Expand Down
4 changes: 1 addition & 3 deletions arch/x86/include/uapi/asm/vmx.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,6 @@
#define EXIT_REASON_PML_FULL 62
#define EXIT_REASON_XSAVES 63
#define EXIT_REASON_XRSTORS 64
#define EXIT_REASON_PCOMMIT 65

#define VMX_EXIT_REASONS \
{ EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \
Expand Down Expand Up @@ -127,8 +126,7 @@
{ EXIT_REASON_INVVPID, "INVVPID" }, \
{ EXIT_REASON_INVPCID, "INVPCID" }, \
{ EXIT_REASON_XSAVES, "XSAVES" }, \
{ EXIT_REASON_XRSTORS, "XRSTORS" }, \
{ EXIT_REASON_PCOMMIT, "PCOMMIT" }
{ EXIT_REASON_XRSTORS, "XRSTORS" }

#define VMX_ABORT_SAVE_GUEST_MSR_FAIL 1
#define VMX_ABORT_LOAD_HOST_MSR_FAIL 4
Expand Down
2 changes: 1 addition & 1 deletion arch/x86/kvm/cpuid.c
Original file line number Diff line number Diff line change
Expand Up @@ -366,7 +366,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | f_mpx | F(RDSEED) |
F(ADX) | F(SMAP) | F(AVX512F) | F(AVX512PF) | F(AVX512ER) |
F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(PCOMMIT);
F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB);

/* cpuid 0xD.1.eax */
const u32 kvm_cpuid_D_1_eax_x86_features =
Expand Down
8 changes: 0 additions & 8 deletions arch/x86/kvm/cpuid.h
Original file line number Diff line number Diff line change
Expand Up @@ -144,14 +144,6 @@ static inline bool guest_cpuid_has_rtm(struct kvm_vcpu *vcpu)
return best && (best->ebx & bit(X86_FEATURE_RTM));
}

static inline bool guest_cpuid_has_pcommit(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;

best = kvm_find_cpuid_entry(vcpu, 7, 0);
return best && (best->ebx & bit(X86_FEATURE_PCOMMIT));
}

static inline bool guest_cpuid_has_rdtscp(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
Expand Down
Loading

0 comments on commit f0c98eb

Please sign in to comment.