Skip to content

Commit

Permalink
Merge branches 'for-next/tpidr2' and 'for-next/sme2' into for-next/si…
Browse files Browse the repository at this point in the history
…gnal

Patches on this branch depend on the branches merged above.
  • Loading branch information
ctmarinas committed Feb 1, 2023
2 parents 8ced928 + b2ab432 commit ea776e4
Show file tree
Hide file tree
Showing 40 changed files with 1,557 additions and 72 deletions.
10 changes: 10 additions & 0 deletions Documentation/arm64/booting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,16 @@ Before jumping into the kernel, the following conditions must be met:

- HCR_EL2.ATA (bit 56) must be initialised to 0b1.

For CPUs with the Scalable Matrix Extension version 2 (FEAT_SME2):

- If EL3 is present:

- SMCR_EL3.EZT0 (bit 30) must be initialised to 0b1.

- If the kernel is entered at EL1 and EL2 is present:

- SMCR_EL2.EZT0 (bit 30) must be initialised to 0b1.

The requirements described above for CPU mode, caches, MMUs, architected
timers, coherency and system registers apply to all CPUs. All CPUs must
enter the kernel in the same exception level. Where the values documented
Expand Down
18 changes: 18 additions & 0 deletions Documentation/arm64/elf_hwcaps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,24 @@ HWCAP2_RPRFM
HWCAP2_SVE2P1
Functionality implied by ID_AA64ZFR0_EL1.SVEver == 0b0010.

HWCAP2_SME2
Functionality implied by ID_AA64SMFR0_EL1.SMEver == 0b0001.

HWCAP2_SME2P1
Functionality implied by ID_AA64SMFR0_EL1.SMEver == 0b0010.

HWCAP2_SMEI16I32
Functionality implied by ID_AA64SMFR0_EL1.I16I32 == 0b0101

HWCAP2_SMEBI32I32
Functionality implied by ID_AA64SMFR0_EL1.BI32I32 == 0b1

HWCAP2_SMEB16B16
Functionality implied by ID_AA64SMFR0_EL1.B16B16 == 0b1

HWCAP2_SMEF16F16
Functionality implied by ID_AA64SMFR0_EL1.F16F16 == 0b1

4. Unused AT_HWCAP bits
-----------------------

Expand Down
52 changes: 43 additions & 9 deletions Documentation/arm64/sme.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,19 @@ model features for SME is included in Appendix A.
1. General
-----------

* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA
register state and TPIDR2_EL0 are tracked per thread.
* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA and (when
present) ZTn register state and TPIDR2_EL0 are tracked per thread.

* The presence of SME is reported to userspace via HWCAP2_SME in the aux vector
AT_HWCAP2 entry. Presence of this flag implies the presence of the SME
instructions and registers, and the Linux-specific system interfaces
described in this document. SME is reported in /proc/cpuinfo as "sme".

* The presence of SME2 is reported to userspace via HWCAP2_SME2 in the
aux vector AT_HWCAP2 entry. Presence of this flag implies the presence of
the SME2 instructions and ZT0, and the Linux-specific system interfaces
described in this document. SME2 is reported in /proc/cpuinfo as "sme2".

* Support for the execution of SME instructions in userspace can also be
detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS
instruction, and checking that the value of the SME field is nonzero. [3]
Expand All @@ -44,6 +49,7 @@ model features for SME is included in Appendix A.
HWCAP2_SME_B16F32
HWCAP2_SME_F32F32
HWCAP2_SME_FA64
HWCAP2_SME2

This list may be extended over time as the SME architecture evolves.

Expand All @@ -52,8 +58,8 @@ model features for SME is included in Appendix A.
cpu-feature-registers.txt for details.

* Debuggers should restrict themselves to interacting with the target via the
NT_ARM_SVE, NT_ARM_SSVE and NT_ARM_ZA regsets. The recommended way
of detecting support for these regsets is to connect to a target process
NT_ARM_SVE, NT_ARM_SSVE, NT_ARM_ZA and NT_ARM_ZT regsets. The recommended
way of detecting support for these regsets is to connect to a target process
first and then attempt a

ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov).
Expand Down Expand Up @@ -89,13 +95,13 @@ be zeroed.
-------------------------

* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the
ZA matrix are preserved.
ZA matrix and ZTn (if present) are preserved.

* On syscall PSTATE.SM will be cleared and the SVE registers will be handled
as per the standard SVE ABI.

* Neither the SVE registers nor ZA are used to pass arguments to or receive
results from any syscall.
* None of the SVE registers, ZA or ZTn are used to pass arguments to
or receive results from any syscall.

* On process creation (eg, clone()) the newly created process will have
PSTATE.SM cleared.
Expand Down Expand Up @@ -137,6 +143,14 @@ be zeroed.
__reserved[] referencing this space. za_context is then written in the
extra space. Refer to [1] for further details about this mechanism.

* If ZTn is supported and PSTATE.ZA==1 then a signal frame record for ZTn will
be generated.

* The signal record for ZTn has magic ZT_MAGIC (0x5a544e01) and consists of a
standard signal frame header followed by a struct zt_context specifying
the number of ZTn registers supported by the system, then zt_context.nregs
blocks of 64 bytes of data per register.


5. Signal return
-----------------
Expand All @@ -154,6 +168,9 @@ When returning from a signal handler:
the signal frame does not match the current vector length, the signal return
attempt is treated as illegal, resulting in a forced SIGSEGV.

* If ZTn is not supported or PSTATE.ZA==0 then it is illegal to have a
signal frame record for ZTn, resulting in a forced SIGSEGV.


6. prctl extensions
--------------------
Expand Down Expand Up @@ -217,8 +234,8 @@ prctl(PR_SME_SET_VL, unsigned long arg)
vector length that will be applied at the next execve() by the calling
thread.

* Changing the vector length causes all of ZA, P0..P15, FFR and all bits of
Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
* Changing the vector length causes all of ZA, ZTn, P0..P15, FFR and all
bits of Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
unspecified, including both streaming and non-streaming SVE state.
Calling PR_SME_SET_VL with vl equal to the thread's current vector
length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
Expand Down Expand Up @@ -320,6 +337,15 @@ The regset data starts with struct user_za_header, containing:

* The effect of writing a partial, incomplete payload is unspecified.

* A new regset NT_ARM_ZT is defined for access to ZTn state via
PTRACE_GETREGSET and PTRACE_SETREGSET.

* The NT_ARM_ZT regset consists of a single 512 bit register.

* When PSTATE.ZA==0 reads of NT_ARM_ZT will report all bits of ZTn as 0.

* Writes to NT_ARM_ZT will set PSTATE.ZA to 1.


8. ELF coredump extensions
---------------------------
Expand All @@ -334,6 +360,11 @@ The regset data starts with struct user_za_header, containing:
been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread
when the coredump was generated.

* A NT_ARM_ZT note will be added to each coredump for each thread of the
dumped process. The contents will be equivalent to the data that would have
been read if a PTRACE_GETREGSET of NT_ARM_ZT were executed for each thread
when the coredump was generated.

* The NT_ARM_TLS note will be extended to two registers, the second register
will contain TPIDR2_EL0 on systems that support SME and will be read as
zero with writes ignored otherwise.
Expand Down Expand Up @@ -409,6 +440,9 @@ In A64 state, SME adds the following:
For best system performance it is strongly encouraged for software to enable
ZA only when it is actively being used.

* A new ZT0 register is introduced when SME2 is present. This is a 512 bit
register which is accessible when PSTATE.ZA is set, as ZA itself is.

* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and
SMSTOP instructions or by access to the SVCR system register:

Expand Down
6 changes: 6 additions & 0 deletions arch/arm64/include/asm/cpufeature.h
Original file line number Diff line number Diff line change
Expand Up @@ -769,6 +769,12 @@ static __always_inline bool system_supports_sme(void)
cpus_have_const_cap(ARM64_SME);
}

static __always_inline bool system_supports_sme2(void)
{
return IS_ENABLED(CONFIG_ARM64_SME) &&
cpus_have_const_cap(ARM64_SME2);
}

static __always_inline bool system_supports_fa64(void)
{
return IS_ENABLED(CONFIG_ARM64_SME) &&
Expand Down
1 change: 1 addition & 0 deletions arch/arm64/include/asm/esr.h
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,7 @@
#define ESR_ELx_SME_ISS_ILL 1
#define ESR_ELx_SME_ISS_SM_DISABLED 2
#define ESR_ELx_SME_ISS_ZA_DISABLED 3
#define ESR_ELx_SME_ISS_ZT_DISABLED 4

#ifndef __ASSEMBLY__
#include <asm/types.h>
Expand Down
30 changes: 22 additions & 8 deletions arch/arm64/include/asm/fpsimd.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ extern void fpsimd_kvm_prepare(void);
struct cpu_fp_state {
struct user_fpsimd_state *st;
void *sve_state;
void *za_state;
void *sme_state;
u64 *svcr;
unsigned int sve_vl;
unsigned int sme_vl;
Expand Down Expand Up @@ -105,19 +105,27 @@ static inline void *sve_pffr(struct thread_struct *thread)
return (char *)thread->sve_state + sve_ffr_offset(vl);
}

static inline void *thread_zt_state(struct thread_struct *thread)
{
/* The ZT register state is stored immediately after the ZA state */
unsigned int sme_vq = sve_vq_from_vl(thread_get_sme_vl(thread));
return thread->sme_state + ZA_SIG_REGS_SIZE(sme_vq);
}

extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr);
extern void sve_load_state(void const *state, u32 const *pfpsr,
int restore_ffr);
extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
extern unsigned int sve_get_vl(void);
extern void sve_set_vq(unsigned long vq_minus_1);
extern void sme_set_vq(unsigned long vq_minus_1);
extern void za_save_state(void *state);
extern void za_load_state(void const *state);
extern void sme_save_state(void *state, int zt);
extern void sme_load_state(void const *state, int zt);

struct arm64_cpu_capabilities;
extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
extern void sme_kernel_enable(const struct arm64_cpu_capabilities *__unused);
extern void sme2_kernel_enable(const struct arm64_cpu_capabilities *__unused);
extern void fa64_kernel_enable(const struct arm64_cpu_capabilities *__unused);

extern u64 read_zcr_features(void);
Expand Down Expand Up @@ -355,14 +363,20 @@ extern int sme_get_current_vl(void);

/*
* Return how many bytes of memory are required to store the full SME
* specific state (currently just ZA) for task, given task's currently
* configured vector length.
* specific state for task, given task's currently configured vector
* length.
*/
static inline size_t za_state_size(struct task_struct const *task)
static inline size_t sme_state_size(struct task_struct const *task)
{
unsigned int vl = task_get_sme_vl(task);
size_t size;

size = ZA_SIG_REGS_SIZE(sve_vq_from_vl(vl));

if (system_supports_sme2())
size += ZT_SIG_REG_SIZE;

return ZA_SIG_REGS_SIZE(sve_vq_from_vl(vl));
return size;
}

#else
Expand All @@ -382,7 +396,7 @@ static inline int sme_max_virtualisable_vl(void) { return 0; }
static inline int sme_set_current_vl(unsigned long arg) { return -EINVAL; }
static inline int sme_get_current_vl(void) { return -EINVAL; }

static inline size_t za_state_size(struct task_struct const *task)
static inline size_t sme_state_size(struct task_struct const *task)
{
return 0;
}
Expand Down
22 changes: 22 additions & 0 deletions arch/arm64/include/asm/fpsimdmacros.h
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,28 @@
| ((\offset) & 7)
.endm

/*
* LDR (ZT0)
*
* LDR ZT0, nx
*/
.macro _ldr_zt nx
_check_general_reg \nx
.inst 0xe11f8000 \
| (\nx << 5)
.endm

/*
* STR (ZT0)
*
* STR ZT0, nx
*/
.macro _str_zt nx
_check_general_reg \nx
.inst 0xe13f8000 \
| (\nx << 5)
.endm

/*
* Zero the entire ZA array
* ZERO ZA
Expand Down
6 changes: 6 additions & 0 deletions arch/arm64/include/asm/hwcap.h
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,12 @@
#define KERNEL_HWCAP_CSSC __khwcap2_feature(CSSC)
#define KERNEL_HWCAP_RPRFM __khwcap2_feature(RPRFM)
#define KERNEL_HWCAP_SVE2P1 __khwcap2_feature(SVE2P1)
#define KERNEL_HWCAP_SME2 __khwcap2_feature(SME2)
#define KERNEL_HWCAP_SME2P1 __khwcap2_feature(SME2P1)
#define KERNEL_HWCAP_SME_I16I32 __khwcap2_feature(SME_I16I32)
#define KERNEL_HWCAP_SME_BI32I32 __khwcap2_feature(SME_BI32I32)
#define KERNEL_HWCAP_SME_B16B16 __khwcap2_feature(SME_B16B16)
#define KERNEL_HWCAP_SME_F16F16 __khwcap2_feature(SME_F16F16)

/*
* This yields a mask that user programs can use to figure out what
Expand Down
2 changes: 1 addition & 1 deletion arch/arm64/include/asm/processor.h
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ struct thread_struct {
enum fp_type fp_type; /* registers FPSIMD or SVE? */
unsigned int fpsimd_cpu;
void *sve_state; /* SVE registers, if any */
void *za_state; /* ZA register, if any */
void *sme_state; /* ZA and ZT state, if any */
unsigned int vl[ARM64_VEC_MAX]; /* vector length */
unsigned int vl_onexec[ARM64_VEC_MAX]; /* vl after next exec */
unsigned long fault_address; /* fault info */
Expand Down
6 changes: 6 additions & 0 deletions arch/arm64/include/uapi/asm/hwcap.h
Original file line number Diff line number Diff line change
Expand Up @@ -96,5 +96,11 @@
#define HWCAP2_CSSC (1UL << 34)
#define HWCAP2_RPRFM (1UL << 35)
#define HWCAP2_SVE2P1 (1UL << 36)
#define HWCAP2_SME2 (1UL << 37)
#define HWCAP2_SME2P1 (1UL << 38)
#define HWCAP2_SME_I16I32 (1UL << 39)
#define HWCAP2_SME_BI32I32 (1UL << 40)
#define HWCAP2_SME_B16B16 (1UL << 41)
#define HWCAP2_SME_F16F16 (1UL << 42)

#endif /* _UAPI__ASM_HWCAP_H */
19 changes: 19 additions & 0 deletions arch/arm64/include/uapi/asm/sigcontext.h
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,14 @@ struct za_context {
__u16 __reserved[3];
};

#define ZT_MAGIC 0x5a544e01

struct zt_context {
struct _aarch64_ctx head;
__u16 nregs;
__u16 __reserved[3];
};

#endif /* !__ASSEMBLY__ */

#include <asm/sve_context.h>
Expand Down Expand Up @@ -312,4 +320,15 @@ struct za_context {
#define ZA_SIG_CONTEXT_SIZE(vq) \
(ZA_SIG_REGS_OFFSET + ZA_SIG_REGS_SIZE(vq))

#define ZT_SIG_REG_SIZE 512

#define ZT_SIG_REG_BYTES (ZT_SIG_REG_SIZE / 8)

#define ZT_SIG_REGS_OFFSET sizeof(struct zt_context)

#define ZT_SIG_REGS_SIZE(n) (ZT_SIG_REG_BYTES * n)

#define ZT_SIG_CONTEXT_SIZE(n) \
(sizeof(struct zt_context) + ZT_SIG_REGS_SIZE(n))

#endif /* _UAPI__ASM_SIGCONTEXT_H */
Loading

0 comments on commit ea776e4

Please sign in to comment.