-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
wufei
committed
Apr 12, 2024
1 parent
fcb1f0d
commit 220291b
Showing
4 changed files
with
131 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,131 @@ | ||
--- | ||
layout: post | ||
title: RISC-V Vector on Valgrind小结 | ||
author: "Fei Wu" | ||
header-mask: 0.4 | ||
tags: | ||
- Qemu | ||
- RISC-V | ||
--- | ||
|
||
* content | ||
{:toc} | ||
|
||
# 现状 | ||
|
||
截至目前(2024/04/12),rvv on valgrind的状态 | ||
* 支持nulgrind和memcheck两种tool | ||
* 支持除了floating-piont和fixed-point之外的所有rvv指令 | ||
* 可以用来跑autovectorized coremark等应用 | ||
* 部分rvv memcheck的逻辑有改进的地方 | ||
* 如有需要,可以支持完整rvv指令,即使有些地方是不完美的 | ||
|
||
最新代码库在 | ||
* repo - https://github.com/intel/valgrind-rvv | ||
* branch - poc-rvv-remove-vl-from-ir | ||
|
||
# Valgrind背景知识 | ||
|
||
## 实现逻辑 | ||
|
||
* valgrind有一套中间表达IR | ||
* guest code和instrumentation code比如memcheck都会先用IR表达,然后IR最后会翻译成host指令。 | ||
|
||
![valgrind flow](/img/rvv-valgrind/valgrind-flow.png) | ||
|
||
总共会经历这么几个步骤,下图来源于[1] | ||
![vex](/img/rvv-valgrind/vex.png) | ||
|
||
## 代码逻辑 | ||
|
||
``` | ||
LibvEX_Translate | ||
irsb = LibvEx_FrontEnd(vta, &res, &pxControl); | ||
disInstrFn = RISCV64FN(disInstr_RISCV64); | ||
irsb = bb_to_IR(vta->guest_extents, disInstrFn, ...); | ||
switch (INSN(1, 0)) | ||
case 0b11: | ||
dres->len = inst_size = 4; | ||
ok = dis_RISCV64_standard(dres, irsb, insn, ...); | ||
irsb = do_iropt_BB(irsb, specHelper, preciseMemExnsFn, *pxControl, ...); | ||
irsb = vta->instrument1(vta->callback_opaque, irsb); | ||
irsb = vta->instrument2(vta->callback_opaque, irsb); | ||
irsb = cprop_BB(irsb); | ||
Libvex_BackEnd(vta, &res, irsb, pxControl); | ||
switch (vta->arch_host) { | ||
case VexArchRISCV64: | ||
iselSB = RISCV64FN(iselSB_RISCV64); | ||
emit = CAST_TO_TYPEOF(emit)RISCV64FN(emit_RISCV64Instr); | ||
iselSB | ||
for (i = 0; i < bb->stmts_used; i++) | ||
iselstmt(env, bb->stmts[i]); | ||
switch (stmt->tag) | ||
case Ist_store: | ||
if (tyd == Ity_I64) addInstr(env, RISCV64Instr_Store(RISCV64op_SD, src, addr, 0)); | ||
RISCV64Instr* i = LibVEX_Alloc_inline(sizeof(RISCV64Instr)); | ||
i->tag = RISCV64in_Store; | ||
i->RISCV64in.Store.op = op; | ||
for (i = 0; i < rcode->arr_used; i++) | ||
emit | ||
switch (i->tag) { | ||
case RISCV64in_MV: | ||
Int dst = iregEnc(i->RISCV64in.MV.dst) | ||
UInt src = iregEnc(i->RISCV64in.MV.src) | ||
p = emit_CR(p, 0b10, src, dst, 0b1000); | ||
Ushort the_insn = 0; | ||
the_insn |= opcode << 0; | ||
the_insn |= rs2 << 2; | ||
the_insn |= rd << 7; | ||
the_insn |= funct4 << 12; | ||
return emit16(p, the_insn); | ||
``` | ||
|
||
## Memcheck逻辑 | ||
|
||
* Valid-value (V) bits | ||
|
||
> In short, each bit in the system has (conceptually) an associated V bit, | ||
which follows it around everywhere, even inside the CPU. Yes, all the CPU's | ||
registers (integer, floating point, vector and condition registers) have | ||
their own V bit vectors. | ||
|
||
* Valid-address (A) bits | ||
|
||
> all bytes in memory, but not in the CPU, have an associated valid-address (A) | ||
bit. This indicates whether or not the program can legitimately read or write | ||
that location. | ||
|
||
![instrument](/img/rvv-valgrind/vex.png) | ||
|
||
# RVV支持 | ||
|
||
## 增加普通指令 | ||
|
||
在valgrind里面增加指令一般有如下方法, 很明显前面的更好。 | ||
|
||
* existing lops (IR) | ||
* creating a new lop | ||
* a clean helper | ||
* a dirty helper | ||
|
||
## 增加RVV指令 | ||
|
||
首先已有的ir是支持不了rvv的,所以退而求其次,只能选择new ir。不到非不得已不会选择helper实现,helper会导致instrumentation不好做。 | ||
|
||
### 实现难点 | ||
|
||
* rvv是第一个在valgrind支持的variable length的ISA,没有参考实现 | ||
* valgrind默认ir都是固定大小,但是对于rvv却不是,这些新isa的加入打破了原来valgrind的一些假设 | ||
* 因为rvv引入了大量新的ir,这些ir的memcheck逻辑都需要重写,本身memcheck针对scalar的ir逻辑就比较复杂,对于部分vector指令就更加复杂 | ||
* rvv有lmul等概念,从而寄存器(组)的大小是可变的,导致后端的寄存器分配变得复杂 | ||
* rvv指令很多,实现工作量大,目前在后端通过一种机制来尽量复用qemu的代码,同时也解决上面寄存器分配的问题,虽然理论性能会有所下降 | ||
* 社区想使用统一的ir给rvv以及arm sve共享,又增加了复杂度,是不是必须这样做我持保留态度。valgrind mailing list有相关讨论,也是目前block的主要原因 | ||
* vector load/store的实现,如果拆成scalar一个个是操作,vlen太长的话会导致生成的ir过长从而破外valgrind原来的假设都需要处理,如果不拆的话怎么保证memcheck的逻辑 | ||
* 还有一些实现的细节也需要慢慢改进,比如struct VexGuestRISCV64State的大小约束了vlen的长度,虽然不是大问题,但都需要一个个解决 | ||
|
||
# 引用 | ||
|
||
* [Valgrind on RISC-V](https://archive.fosdem.org/2022/schedule/event/valgrind_riscv/attachments/slides/4869/export/events/attachments/valgrind_riscv/slides/4869/valgrind_riscv.pdf) | ||
* [Memcheck Manual](https://valgrind.org/docs/manual/mc-manual.html) | ||
* [Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation](https://valgrind.org/docs/valgrind2007.pdf) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.