add rvv valgrind

atwufei · Apr 12, 2024 · 220291b · 220291b
1 parent fcb1f0d
commit 220291b
Show file tree

Hide file tree

Showing 4 changed files with 131 additions and 0 deletions.
diff --git a/_posts/2024-04-12-rvv-valgrind.md b/_posts/2024-04-12-rvv-valgrind.md
@@ -0,0 +1,131 @@
+---
+layout: post
+title: RISC-V Vector on Valgrind小结
+author: "Fei Wu"
+header-mask: 0.4
+tags:
+  - Qemu
+  - RISC-V
+---
+
+* content
+{:toc}
+
+# 现状
+
+截至目前(2024/04/12)，rvv on valgrind的状态
+* 支持nulgrind和memcheck两种tool
+* 支持除了floating-piont和fixed-point之外的所有rvv指令
+* 可以用来跑autovectorized coremark等应用
+* 部分rvv memcheck的逻辑有改进的地方
+* 如有需要，可以支持完整rvv指令，即使有些地方是不完美的
+
+最新代码库在
+* repo - https://github.com/intel/valgrind-rvv
+* branch - poc-rvv-remove-vl-from-ir
+
+# Valgrind背景知识
+
+## 实现逻辑
+
+* valgrind有一套中间表达IR
+* guest code和instrumentation code比如memcheck都会先用IR表达，然后IR最后会翻译成host指令。
+
+![valgrind flow](/img/rvv-valgrind/valgrind-flow.png)
+
+总共会经历这么几个步骤，下图来源于[1]
+![vex](/img/rvv-valgrind/vex.png)
+
+## 代码逻辑
+
+```
+LibvEX_Translate
+    irsb = LibvEx_FrontEnd(vta, &res, &pxControl);
+    disInstrFn = RISCV64FN(disInstr_RISCV64);
+    irsb = bb_to_IR(vta->guest_extents, disInstrFn, ...);
+        switch (INSN(1, 0))
+            case 0b11:
+                dres->len = inst_size = 4;
+                ok = dis_RISCV64_standard(dres, irsb, insn, ...);
+    irsb = do_iropt_BB(irsb, specHelper, preciseMemExnsFn, *pxControl, ...);
+    irsb = vta->instrument1(vta->callback_opaque, irsb);
+    irsb = vta->instrument2(vta->callback_opaque, irsb);
+    irsb = cprop_BB(irsb);
+    Libvex_BackEnd(vta, &res, irsb, pxControl);
+        switch (vta->arch_host) {
+            case VexArchRISCV64:
+                iselSB = RISCV64FN(iselSB_RISCV64);
+                emit = CAST_TO_TYPEOF(emit)RISCV64FN(emit_RISCV64Instr);
+
+        iselSB
+            for (i = 0; i < bb->stmts_used; i++)
+                iselstmt(env, bb->stmts[i]);
+                switch (stmt->tag)
+                    case Ist_store:
+                        if (tyd == Ity_I64) addInstr(env, RISCV64Instr_Store(RISCV64op_SD, src, addr, 0));
+                        RISCV64Instr* i = LibVEX_Alloc_inline(sizeof(RISCV64Instr));
+                        i->tag = RISCV64in_Store;
+                        i->RISCV64in.Store.op = op;
+        for (i = 0; i < rcode->arr_used; i++)
+            emit
+                switch (i->tag) {
+                    case RISCV64in_MV:
+                        Int dst = iregEnc(i->RISCV64in.MV.dst)
+                        UInt src = iregEnc(i->RISCV64in.MV.src)
+                        p = emit_CR(p, 0b10, src, dst, 0b1000);
+                            Ushort the_insn = 0;
+                            the_insn |= opcode << 0;
+                            the_insn |= rs2 << 2;
+                            the_insn |= rd << 7;
+                            the_insn |= funct4 << 12;
+                            return emit16(p, the_insn);
+```
+
+## Memcheck逻辑
+
+* Valid-value (V) bits
+
+> In short, each bit in the system has (conceptually) an associated V bit,
+which follows it around everywhere, even inside the CPU. Yes, all the CPU's
+registers (integer, floating point, vector and condition registers) have
+their own V bit vectors.
+
+* Valid-address (A) bits
+
+> all bytes in memory, but not in the CPU, have an associated valid-address (A)
+bit. This indicates whether or not the program can legitimately read or write
+that location.
+
+![instrument](/img/rvv-valgrind/vex.png)
+
+# RVV支持
+
+## 增加普通指令
+
+在valgrind里面增加指令一般有如下方法， 很明显前面的更好。
+
+* existing lops (IR)
+* creating a new lop
+* a clean helper
+* a dirty helper
+
+## 增加RVV指令
+
+首先已有的ir是支持不了rvv的，所以退而求其次，只能选择new ir。不到非不得已不会选择helper实现，helper会导致instrumentation不好做。
+
+### 实现难点
+
+* rvv是第一个在valgrind支持的variable length的ISA，没有参考实现
+* valgrind默认ir都是固定大小，但是对于rvv却不是，这些新isa的加入打破了原来valgrind的一些假设
+* 因为rvv引入了大量新的ir，这些ir的memcheck逻辑都需要重写，本身memcheck针对scalar的ir逻辑就比较复杂，对于部分vector指令就更加复杂
+* rvv有lmul等概念，从而寄存器(组)的大小是可变的，导致后端的寄存器分配变得复杂
+* rvv指令很多，实现工作量大，目前在后端通过一种机制来尽量复用qemu的代码，同时也解决上面寄存器分配的问题，虽然理论性能会有所下降
+* 社区想使用统一的ir给rvv以及arm sve共享，又增加了复杂度，是不是必须这样做我持保留态度。valgrind mailing list有相关讨论，也是目前block的主要原因
+* vector load/store的实现，如果拆成scalar一个个是操作，vlen太长的话会导致生成的ir过长从而破外valgrind原来的假设都需要处理，如果不拆的话怎么保证memcheck的逻辑
+* 还有一些实现的细节也需要慢慢改进，比如struct VexGuestRISCV64State的大小约束了vlen的长度，虽然不是大问题，但都需要一个个解决
+
+# 引用
+
+* [Valgrind on RISC-V](https://archive.fosdem.org/2022/schedule/event/valgrind_riscv/attachments/slides/4869/export/events/attachments/valgrind_riscv/slides/4869/valgrind_riscv.pdf)
+* [Memcheck Manual](https://valgrind.org/docs/manual/mc-manual.html)
+* [Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation](https://valgrind.org/docs/valgrind2007.pdf)
diff --git a/img/rvv-valgrind/instrument.png b/img/rvv-valgrind/instrument.png
diff --git a/img/rvv-valgrind/valgrind-flow.png b/img/rvv-valgrind/valgrind-flow.png
diff --git a/img/rvv-valgrind/vex.png b/img/rvv-valgrind/vex.png