+1585
-36
lines changedFilter options
+1585
-36
lines changed Original file line number Diff line number Diff line change
@@ -54,3 +54,5 @@ autoconf/autom4te.cache
54
54
.vs
55
55
# clangd index
56
56
.clangd
57
+
# patch reject
58
+
*.rej
Original file line number Diff line number Diff line change
@@ -1,18 +1,12 @@
1
-
The LLVM Compiler Infrastructure
2
-
================================
3
-
4
-
This directory and its subdirectories contain source code for LLVM,
5
-
a toolkit for the construction of highly optimized compilers,
6
-
optimizers, and runtime environments.
7
-
8
-
LLVM is open source software. You may freely distribute it under the terms of
9
-
the license agreement found in LICENSE.txt.
10
-
11
-
Please see the documentation provided in docs/ for further
12
-
assistance with LLVM, and in particular docs/GettingStarted.rst for getting
13
-
started with LLVM and docs/README.txt for an overview of LLVM's
14
-
documentation setup.
15
-
16
-
If you are writing a package for LLVM, see docs/Packaging.rst for our
17
-
suggestions.
18
-
1
+
This repository contains a fork of LLVM (see https://llvm.org) with patches
2
+
towards supporting the RISC-V vector extension (see
3
+
https://github.com/riscv/riscv-v-spec/). This is very much a work in progress.
4
+
5
+
See `docs/RISCVVectorCodegen.md` for some documentation.
6
+
7
+
These patches are regularly rebased to track LLVM trunk, which means commit
8
+
hashes will change. To mitigate the annoyance that causes, development always
9
+
happens on a branch named `rvv-<date>` where `<date>` is the date of the last
10
+
rebase, and on the next rebase this branch is left alone and a new branch is
11
+
created. The downside of this is that you will have to check the list of
12
+
branches to find the latest version.
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
1
+
Code Generation for the RISC-V Vector Extension
2
+
===============================================
3
+
4
+
The vector processing capabilities added by the vector extension are very flexible, but this flexibility introduces some unique challenges in generating correct code for it, let alone optimized code. This document gives a bird's eye view of how vector values and operations are handled throughout the RISCV backend to solve these challenges.
5
+
6
+
## Context
7
+
8
+
The LLVM IR we receive as input explicitly operates on vectors of unknown length (e.g., `<scalable 1 x i32>`). The number of elements in an IR vector can be a compile-time integer multiple of the unknown `vscale` (e.g. `<scalable 4 x i32>`), but only a multiple of 1 is legal in the RISCV backend. The unknown factor of the vector element count (`vscale`) corresponds directly to the RVV *Maximum Vector Length* (MVL).
9
+
10
+
The IR occasionally uses RISCV-specific intrinsics that take and return the *active vector length* as an integer value, as in this fragment:
11
+
12
+
```
13
+
%vl = call i32 @llvm.riscv.setvl(i32 %n)
14
+
%a = call <scalable 1 x i32> @llvm.riscv.vlw(i32* %p, i32 %vl)
15
+
```
16
+
17
+
However, there are also target-independent operations such as `add <scalable 1 x i32>` that have no such concept.
18
+
19
+
## Instruction Selection
20
+
21
+
Instruction selection ignores the vector unit configuration entirely: it operates as if we had a full hard-wired set of regular vector registers with some unknown-but-fixed number of elements. This is done by using pseudo-instructions with suffix `_ImpConf` (*imp*licit *conf*iguration).
22
+
23
+
In general these instructions would omit an important dependency and therefore open us up to miscompilations, but during instruction selection it's fine because we select no instructions that *change* the configuration (that comes later).
24
+
25
+
Another concern is the active vector length, which is an operand to most vector instructions we want to select. For IR instructions such as `add <scalable 1 x i32>` that have no VL operand in IR, we use MVL for that operation (and hope to optimize it later). For RISCV-specific intrinsics that handle the active vector length, we just use the integer passed to that intrinsic.
26
+
27
+
## Vector length register
28
+
29
+
We represent the VL CSR as an ordinary physical register that can hold an XLEN-sized integer (`XLenVT`). There is a register class, `VLR`, that contains only VL. Most vector instructions have `VLR` inputs, `setvl` has a `VLR` output (along with the GPR result).
30
+
31
+
This register class is naturally not the preferred one for selecting integer code, but copies between it and GPRs are possible via CSR reads and writes.
32
+
33
+
> Note: Still need to decide on a solution for VLR virtregs with overlapping live ranges. As-is they cause crashes during register allocation.
34
+
35
+
## Vector unit configuration CSR
36
+
37
+
Ẃe represent the vector unit configuration CSR as an unallocatable, permanently reserved physical register.
38
+
It is implicitly used by most vector instructions, and instructions that change the configuration define this register implicitly.
39
+
There are no corresponding virtual registers, it's a physical register operand from the very start.
40
+
41
+
## Selecting the active vector length
42
+
43
+
> Note: This part is not implemented yet.
44
+
45
+
RISC-V vector instructions are generally controlled by the VL register and only process the lanes up to VL and zero the remaining elements of the destination. This does not match LLVM IR semantics, where many vector operations operate on the full vector and prediction is often applied as a separate operation afterwards.
46
+
47
+
For correctness, the RISCV backend must therefore be prepared to generate code that sets VL to MAXVL, i.e., operates on full vectors. This is particularly relevant for register copies and spill/fill code, but may also sometimes be required for arithmetic operations (though it shouldn't be required for IR produced by a V-aware vectorizer).
48
+
49
+
Nevertheless, we really want to use VL in the natural way. Thus we perform a "demanded elements" analysis which symbolically computes .
50
+
51
+
This is only correct for side-effect-free operations, but operations with side effects need to be "inherently predicated" in the IR already (e.g., consider `@llvm.masked.load` versus a regular vector load).
52
+
53
+
## Deciding the configuration
54
+
55
+
The configuration is decided before register allocation, because register allocation needs complete knowledge of the physical register field.
56
+
Integrating these decisions into register allocation would be even better, but it's not clear how to do that (and it would be likely a huge project in any case).
57
+
By looking at the live intervals, proportions of data type widths among them, etc. we can at least try to heuristically find a good fit.
58
+
59
+
In the future we should try to separate the function body into several regions between which we can safely change the configuration (at minimum this means no vector values live between them, not even in memory).
60
+
61
+
## Scalar operations in vector registers
62
+
63
+
Details TBD, see https://lists.llvm.org/pipermail/llvm-dev/2018-October/126733.html for some thoughts on this
64
+
65
+
## Emitting configuration instructions
66
+
67
+
Currently the vector unit is configured in the prologue and disabled in the epilogue.
68
+
69
+
In the future, especially once we start to support more than one configurations per function, we may insert configuration changes earlier, and also place them closer to the code that actually uses the vector unit.
Original file line number Diff line number Diff line change
@@ -2032,6 +2032,8 @@ class ShuffleVectorInst : public Instruction {
2032
2032
static void getShuffleMask(const Constant *Mask,
2033
2033
SmallVectorImpl<int> &Result);
2034
2034
2035
+
static bool getShuffleMask(Value *Mask, SmallVectorImpl<int> &Result);
2036
+
2035
2037
/// Return the mask for this instruction as a vector of integers. Undefined
2036
2038
/// elements of the mask are returned as -1.
2037
2039
void getShuffleMask(SmallVectorImpl<int> &Result) const {
Original file line number Diff line number Diff line change
@@ -285,6 +285,8 @@ def llvm_v2f64_ty : LLVMType<v2f64>; // 2 x double
285
285
def llvm_v4f64_ty : LLVMType<v4f64>; // 4 x double
286
286
def llvm_v8f64_ty : LLVMType<v8f64>; // 8 x double
287
287
288
+
def llvm_nxv1i32_ty : LLVMType<nxv1i32>; // scalable 1 x i32
289
+
288
290
def llvm_vararg_ty : LLVMType<isVoid>; // this means vararg here
289
291
290
292
//===----------------------------------------------------------------------===//
@@ -1230,6 +1232,8 @@ let IntrProperties = [IntrNoMem, IntrWillReturn] in {
1230
1232
[llvm_anyvector_ty]>;
1231
1233
def int_experimental_vector_reduce_fmin : Intrinsic<[LLVMVectorElementType<0>],
1232
1234
[llvm_anyvector_ty]>;
1235
+
def int_experimental_vector_splatvector : Intrinsic<[LLVMVectorElementType<0>],
1236
+
[llvm_anyvector_ty]>;
1233
1237
}
1234
1238
1235
1239
//===---------- Intrinsics to control hardware supported loops ----------===//
Original file line number Diff line number Diff line change
@@ -65,4 +65,51 @@ def int_riscv_masked_cmpxchg_i64
65
65
llvm_i64_ty, llvm_i64_ty],
66
66
[IntrArgMemOnly, NoCapture<0>, ImmArg<4>]>;
67
67
68
+
//===----------------------------------------------------------------------===//
69
+
// Vector extension
70
+
71
+
def int_riscv_setvl : Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrNoMem]>;
72
+
73
+
def int_riscv_vadd : Intrinsic<[llvm_nxv1i32_ty],
74
+
[llvm_nxv1i32_ty, llvm_nxv1i32_ty, llvm_i32_ty],
75
+
[IntrNoMem]>;
76
+
77
+
def int_riscv_vsub : Intrinsic<[llvm_nxv1i32_ty],
78
+
[llvm_nxv1i32_ty, llvm_nxv1i32_ty, llvm_i32_ty],
79
+
[IntrNoMem]>;
80
+
81
+
82
+
def int_riscv_vmul : Intrinsic<[llvm_nxv1i32_ty],
83
+
[llvm_nxv1i32_ty, llvm_nxv1i32_ty, llvm_i32_ty],
84
+
[IntrNoMem]>;
85
+
86
+
87
+
def int_riscv_vand : Intrinsic<[llvm_nxv1i32_ty],
88
+
[llvm_nxv1i32_ty, llvm_nxv1i32_ty, llvm_i32_ty],
89
+
[IntrNoMem]>;
90
+
91
+
92
+
def int_riscv_vor : Intrinsic<[llvm_nxv1i32_ty],
93
+
[llvm_nxv1i32_ty, llvm_nxv1i32_ty, llvm_i32_ty],
94
+
[IntrNoMem]>;
95
+
96
+
def int_riscv_vxor : Intrinsic<[llvm_nxv1i32_ty],
97
+
[llvm_nxv1i32_ty, llvm_nxv1i32_ty, llvm_i32_ty],
98
+
[IntrNoMem]>;
99
+
100
+
def int_riscv_vlw : Intrinsic<[llvm_nxv1i32_ty],
101
+
[llvm_ptr32_ty, llvm_i32_ty],
102
+
[IntrReadMem]>;
103
+
def int_riscv_vsw : Intrinsic<[],
104
+
[llvm_ptr32_ty, llvm_nxv1i32_ty, llvm_i32_ty],
105
+
[IntrWriteMem]>;
106
+
107
+
def int_riscv_vmpopcnt : Intrinsic<[llvm_i32_ty],
108
+
[llvm_nxv1i32_ty, llvm_i32_ty],
109
+
[IntrNoMem]>;
110
+
111
+
def int_riscv_vmfirst : Intrinsic<[llvm_i32_ty],
112
+
[llvm_nxv1i32_ty, llvm_i32_ty],
113
+
[IntrNoMem]>;
114
+
68
115
} // TargetPrefix = "riscv"
Original file line number Diff line number Diff line change
@@ -1545,6 +1545,17 @@ SDValue SelectionDAGBuilder::getValueImpl(const Value *V) {
1545
1545
Op = DAG.getConstantFP(0, getCurSDLoc(), EltVT);
1546
1546
else
1547
1547
Op = DAG.getConstant(0, getCurSDLoc(), EltVT);
1548
+
1549
+
if (VT.isScalableVector()) {
1550
+
auto INum = DAG.getConstant(Intrinsic::experimental_vector_splatvector,
1551
+
getCurSDLoc(), MVT::i32);
1552
+
1553
+
auto Splat = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, getCurSDLoc(), VT,
1554
+
INum, Op);
1555
+
1556
+
return Splat;
1557
+
}
1558
+
1548
1559
Ops.assign(NumElements, Op);
1549
1560
}
1550
1561
@@ -3538,17 +3549,51 @@ void SelectionDAGBuilder::visitExtractElement(const User &I) {
3538
3549
void SelectionDAGBuilder::visitShuffleVector(const User &I) {
3539
3550
SDValue Src1 = getValue(I.getOperand(0));
3540
3551
SDValue Src2 = getValue(I.getOperand(1));
3552
+
Value *MaskV = I.getOperand(2);
3541
3553
SDLoc DL = getCurSDLoc();
3542
3554
3543
-
SmallVector<int, 8> Mask;
3544
-
ShuffleVectorInst::getShuffleMask(cast<Constant>(I.getOperand(2)), Mask);
3545
-
unsigned MaskNumElts = Mask.size();
3546
-
3547
3555
const TargetLowering &TLI = DAG.getTargetLoweringInfo();
3548
3556
EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
3557
+
bool IsScalable = VT.isScalableVector();
3549
3558
EVT SrcVT = Src1.getValueType();
3550
3559
unsigned SrcNumElts = SrcVT.getVectorNumElements();
3551
3560
3561
+
SmallVector<int, 8> Mask;
3562
+
if (!ShuffleVectorInst::getShuffleMask(MaskV, Mask)) {
3563
+
SDValue Mask = getValue(I.getOperand(2));
3564
+
unsigned NumElts = VT.getVectorNumElements();
3565
+
// We don't currently support variable shuffles on fixed-length vectors
3566
+
assert(IsScalable && "Non-constant shuffle mask on fixed-length vector");
3567
+
3568
+
// We haven't introduced a vector_shuffle_var intrinsic to support shuffles
3569
+
// where we need to extract or merge vectors.
3570
+
if (NumElts != SrcNumElts)
3571
+
llvm_unreachable("Haven't implemented VECTOR_SHUFFLE_VAR intrinsic yet");
3572
+
3573
+
// Currently only handling splats of a single value for scalable vectors
3574
+
if (auto *CMask = dyn_cast<Constant>(MaskV))
3575
+
if (CMask->isNullValue()) {
3576
+
// Splat of first element.
3577
+
auto FirstElt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL,
3578
+
SrcVT.getScalarType(), Src1,
3579
+
DAG.getConstant(0, DL,
3580
+
TLI.getVectorIdxTy(DAG.getDataLayout())));
3581
+
3582
+
auto INum = DAG.getConstant(Intrinsic::experimental_vector_splatvector,
3583
+
DL, MVT::i32);
3584
+
3585
+
auto Splat = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, VT,
3586
+
INum, FirstElt);
3587
+
3588
+
setValue(&I, Splat);
3589
+
return;
3590
+
}
3591
+
llvm_unreachable("Haven't implemented VECTOR_SHUFFLE_VAR intrinsic yet");
3592
+
return;
3593
+
}
3594
+
3595
+
unsigned MaskNumElts = Mask.size();
3596
+
3552
3597
if (SrcNumElts == MaskNumElts) {
3553
3598
setValue(&I, DAG.getVectorShuffle(VT, DL, Src1, Src2, Mask));
3554
3599
return;
Original file line number Diff line number Diff line change
@@ -797,8 +797,9 @@ Constant *llvm::ConstantFoldExtractElementInstruction(Constant *Val,
797
797
798
798
if (ConstantInt *CIdx = dyn_cast<ConstantInt>(Idx)) {
799
799
// ee({w,x,y,z}, wrong_value) -> undef
800
-
if (CIdx->uge(Val->getType()->getVectorNumElements()))
801
-
return UndefValue::get(Val->getType()->getVectorElementType());
800
+
if (!Val->getType()->getVectorIsScalable())
801
+
if (CIdx->uge(Val->getType()->getVectorNumElements()))
802
+
return UndefValue::get(Val->getType()->getVectorElementType());
802
803
return Val->getAggregateElement(CIdx->getZExtValue());
803
804
}
804
805
return nullptr;
@@ -810,6 +811,10 @@ Constant *llvm::ConstantFoldInsertElementInstruction(Constant *Val,
810
811
if (isa<UndefValue>(Idx))
811
812
return UndefValue::get(Val->getType());
812
813
814
+
// Everything after this point assumes you can iterate across Val.
815
+
if (Val->getType()->getVectorIsScalable())
816
+
return nullptr;
817
+
813
818
ConstantInt *CIdx = dyn_cast<ConstantInt>(Idx);
814
819
if (!CIdx) return nullptr;
815
820
@@ -837,8 +842,9 @@ Constant *llvm::ConstantFoldInsertElementInstruction(Constant *Val,
837
842
Constant *llvm::ConstantFoldShuffleVectorInstruction(Constant *V1,
838
843
Constant *V2,
839
844
Constant *Mask) {
840
-
unsigned MaskNumElts = Mask->getType()->getVectorNumElements();
841
-
Type *EltTy = V1->getType()->getVectorElementType();
845
+
auto *MaskTy = cast<VectorType>(Mask->getType());
846
+
auto MaskNumElts = MaskTy->getElementCount();
847
+
Type *EltTy = V1->getType()->getVectorElementType();
842
848
843
849
// Undefined shuffle mask -> undefined value.
844
850
if (isa<UndefValue>(Mask))
@@ -847,11 +853,23 @@ Constant *llvm::ConstantFoldShuffleVectorInstruction(Constant *V1,
847
853
// Don't break the bitcode reader hack.
848
854
if (isa<ConstantExpr>(Mask)) return nullptr;
849
855
856
+
if (MaskTy->isScalable()) {
857
+
// Is splat?
858
+
if (Mask->isNullValue()) {
859
+
Constant *Zero = Constant::getNullValue(MaskTy->getElementType());
860
+
Constant *SplatVal = ConstantFoldExtractElementInstruction(V1, Zero);
861
+
// Is splat of zero?
862
+
if (SplatVal && SplatVal->isNullValue())
863
+
return Constant::getNullValue(VectorType::get(EltTy, MaskNumElts));
864
+
}
865
+
return nullptr;
866
+
}
867
+
850
868
unsigned SrcNumElts = V1->getType()->getVectorNumElements();
851
869
852
870
// Loop over the shuffle mask, evaluating each element.
853
871
SmallVector<Constant*, 32> Result;
854
-
for (unsigned i = 0; i != MaskNumElts; ++i) {
872
+
for (unsigned i = 0; i != MaskNumElts.Min; ++i) {
855
873
int Elt = ShuffleVectorInst::getMaskValue(Mask, i);
856
874
if (Elt == -1) {
857
875
Result.push_back(UndefValue::get(EltTy));
Original file line number Diff line number Diff line change
@@ -2166,8 +2166,9 @@ Constant *ConstantExpr::getShuffleVector(Constant *V1, Constant *V2,
2166
2166
return FC; // Fold a few common cases.
2167
2167
2168
2168
unsigned NElts = Mask->getType()->getVectorNumElements();
2169
+
bool Scalable = Mask->getType()->getVectorIsScalable();
2169
2170
Type *EltTy = V1->getType()->getVectorElementType();
2170
-
Type *ShufTy = VectorType::get(EltTy, NElts);
2171
+
Type *ShufTy = VectorType::get(EltTy, NElts, Scalable);
2171
2172
2172
2173
if (OnlyIfReducedTy == ShufTy)
2173
2174
return nullptr;
Original file line number Diff line number Diff line change
@@ -149,7 +149,7 @@ class ShuffleVectorConstantExpr : public ConstantExpr {
149
149
ShuffleVectorConstantExpr(Constant *C1, Constant *C2, Constant *C3)
150
150
: ConstantExpr(VectorType::get(
151
151
cast<VectorType>(C1->getType())->getElementType(),
152
-
cast<VectorType>(C3->getType())->getNumElements()),
152
+
cast<VectorType>(C3->getType())->getElementCount()),
153
153
Instruction::ShuffleVector,
154
154
&Op<0>(), 3) {
155
155
Op<0>() = C1;
You can’t perform that action at this time.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4