dart-sdk/runtime/vm/instructions_arm.h
Ryan Macnak 04ba20aa98 [vm] Support RISC-V.
Implements a backend targeting RV32GC and RV64GC, based on Linux standardizing around GC. The assembler is written to make it easy to disable usage of C, but because the sizes of some instruction sequences are compile-time constants, an additional build configuration would need to be defined to make use of it.

The assembler and disassembler cover every RV32/64GC instruction. The simulator covers all instructions except accessing CSRs and the floating point state accessible through such, include accrued exceptions and dynamic rounding mode.

Quirks:
  - RISC-V is a compare-and-branch architecture, but some existing "architecture-independent" parts of the Dart compiler assume a condition code architecture. To avoid rewriting these parts, we use a peephole in the assembler to map to compare-and-branch. See Assembler::BranchIf. Luckily nothing depended on taking multiple branches on the same condition code set.
  - There are no hardware overflow checks, so we must use Hacker's Delight style software checks. Often these are very cheap: if the sign of one operand is known, a single branch is needed.
  - The ranges of RISC-V branches and jumps are such that we use 3 levels of generation for forward branches, instead of the 2 levels of near and far branches used on ARM[64]. Nearly all code is handled by the first two levels with 20-bits of range, with enormous regex matchers triggering the third level that uses aupic+jalr to get 32-bits of range.
  - For PC-relative calls in AOT, we always generate auipc+jalr pairs with 32-bits of range, so we never generate trampolines.
  - Only a subset of registers are available in some compressed instructions, so we assign the most popular uses to these registers. In particular, THR, TMP[2], CODE and PP. This has the effect of assigning CODE and PP to volatile registers in the C calling convention, whereas they are assigned preserved registers on the other architectures. As on ARM64, PP is untagged; this is so short indices can be accessed with a compressed instruction.
  - There are no push or pop instructions, so combining pushes and pops is preferred so we can update SP once.
  - The C calling convention has a strongly aligned stack, but unlike on ARM64 we don't need to use an alternate stack pointer. The author ensured language was added to the RISC-V psABI making the OS responsible for realigning the stack pointer for signal handlers, allowing Dart to leave the stack pointer misaligned from the C calling convention's point of view until a foreign call.
  - We don't bother with the link register tracking done on ARM[64]. Instead we make use of an alternate link register to avoid inline spilling in the write barrier.

Unimplemented:
 - non-trivial FFI cases
 - Compressed pointers - No intention to implement.
 - Unboxed SIMD - We might make use of the V extension registers when the V extension is ratified.
 - BigInt intrinsics

TEST=existing tests for IL level, new tests for assembler/disassembler/simulator
Bug: https://github.com/dart-lang/sdk/issues/38587
Bug: https://github.com/dart-lang/sdk/issues/48164
Change-Id: I991d1df4be5bf55efec5371b767b332d37dfa3e0
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/217289
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Daco Harkes <dacoharkes@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
2022-01-20 00:57:57 +00:00

279 lines
8.1 KiB
C++

// Copyright (c) 2013, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
// Classes that describe assembly patterns as used by inline caches.
#ifndef RUNTIME_VM_INSTRUCTIONS_ARM_H_
#define RUNTIME_VM_INSTRUCTIONS_ARM_H_
#ifndef RUNTIME_VM_INSTRUCTIONS_H_
#error Do not include instructions_arm.h directly; use instructions.h instead.
#endif
#include "vm/allocation.h"
#include "vm/constants.h"
#include "vm/native_function.h"
#include "vm/tagged_pointer.h"
#if !defined(DART_PRECOMPILED_RUNTIME)
#include "vm/compiler/assembler/assembler.h"
#endif // !defined(DART_PRECOMPILED_RUNTIME)
namespace dart {
class ICData;
class Code;
class Object;
class ObjectPool;
class UntaggedCode;
class InstructionPattern : public AllStatic {
public:
// Decodes a load sequence ending at 'end' (the last instruction of the
// load sequence is the instruction before the one at end). Returns the
// address of the first instruction in the sequence. Returns the register
// being loaded and the loaded immediate value in the output parameters
// 'reg' and 'value' respectively.
static uword DecodeLoadWordImmediate(uword end,
Register* reg,
intptr_t* value);
// Encodes a load immediate sequence ending at 'end' (the last instruction of
// the load sequence is the instruction before the one at end).
//
// Supports only a subset of [DecodeLoadWordImmediate], namely:
// movw r, #lower16
// movt r, #upper16
static void EncodeLoadWordImmediate(uword end, Register reg, intptr_t value);
// Decodes a load sequence ending at 'end' (the last instruction of the
// load sequence is the instruction before the one at end). Returns the
// address of the first instruction in the sequence. Returns the register
// being loaded and the index in the pool being read from in the output
// parameters 'reg' and 'index' respectively.
// IMPORANT: When generating code loading values from pool on ARM use
// LoadWordFromPool macro instruction instead of emitting direct load.
// The macro instruction takes care of pool offsets that can't be
// encoded as immediates.
static uword DecodeLoadWordFromPool(uword end,
Register* reg,
intptr_t* index);
};
class CallPattern : public ValueObject {
public:
CallPattern(uword pc, const Code& code);
CodePtr TargetCode() const;
void SetTargetCode(const Code& code) const;
private:
const ObjectPool& object_pool_;
intptr_t target_code_pool_index_;
DISALLOW_COPY_AND_ASSIGN(CallPattern);
};
class ICCallPattern : public ValueObject {
public:
ICCallPattern(uword pc, const Code& code);
ObjectPtr Data() const;
void SetData(const Object& data) const;
CodePtr TargetCode() const;
void SetTargetCode(const Code& code) const;
private:
const ObjectPool& object_pool_;
intptr_t target_pool_index_;
intptr_t data_pool_index_;
DISALLOW_COPY_AND_ASSIGN(ICCallPattern);
};
class NativeCallPattern : public ValueObject {
public:
NativeCallPattern(uword pc, const Code& code);
CodePtr target() const;
void set_target(const Code& target) const;
NativeFunction native_function() const;
void set_native_function(NativeFunction target) const;
private:
const ObjectPool& object_pool_;
uword end_;
intptr_t native_function_pool_index_;
intptr_t target_code_pool_index_;
DISALLOW_COPY_AND_ASSIGN(NativeCallPattern);
};
// Instance call that can switch between a direct monomorphic call, an IC call,
// and a megamorphic call.
// load guarded cid load ICData load MegamorphicCache
// load monomorphic target <-> load ICLookup stub -> load MMLookup stub
// call target.entry call stub.entry call stub.entry
class SwitchableCallPatternBase : public ValueObject {
public:
explicit SwitchableCallPatternBase(const ObjectPool& object_pool);
ObjectPtr data() const;
void SetData(const Object& data) const;
protected:
const ObjectPool& object_pool_;
intptr_t data_pool_index_;
intptr_t target_pool_index_;
private:
DISALLOW_COPY_AND_ASSIGN(SwitchableCallPatternBase);
};
// See [SwitchableCallBase] for a switchable calls in general.
//
// The target slot is always a [Code] object: Either the code of the
// monomorphic function or a stub code.
class SwitchableCallPattern : public SwitchableCallPatternBase {
public:
SwitchableCallPattern(uword pc, const Code& code);
uword target_entry() const;
void SetTarget(const Code& target) const;
private:
DISALLOW_COPY_AND_ASSIGN(SwitchableCallPattern);
};
// See [SwitchableCallBase] for a switchable calls in general.
//
// The target slot is always a direct entrypoint address: Either the entry point
// of the monomorphic function or a stub entry point.
class BareSwitchableCallPattern : public SwitchableCallPatternBase {
public:
explicit BareSwitchableCallPattern(uword pc);
uword target_entry() const;
void SetTarget(const Code& target) const;
private:
DISALLOW_COPY_AND_ASSIGN(BareSwitchableCallPattern);
};
class ReturnPattern : public ValueObject {
public:
explicit ReturnPattern(uword pc);
// bx_lr = 1.
static const int kLengthInBytes = 1 * Instr::kInstrSize;
int pattern_length_in_bytes() const { return kLengthInBytes; }
bool IsValid() const;
private:
const uword pc_;
};
class PcRelativeCallPatternBase : public ValueObject {
public:
// 24 bit signed integer which will get multiplied by 4.
static constexpr intptr_t kLowerCallingRange =
-(1 << 25) + Instr::kPCReadOffset;
static constexpr intptr_t kUpperCallingRange =
(1 << 25) - Instr::kInstrSize + Instr::kPCReadOffset;
explicit PcRelativeCallPatternBase(uword pc) : pc_(pc) {}
static const int kLengthInBytes = 1 * Instr::kInstrSize;
int32_t distance() {
#if !defined(DART_PRECOMPILED_RUNTIME)
return compiler::Assembler::DecodeBranchOffset(
*reinterpret_cast<int32_t*>(pc_));
#else
UNREACHABLE();
return 0;
#endif
}
void set_distance(int32_t distance) {
#if !defined(DART_PRECOMPILED_RUNTIME)
int32_t* word = reinterpret_cast<int32_t*>(pc_);
*word = compiler::Assembler::EncodeBranchOffset(distance, *word);
#else
UNREACHABLE();
#endif
}
protected:
uword pc_;
};
class PcRelativeCallPattern : public PcRelativeCallPatternBase {
public:
explicit PcRelativeCallPattern(uword pc) : PcRelativeCallPatternBase(pc) {}
bool IsValid() const;
};
class PcRelativeTailCallPattern : public PcRelativeCallPatternBase {
public:
explicit PcRelativeTailCallPattern(uword pc)
: PcRelativeCallPatternBase(pc) {}
bool IsValid() const;
};
// Instruction pattern for a tail call to a signed 32-bit PC-relative offset
//
// The AOT compiler can emit PC-relative calls. If the destination of such a
// call is not in range for the "bl.<cond> <offset>" instruction, the AOT
// compiler will emit a trampoline which is in range. That trampoline will
// then tail-call to the final destination (also via PC-relative offset, but it
// supports a full signed 32-bit offset).
//
// The pattern of the trampoline looks like:
//
// movw TMP, #lower16
// movt TMP, #upper16
// add PC, PC, TMP lsl #0
//
class PcRelativeTrampolineJumpPattern : public ValueObject {
public:
explicit PcRelativeTrampolineJumpPattern(uword pattern_start)
: pattern_start_(pattern_start) {
USE(pattern_start_);
}
static const int kLengthInBytes = 3 * Instr::kInstrSize;
void Initialize();
int32_t distance();
void set_distance(int32_t distance);
bool IsValid() const;
private:
// This offset must be applied to account for the fact that
// a) the actual "branch" is only in the 3rd instruction
// b) when reading the PC it reports current instruction + 8
static const intptr_t kDistanceOffset = -4 * Instr::kInstrSize;
// add PC, PC, TMP lsl #0
static const uint32_t kAddPcEncoding =
(ADD << kOpcodeShift) | (AL << kConditionShift) | (PC << kRnShift) |
(PC << kRdShift) | (TMP << kRmShift);
uword pattern_start_;
};
} // namespace dart
#endif // RUNTIME_VM_INSTRUCTIONS_ARM_H_