[vm] Faster floor/ceil/truncate on JIT/x64

This change enables use of roundsd instruction on x64 if SSE4.1 is
detected (in JIT mode).

Also, XMM register is cleared before roundsd to avoid false dependency
due to partial register access in roundsd.

Microbenchmark results on JIT/x64:
Before: BenchFloor(RunTime): 303.81512987999395 us.
After:  BenchFloor(RunTime): 135.00067499156262 us.

Custom floor implementation in Dart:
BenchFastFloor(RunTime): 147.609889298893 us.

TEST=ci

Issue: https://github.com/dart-lang/sdk/issues/46650
Change-Id: I13502f5d40edf32916edec14cfab027758a22457
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/209764
Reviewed-by: Slava Egorov <vegorov@google.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
This commit is contained in:
Alexander Markov 2021-08-11 16:32:17 +00:00 committed by commit-bot@chromium.org
parent 3befe682e9
commit 15e5ec1762
2 changed files with 8 additions and 1 deletions

View file

@ -5290,6 +5290,11 @@ LocationSummary* DoubleToDoubleInstr::MakeLocationSummary(Zone* zone,
void DoubleToDoubleInstr::EmitNativeCode(FlowGraphCompiler* compiler) {
XmmRegister value = locs()->in(0).fpu_reg();
XmmRegister result = locs()->out(0).fpu_reg();
if (value != result) {
// Clear full register to avoid false dependency due to
// a partial access to XMM register in roundsd instruction.
__ xorps(result, result);
}
switch (recognized_kind()) {
case MethodRecognizer::kDoubleTruncate:
__ roundsd(result, value, compiler::Assembler::kRoundToZero);

View file

@ -61,7 +61,9 @@ class TargetCPUFeatures : public AllStatic {
static bool sse4_1_supported() { return HostCPUFeatures::sse4_1_supported(); }
static bool popcnt_supported() { return HostCPUFeatures::popcnt_supported(); }
static bool abm_supported() { return HostCPUFeatures::abm_supported(); }
static bool double_truncate_round_supported() { return false; }
static bool double_truncate_round_supported() {
return HostCPUFeatures::sse4_1_supported();
}
};
} // namespace dart