Commit graph

10 commits

Author SHA1 Message Date
Paul E. Murphy 2b0a157d68 cmd/compile: intrinsify math.MulUintptr on PPC64
This can be done efficiently with few instructions.

This also adds MULHDUCC for further codegen improvement.

Change-Id: I06320ba4383a679341b911a237a360ef07b19168
Reviewed-on: https://go-review.googlesource.com/c/go/+/605975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Archana Ravindar <aravinda@redhat.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-08-26 17:02:43 +00:00
khr@golang.org 9b4268c3df cmd/compile: simplify prove pass
We don't need noLimit checks in a bunch of places.
Also simplify folding of provable constant results.

At this point in the CL stack, compilebench reports no performance
changes. The only thing of note is that binaries got a bit smaller.

name                      old text-bytes    new text-bytes    delta
HelloSize                       960kB ± 0%        952kB ± 0%  -0.83%  (p=0.000 n=10+10)
CmdGoSize                      12.3MB ± 0%       12.1MB ± 0%  -1.53%  (p=0.000 n=10+10)

Change-Id: Id4be75eec0f8c93f2f3b93a8521ce2278ee2ee2c
Reviewed-on: https://go-review.googlesource.com/c/go/+/599197
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-08-07 16:08:20 +00:00
Paul E. Murphy 773039ed5c cmd/compile/internal/ssa: on PPC64, merge (CMPconst [0] (op ...)) more aggressively
Generate the CC version of many opcodes whose result is compared against
signed 0. The approach taken here works even if the opcode result is used in
multiple places too.

Add support for ADD, ADDconst, ANDN, SUB, NEG, CNTLZD, NOR conversions
to their CC opcode variant. These are the most commonly used variants.

Also, do not set clobberFlags of CNTLZD and CNTLZW, they do not clobber
flags.

This results in about 1% smaller text sections in kubernetes binaries,
and no regressions in the crypto benchmarks.

Change-Id: I9e0381944869c3774106bf348dead5ecb96dffda
Reviewed-on: https://go-review.googlesource.com/c/go/+/538636
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-11-13 22:12:32 +00:00
Lynn Boger 80834af206 cmd/compile: avoid ANDCCconst on PPC64 if condition not needed
In the PPC64 ISA, the instruction to do an 'and' operation
using an immediate constant is only available in the form that
also sets CR0 (i.e. clobbers the condition register.) This means
CR0 is being clobbered unnecessarily in many cases. That
affects some decisions made during some compiler passes
that check for it.

In those cases when the constant used by the ANDCC is a right
justified consecutive set of bits, a shift instruction can
be used which has the same effect if CR0 does not need to be
set. The rule to do that has been added to the late rules file
after other rules using ANDCCconst have been processed in the
main rules file.

Some codegen tests had to be updated since ANDCC is no
longer generated for some cases. A new test case was added to
verify the ANDCC is present if the results for both the AND
and CR0 are used.

Change-Id: I304f607c039a458e2d67d25351dd00aea72ba542
Reviewed-on: https://go-review.googlesource.com/c/go/+/531435
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-10-18 15:56:53 +00:00
Archana R a432d89137 cmd/compile: add rules to emit SETBC/R instructions on power10
This CL adds rules that replaces instances of ISEL that produce
a boolean result based on a condition register by SETBC/SETBCR
operations. On Power10 these are convereted to SETBC/SETBCR
instructions that use one register instead of 3 registers
conventionally used by ISEL and hence reduces register pressure.
On loops written specifically to exercise such instances of ISEL
extensively, a performance improvement of 2.5% is seen on Power10.
Also added verification tests to verify correct generation of
SETBC/SETBCR instructions on Power10.

Change-Id: Ib719897f09d893de40324440a43052dca026e8fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/449795
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-06 12:49:53 +00:00
Paul E. Murphy 1540531746 test/codegen: merge identical ppc64 and ppc64le tests
Manually consolidate the remaining ppc64/ppc64le test which
are not so trivial to automatically merge.

The remaining ppc64le tests are limited to cases where load/stores are
merged (this only happens on ppc64le) and the race detector (only
supported on ppc64le).

Change-Id: I1f9c0f3d3ddbb7fbbd8c81fbbd6537394fba63ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/463217
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-01-27 19:03:02 +00:00
Paul E. Murphy 0301c6c351 test/codegen: combine trivial PPC64 tests into ppc64x
Use a small python script to consolidate duplicate
ppc64/ppc64le tests into a single ppc64x codegen test.

This makes small assumption that anytime two tests with
for different arch/variant combos exists, those tests
can be combined into a single ppc64x test.

E.x:

  // ppc64le: foo
  // ppc64le/power9: foo
into
  // ppc64x: foo

or

  // ppc64: foo
  // ppc64le: foo
into
  // ppc64x: foo

import glob
import re
files = glob.glob("codegen/*.go")
for file in files:
    with open(file) as f:
        text = [l for l in f]
    i = 0
    while i < len(text):
        first = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[i])
        if first:
            j = i+1
            while j < len(text):
                second = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[j])
                if not second:
                    break
                if (not first.group(2) or first.group(2) == second.group(2)) and first.group(3) == second.group(3):
                    text[i] = re.sub(" ?ppc64(le|x)?"," ppc64x",text[i])
                    text=text[:j] + (text[j+1:])
                else:
                    j += 1
        i+=1
    with open(file, 'w') as f:
        f.write("".join(text))

Change-Id: Ic6b009b54eacaadc5a23db9c5a3bf7331b595821
Reviewed-on: https://go-review.googlesource.com/c/go/+/463220
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-27 18:24:12 +00:00
Lynn Boger c1bfefe9d1 cmd/compile: fix confusion with ANDCCconst in PPC64 rules
Currently there is a an ANDconst and an ANDCCconst op in PPC64,
which is confusing since they map onto the same instruction.
One of these ops sets the result of the AND operation, and the
other sets the flag (condition register).

This converts ANDCCconst into an op with the 2 expected results:
the integer result of the AND and the flag setting. The ANDconst
op has been removed.

Note that in the PPC64 ISA the only variation of the 'and immediate'
is the one that sets the condition bit, which probably led to the
original (confusing) implementation.

This also adds a few rules to improve the use of ANDCCconst with
ISELB and some testcases to verify those improvements.

Change-Id: I523703fa4da2098eb995dc3ba744d36fa28e41d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/422015
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
2022-08-08 20:15:55 +00:00
Jake Ciolek 732f6fa9d5 cmd/compile: use ANDL for small immediates
We can rewrite ANDQ with an immediate fitting in 32bit with an ANDL, which is shorter to encode.

Looking at Go binary itself, before the change there was:

ANDL: 2337
ANDQ: 4476

After the change:

ANDL: 3790
ANDQ: 3024

So we got rid of 1452 ANDQs

This makes the Linux x86_64 binary 0.03% smaller.

There seems to be an impact on performance.

Intel Cascade Lake benchmarks (with perflock):

name                     old time/op    new time/op    delta
BinaryTree17-8              1.91s ± 1%     1.89s ± 1%  -1.22%  (p=0.000 n=21+18)
Fannkuch11-8                2.34s ± 0%     2.34s ± 0%    ~     (p=0.052 n=20+20)
FmtFprintfEmpty-8          27.7ns ± 1%    27.4ns ± 3%    ~     (p=0.497 n=21+21)
FmtFprintfString-8         53.2ns ± 0%    51.5ns ± 0%  -3.21%  (p=0.000 n=20+19)
FmtFprintfInt-8            57.3ns ± 0%    55.7ns ± 0%  -2.89%  (p=0.000 n=19+19)
FmtFprintfIntInt-8         92.3ns ± 0%    88.4ns ± 1%  -4.23%  (p=0.000 n=20+21)
FmtFprintfPrefixedInt-8     103ns ± 0%     103ns ± 0%  +0.23%  (p=0.000 n=20+21)
FmtFprintfFloat-8           147ns ± 0%     148ns ± 0%  +0.75%  (p=0.000 n=20+21)
FmtManyArgs-8               384ns ± 0%     381ns ± 0%  -0.63%  (p=0.000 n=21+21)
GobDecode-8                3.86ms ± 1%    3.88ms ± 1%  +0.52%  (p=0.000 n=20+21)
GobEncode-8                2.77ms ± 1%    2.77ms ± 0%    ~     (p=0.078 n=21+21)
Gzip-8                      168ms ± 1%     168ms ± 0%  +0.24%  (p=0.000 n=20+20)
Gunzip-8                   25.1ms ± 0%    24.3ms ± 0%  -3.03%  (p=0.000 n=21+21)
HTTPClientServer-8         61.4µs ± 8%    59.1µs ±10%    ~     (p=0.088 n=20+21)
JSONEncode-8               6.86ms ± 0%    6.70ms ± 0%  -2.29%  (p=0.000 n=20+19)
JSONDecode-8               30.8ms ± 1%    30.6ms ± 1%  -0.82%  (p=0.000 n=20+20)
Mandelbrot200-8            3.85ms ± 0%    3.85ms ± 0%    ~     (p=0.191 n=16+17)
GoParse-8                  2.61ms ± 2%    2.60ms ± 1%    ~     (p=0.561 n=21+20)
RegexpMatchEasy0_32-8      48.5ns ± 2%    45.9ns ± 3%  -5.26%  (p=0.000 n=20+21)
RegexpMatchEasy0_1K-8       139ns ± 0%     139ns ± 0%  +0.27%  (p=0.000 n=18+20)
RegexpMatchEasy1_32-8      41.3ns ± 0%    42.1ns ± 4%  +1.95%  (p=0.000 n=17+21)
RegexpMatchEasy1_1K-8       216ns ± 2%     216ns ± 0%  +0.17%  (p=0.020 n=21+19)
RegexpMatchMedium_32-8      790ns ± 7%     803ns ± 8%    ~     (p=0.178 n=21+21)
RegexpMatchMedium_1K-8     23.5µs ± 5%    23.7µs ± 5%    ~     (p=0.421 n=21+21)
RegexpMatchHard_32-8       1.09µs ± 1%    1.09µs ± 1%  -0.53%  (p=0.000 n=19+18)
RegexpMatchHard_1K-8       33.0µs ± 0%    33.0µs ± 0%    ~     (p=0.610 n=21+20)
Revcomp-8                   348ms ± 0%     353ms ± 0%  +1.38%  (p=0.000 n=17+18)
Template-8                 42.0ms ± 1%    41.9ms ± 1%  -0.30%  (p=0.049 n=20+20)
TimeParse-8                 185ns ± 0%     185ns ± 0%    ~     (p=0.387 n=20+18)
TimeFormat-8                237ns ± 1%     241ns ± 1%  +1.57%  (p=0.000 n=21+21)
[Geo mean]                 35.4µs         35.2µs       -0.66%

name                     old speed      new speed      delta
GobDecode-8               199MB/s ± 1%   198MB/s ± 1%  -0.52%  (p=0.000 n=20+21)
GobEncode-8               277MB/s ± 1%   277MB/s ± 0%    ~     (p=0.075 n=21+21)
Gzip-8                    116MB/s ± 1%   115MB/s ± 0%  -0.25%  (p=0.000 n=20+20)
Gunzip-8                  773MB/s ± 0%   797MB/s ± 0%  +3.12%  (p=0.000 n=21+21)
JSONEncode-8              283MB/s ± 0%   290MB/s ± 0%  +2.35%  (p=0.000 n=20+19)
JSONDecode-8             63.0MB/s ± 1%  63.5MB/s ± 1%  +0.82%  (p=0.000 n=20+20)
GoParse-8                22.2MB/s ± 2%  22.3MB/s ± 1%    ~     (p=0.539 n=21+20)
RegexpMatchEasy0_32-8     660MB/s ± 2%   697MB/s ± 3%  +5.57%  (p=0.000 n=20+21)
RegexpMatchEasy0_1K-8    7.36GB/s ± 0%  7.34GB/s ± 0%  -0.26%  (p=0.000 n=18+20)
RegexpMatchEasy1_32-8     775MB/s ± 0%   761MB/s ± 4%  -1.88%  (p=0.000 n=17+21)
RegexpMatchEasy1_1K-8    4.74GB/s ± 2%  4.74GB/s ± 0%  -0.18%  (p=0.020 n=21+19)
RegexpMatchMedium_32-8   40.6MB/s ± 7%  39.9MB/s ± 9%    ~     (p=0.191 n=21+21)
RegexpMatchMedium_1K-8   43.7MB/s ± 5%  43.2MB/s ± 5%    ~     (p=0.435 n=21+21)
RegexpMatchHard_32-8     29.3MB/s ± 1%  29.4MB/s ± 1%  +0.53%  (p=0.000 n=19+18)
RegexpMatchHard_1K-8     31.0MB/s ± 0%  31.0MB/s ± 0%    ~     (p=0.572 n=21+20)
Revcomp-8                 730MB/s ± 0%   720MB/s ± 0%  -1.36%  (p=0.000 n=17+18)
Template-8               46.2MB/s ± 1%  46.3MB/s ± 1%  +0.30%  (p=0.041 n=20+20)
[Geo mean]                204MB/s        205MB/s       +0.30%

Change-Id: Iac75d0ec184a515ce0e65e19559d5fe2e9840514
Reviewed-on: https://go-review.googlesource.com/c/go/+/354970
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-10-12 22:02:39 +00:00
Josh Bleecher Snyder e7c1873691 cmd/compile: optimize x & 1 != 0 to x & 1 on amd64
Triggers a handful of times in std+cmd.

Change-Id: I9bb8ce9a5f8bae2547cb61157cd8f256e1b63e76
Reviewed-on: https://go-review.googlesource.com/c/go/+/229602
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-23 17:52:28 +00:00