go/test/writebarrier.go
Josh Bleecher Snyder 2db4cc38a0 cmd/compile: improve generated code for concrete cases in type switches
Consider

switch x:= x.(type) {
case int:
  // int stmts
case error:
  // error stmts
}

Prior to this change, we lowered this roughly as:

if x, ok := x.(int); ok {
  // int stmts
} else if x, ok := x.(error); ok {
  // error stmts
}

x, ok := x.(error) is implemented with a call to runtime.assertE2I2 or runtime.assertI2I2.

x, ok := x.(int) generates inline code that checks whether x has type int,
and populates x and ok as appropriate. We then immediately branch again on ok.
The shortcircuit pass in the SSA backend is designed to recognize situations
like this, in which we are immediately branching on a bool value
that we just calculated with a branch.

However, the shortcircuit pass has limitations when the intermediate state has phis.
In this case, the phi value is x (the int).
CL 222923 improved the situation, but many cases are still unhandled.
I have further improvements in progress, which is how I found this particular problem,
but they are expensive, and may or may not see the light of day.

In the common case of a lone concrete type in a type switch case,
it is easier and cheaper to simply lower a different way, roughly:

if _, ok := x.(int); ok {
  x := x.(int)
  // int stmts
}

Instead of using a type assertion, though, we extract the value of x
from the interface directly.

This removes the need to track x (the int) across the branch on ok,
which removes the phi, which lets the shortcircuit pass do its job.

Benchmarks for encoding/binary show improvements, as well as some
wild swings on the super fast benchmarks (alignment effects?):

name                      old time/op    new time/op    delta
ReadSlice1000Int32s-8       5.25µs ± 2%    4.87µs ± 3%   -7.11%  (p=0.000 n=44+49)
ReadStruct-8                 451ns ± 2%     417ns ± 2%   -7.39%  (p=0.000 n=45+46)
WriteStruct-8                412ns ± 2%     405ns ± 3%   -1.58%  (p=0.000 n=46+48)
ReadInts-8                   296ns ± 8%     275ns ± 3%   -7.23%  (p=0.000 n=48+50)
WriteInts-8                  324ns ± 1%     318ns ± 2%   -1.67%  (p=0.000 n=44+49)
WriteSlice1000Int32s-8      5.21µs ± 2%    4.92µs ± 1%   -5.67%  (p=0.000 n=46+44)
PutUint16-8                 0.58ns ± 2%    0.59ns ± 2%   +0.63%  (p=0.000 n=49+49)
PutUint32-8                 0.87ns ± 1%    0.58ns ± 1%  -33.10%  (p=0.000 n=46+44)
PutUint64-8                 0.66ns ± 2%    0.87ns ± 2%  +33.07%  (p=0.000 n=47+48)
LittleEndianPutUint16-8     0.86ns ± 2%    0.87ns ± 2%   +0.55%  (p=0.003 n=47+50)
LittleEndianPutUint32-8     0.87ns ± 1%    0.87ns ± 1%     ~     (p=0.547 n=45+47)
LittleEndianPutUint64-8     0.87ns ± 2%    0.87ns ± 1%     ~     (p=0.451 n=46+47)
ReadFloats-8                79.8ns ± 5%    75.9ns ± 2%   -4.83%  (p=0.000 n=50+47)
WriteFloats-8               89.3ns ± 1%    88.9ns ± 1%   -0.48%  (p=0.000 n=46+44)
ReadSlice1000Float32s-8     5.51µs ± 1%    4.87µs ± 2%  -11.74%  (p=0.000 n=47+46)
WriteSlice1000Float32s-8    5.51µs ± 1%    4.93µs ± 1%  -10.60%  (p=0.000 n=48+47)
PutUvarint32-8              25.9ns ± 2%    24.0ns ± 2%   -7.02%  (p=0.000 n=48+50)
PutUvarint64-8              75.1ns ± 1%    61.5ns ± 2%  -18.12%  (p=0.000 n=45+47)
[Geo mean]                  57.3ns         54.3ns        -5.33%

Despite the rarity of type switches, this generates noticeably smaller binaries.

file      before    after     Δ       %
addr2line 4413296   4409200   -4096   -0.093%
api       5982648   5962168   -20480  -0.342%
cgo       4854168   4833688   -20480  -0.422%
compile   19694784  19682560  -12224  -0.062%
cover     5278008   5265720   -12288  -0.233%
doc       4694824   4682536   -12288  -0.262%
fix       3411336   3394952   -16384  -0.480%
link      6721496   6717400   -4096   -0.061%
nm        4371152   4358864   -12288  -0.281%
objdump   4760960   4752768   -8192   -0.172%
pprof     14810820  14790340  -20480  -0.138%
trace     11681076  11668788  -12288  -0.105%
vet       8285464   8244504   -40960  -0.494%
total     115824120 115627576 -196544 -0.170%

Compiler performance is marginally improved (note that go/types has many type switches):

name        old alloc/op      new alloc/op      delta
Template         35.0MB ± 0%       35.0MB ± 0%  +0.09%  (p=0.008 n=5+5)
Unicode          28.5MB ± 0%       28.5MB ± 0%    ~     (p=0.548 n=5+5)
GoTypes           114MB ± 0%        114MB ± 0%  -0.76%  (p=0.008 n=5+5)
Compiler          541MB ± 0%        541MB ± 0%  -0.03%  (p=0.008 n=5+5)
SSA              1.17GB ± 0%       1.17GB ± 0%    ~     (p=0.841 n=5+5)
Flate            21.9MB ± 0%       21.9MB ± 0%    ~     (p=0.421 n=5+5)
GoParser         26.9MB ± 0%       26.9MB ± 0%    ~     (p=0.222 n=5+5)
Reflect          74.6MB ± 0%       74.6MB ± 0%    ~     (p=1.000 n=5+5)
Tar              32.9MB ± 0%       32.8MB ± 0%    ~     (p=0.056 n=5+5)
XML              42.4MB ± 0%       42.1MB ± 0%  -0.77%  (p=0.008 n=5+5)
[Geo mean]       73.2MB            73.1MB       -0.15%

name        old allocs/op     new allocs/op     delta
Template           377k ± 0%         377k ± 0%  +0.06%  (p=0.008 n=5+5)
Unicode            354k ± 0%         354k ± 0%    ~     (p=0.095 n=5+5)
GoTypes           1.31M ± 0%        1.30M ± 0%  -0.73%  (p=0.008 n=5+5)
Compiler          5.44M ± 0%        5.44M ± 0%  -0.04%  (p=0.008 n=5+5)
SSA               11.7M ± 0%        11.7M ± 0%    ~     (p=1.000 n=5+5)
Flate              239k ± 0%         239k ± 0%    ~     (p=1.000 n=5+5)
GoParser           302k ± 0%         302k ± 0%  -0.04%  (p=0.008 n=5+5)
Reflect            977k ± 0%         977k ± 0%    ~     (p=0.690 n=5+5)
Tar                346k ± 0%         346k ± 0%    ~     (p=0.889 n=5+5)
XML                431k ± 0%         430k ± 0%  -0.25%  (p=0.008 n=5+5)
[Geo mean]         806k              806k       -0.10%

For packages with many type switches, this considerably shrinks function text size.
Some examples:

file                                                           before   after    Δ       %
encoding/binary.s                                              30726    29504    -1222   -3.977%
go/printer.s                                                   77597    76005    -1592   -2.052%
cmd/vendor/golang.org/x/tools/go/ast/astutil.s                 65704    63318    -2386   -3.631%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s 8047     7714     -333    -4.138%

Text size regressions are rare.

Change-Id: Ic10982bbb04876250eaa5bfee97990141ae5fc28
Reviewed-on: https://go-review.googlesource.com/c/go/+/228106
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-14 17:34:31 +00:00

292 lines
5.6 KiB
Go

// errorcheck -0 -l -d=wb
// Copyright 2015 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Test where write barriers are and are not emitted.
package p
import "unsafe"
func f(x **byte, y *byte) {
*x = y // no barrier (dead store)
z := y // no barrier
*x = z // ERROR "write barrier"
}
func f1(x *[]byte, y []byte) {
*x = y // no barrier (dead store)
z := y // no barrier
*x = z // ERROR "write barrier"
}
func f1a(x *[]byte, y *[]byte) {
*x = *y // ERROR "write barrier"
z := *y // no barrier
*x = z // ERROR "write barrier"
}
func f2(x *interface{}, y interface{}) {
*x = y // no barrier (dead store)
z := y // no barrier
*x = z // ERROR "write barrier"
}
func f2a(x *interface{}, y *interface{}) {
*x = *y // no barrier (dead store)
z := y // no barrier
*x = z // ERROR "write barrier"
}
func f3(x *string, y string) {
*x = y // no barrier (dead store)
z := y // no barrier
*x = z // ERROR "write barrier"
}
func f3a(x *string, y *string) {
*x = *y // ERROR "write barrier"
z := *y // no barrier
*x = z // ERROR "write barrier"
}
func f4(x *[2]string, y [2]string) {
*x = y // ERROR "write barrier"
z := y // no barrier
*x = z // ERROR "write barrier"
}
func f4a(x *[2]string, y *[2]string) {
*x = *y // ERROR "write barrier"
z := *y // no barrier
*x = z // ERROR "write barrier"
}
type T struct {
X *int
Y int
M map[int]int
}
func f5(t, u *T) {
t.X = &u.Y // ERROR "write barrier"
}
func f6(t *T) {
t.M = map[int]int{1: 2} // ERROR "write barrier"
}
func f7(x, y *int) []*int {
var z [3]*int
i := 0
z[i] = x // ERROR "write barrier"
i++
z[i] = y // ERROR "write barrier"
i++
return z[:i]
}
func f9(x *interface{}, v *byte) {
*x = v // ERROR "write barrier"
}
func f10(x *byte, f func(interface{})) {
f(x)
}
func f11(x *unsafe.Pointer, y unsafe.Pointer) {
*x = unsafe.Pointer(uintptr(y) + 1) // ERROR "write barrier"
}
func f12(x []*int, y *int) []*int {
// write barrier for storing y in x's underlying array
x = append(x, y) // ERROR "write barrier"
return x
}
func f12a(x []int, y int) []int {
// y not a pointer, so no write barriers in this function
x = append(x, y)
return x
}
func f13(x []int, y *[]int) {
*y = append(x, 1) // ERROR "write barrier"
}
func f14(y *[]int) {
*y = append(*y, 1) // ERROR "write barrier"
}
type T1 struct {
X *int
}
func f15(x []T1, y T1) []T1 {
return append(x, y) // ERROR "write barrier"
}
type T8 struct {
X [8]*int
}
func f16(x []T8, y T8) []T8 {
return append(x, y) // ERROR "write barrier"
}
func t1(i interface{}) **int {
// From issue 14306, make sure we have write barriers in a type switch
// where the assigned variable escapes.
switch x := i.(type) {
case *int: // ERROR "write barrier"
return &x
}
switch y := i.(type) {
case **int: // no write barrier here
return y
}
return nil
}
type T17 struct {
f func(*T17)
}
func f17(x *T17) {
// Originally from golang.org/issue/13901, but the hybrid
// barrier requires both to have barriers.
x.f = f17 // ERROR "write barrier"
x.f = func(y *T17) { *y = *x } // ERROR "write barrier"
}
type T18 struct {
a []int
s string
}
func f18(p *T18, x *[]int) {
p.a = p.a[:5] // no barrier
*x = (*x)[0:5] // no barrier
p.a = p.a[3:5] // ERROR "write barrier"
p.a = p.a[1:2:3] // ERROR "write barrier"
p.s = p.s[8:9] // ERROR "write barrier"
*x = (*x)[3:5] // ERROR "write barrier"
}
func f19(x, y *int, i int) int {
// Constructing a temporary slice on the stack should not
// require any write barriers. See issue 14263.
a := []*int{x, y} // no barrier
return *a[i]
}
func f20(x, y *int, i int) []*int {
// ... but if that temporary slice escapes, then the
// write barriers are necessary.
a := []*int{x, y} // ERROR "write barrier"
return a
}
var x21 *int
var y21 struct {
x *int
}
var z21 int
// f21x: Global -> heap pointer updates must have write barriers.
func f21a(x *int) {
x21 = x // ERROR "write barrier"
y21.x = x // ERROR "write barrier"
}
func f21b(x *int) {
x21 = &z21 // ERROR "write barrier"
y21.x = &z21 // ERROR "write barrier"
}
func f21c(x *int) {
y21 = struct{ x *int }{x} // ERROR "write barrier"
}
func f22(x *int) (y *int) {
// pointer write on stack should have no write barrier.
// this is a case that the frontend failed to eliminate.
p := &y
*p = x // no barrier
return
}
type T23 struct {
p *int
a int
}
var t23 T23
var i23 int
// f23x: zeroing global needs write barrier for the hybrid barrier.
func f23a() {
t23 = T23{} // ERROR "write barrier"
}
func f23b() {
// also test partial assignments
t23 = T23{a: 1} // ERROR "write barrier"
}
func f23c() {
t23 = T23{} // no barrier (dead store)
// also test partial assignments
t23 = T23{p: &i23} // ERROR "write barrier"
}
var g int
func f24() **int {
p := new(*int)
*p = &g // no write barrier here
return p
}
func f25() []string {
return []string{"abc", "def", "ghi"} // no write barrier here
}
type T26 struct {
a, b, c int
d, e, f *int
}
var g26 int
func f26(p *int) *T26 { // see issue 29573
return &T26{
a: 5,
b: 6,
c: 7,
d: &g26, // no write barrier: global ptr
e: nil, // no write barrier: nil ptr
f: p, // ERROR "write barrier"
}
}
func f27(p *int) []interface{} {
return []interface{}{
nil, // no write barrier: zeroed memory, nil ptr
(*T26)(nil), // no write barrier: zeroed memory, type ptr & nil ptr
&g26, // no write barrier: zeroed memory, type ptr & global ptr
7, // no write barrier: zeroed memory, type ptr & global ptr
p, // ERROR "write barrier"
}
}