High Performance Go Compiler Optimisations
golang compiler optimisations
1. history
- Plan9 compiler tool chain(since 2007) –>
- Go1.5 compiler write in Go instead of C(2015) –>
- Go1.7 new compiler backend based on SSA tech(2016) –>
2. escape analysis
Allocing and deallocting spaces on stack is much cheaper than on heap. In Go, if your value lives beyond the life time of the function call, then it will be moved to the heap. It is said this value escapes to the heap. See the follow code snippet:
type TestStruct struct {
a, b, c, d int
}
func NewTestStruct() *TestStruct {
return &TestStruct{
a: 1,
b: 2,
c: 3,
d: 4,
}
}
the assembly code complier generated:
--------NewTestStruct-------------
"".NewTestStruct STEXT size=103 args=0x8 locals=0x18
0x0000 00000 (escape.go:7) TEXT "".NewTestStruct(SB), ABIInternal, $24-8
0x0000 00000 (escape.go:7) MOVQ (TLS), CX
0x0009 00009 (escape.go:7) CMPQ SP, 16(CX)
0x000d 00013 (escape.go:7) JLS 96
0x000f 00015 (escape.go:7) SUBQ $24, SP
0x0013 00019 (escape.go:7) MOVQ BP, 16(SP)
0x0018 00024 (escape.go:7) LEAQ 16(SP), BP
0x001d 00029 (escape.go:7) FUNCDATA $0, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
0x001d 00029 (escape.go:7) FUNCDATA $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
0x001d 00029 (escape.go:7) FUNCDATA $2, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
0x001d 00029 (escape.go:12) PCDATA $0, $1
0x001d 00029 (escape.go:12) PCDATA $1, $0
0x001d 00029 (escape.go:12) LEAQ type."".TestStruct(SB), AX
0x0024 00036 (escape.go:12) PCDATA $0, $0
0x0024 00036 (escape.go:12) MOVQ AX, (SP)
0x0028 00040 (escape.go:12) CALL runtime.newobject(SB)
0x002d 00045 (escape.go:12) PCDATA $0, $1
0x002d 00045 (escape.go:12) MOVQ 8(SP), AX
0x0032 00050 (escape.go:9) MOVQ $1, (AX)
0x0039 00057 (escape.go:10) MOVQ $2, 8(AX)
0x0041 00065 (escape.go:11) MOVQ $3, 16(AX)
0x0049 00073 (escape.go:12) MOVQ $4, 24(AX)
0x0051 00081 (escape.go:8) PCDATA $0, $0
0x0051 00081 (escape.go:8) PCDATA $1, $1
0x0051 00081 (escape.go:8) MOVQ AX, "".~r0+32(SP)
0x0056 00086 (escape.go:8) MOVQ 16(SP), BP
0x005b 00091 (escape.go:8) ADDQ $24, SP
0x005f 00095 (escape.go:8) RET
0x0060 00096 (escape.go:8) NOP
0x0060 00096 (escape.go:7) PCDATA $1, $-1
0x0060 00096 (escape.go:7) PCDATA $0, $-1
0x0060 00096 (escape.go:7) CALL runtime.morestack_noctxt(SB)
0x0065 00101 (escape.go:7) JMP 0
0x0000 64 48 8b 0c 25 00 00 00 00 48 3b 61 10 76 51 48 dH..%....H;a.vQH
0x0010 83 ec 18 48 89 6c 24 10 48 8d 6c 24 10 48 8d 05 ...H.l$.H.l$.H..
0x0020 00 00 00 00 48 89 04 24 e8 00 00 00 00 48 8b 44 ....H..$.....H.D
0x0030 24 08 48 c7 00 01 00 00 00 48 c7 40 08 02 00 00 $.H......H.@....
0x0040 00 48 c7 40 10 03 00 00 00 48 c7 40 18 04 00 00 .H.@.....H.@....
0x0050 00 48 89 44 24 20 48 8b 6c 24 10 48 83 c4 18 c3 .H.D$ H.l$.H....
0x0060 e8 00 00 00 00 eb 99 .......
rel 5+4 t=16 TLS+0
rel 32+4 t=15 type."".TestStruct+0
rel 41+4 t=8 runtime.newobject+0
rel 97+4 t=8 runtime.morestack_noctxt+0
--------StackNewStruct-------------
"".StackNewStruct STEXT nosplit size=37 args=0x20 locals=0x0
0x0000 00000 (escape.go:16) TEXT "".StackNewStruct(SB), NOSPLIT|ABIInternal, $0-32
0x0000 00000 (escape.go:16) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (escape.go:16) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (escape.go:16) FUNCDATA $2, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (escape.go:17) PCDATA $0, $0
0x0000 00000 (escape.go:17) PCDATA $1, $0
0x0000 00000 (escape.go:17) MOVQ $1, "".~r0+8(SP)
0x0009 00009 (escape.go:17) MOVQ $2, "".~r0+16(SP)
0x0012 00018 (escape.go:17) MOVQ $3, "".~r0+24(SP)
0x001b 00027 (escape.go:17) MOVQ $4, "".~r0+32(SP)
0x0024 00036 (escape.go:17) RET
0x0000 48 c7 44 24 08 01 00 00 00 48 c7 44 24 10 02 00 H.D$.....H.D$...
0x0010 00 00 48 c7 44 24 18 03 00 00 00 48 c7 44 24 20 ..H.D$.....H.D$
0x0020 04 00 00 00 c3 .....
compare the assembly code generated by the two function, we find some intresting things:
- first: the assembly code genreated by NewTestStruct is much longer than StackNewStruct
- second: runtime.newobject is called in the first function's assembly code, this tell us something is allocated on heap.
The compiler can do the oppsite things, move something would assumed to be allocated on the heap to the stack, let's see another example:
func Sum() int {
numbers := make([]int, 10)
return 10
}
func main() {
answer := Sum()
fmt.Println(answer)
}
Function Sum make a int slice, it is used just inside this function, no one can know it from the outside. The compiler will arrange to store the 100 integers for that slice on the stack, rather than on the heap.
$ go build -gcflags=-m escape.go
# command-line-arguments
./escape.go:28:6: can inline Sum
./escape.go:45:15: inlining call to Sum
./escape.go:46:13: inlining call to fmt.Println
./escape.go:29:17: Sum make([]int, 10) does not escape
./escape.go:45:15: main make([]int, 10) does not escape
./escape.go:46:13: answer escapes to heap
./escape.go:46:13: main []interface {} literal does not escape
./escape.go:46:13: io.Writer(os.Stdout) escapes to heap
<autogenerated>:1: (*File).close .this does not escape
./escape.go:29:2: numbers declared and not used
the Sum make([]int, 10) does not escape tell us that compiler already detected this no escaping sence!
3. inlining
3.1 why inlining?
Function calls have a fixed overhead, inlining is the classical optimisation that avoids these costs. Inlining is designed to deal with leaf functions, which does its own work and do not call other functions. Becareful! heavy inlining can makes stack traces harder to follow.
3.2 inlining in action:
See the following example:
func Max(a, b int) int {
if a > b {
return a
}
return b
}
func F() {
const a, b = 100, 20
if Max(a, b) == b {
panic(b)
}
}
Use –gcflags=-m to view the compilers optimisaztion decision:
go build --gcflags=-m inline.go
# command-line-arguments
./inline.go:3:6: can inline Max
./inline.go:10:6: can inline F
./inline.go:12:8: inlining call to Max
- line 3: Tell Max function can be inlined;
- line 12: Tell the body of Max has been inlined into the function F;
4. dead code elimination
Borrow the example from the previous section, let's see how compiler do dead code elimination.
Original code:
func Max(a, b int) int {
if a > b {
return a
}
return b
}
func F() {
const a, b = 100, 20
if Max(a, b) == b {
panic(b)
}
}
After Max inline into F, F become:
func F() {
const a, b = 100, 20
var result int
if a > b {
result = a
} else {
result = b
}
if result == b {
panic(b)
}
}
Because a and b are constants, the compiler know their value at compile time. So, a > b is determined. F can further be optimised to:
func F() {
const a, b = 100, 20
var result int
if true {
result = a
} else {
result = b
}
if result == b {
panic(b)
}
}
now compiler know that else branch can never be reached, it will eliminate this branch:
func F() {
const a, b = 100, 20
var result int
result = a
if result == b {
panic(b)
}
}
the result == b is determinated to be false, it can never be reached. Eliminate it:
func F() {
const a, b = 100, 20
var result = a
}
Finally, F become:
func F() {
}
5. prove pass
There is code snippt as fllowing:
func test(x uint32) bool {
if x < 5 {
if x < 10 {
return true
}
panic("x not less 10")
}
return false
}
if x is less than 5, then x is must less than 10; We ask compiler to show the working of the prove pass:
go build --gcflags=-d=ssa/prove/debug escape.go
# command-line-arguments
./escape.go:39:10: Proved Less32U
6. review of compiler flags
Compiler flags are provided with:
go build -gcflags=$FLAGS
Investigate the operation of the following compiler functions:
- -S prints the (Go flavoured) assembly of the package being compiled.
- -l controls the behaviour of the inliner; -l disables inlining, -l -l increases it (more -l ‘s increases the compiler’s appetite for inlining code). Experiment with the difference in compile time, program size, and run time.
- -m controls printing of optimisation decision like inlining, escape analysis. -m-m` prints more details about what the compiler was thinking.
- -l -N disables all optimisations.
- -d=ssa/prove/debug=on, this also takes values of 2 and above, see what prints
- The -d flag takes other values, you can find out what they are with the command go tool compile -d help. Experiment and see what you can discovrer.