Benchmark Your Code

when you want know the throughput of a database, you do benchmark; when you want to know the delay time of net package, you do benchmark; when you want to know the performance of a function write in some language, you do benchmark; maybe when god was creating this world, he/she did so much benchmarking.:) now let's see how to do benchmark in golang.

1. prepare benchmark environment

the mechine must be idle;
be careful with power saving and thermal scaling;
do not use virtual mechine and shared cloud hosting;

2. benchmarking using golang testing package:

let's use a simple function for example, the following Fib function is raw and too badlly slow!

benching.go:

func Fib2(n int) int {
	switch n {
	case 0:
		return 0
	case 1:
		return 1
	default:
		return Fib2(n-1) + Fib2(n-2)
	}
}

benching_test.go:

func BenchmarkFib20(b *testing.B) {
	for n := 0; n < b.N; n++ {
		Fib2(20)
	}
}

func BenchmarkFib28(b *testing.B) {
	for n := 0; n < b.N; n++ {
		Fib2(28)
	}
}

run go test command to do the real benching work:

    $go test -bench=. .

the privious command run all the benching functions under current directory. you can also run specific benching functions:

    $go test -bench=Fib20 .

the privious command run specific benching function which name contain “Fib20”;
Tips: if you use emacs do go development, try add the following code snippet into your emacs config file:

(defun get-word-on-point()
  (interactive)
  (let ((word (thing-at-point 'word 'no-properties)))
	word))

(defun benching-at-point()
  ;;benching the function at point
  (interactive)
  (let* ((curr-func-name (get-word-on-point))
		 (bench-cmd (concat "go test -bench=" curr-func-name " .")))
	(shell-command bench-cmd)))

(defun benching-all ()
  ;;benching all the benchmark function in current file
  (interactive)
  (let ((bench-cmd (concat "go test -bench=. .")))
	(shell-command bench-cmd)))

(defun benching-golang ()
  "running and testing current file"
  (local-set-key (kbd "C-c C-c C-b") 'benching-at-point)
  (local-set-key (kbd "C-c C-c C-a") 'benching-all))

(add-hook 'go-mode-hook 'benching-golang)

now, when you are in a benching file buffer, move your point to the benching function, then press”C-c C-c C-b”, and wait for the benching result to come. en!! fast and amazing!!!

3. how benchmarks work?

In the benching function, b.N start at 1. If the benched function completes in under 1 second–then b.N is increase approximately 20% and benched function running again.

goos: linux
goarch: amd64
BenchmarkFib20-8   	   46526	     25536 ns/op
BenchmarkFib28-8   	     939	   1204306 ns/op
PASS
ok  	_/home/hjiang/github/playground/golang/high_performance/benching	2.713s

In the benching result above, we can see almost 46000 loops took just over a second. and every loop took almost 25000ns.

4. how to increase benchmark accuracy?

Think the following scene, you have a function which took 0.5 second to finish. when you do benching using the default benching time – 1 second, the function only run two time. the average of this two runs may have a high standard deviation.

you can increase the benchmark time by the “-benchtime” flag:

go test -bench . -benchtime=10s .
goos: linux
goarch: amd64
BenchmarkFib20-8   	  467419	     25576 ns/op
BenchmarkFib28-8   	    9517	   1200492 ns/op
PASS
ok  	_/home/hjiang/github/playground/golang/high_performance/benching	23.771s

also, you can do any number of benching for the same function, using “-count”:

go test -bench Fib20 -count=10 .
goos: linux
goarch: amd64
BenchmarkFib20-8   	   45270	     25574 ns/op
BenchmarkFib20-8   	   45276	     25659 ns/op
BenchmarkFib20-8   	   46740	     25609 ns/op
BenchmarkFib20-8   	   45705	     25610 ns/op
BenchmarkFib20-8   	   46526	     25597 ns/op
BenchmarkFib20-8   	   46834	     25701 ns/op
BenchmarkFib20-8   	   46414	     25543 ns/op
BenchmarkFib20-8   	   46909	     25561 ns/op
BenchmarkFib20-8   	   46792	     25571 ns/opp
BenchmarkFib20-8   	   46754	     25475 ns/op
PASS
ok  	_/home/hjiang/github/playground/golang/high_performance/benching	14.483s

you can see the ns/op changed vary little each running. our banching is reliable! Also, you can use benchstat tool to tell how stable is your benching.

go test -bench Fib20 -count=10 . >> old.txt
benchstat old.txt

name     time/op
Fib20-8  25.6µs ± 0%

the benching is very stable.

5. improve function performance, then do benching again

we hard code another number from the fibonacci series, reduce the depth of each recusive call by one.

func Fib3(n int) int {
	switch n {
	case 0:
		return 0
	case 1:
		return 1
	case 2:
		return 1
	default:
		return Fib2(n-1) + Fib2(n-2)
	}
}

func BenchmarkFib20(b *testing.B) {
	for n := 0; n < b.N; n++ {
		Fib3(20)
	}
}

func BenchmarkFib28(b *testing.B) {
	for n := 0; n < b.N; n++ {
		Fib3(28)
	}
}

compare the new version with the old version with benchstat:

go test -bench Fib20 -count=10 . > new.txt

benchstat old.txt new.txt
name     old time/op  new time/op  delta
Fib20-8  39.2µs ± 0%  39.3µs ± 0%   ~     (p=0.424 n=10+10)

go test -bench Fib28 -count=10 . > new.txt
benchstat old.txt new.txt
name     old time/op  new time/op  delta
Fib28-8  1.84ms ± 0%  1.84ms ± 0%   ~     (p=0.870 n=10+10)

The recursion depth reducing does not have much effect on the performance! We re-implement the fibonacci series generate function with iterate method.

func Fib_Iter(n int) int {
	a := 0
	b := 1
	c := 0

	if n == 0 {
		return a
	}

	if n == 1 {
		return b
	}

	for i := 2; i <= n; i++ {
		c = b + a
		a = b
		b = c
	}

	return c
}

Use benchstat do compare:

benchstat old.txt new.txt

name     old time/op  new time/op  delta
Fib20-8  39.2µs ± 0%   0.0µs ± 1%  -99.97%  (p=0.000 n=9+10)

The iterate version is so much fast that the recur version.

6. avoid setup interference when benching

When need some setup before or in the middle of the benching, one can reset the benching timer:

once per run setup, use ResetTimer():

func BenchmarkExpensive(b *testing.B) {
        boringAndExpensiveSetup()
        b.ResetTimer() 
        for n := 0; n < b.N; n++ {
                // function under test
        }
}

per loop setup, use StopTimer() and StartTimer():

func BenchmarkComplicated(b *testing.B) {
        for n := 0; n < b.N; n++ {
                b.StopTimer() 
                complicatedSetup()
                b.StartTimer() 
                // function under test
        }
}

7. miscellaneous in benching

7.1 record the number of memory allocations:

func BenchmarkRead(b *testing.B) {
        b.ReportAllocs()
        for n := 0; n < b.N; n++ {
                // function under test
        }
}

7.2 profiling benchmarks:

-cpuprofile=$FILE writes a CPU profile to $FILE.
-memprofile=$FILE, writes a memory profile to $FILE, -memprofilerate=N adjusts the profile rate to 1/N.
-blockprofile=$FILE, writes a block profile to $FILE.

this the end of world now:) –> Do benching yourself!!!!