Iterative benchmarking demo

Setup

You'll need:

Go installed.
The internet.

Steps:

Open up a terminal/command prompt:
git clone https://github.com/jh125486/dd_benchmarking.git
go mod tidy

Feel free to open the Go code in your favorite editor, but for this demonstration, you can run all the commands in the shell/command prompt.

Background

To demonstrate the builtin go tool benchmarking, we'll be using some simple functions to compute all the Prime numbers up to a max int value. My point in this isn't to demonstrate Prime Number algorithms... merely Go benchmarking.

Testing first....

A test functions in Go start with the word Test, and takes *testing.T as the only parameter.

To test each of our Prime number functions, we have created a test function, which compares the output of the function to a known good list of all primes numbers under 1000... which should give us a good indication of it's correctness for our needs.

If we wrote the functions correctly, each should pass within a few milliseconds :)

go test ./...

First round: Naive basic

A benchmark function in Go starts with the word Benchmark and takes *testing.B as the only parameter. To run a benchmark, pass the bench flag to go test, along with the package to test. In our case, we're restricting each benchmark for clarity with -args.

Let's run Go's builtin benchmark on our naive Prime Number function:

go test -bench=. -args Basic

Pass a count arg to the go command to run the benchmark multiple times.

go test -bench=. -count 10 -args Basic

Second round: Naive with sub-benchmarks

For many algorithms, performance issues only are found on different "sets" of input... for example, this naive function has exponential growth which we can detect with larger and larger input values.

A way to detect this is with sub-benchmarks, e.g.:

var inputs = []int{
	100,
	...
	10000000,
}
for _, v := range inputs {
    // b.Run <- starts a sub-benchmark
    b.Run(fmt.Sprint("sub-", v), func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            functionToBenchmark(v)
        }
    })
}

Run benchmarks with sub-benchmarks:

go test -bench=. -args Naive

Notice that the last sub-benchmark only completed one run (or not many runs at least)... this is because Go by default only runs the benchmark for 1s, so let's run that again with 10s by setting benchtime:

go test -bench=. -benchtime=10s -args Naive

Note: If your system is still having problems running that last su-benchmark multiple times, comment out or delete line main_test.go:23 to just disable that large input.

You can also use the NNNx format with benchtime, which will run the test NNN number of times instead.

We can also show memory allocations and total memory per operation with benchmem:

go test -bench=. -benchmem -args Naive

Third round: we can do better

There's faster algorithm for finding primes, that is surprising old (~2,300 years): Sieve of Eratosthenes.

Run benchmarks for Sieve of Eratosthenes (with sub-benchmarks):

go test -bench=. -args Eratos

This should feel faster... but without a direct comparison benchmark to benchmark, I can't be sure. So let's verify.

Gophers created a program to do just that, so let's install it:

go install golang.org/x/perf/cmd/benchstat@latest

Now let's re-run the benchmarks and save off the results:

go test -bench=. -count 10 -args Naive > naive_10.txt
go test -bench=. -count 10 -args Eratos > eratos_10.txt

benchstat can calculate differences with a confidence interval at level 0.95, so let's show generate that:

benchstat eratos_10.txt naive_10.txt

It should show an impressive speed up... so that's a great start.

Sidenote: benchstat can only compare the benchmarks if the test names are the same, which is the reason we have been running the same benchmark function, and just passing a command line arg to switch between the different "backing" algorithms.

Final round?: time to get concurrent

If you noticed during the last few benchmarks that the CPU really wasn't being used at all... that's because all modern CPUs have multiple cores and these algorithms are single-CPU bound.

So let's fix that. There's a number of concurrent Prime-finding algorithms, but we'll use the Sieve of Atkin because, well, it's already written for us here primegen.

Run a benchmark on the concurrent Sieve of Atkin and save off the results:

go test -bench=. -count 10 -args Atkin > atkin_10.txt

Check the results against both the non-concurrent Sieve of Eratosthenes, and the Naive version:

benchstat atkin_10.txt eratos_10.txt

You should notice that on the smaller input values that Atkin is actually slower. That's probably because the actual setup and coordination to handle the concurrent communication has some overhead, which overruns the performance gains on the low-end. Once the values start to increase, we should see that it performs ~500% faster on the high inputs.

Finishing up

There's plenty of other ways to discovery performance bottlenecks using the builtin tooling. I haven't covered profiling, which is through go tool pprof.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
samples		samples
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
main_test.go		main_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Iterative benchmarking demo

Setup

Background

Testing first....

First round: Naive basic

Second round: Naive with sub-benchmarks

Third round: we can do better

Final round?: time to get concurrent

Finishing up

Epilogue: Helpful links about Go benchmarking

About

Languages

License

jh125486/benchmarking_demo

Folders and files

Latest commit

History

Repository files navigation

Iterative benchmarking demo

Setup

Background

Testing first....

First round: Naive basic

Second round: Naive with sub-benchmarks

Third round: we can do better

Final round?: time to get concurrent

Finishing up

Epilogue: Helpful links about Go benchmarking

About

Topics

Resources

License

Stars

Watchers

Forks

Languages