Performance Profiling of Golang's Biggest Killers: PProf

 

After writing tons of code and implementing hundreds of interfaces, you finally managed to deploy your application successfully. However, you soon discover that the performance is not up to par. What a nightmare! 😭

This article was first published in the Medium MPP plan. If you are a Medium user, please follow me on Medium. Thank you very much.

The Need for Performance Analysis

Introducing PProf

To optimize performance, the first thing to focus on is the toolchain provided by Go itself. In this article, we will explore and utilize the powerful features of Go’s performance profiling tool, PProf. It covers the following areas:

  • runtime/pprof: Collects runtime data of non-server programs for analysis
  • net/http/pprof: Collects runtime data of HTTP servers for analysis

What is PProf?

PProf is a tool used for visualizing and analyzing performance profiling data. It reads a collection of analysis samples in the profile.proto format and generates reports to visualize and analyze the data (supports both text and graphical reports).

The profile.proto file is a Protocol Buffer v3 descriptor file that describes a set of call stacks and symbolization information. It represents a set of sampled call stacks for statistical analysis and is a common format for stack trace configuration files.

Supported Usage Modes

  • Report generation: Generates reports
  • Interactive terminal use: Supports interactive terminal-based usage
  • Web interface: Provides a web-based interface

What Can You Do with PProf?

  • CPU Profiling: Collects CPU (including registers) usage of the monitored application at a certain frequency. It helps identify the time spent by the application in actively consuming CPU cycles.
  • Memory Profiling: Records stack traces when heap allocations occur in the application. It monitors current and historical memory usage and helps detect memory leaks.
  • Block Profiling: Records the locations where goroutines block and wait for synchronization (including timer channels).
  • Mutex Profiling: Reports the competition status of mutexes.

A Simple Example

Let’s start with a simple example that has some performance issues. This will serve as a basic demonstration of program analysis.

Writing the Demo Files

  1. Create a file named demo.go with the following content:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
package main

import (
    "log"
    "net/http"
    _ "net/http/pprof"
    "github.com/EDDYCJY/go-pprof-example/data"
)

func main() {
    go func() {
        for {
            log.Println(data.Add("https://github.com/EDDYCJY"))
        }
    }()

    http.ListenAndServe("0.0.0.0:6060", nil)
}
  1. Create a file named data/d.go with the following content:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
package data

var datas []string

func Add(str string) string {
    data := []byte(str)
    sData := string(data)
    datas = append(datas, sData)

    return sData
}

When you run this file, your HTTP server will have an additional endpoint /debug/pprof for observing the application’s status.

Analysis

1. Using the Web Interface

To view the current overview, visit http://127.0.0.1:6060/debug/pprof/.

1
2
3
4
5
6
7
8
9
/debug/pprof/
profiles:
0    block
5    goroutine
3    heap
0    mutex
9    threadcreate

full goroutine stack dump

This page contains several subpages. Let’s dive deeper to see what we can find:

  • cpu (CPU Profiling): $HOST/debug/pprof/profile. This performs CPU profiling for 30 seconds by default and generates a profile file for analysis.
  • block (Block Profiling): $HOST/debug/pprof/block. This shows the stack traces causing blocking synchronization.
  • goroutine: $HOST/debug/pprof/goroutine. This displays the stack traces of all currently running goroutines.
  • heap (Memory Profiling): $HOST/debug/pprof/heap. This shows the memory allocation of active objects.
  • mutex (Mutex Profiling): $HOST/debug/pprof/mutex. This displays the stack traces of mutex contention.

2. Using the Interactive Terminal

  1. Execute the following command: go tool pprof http://localhost:6060/debug/pprof/profile?seconds=60.
1
2
3
4
5
6
7
$ go tool pprof http://localhost:6060/debug/pprof/profile\?seconds\=60
Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile?seconds=60
Saved profile in /Users/eddycjy/pprof/pprof.samples.cpu.007.pb.gz
Type: cpu
Duration: 1mins, Total samples = 26.55s (44.15%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) 

After executing this command, wait for 60 seconds (you can adjust the value of seconds). PProf will perform CPU profiling during this time. Once finished, it will enter the interactive command mode, allowing you to view or export the analysis results. For a list of available commands, type pprof help.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(pprof) top10
Showing nodes accounting for 25.92s, 97.63% of 26.55s total
Dropped 85 nodes (cum <= 0.13s)
Showing top 10 nodes out of 21
      flat  flat%   sum%        cum   cum%
    23.28s 87.68% 87.68%     23.29s 87.72%  syscall.Syscall
     0.77s  2.90% 90.58%      0.77s  2.90%  runtime.memmove
     0.58s  2.18% 92.77%      0.58s  2.18%  runtime.freedefer
     0.53s  2.00% 94.76%      1.42s  5.35%  runtime.scanobject
     0.36s  1.36% 96.12%      0.39s  1.47%  runtime.heapBitsForObject
     0.35s  1.32% 97.44%      0.45s  1.69%  runtime.greyobject
     0.02s 0.075% 97.51%     24.96s 94.01%  main.main.func1
     0.01s 0.038% 97.55%     23.91s 90.06%  os.(*File).Write
     0.01s 0.038% 97.59%      0.19s  0.72%  runtime.mallocgc
     0.01s 0.038% 97.63%     23.30s 87.76%  syscall.Write
  • flat: The time spent in a given function.
  • flat%: The percentage of CPU time spent in a given function.
  • sum%: The cumulative percentage of CPU time spent in a given function and its callees.
  • cum: The total time spent in a function and its callees.
  • cum%: The cumulative percentage of CPU time spent in a given function and its callees.

The last column represents the function names. In most cases, these five columns provide insights into the application’s runtime behavior, helping you optimize it. 🤔
2. Execute the following command: go tool pprof http://localhost:6060/debug/pprof/heap.

1
2
3
4
5
6
7
8
9
$ go tool pprof http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /Users/eddycjy/pprof/pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.008.pb.gz
Type: inuse_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 837.48MB, 100% of 837.48MB total
      flat  flat%   sum%        cum   cum%
  837.48MB   100%   100%   837.48MB   100%  main.main.func1
  • -inuse_space: Analyzes the resident memory usage of the application.
  • -alloc_objects: Analyzes the temporary memory allocations of the application.
  1. Execute the following command: go tool pprof http://localhost:6060/debug/pprof/block.
  2. Execute the following command: go tool pprof http://localhost:6060/debug/pprof/mutex.

3. PProf Visualization Interface

This is the exciting part! But before we proceed, we need to write a simple test case to run.

Writing the Test Case
  1. Create a file named data/d_test.go with the following content:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
package data

import "testing"

const url = "https://github.com/EDDYCJY"

func TestAdd(t *testing.T) {
    s := Add(url)
    if s == "" {
        t.Errorf("Test.Add error!")
    }
}

func BenchmarkAdd(b *testing.B) {
    for i := 0; i < b.N; i++ {
        Add(url)
    }
}
  1. Run the test case:
1
2
3
4
5
$ go test -bench=. -cpuprofile=cpu.prof
pkg: github.com/EDDYCJY/go-pprof-example/data
BenchmarkAdd-4       10000000           187 ns/op
PASS
ok      github.com/EDDYCJY/go-pprof-example/data    2.300s

You can also explore -memprofile.

Launching the PProf Visualization Interface
Method 1:
1
$ go tool pprof -http=:8080 cpu.prof
Method 2:
1
2
$ go tool pprof cpu.prof 
$ (pprof) web

If you encounter the message “Could not execute dot; may need to install graphviz,” it means you need to install graphviz (please consult your favorite search engine).

Viewing the PProf Visualization Interface

When you open the PProf visualization interface, you will notice that it is more refined than the official toolchain’s PProf. Additionally, it includes a Flame Graph.

The Flame Graph is the highlight of this section. It is a dynamic visualization where the call sequence is represented from top to bottom (A -> B -> C -> D). Each block represents a function, and the larger the block, the more CPU time it consumes. It also supports drill-down analysis by clicking on the blocks!

Conclusion

In this article, we provided a brief introduction to PProf, the performance profiling tool in Go. PProf is very helpful in locating and analyzing performance issues in specific scenarios.
We hope this article has been helpful to you. We encourage you to try it out yourself and delve deeper into the various features and knowledge points it offers.

Thought Questions

Congratulations on making it to the end! Here are two simple thought questions to expand your thinking:

  1. Is flat always greater than cum? Why? In what scenarios would cum be greater than flat?
  2. What performance issues can you identify in the demo code provided in this article? How would you address them?

Now it’s your turn to share your thoughts!

true
Last updated on Jun 29, 2024 20:50 CST
Built with Hugo
Theme Stack designed by Jimmy