Featured image of post How to Analyze Go Program Performance with Pprof

How to Analyze Go Program Performance with Pprof

Mastering the skills of using PProf helps you write more efficient Go code.

 

Introduction

After writing tons of code and implementing hundreds of interfaces, you finally managed to deploy your application successfully. However, you soon discover that the performance is not up to par. What a nightmare! 😭

This article is first published in the medium MPP plan. If you are a medium user, please follow me in medium. Thank you very much.

We Need for Performance Analysis

Pasted image 20240517214930

Introducing PProf

To optimize performance, the first thing to focus on is the toolchain provided by Go itself. In this article, we will explore and utilize the powerful features of Go’s performance profiling tool, PProf. It covers the following areas:

  • runtime/pprof: Collects runtime data of non-server programs for analysis
  • net/http/pprof: Collects runtime data of HTTP servers for analysis

What is PProf?

PProf is a tool used for visualizing and analyzing performance profiling data. It reads a collection of analysis samples in the profile.proto format and generates reports to visualize and analyze the data (supports both text and graphical reports).

The profile.proto file is a Protocol Buffer v3 descriptor file that describes a set of call stacks and symbolization information. It represents a set of sampled call stacks for statistical analysis and is a common format for stack trace configuration files.

Supported Usage Modes

  • Report generation: Generates reports
  • Interactive terminal use: Supports interactive terminal-based usage
  • Web interface: Provides a web-based interface

What Can You Do with PProf?

  • CPU Profiling: Collects CPU (including registers) usage of the monitored application at a certain frequency. It helps identify the time spent by the application in actively consuming CPU cycles.
  • Memory Profiling: Records stack traces when heap allocations occur in the application. It monitors current and historical memory usage and helps detect memory leaks.
  • Block Profiling: Records the locations where goroutines block and wait for synchronization (including timer channels).
  • Mutex Profiling: Reports the competition status of mutexes.

A Simple Example

Let’s start with a simple example that has some performance issues. This will serve as a basic demonstration of program analysis.

Writing the Demo Files

  1. Create a file named demo.go with the following content:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
package main
 
 import (
     "log"
     "net/http"
     _ "net/http/pprof"
     "github.com/EDDYCJY/go-pprof-example/data"
 )
 
 func main() {
     go func() {
         for {
             log.Println(data.Add("https://github.com/EDDYCJY"))
         }
     }()
 
     http.ListenAndServe("0.0.0.0:6060", nil)
 }
  1. Create a file named data/d.go with the following content:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
package data
 
 var datas []string
 
 func Add(str string) string {
     data := []byte(str)
     sData := string(data)
     datas = append(datas, sData)
 
     return sData
 }

When you run this file, your HTTP server will have an additional endpoint /debug/pprof for observing the application’s status.

Analysis

1. Using the Web Interface

To view the current overview, visit http://127.0.0.1:6060/debug/pprof/ .

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
/debug/pprof/
 profiles:
 0    block
 5    goroutine
 3    heap
 0    mutex
 9    threadcreate
 full goroutine stack dump

This page contains several subpages. Let’s dive deeper to see what we can find:

  • cpu (CPU Profiling): $HOST/debug/pprof/profile. This performs CPU profiling for 30 seconds by default and generates a profile file for analysis.
  • block (Block Profiling): $HOST/debug/pprof/block. This shows the stack traces causing blocking synchronization.
  • goroutine: $HOST/debug/pprof/goroutine. This displays the stack traces of all currently running goroutines.
  • heap (Memory Profiling): $HOST/debug/pprof/heap. This shows the memory allocation of active objects.
  • mutex (Mutex Profiling): $HOST/debug/pprof/mutex. This displays the stack traces of mutex contention.

2. Using the Interactive Terminal

  1. Execute the following command: go tool pprof http://localhost:6060/debug/pprof/profile?seconds=60 .
1
2
3
4
5
6
7
8
$ go tool pprof http://localhost:6060/debug/pprof/profile\?seconds\=60
 Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile?seconds=60
 Saved profile in /Users/eddycjy/pprof/pprof.samples.cpu.007.pb.gz
 Type: cpu
 Duration: 1mins, Total samples = 26.55s (44.15%)
 Entering interactive mode (type "help" for commands, "o" for options)
 (pprof)

After executing this command, wait for 60 seconds (you can adjust the value of seconds). PProf will perform CPU profiling during this time. Once finished, it will enter the interactive command mode, allowing you to view or export the analysis results. For a list of available commands, type pprof help.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(pprof) top10
 Showing nodes accounting for 25.92s, 97.63% of 26.55s total
 Dropped 85 nodes (cum <= 0.13s)
 Showing top 10 nodes out of 21
       flat  flat%   sum%        cum   cum%
     23.28s 87.68% 87.68%     23.29s 87.72%  syscall.Syscall
      0.77s  2.90% 90.58%      0.77s  2.90%  runtime.memmove
      0.58s  2.18% 92.77%      0.58s  2.18%  runtime.freedefer
      0.53s  2.00% 94.76%      1.42s  5.35%  runtime.scanobject
      0.36s  1.36% 96.12%      0.39s  1.47%  runtime.heapBitsForObject
      0.35s  1.32% 97.44%      0.45s  1.69%  runtime.greyobject
      0.02s 0.075% 97.51%     24.96s 94.01%  main.main.func1
      0.01s 0.038% 97.55%     23.91s 90.06%  os.(*File).Write
      0.01s 0.038% 97.59%      0.19s  0.72%  runtime.mallocgc
      0.01s 0.038% 97.63%     23.30s 87.76%  syscall.Write
  • flat: The time spent in a given function.
  • flat%: The percentage of CPU time spent in a given function.
  • sum%: The cumulative percentage of CPU time spent in a given function and its callees.
  • cum: The total time spent in a function and its callees.
  • cum%: The cumulative percentage of CPU time spent in a given function and its callees.

The last column represents the function names. In most cases, these five columns provide insights into the application’s runtime behavior, helping you optimize it. 🤔

  1. Execute the following command: go tool pprof http://localhost:6060/debug/pprof/heap .
1
2
3
4
5
6
7
8
9
$ go tool pprof http://localhost:6060/debug/pprof/heap
 Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
 Saved profile in /Users/eddycjy/pprof/pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.008.pb.gz
 Type: inuse_space
 Entering interactive mode (type "help" for commands, "o" for options)
 (pprof) top
 Showing nodes accounting for 837.48MB, 100% of 837.48MB total
       flat  flat%   sum%        cum   cum%
   837.48MB   100%   100%   837.48MB   100%  main.main.func1
  • -inuse_space: Analyzes the resident memory usage of the application.
  • -alloc_objects: Analyzes the temporary memory allocations of the application.
  1. Execute the following command: go tool pprof http://localhost:6060/debug/pprof/block
  2. Execute the following command: go tool pprof http://localhost:6060/debug/pprof/mutex

3. PProf Visualization Interface

This is the exciting part! But before we proceed, we need to write a simple test case to run.

Writing the Test Case

  1. Create a file named data/d_test.go with the following content:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
package data
 
 import "testing"
 
 const url = "https://github.com/EDDYCJY"
 
 func TestAdd(t *testing.T) {
     s := Add(url)
     if s == "" {
         t.Errorf("Test.Add error!")
     }
 }
 
 func BenchmarkAdd(b *testing.B) {
     for i := 0; i < b.N; i++ {
         Add(url)
     }
 }
  1. Run the test case:
1
2
3
4
5
$ go test -bench=. -cpuprofile=cpu.prof
 pkg: github.com/EDDYCJY/go-pprof-example/data
 BenchmarkAdd-4       10000000           187 ns/op
 PASS
 ok      github.com/EDDYCJY/go-pprof-example/data    2.300s

You can also explore -memprofile.

Launching the PProf Visualization InterfaceMethod 1:

1
$ go tool pprof -http=:8080 cpu.prof

Method 2:

1
2
$ go tool pprof cpu.prof
 $ (pprof) web

If you encounter the message “Could not execute dot; may need to install graphviz,” it means you need to install graphviz (please consult your favorite search engine).

Viewing the PProf Visualization Interface

When you open the PProf visualization interface, you will notice that it is more refined than the official toolchain’s PProf. Additionally, it includes a Flame Graph.

The Flame Graph is the highlight of this section. It is a dynamic visualization where the call sequence is represented from top to bottom (A -> B -> C -> D). Each block represents a function, and the larger the block, the more CPU time it consumes. It also supports drill-down analysis by clicking on the blocks!

Conclusion

In this article, we provided a brief introduction to PProf, the performance profiling tool in Go.
PProf offers great assistance in locating and analyzing performance issues in specific scenarios.
We hope this article has been helpful to you. We encourage you to try it out yourself and delve deeper into the various features and knowledge points it offers.

Thought Questions

Congratulations on making it to the end! Here are two simple thought questions to expand your thinking:

  1. Is flat always greater than cum? Why? In what scenarios would cum be greater than flat?
  2. What performance issues can you identify in the demo code provided in this article? How would you address them?
    Now it’s your turn to share your thoughts!
Built with Hugo
Theme Stack designed by Jimmy