After writing tons of code and implementing hundreds of interfaces, you finally successfully managed to deploy your application. However, you soon discover that the performance could be better. What a nightmare!
This article was first published in the Medium MPP plan. If you are a Medium user, please follow me on Medium. Thank you very much.
The Need for Performance Analysis
Introducing PProf
To optimize performance, the first thing to focus on is the toolchain provided by Go itself. In this article, we will explore and utilize the powerful features of Go’s performance profiling tool, PProf. It covers the following areas:
runtime/pprof
: Collects runtime data of non-server programs for analysisnet/http/pprof
: Collects runtime data of HTTP servers for analysis
What is the pprof
?
pprof
is a tool used to visualize and analyze performance profiling data. It reads a collection of analysis samples in the profile.proto format and generates reports to visualize and analyze the data (supports both text and graphical reports).
The profile.proto
file is a Protocol Buffer v3 descriptor file that describes a set of call stacks and symbolization information. It represents a set of sampled call stacks for statistical analysis and is a common format for stack trace configuration files.
Supported Usage Modes
- Report generation: Generates reports
- Interactive terminal use: Supports interactive terminal-based usage
- Web interface: Provides a web-based interface
What Can You Do with pprof
?
- CPU Profiling Collects CPU (including registers) usage of the monitored application at a certain frequency. It helps identify the time the application spends actively consuming CPU cycles.
- Memory Profiling: Records stack traces when heap allocations occur in the application. It monitors current and historical memory usage and helps detect memory leaks.
- Block Profiling: Records the locations where goroutines block and wait for synchronization (including timer channels).
- Mutex Profiling: Reports the competition status of mutexes.
A Simple Example
Let’s start with a simple example that has some performance issues. This will serve as a basic demonstration of program analysis.
Writing the Demo Files
- Create a file named
demo.go
with the following content:
|
|
- Create a file named
data/d.go
with the following content:
|
|
When you run this file, your HTTP server will have an additional endpoint /debug/pprof
for observing the application’s status.
Analysis
1. Using the Web Interface
To view the current overview, visit http://127.0.0.1:6060/debug/pprof/
.
|
|
This page contains several subpages. Let’s dive deeper to see what we can find:
- cpu (CPU Profiling):
$HOST/debug/pprof/profile
. This performs CPU profiling for 30 seconds by default and generates a profile file for analysis. - block (Block Profiling):
$HOST/debug/pprof/block
. This shows the stack traces causing blocking synchronization. - goroutine:
$HOST/debug/pprof/goroutine
. This displays the stack traces of all currently running goroutines. - heap (Memory Profiling):
$HOST/debug/pprof/heap
. This shows the memory allocation of active objects. - mutex (Mutex Profiling):
$HOST/debug/pprof/mutex
. This displays the stack traces of mutex contention.
2. Using the Interactive Terminal
- Execute the following command:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=60
.
|
|
After executing this command, wait for 60 seconds (you can adjust the value of seconds
). PProf will perform CPU profiling during this time. Once finished, it will enter the interactive command mode, allowing you to view or export the analysis results. For a list of available commands, type pprof help
.
|
|
- flat: The time spent in a given function.
- flat%: The percentage of CPU time spent in a given function.
- sum%: The cumulative percentage of CPU time spent in a given function and its callees.
- cum: The total time spent in a function and its callees.
- cum%: The cumulative percentage of CPU time spent in a given function and its callees.
The last column represents the function names. In most cases, these five columns provide insights into the application’s runtime behavior, helping you optimize it. π€
2. Execute the following command: go tool pprof http://localhost:6060/debug/pprof/heap
.
|
|
- -inuse_space: Analyzes the resident memory usage of the application.
- -alloc_objects: Analyzes the temporary memory allocations of the application.
- Execute the following command:
go tool pprof http://localhost:6060/debug/pprof/block
. - Execute the following command:
go tool pprof http://localhost:6060/debug/pprof/mutex
.
3. PProf Visualization Interface
This is the exciting part! But before we proceed, we need to write a simple test case to run.
Writing the Test Case
- Create a file named
data/d_test.go
with the following content:
|
|
- Run the test case:
|
|
You can also explore -memprofile
.
Launching the PProf Visualization Interface
Method 1:
|
|
Method 2:
|
|
If you encounter the message “Could not execute dot; may need to install graphviz,” it means you need to install graphviz
(please consult your favorite search engine).
Viewing the PProf Visualization Interface
When you open the PProf visualization interface, you will notice that it is more refined than the official toolchain’s PProf. Additionally, it includes a Flame Graph.
The Flame Graph is the highlight of this section. It is a dynamic visualization where the call sequence is represented from top to bottom (A -> B -> C -> D). Each block represents a function, and the larger the block, the more CPU time it consumes. It also supports drill-down analysis by clicking on the blocks!
Conclusion
In this article, we provided a brief introduction to PProf, the performance profiling tool in Go. PProf is very helpful in locating and analyzing performance issues in specific scenarios.
We hope this article has been helpful to you. We encourage you to try it out yourself and delve deeper into the various features and knowledge points it offers.
Thought Questions
Congratulations on making it to the end! Here are two simple thought questions to expand your thinking:
- Is
flat
always greater thancum
? Why? In what scenarios wouldcum
be greater thanflat
? - What performance issues can you identify in the demo code provided in this article? How would you address them?
Now it’s your turn to share your thoughts!