Unmasking Go Memory Leaks: CloudWeGo Open Sources goref for Deep Heap Analysis

In Go language development, memory leak issues are often challenging to pinpoint. While traditional pprof tools can provide some assistance, their capabilities are limited in complex scenarios. To analyze and resolve these issues more effectively, the CloudWeGo team has developed a new tool called goref.
Based on Delve, goref can deeply analyze heap object references in Go programs, displaying the distribution of memory references to help developers quickly locate memory leaks or optimize garbage collection (GC) overhead. This tool supports the analysis of runtime processes and core dump files, providing Go developers with a powerful memory analysis tool.

Limitations Of `pprof`

When encountering memory leaks in Go development, most people first attempt to generate a heap profile to investigate the issue. However, the heap profile flame graph often does not provide much help in troubleshooting because it only records where objects are created. In complex business scenarios, where objects are passed through multiple layers of dependencies or reused from memory pools, it becomes nearly impossible to identify the root cause based solely on the stack information of the creation.

For example, in the following heap profile, the FastRead function stack is a deserialization function from the Kitex framework. If a business coroutine leaks a request object, it does not reflect the corresponding leaking code location but only shows that the FastRead function stack occupies memory.

Heap Profile Example

As we know, Go is a garbage-collected language, and an object cannot be released primarily because the GC marks it as alive through reference analysis. Similarly, as a GC language, Java has more sophisticated analysis tools, such as JProfiler, which can effectively display object reference relationships. Therefore, we also wanted to implement an efficient reference analysis tool in Go that can accurately and directly inform us of memory reference distribution and relationships, liberating us from the arduous task of static analysis. The good news is that we have nearly completed the development of this tool, which is open-sourced in the goref repository, with usage instructions available in the README document.

The following sections will share the design ideas and detailed implementation of this tool.

Implementation Ideas of Goref

GC Marking Process

Before discussing the specific implementation, let’s review how the GC marks objects as alive.

Go employs a tiered allocation scheme similar to tcmalloc, where each heap object is assigned to an mspan during allocation, with a fixed size. During GC, a heap address calls runtime.spanOf to find this mspan from a multi-level index, thus obtaining the base address and size of the original object.

1
2
3
4
5
6


// simplified code
func spanOf(p uintptr) *mspan {
    ri := arenaIndex(p)
    ha := mheap_.arenas[ri.l1()][ri.l2()]
    return ha.spans[(p/pageSize)%pagesPerArena]
}

By using the runtime.heapBitsForAddr function, we can obtain a GC bitmap for an object address range. The GC bitmap indicates whether each 8-byte aligned address in the memory of an object is a pointer type, thus determining whether to further mark downstream objects.

For example, consider the following Go code snippet:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


type Object struct {
    A string
    B int64
    C *[]byte
}
// global variables
var a = echo()
var b *int64 = &echo().B
func echo() *Object {
    bytes := make([]byte, 1024)
    return &Object{A: string(bytes), C: &bytes}
}

When the GC scans the variable b, it does not simply scan the memory of the B int64 field; instead, it looks up the base and elem size through the mspan index and then scans, marking the memory of fields A and C, as well as their downstream objects, as alive.

When scanning the variable a, the corresponding GC bit is 1001. How should we interpret this? We can consider that the addresses base+0 and base+24 are pointers, indicating that downstream objects should be scanned further. Here, both A string and C *[]byte contain pointers to downstream objects.

GC Bitmap Example

Based on the above brief analysis, we can see that to find all live objects, the simple principle is to start from the GC Root and scan the GC bits of each object. If an address is marked as 1, we continue scanning downstream. Each downstream address must determine its mspan to obtain the complete object base address, size, and GC bit.

DWARF Type Information

However, merely knowing the reference relationships of objects is almost useless for troubleshooting, as it does not output any effective variable names or type information for developers to locate issues. Therefore, another crucial step is to obtain the variable names and type information of these objects.

Go is a statically typed language, and objects generally do not directly contain their type information. For instance, when we create an object using obj=new(Object), the actual memory only stores the values of fields A/B/C, occupying only 32 bytes in memory. Given this, how can we obtain type information?

Implementation Of goref

Introduction To Delve Tool

Those who have experience in Go development should be familiar with Delve. If you think you haven’t used it, don’t doubt yourself; the debugging functionality you use in the Goland IDE is fundamentally based on Delve. At this point, you may recall the debugging window during your debugging sessions. Indeed, the variable names, values, and types displayed in the debugging window are precisely the type information we need!

1
2
3
4
5


$ ./dlv attach 270
(dlv) ...
(dlv) locals
tccCli = (\"*code.byted.org/gopkg/tccclient.ClientV2\")(0xc000782240)
ticker = (*time.Ticker)(0xc001086be0)

So, how does Delve obtain this variable information? Delve reads the executable file path from the soft link in /proc/<pid>/exe when we attach it to a process. Go generates debugging information during compilation, stored in sections with the .debug_* prefix in the executable file, following the DWARF standard format. The type of information required for reference analysis for global and local variables can be parsed from this DWARF information.

For global variables, Delve iterates through all DWARF Entries, parsing those with the Variable tag. These Entries contain attributes such as Location, Type, and Name.

The Type attribute records its type information, which can be recursively traversed in DWARF format to determine the type of each sub-object further.
The Location attribute is a relatively complex property that records either an executable expression or a simple variable address, serving to determine a variable’s memory address or return a register’s value. During global variable parsing, Delve uses this to obtain the variable’s memory address.

The principle for parsing local variables in goroutines is similar to that of global variables, but it is somewhat more complex. For example, it requires determining the DWARF offset based on the PC, and the location expressions are more complicated, involving register access. We will not elaborate on this here.

Building GC Analysis Metadata

We can also obtain memory access permissions by utilizing Delve’s process attachment and core file analysis capabilities. We mimic the GC marking process for objects, constructing the necessary metadata for the process to be analyzed in the tool’s runtime memory. This includes:

The address space range of each Goroutine stack in the process to be analyzed, including the stackmap that stores the gcmask for each Goroutine stack, used to mark whether it may point to a live heap object.
The address space range of each data/bss segment in the process to be analyzed, including the gcmask for each segment, is also used to mark whether it may point to a live heap object.
The above two steps are necessary to obtain GC Roots.
The final step is to read the mspan index of the process to be analyzed, along with the base, elem size, gcmask, and other information for each mspan, restoring this index in the tool’s memory.

These steps outline the general process, which also involves handling some detail issues, such as dealing with GC finalizer objects and special handling of allocation header features in Go version 1.22, which we will not delve into here.

DWARF Type Scanning

With everything in place, we are ready for the most critical step: object reference relationship analysis.

We call each GC Root variable the findRef function, accessing the object’s memory according to different DWARF types. If it is a pointer that may point to downstream objects, we read the pointer’s value and find this downstream object in the GC metadata. We have obtained the object’s base address, elem size, gcmask, and other information.

If the object is accessed, we record a mark bit to avoid re-accessing the object. By constructing a new variable with the DWARF sub-object type, we recursively call findRef until all known types of objects are confirmed.

However, this reference scanning method is entirely contrary to the GC approach. The main reason is that Go has many unsafe type conversions. For instance, an object created with pointer fields may look like this:

1
2
3
4
5


func echo() *byte {
    bytes := make([]byte, 1024)
    obj := &Object{A: string(bytes), C: &bytes}
    return (*byte)(unsafe.Pointer(obj))
}

From the GC’s perspective, although the type has been converted to *byte using unsafe, it does not affect its gcmask marking. Therefore, when scanning downstream objects, we can still scan the complete Object object and identify the bytes downstream object, marking it as alive.

However, DWARF type scanning cannot achieve this. When it encounters the byte type, it will consider it a non-pointer object and skip further scanning. Thus, the only solution is to prioritize DWARF type scanning, and for objects that cannot be scanned, we will use the GC method to mark them.

To implement this, whenever we access a pointer to an object using DWARF types, we will mark its corresponding gcmask from 1 to 0. After scanning an object, if there are still non-zero marked pointers within the object’s address space, we will record them for final marking tasks. Once all objects have been scanned using DWARF types, we will extract these final marking tasks and perform a secondary scan using the GC method.

Reference Scanning Example

For example, when accessing the aforementioned Object, its gcmask is 1010. After reading field A, the gcmask changes to 1000. If field C is not accessed due to type conversion, it will be included in the final scan during GC marking.

In addition to type conversion, memory out-of-bounds references are also common issues. For instance, in the example code var b *int64 = &echo().B, fields A and C belong to memory that cannot be scanned using DWARF types and will also be counted during the final scan.

Final Scan

The fields that were type-converted or could not be accessed due to exceeding the DWARF-defined address range, as well as variables like unsafe.Pointer whose types cannot be determined, will all be marked during the final scan. Since these objects cannot be assigned specific types, we only need to record their size and count in the known reference chain.

In Go’s native implementation, many commonly used libraries utilize unsafe.Pointer, leading to issues with sub-object identification. These types require special handling.

Output File Format

After scanning all objects, the reference chains, object counts, and object memory spaces are output to a file, aligned with the pprof binary file format and encoded using protobuf.

Root Object Format:
- Stack Variable Format: Package name + Function name + Stack variable name github.com/cloudwego/kitex/client.invokeHandleEndpoint.func1.sendMsg
- Global Variable Format: Package name + Global variable name github.com/cloudwego/kitex/pkg/loadbalance/lbcache.balancerFactories
Sub-object Format:
- Outputs the type name of the sub-object, such as: net.Conn;
- If it is a map key or value field, it is output in the form of $mapkey. (type_name) or $mapval. (type_name);
- If it is an element of an array, it is output in the format [0]. (type_name); for elements greater than or equal to 10, it is output as [10+]. (type_name);

Effect Demonstration

Below is a flame graph of object references sampled from a real business using the tool:

Object Reference Flame Graph

The graph displays the names of each root variable, along with the names and types of the fields they reference. Note: Due to the lack of support for closure type field offsets in DWARF Info before Go 1.23, the closure variable wpool.(*Pool).GoCtx.func1.task cannot currently display downstream objects.

By selecting the inuse_objects tag, you can also view the flame graph of object count distribution:

Object Count Distribution Flame Graph

Long Time Link
If you find my blog helpful, please subscribe to me via RSS
Or follow me on X
If you have a Medium account, follow me there. My articles will be published there as soon as possible.