Featured image of post Golang High-Performance Programming EP3 : Memory Alignment

Golang High-Performance Programming EP3 : Memory Alignment

 

All examples in this article use a MacBook Pro M1, a 64-bit architecture CPU.

This is the third article on high-performance programming in Go, analyzing why memory alignment is needed, the rules of Go memory alignment, and practical examples of memory alignment usage. Finally, it shares two tools to help us identify memory alignment issues during development.

This article was first published in the Medium MPP plan. If you are a Medium user, please follow me on Medium. Thank you very much.

What is Memory Alignment?

To a programmer, memory might just be a huge array. We can write an int16, which occupies two bytes, or an int32, which occupies four bytes. For example:

1
2
3
4
5
type T1 struct {  
    a int8  
    b int64  
    c int16  
}

Those unfamiliar with Go might think the structure is laid out like this, taking up a total of 11 bytes of space.

Figure 1: Memory layout as understood by some people
Memory layout as understood by some people

One after another, very compact and perfect. But in reality, it’s not like this. If we print the addresses of T1 variables, we’ll find they look something like this, occupying a total of 24 bytes of space.

Figure 2: Actual memory layout of T1
Actual memory layout of T1

List 1: T1 size

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
func main() {  
    t := T1{}  
    fmt.Println(fmt.Sprintf("%d %d %d %d", unsafe.Sizeof(t.a), unsafe.Sizeof(t.b), unsafe.Sizeof(t.c), unsafe.Sizeof(t)))  
    fmt.Println(fmt.Sprintf("%p %p %p", &t.a, &t.b, &t.c))  
    fmt.Println(unsafe.Alignof(t))    
}
// output
// 1 8 2 24
// 0x14000114018 0x14000114020 0x14000114028
// 8

The CPU fetches data from memory based on word size. For example, a 64-bit CPU has a word size of 8 bytes, meaning the CPU accesses memory in 8-byte units, referred to as memory access granularity.

This phenomenon can cause several serious problems:

  1. Performance degradation due to an extra CPU instruction.
  2. What was originally an atomic operation for reading a variable is no longer atomic.
  3. Other unexpected situations.

Therefore, compilers generally implement memory alignment, sacrificing memory space to ensure:

  • Platform Compatibility: Not all hardware platforms can access arbitrary data at arbitrary addresses. For example, specific hardware platforms only allow fetching specific types of data at specific addresses, otherwise leading to exceptions.
  • Performance: Accessing unaligned memory causes the CPU to perform two memory accesses and spend extra clock cycles handling alignment and computation. Aligned memory can be accessed in a single operation, improving efficiency—a typical space-for-time tradeoff.

Memory Alignment in Go

The Go spec stipulates Go’s alignment rules.

1
2
3
4
5
6
7
type                                 size in bytes

byte, uint8, int8                     1
uint16, int16                         2
uint32, int32, float32                4
uint64, int64, float64, complex64     8
complex128                           16
  1. For a variable x of any type: unsafe.Alignof(x) is at least 1.
  2. For a variable x of struct type: unsafe.Alignof(x) is the largest of all the values unsafe.Alignof(x.f) for each field f of x, but at least 1.
  3. For a variable x of array type: unsafe.Alignof(x) is the same as the alignment of a variable of the array’s element type.

In most cases, the Go compiler automatically aligns memory for us, and we don’t need to worry about it. However, in one particular case, manual alignment is required.

For 64-bit pointer atomic operations on the x86 platform, alignment is mandatory because 64-bit atomic operations on a 32-bit platform require 8-byte alignment, or the program will panic. For example, consider the following code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
package main

import "sync/atomic"

type T3 struct {
    b int64
    c int32
    d int64
}

func main() {
    a := T3{}
    atomic.AddInt64(&a.d, 1)
}

Running this on the amd64 architecture won’t cause an error, but on the i386 architecture, it will panic.

Figure 3: T3 panic
T3 panic

The reason is that T3 is 4-byte aligned on a 32-bit platform and 8-byte aligned on a 64-bit platform. On a 64-bit platform, its memory layout is:

Figure 4: T3 memory layout on amd64
T3 memory layout on amd64

But on the i386 layout:

Figure 5: T3 memory layout on i386
T3 memory layout on i386

This issue is documented in the atomic package.

  • On 386, the 64-bit functions use instructions unavailable before the Pentium MMX.
    On non-Linux ARM, the 64-bit functions use instructions unavailable before the ARMv6k core.
    On ARM, 386, and 32-bit MIPS, it is the caller’s responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically via the primitive atomic functions (types Int64 and Uint64 are automatically aligned). The first word in an allocated struct, array, or slice; in a global variable; or in a local variable (because the subject of all atomic operations will escape to the heap) can be relied upon to be 64-bit aligned.

To resolve this, we must manually pad T3 to make it “look” 8-byte aligned:

1
2
3
4
5
6
type T3 struct {
    b int64
    c int32
    _ int32
    d int64
}

Similar operations can be seen in the Go source code and open-source libraries, such as:

  1. mgc
  2. groupcache

Fortunately, we have many tools to help identify and optimize these issues.

Practical Engineering

fieldalignment

fieldalignment is an official Go tool that helps us identify potential memory alignment optimizations in code and automatically aligns them. For example, it will automatically convert T1 to be memory-aligned.

1
2
3
4
5
6
7
8
9
➜  go_mem_alignment git:(main) ✗ fieldalignment -fix .          
/Users/hxzhouh/workspace/github/blog-example/go/go_mem_alignment/main.go:8:8: struct of size 24 could be 16

// change
type T1 struct {  
    b int64  
    c int16  
    a int8  
}

It can also be used in golangci-lint. fieldalignment is a sub-function of govet, enabled in .golangci.yaml as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# .golangci.yml
linters:  
  disable-all: true  
  enable:  
    - govet  
  fast: false  
  
linters-settings:  
  govet:  
    # report about shadowed variables  
    check-shadowing: false  
    fast: false  
    # disable:  
    #  - fieldalignment # I'm ok to waste some bytes
    enable:  
      - fieldalignment

However, fieldalignment has a frustrating drawback: it removes all blank lines and comments when rearranging struct members. Therefore, you should git commit once, use this tool, then review its changes via git diff and make necessary post-processing adjustments. Thus, I rarely use this tool in production, preferring structlayout.

structlayout

structlayout displays the layout and size of structs and can output data in svg or json formats. If a struct is complex, this tool can help optimize it.

Visualize and Optimize Go Struct Layout with structlayout

structlayout allows you to display the layout and size of structs, outputting data in SVG or JSON format. If a struct is complex, this tool can be used to optimize it.

Installation

1
2
3
4
go install honnef.co/go/tools/cmd/structlayout@latest 
go install honnef.co/go/tools/cmd/structlayout-pretty@latest
go install honnef.co/go/tools/cmd/structlayout-optimize@latest
go install github.com/ajstarks/svgo/structlayout-svg@latest

Analyze T1 with structlayout

1
structlayout -json ./main.go T1 | structlayout-svg > T1.svg

Figure 6: T1 Structure Layout
T1 Structure Layout

We can clearly see two padding areas: 7 size and 6 size.

Optimized T2

1
2
3
4
5
type T2 struct {  
    a int8
    c int16
    b int64  
}

Figure 7: T2 Structure Layout
T2 Structure Layout

There are still two padding areas, but only 5 sizes.

Summary

In programming, memory alignment is a crucial technique designed to enhance program performance and compatibility. This article uses Go as an example to explain the basic concepts and necessity of memory alignment in detail, demonstrating the actual layout of different structs in memory through code examples.

Memory alignment rules in Go are primarily reflected in the order of struct fields. The compiler ensures performance and platform portability through automatic alignment, but in some cases, developers need to manually adjust struct fields to avoid performance issues and potential errors.

The empty struct is a helpful tool for memory alignment optimization. For specific operations, refer to my other article: Golang High-Performance Programming EP1: Empty Struct.

To help developers detect and optimize memory alignment issues, this article introduces two practical tools:

  1. fieldalignment: An official Go tool that can automatically optimize struct memory alignment.
  2. structlayout: Displays the memory layout of structs, helping developers understand and optimize memory usage more intuitively.

By using these tools effectively, developers can reduce memory waste and improve development efficiency while ensuring program performance and stability.

References

true
Last updated on Jul 07, 2024 22:03 CST
Built with Hugo
Theme Stack designed by Jimmy