go 高性能编程EP3: 内存对齐

本文写作所有的例子以 macbookpro M1 为例，该CPU为64位架构

本文是Go语言高性能编程第三篇，分析了为什么需要内存对齐，Go语言内存对齐的规则，以及实际例子中内存对齐的使用，最后分享了两个工具，帮助我们在开发过程中发现内存对齐问题。

This article was first published in the Medium MPP plan. If you are a Medium user, please follow me on Medium. Thank you very much.

什么是内存对齐？

在程序员眼里，内存可能就是一个巨大的数组，我们可以在内存中写一个int16 ,占用两个字节。也可以写一个int32，占用四个字节。比如

1
2
3
4
5


type T1 struct {  
    a int8  
    b int64  
    c int16  
}

这个 struce 不熟悉Go语言的人可能认为是下面这种布局。总共占用11字节空间。
Figure 1: Memory layout as understood by some people

一个挨着一个，很紧凑，很完美。
但是实际上并不是这样的。如果我们打印 T1 的变量地址，会发现，他们大概长这样。总共占用 24字节空间。

Figure 2: T1 的实际内存布局

List 1：T1 size

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


func main() {  
    t := T1{}  
    fmt.Println(fmt.Sprintf("%d %d %d %d", unsafe.Sizeof(t.a), unsafe.Sizeof(t.b), unsafe.Sizeof(t.c), unsafe.Sizeof(t)))  
    fmt.Println(fmt.Sprintf("%p %p %p", &t.a, &t.b, &t.c))  
    fmt.Println(unsafe.Alignof(t))    
}
// output
// 1 8 2 24
// 0x14000114018 0x14000114020 0x14000114028
// 8

因为CPU从内存里面拿数据，是根据word size 来拿的，比如 64 位的 CPU ，word size 为 8字节，那么 CPU 访问内存的单位也是 8 字节，我们将处理器访问内存的大小称为内存访问粒度。
这种现象，会造成几个严重的问题

性能降低，因为多了一次CPU指令
原本读一个变量是原子操作的，现在变得不原子
一些其他意想不到的情况。
所以一般编译器都会实现内存对齐，用牺牲内存空间的方式，保证了：

平台（移植性）
不是所有的硬件平台都能够访问任意地址上的任意数据。例如：特定的硬件平台只允许在特定地址获取特定类型的数据，否则会导致异常情况。
性能
若访问未对齐的内存，将会导致 CPU 进行两次内存访问，并且要花费额外的时钟周期来处理对齐及运算。而本身就对齐的内存仅需要一次访问就可以完成读取动作，这显然高效很多，是标准的空间换时间做法。

GO语言内存对齐

go spec 中约定了 go 对齐的规则。

1
2
3
4
5
6
7


type                                 size in bytes

byte, uint8, int8                     1
uint16, int16                         2
uint32, int32, float32                4
uint64, int64, float64, complex64     8
complex128                           16

For a variable x of any type: unsafe.Alignof(x) is at least 1.

For a variable x of struct type: unsafe.Alignof(x) is the largest of all the values unsafe.Alignof(x.f) for each field f of x, but at least 1.

For a variable x of array type: unsafe.Alignof(x) is the same as the alignment of a variable of the array’s element type.

绝大部分情况下，go编译器会帮我们自动内存对齐，我们不需要关心内存是否对齐，但是在有一种情况下，需要手动对齐。

在 x86 平台上原子操作 64bit 指针。之所以要强制对齐，是因为在 32bit 平台下进行 64bit 原子操作要求必须 8 字节对齐，否则程序会 panic。
比如下面这段代码：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


package main

import "sync/atomic"

type T3 struct {
	b int64
	c int32
	d int64
}

func main() {
	a := T3{}
	atomic.AddInt64(&a.d, 1)
}

在 amd64 架构下运行不会报错，但是在i386 架构下面就会panic。
Figure 3: T3 panic

原因就是 T3 在 32bit 平台上是 4 字节对齐，而在 64bit 平台上是 8 字节对齐。在 64bit 平台上其内存布局为：
Figure 4: T3在 amd64 的内存布局

Pasted image 20240707204743
但是在I386 的布局为：
Figure 5: T3在 i386的内存布局

这个问题在 atomic 的文档中有写。

On 386, the 64-bit functions use instructions unavailable before the Pentium MMX.
On non-Linux ARM, the 64-bit functions use instructions unavailable before the ARMv6k core.
On ARM, 386, and 32-bit MIPS, it is the caller’s responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically via the primitive atomic functions (types Int64 and Uint64 are automatically aligned). The first word in an allocated struct, array, or slice; in a global variable; or in a local variable (because the subject of all atomic operations will escape to the heap) can be relied upon to be 64-bit aligned.

为了解决这种情况，我们必须手动 padding T3，让其 “看起来” 像是 8 字节对齐的：

1
2
3
4
5
6


type T3 struct {
	b int64
	c int32
	_ int32
	d int64
}

在go源码和开源库中也能看到很多类似的操作。
比如

所幸的是，我们其实有很多工具来帮助我们识别与优化这些问题。

工程实践

fieldalignment

fieldalignment 是golang 官方的工具，它会帮我们发现代码中可能的内存对齐优化以及自动帮我们对齐。比如T1 它会自动转成内存对齐的。

1
2
3
4
5
6
7
8
9


➜  go_mem_alignment git:(main) ✗ fieldalignment -fix .          
/Users/hxzhouh/workspace/github/blog-example/go/go_mem_alignment/main.go:8:8: struct of size 24 could be 16

// change
type T1 struct {  
    b int64  
    c int16  
    a int8  
}

也可以在 golangci-link 中使用它，fieldalignment 是隶属于 govet 的一个子功能，在 .golangci.yaml 中可以这样启用它：
list :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


# .golangci.yml
linters:  
  disable-all: true  
  enable:  
    - govet  
  fast: false  
  
linters-settings:  
  govet:  
    # report about shadowed variables  
    check-shadowing: false  
    fast: false  
    # disable:  
    #  - fieldalignment # I'm ok to waste some bytes    enable:  
      - fieldalignment

但是，fieldalignment 有一个比较恼火的地方：它会在重新排布结构体成员的时候，将所有空行、注释通通删去。所以有时候，你应该 git commit 一次，然后用一下这个工具，然后通过 git diff 来 review 它所做的变更，然后进行若干后处理。所以我再生产环境很少使用这个工具，一般使用structlayout

structlayout

structlayout 可以显示struct的布局以及大小，可以输出svg或者json格式的数据。如果一个struct 比较复杂，可以用这个工具来优化。
安装方式

1
2
3
4


go install honnef.co/go/tools/cmd/structlayout@latest 
go install honnef.co/go/tools/cmd/structlayout-pretty@latest
go install honnef.co/go/tools/cmd/structlayout-optimize@latest
go install github.com/ajstarks/svgo/structlayout-svg@latest

用structlayout 分析一下 T1

1

structlayout -json ./main.go T1 | structlayout-svg  >T1.svg

Figure 6: T1 Structure Layout

我们可以很清楚的看到有两个padding。 7 size 和 6size
优化后的T2：

1
2
3
4
5


type T2 struct {  
    a int8
    c int16
    b int64  
}

Figure 7: T2 Structure Layout

只有也有两个地方有padding，但是只有5个size。

总结

在程序设计中，内存对齐是一项关键技术，旨在提高程序性能和兼容性。本文以Go语言为例，详细讲解了内存对齐的基本概念和必要性，并通过代码示例展示了不同结构体在内存中的实际布局。

Go语言中的内存对齐规则主要体现在结构体字段的排列顺序上。编译器通过自动对齐来保证性能和平台移植性，但在某些情况下需要开发者手动调整结构体字段以避免性能问题和潜在的错误。

empty struct 是内存对齐优化的一个好帮手，具体操作可以参考我的另外一篇文章：Golang High-Performance Programming EP1: Empty Struct

为帮助开发者检测和优化内存对齐问题，本文介绍了两个实用工具：

fieldalignment：Go官方工具，能自动优化结构体的内存对齐。
structlayout：显示结构体的内存布局，帮助开发者更直观地理解和优化内存使用。

通过合理使用这些工具，开发者可以在保证程序性能和稳定性的同时，减少内存浪费，提升开发效率。

什么是内存对齐？

GO语言内存对齐

工程实践

fieldalignment

structlayout

总结

参考资料