深入探讨 BigCache 的性能优化手段

bigcache是一个高性能的内存缓存库，专为需要高并发访问和低延迟响应的应用场景设计。本文将深入探讨 BigCache 的性能优化手段，包括分片机制、高效哈希算法、读写锁的使用等，并引用相关源码进行详细说明。

分段加锁

从使用者的角度来看cache就像一个大的hashtable,可以存储k/v 格式的数据。那么，是不是可以使用一个map[string][]byte + sync.RWMutex 实现满足需求的cache呢？
如果性能要求不高，的确可以这么做。
sync.RWMutex虽然对读写进行了优化，但是对于并发的读，最终还是把写变成了串行，一旦写的并发量大的时候，即使写不同的key, 对应的goroutine也会block，只允许一个写执行，这是一个瓶颈，并且不可控。
bigcache 参考了 java ConcurrentMap 的实现方式，将一个大hashtable 分成多个小的 shard，每个分片一把锁，很多大并发场景下为了减小并发的压力都会采用这种方法，比如MongoDB的sharding等。Golang也有一个第三方的 ConcurrentMap 实现 concurrent-map。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


// https://github.com/allegro/bigcache/blob/a2f05d7cbfdc7000a8b7e1c40f27d3f239c204df/bigcache.go#L58
type BigCache struct {
    shards       []*cacheShard
    shardMask    uint64
    config       Config
}
func newBigCache(ctx context.Context, config Config, clock clock) (*BigCache, error) {
    ......
    cache := &BigCache{
        shards: make([]*cacheShard, config.Shards),
        lifeWindow: lifeWindowSeconds,
        clock: clock,
        hash: config.Hasher,
        config: config,
        shardMask: uint64(config.Shards - 1),
        close: make(chan struct{}),
    }
    ......
    for i := 0; i < config.Shards; i++ {
        cache.shards[i] = initNewShard(config, onRemove, clock)
    }
    ......
    return cache, nil
}

对于每一个缓存对象，根据它的key计算它的哈希值: hash(key) % N。理想情况下N个 goroutine 每次请求正好平均落在各自的shard上，这样就不会有锁竞争了。即使有多个goroutine同时请求，如果hash比较平均的话，单个shard的压力也会比较小。可以降低延迟，因为等待获取锁的时间变小了。

N的选择

既然分片可以很好的降低锁的竞争，那么N是不是越大越好呢？当然不是，如果N非常大，比如每个缓存对象一个锁，那么会带来很多额外的不必要的开销。可以选择一个不太大的值，在性能和花销上寻找一个平衡。

另外, N是 2的幂，比如16、32、64。这样设计的好处就是计算余数可以使用位运算快速计算。

1
2
3
4


// https://github.com/allegro/bigcache/blob/a2f05d7cbfdc7000a8b7e1c40f27d3f239c204df/bigcache.go#L253
func (c *BigCache) getShard(hashedKey uint64) (shard *cacheShard) {
    return c.shards[hashedKey&c.shardMask]
}

因为对于 2 的幂N，对于任意的x, 下面的公式成立:
x mod N = (x & (N − 1))
所以只需要使用一次按位 AND (&) 就可以求得它的余数。

选择合适的 Hash 算法

计算机科学家已经发明了很多的Hash算法，gopher也实现了很多Hash算法。一个优秀的Hash算法有以下特点

哈希值应该比较随机 (质量)
哈希速度比较快 (速度)
尽量不产生额外的内存分配, 避免对垃圾回收产生压力 (耗费资源少)

bigcache 提供了一个默认的 Hash算法的实现，采用 fnv64a 算法。这个算法的好处是采用位运算的方式在栈上进行运算，避免在堆上分配。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


type fnv64a struct{}

const (
    offset64 = 14695981039346656037
    prime64 = 1099511628211
)
func (f fnv64a) Sum64(key string) uint64 {
var hash uint64 = offset64
for i := 0; i < len(key); i++ {
        hash ^= uint64(key[i])
        hash *= prime64
    }
return hash
}

内存优化

对于 Go 语言中的 map, 垃圾回收器在 mark和scan阶段检查 map 中的每一个元素, 如果缓存中包含数百万的缓存对象，垃圾回收器对这些对象无意义的检查导致不必要的时间开销。

作者做了测试。他们测试了简单的 HTTP/JSON 序列化 (不会访问 cache)。在 cache 为空的时候 1 万的 QPS 的耗时大约 10 毫秒。当 cache 填满的时候， P99 的请求都会超过 1 秒。监控显示堆中包含 4 千万的对象， GC 过程中的 mark 和 scan 也需要 4 秒。
那是因为如果hashtable中的元素包含指针，那么GC在mark会扫描每个包含指针的元素，如果不包含指针，在mark阶段就会跳过这些元素。
我们可以用一个简单的例子测试一下。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


var pointMap map[int]*int  
var noPointMap map[int]int  
  
func BenchmarkPointMap(b *testing.B) {  
    pointMap = make(map[int]*int)  
    for i := 0; i < 10e6; i++ {  
       pointMap[i] = &i  
    }  
    b.ResetTimer()  
    for i := 0; i < b.N; i++ {  
       delete(pointMap, i)  
       pointMap[i] = &i  
    }  
}  
  
func BenchmarkNoPointMap(b *testing.B) {  
    noPointMap = make(map[int]int)  
    for i := 0; i < 10e6; i++ {  
       noPointMap[i] = i  
    }  
    b.ResetTimer()  
    for i := 0; i < b.N; i++ {  
       delete(noPointMap, i)  
       noPointMap[i] = i  
    }  
}

测试结果如下

1
2
3
4
5
6
7


➜  gc git:(main) ✗ GOMAXPROCS=1  go test --bench=. 
goos: darwin
goarch: arm64
pkg: blog-example/go/gc
cpu: Apple M1 Pro
BenchmarkPointMap        5273188               209.4 ns/op
BenchmarkNoPointMap      7037848               178.5 ns/op

然后分别运行两个测试，分析 gc

1
2


go test -bench=BenchmarkPointMap -trace trace_point.out  
go test -bench=BenchmarkNoPointMap -trace trace_no_point.out 

NoPointMap 的 Wall Duration 只有PointMap 的2% 。虽然PointMap 的并发量很小，并且单个的 goroutine也没有竞争，但是由于元素的数量很多，垃圾回收在mark/scan阶段需要花费上百毫秒进行标记和遍历。

Pasted image 20241205155902
Pasted image 20241205155823
bigcache 是如何解决这个问题的？禁止你的用户在 bigcache 上存储带有指针的数据。
开个玩笑，如果真的这么做，你的用户会抛弃你。有几种办法可以解决这个问题。

参考 offheap，使用定制化的Malloc() 和 Free() 来手动管理内存，而绕过runtime 垃圾回收，但是这样会比较容易导致内存泄露。
第二种方式是使用 freecache。freecache 通过减少指针的数量以零 GC 开销实现 map。它将键和值保存在ringbuffer中，并使用索引查找对象。

bigcache 实现的方式是使用哈希值作为map[int]int的 key。把缓存对象序列化后放到一个预先分配的大的字节数组中，然后将它在数组中的 offset 作为map[int]int的 value。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66


//https://github.com/allegro/bigcache/blob/a2f05d7cbfdc7000a8b7e1c40f27d3f239c204df/shard.go#L18
type cacheShard struct {
    hashmap     map[uint64]uint32
    entries     queue.BytesQueue
    lock        sync.RWMutex
    entryBuffer []byte
    onRemove    onRemoveCallback
    isVerbose    bool
    statsEnabled bool
    logger       Logger
    clock        clock
    lifeWindow   uint64
    hashmapStats map[uint64]uint32
    stats        Stats
}

func (s *cacheShard) set(key string, hashedKey uint64, entry []byte) error {  
    currentTimestamp := uint64(s.clock.Epoch()) 
  
    s.lock.Lock() 
  
    if previousIndex := s.hashmap[hashedKey]; previousIndex != 0 { // 检查是否存在相同哈希键的旧条目  
       if previousEntry, err := s.entries.Get(int(previousIndex)); err == nil {  
          resetHashFromEntry(previousEntry) // 重置旧条目的哈希  
          delete(s.hashmap, hashedKey) // 从哈希表中删除旧条目  
       }  
    }  
  
    if !s.cleanEnabled { // 如果清理功能未启用  
       if oldestEntry, err := s.entries.Peek(); err == nil {  
          s.onEvict(oldestEntry, currentTimestamp, s.removeOldestEntry) // 处理最旧的条目  
       }  
    }  
  
    w := wrapEntry(currentTimestamp, hashedKey, key, entry, &s.entryBuffer)  
  
    for {  
       if index, err := s.entries.Push(w); err == nil { // 尝试将新条目推入队列  
          s.hashmap[hashedKey] = uint64(index) // 更新哈希表  
          s.lock.Unlock() 
          return nil 
       }  
       if s.removeOldestEntry(NoSpace) != nil { // 如果空间不足，删除最旧的条目  
          s.lock.Unlock()  
          return errors.New("entry is bigger than max shard size") // 返回错误  
       }  
    }  
}

func (s *cacheShard) getValidWrapEntry(key string, hashedKey uint64) ([]byte, error) {  
    wrappedEntry, err := s.getWrappedEntry(hashedKey)  
    if err != nil {  
       return nil, err  
    }  
  
    if !compareKeyFromEntry(wrappedEntry, key) {  
       s.collision()  
       if s.isVerbose {  
          s.logger.Printf("Collision detected. Both %q and %q have the same hash %x", key, readKeyFromEntry(wrappedEntry), hashedKey)  
       }  
  
       return nil, ErrEntryNotFound  
    }  
    s.hitWithoutLock(hashedKey)  
    return wrappedEntry, nil  
}

queue.BytesQueue 是一个字节数组，按需分配。当加入一个[]byte时，它会把数据 copy 到尾部。

bigcache 在删除缓存元素的时候，只是把它从的索引从map[uint64]uint32中删除了，并把它在queue.BytesQueue队列中的数据置为0。删除操作会在 queue.BytesQueue中造成很多的 “空洞”，而且这些 “虫洞” 不会被整理，也不会被移除。因为它的底层是使用一个字节数组实现的，“空洞” 的移除是一个耗时的操作，会导致锁的持有时间过长。bigcache 只能等待清理机制把这些 “空洞” 删除掉。
其他一些细节：

bigcache 中的缓存对象没有刷新过期时间的功能，所有的缓存最终都会过期。
所有缓存的生命周期都是由config.LifeWindow 配置，不能针对单独的key设置。