BigCache: High-Performance Memory Caching in Go

BigCache[1] is a high-performance in-memory caching library designed for applications that demand high concurrency and low latency. This article explores BigCache’s performance optimization techniques, including its sharding mechanism, efficient hashing algorithms, usage of read-write locks, and more, with detailed explanations based on the source code.

Shard & Locking

From a user’s perspective, a cache functions like a large hashtable to store key-value pairs (k/v). Could this be achieved with map[string][]byte and sync.RWMutex?

Yes, but only for low-performance requirements. Although sync.RWMutex optimizes reads and writes, it serializes concurrent writes, even for different keys. This creates bottlenecks under high write concurrency, blocking multiple goroutines and allowing only one writer at a time.

BigCache addresses this by adopting a sharding mechanism inspired by Java’s ConcurrentMap. It divides a large hashtable into multiple smaller shards, each protected by its lock. This strategy, commonly used in high-concurrency scenarios like MongoDB’s sharding, effectively reduces contention. Golang also has a third-party implementation: concurrent-map[2].

Source Code Example: Sharding in BigCache

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


// https://github.com/allegro/bigcache/blob/a2f05d7cbfdc7000a8b7e1c40f27d3f239c204df/bigcache.go#L58
type BigCache struct {
    shards       []*cacheShard
    shardMask    uint64
    config       Config
}
func newBigCache(ctx context.Context, config Config, clock clock) (*BigCache, error) {
    ......
    cache := &BigCache{
        shards: make([]*cacheShard, config.Shards),
        lifeWindow: lifeWindowSeconds,
        clock: clock,
        hash: config.Hasher,
        config: config,
        shardMask: uint64(config.Shards - 1),
        close: make(chan struct{}),
    }
    ......
    for i := 0; i < config.Shards; i++ {
        cache.shards[i] = initNewShard(config, onRemove, clock)
    }
    ......
    return cache, nil
}

Each cache object calculates its hash based on its key: hash(key) % N. Ideally, concurrent requests are evenly distributed across shards, minimizing lock contention and reducing latency

Optimal Number of Shards (`N`)

Is a larger N always better? No. Excessive shards incur unnecessary overhead. It’s crucial to balance performance and cost by choosing a reasonable N, typically powers of 2 (e.g., 16, 32, 64). This allows modulo operations to use fast bitwise calculations:

1
2
3
4


// https://github.com/allegro/bigcache/blob/a2f05d7cbfdc7000a8b7e1c40f27d3f239c204df/bigcache.go#L253
func (c *BigCache) getShard(hashedKey uint64) (shard *cacheShard) {
    return c.shards[hashedKey & c.shardMask]
}

Since for a power N of 2, for any x, the following equation holds.
x mod N = (x & (N - 1)).
So, you only need to use a bitwise AND (&) once to find its remainder.

Choosing the Right Hashing Algorithm

Computer scientists have invented many Hash algorithms, and gopher has implemented many. An ideal hashing algorithm should meet these criteria:

High randomness in hash values.
Fast computation.
Minimal memory allocations to reduce garbage collection (GC) pressure.

BigCache’s default implementation uses the fnv64a algorithm, leveraging bitwise operations on the stack to avoid heap allocations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


type fnv64a struct{}

const (
    offset64 = 14695981039346656037
    prime64 = 1099511628211
)
func (f fnv64a) Sum64(key string) uint64 {
var hash uint64 = offset64
for i := 0; i < len(key); i++ {
        hash ^= uint64(key[i])
        hash *= prime64
    }
return hash
}

Memory Optimization

In Go, the garbage collector checks every element in the map during the mark and scan phases. If the cache contains millions of cached objects, the garbage collector’s pointless checking of these objects leads to unnecessary time overhead.
That’s because if the elements in the hashtable contain pointers, the GC scans every element that contains pointers at mark; if they don’t, they are skipped at the mark stage.
We can test this with a simple example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


var pointMap map[int]*int  
var noPointMap map[int]int  
  
func BenchmarkPointMap(b *testing.B) {  
    pointMap = make(map[int]*int)  
    for i := 0; i < 10e6; i++ {  
       pointMap[i] = &i  
    }  
    b.ResetTimer()  
    for i := 0; i < b.N; i++ {  
       delete(pointMap, i)  
       pointMap[i] = &i  
    }  
}  
  
func BenchmarkNoPointMap(b *testing.B) {  
    noPointMap = make(map[int]int)  
    for i := 0; i < 10e6; i++ {  
       noPointMap[i] = i  
    }  
    b.ResetTimer()  
    for i := 0; i < b.N; i++ {  
       delete(noPointMap, i)  
       noPointMap[i] = i  
    }  
}

Result

1
2
3
4
5
6
7


➜  gc git:(main) ✗ GOMAXPROCS=1  go test --bench=. 
goos: darwin
goarch: arm64
pkg: blog-example/go/gc
cpu: Apple M1 Pro
BenchmarkPointMap        5273188               209.4 ns/op
BenchmarkNoPointMap      7037848               178.5 ns/op

Then, run two tests separately to analyze the GC

1
2


go test -bench=BenchmarkPointMap -trace trace_point.out  
go test -bench=BenchmarkNoPointMap -trace trace_no_point.out 

The Wall Duration of NoPointMap is only 2% of PointMap. Although the concurrency of PointMap is small and there is no competition for a single goroutine, the garbage collection takes hundreds of milliseconds to mark and traverse in the mark/scan phase due to the large number of elements.
Pasted image 20241205155902
Pasted image 20241205155823

How does bigcache solve this problem? Disable your users from storing data with pointers on bigcache.
Just kidding, if you did, your users would abandon you. There are several ways to solve this problem.

Refer to offheap[4] to use customized Malloc() and Free() to manage memory manually while bypassing the runtime garbage collection, but this can be more likely to lead to memory leaks.
The second way is to use freecache[5]. freecache implements map with zero GC overhead by reducing the number of pointers. it stores the keys and values in a ringbuffer and uses the index to look up the object.
The way bigcache implements this is to use the hash value as the key of map[int]int. Serialize the cached object into a pre-allocated large byte array, and then use its offset in the array as the value of map[int]int.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66


//https://github.com/allegro/bigcache/blob/a2f05d7cbfdc7000a8b7e1c40f27d3f239c204df/shard.go#L18
type cacheShard struct {
    hashmap     map[uint64]uint32
    entries     queue.BytesQueue
    lock        sync.RWMutex
    entryBuffer []byte
    onRemove    onRemoveCallback
    isVerbose    bool
    statsEnabled bool
    logger       Logger
    clock        clock
    lifeWindow   uint64
    hashmapStats map[uint64]uint32
    stats        Stats
}

func (s *cacheShard) set(key string, hashedKey uint64, entry []byte) error {  
    currentTimestamp := uint64(s.clock.Epoch()) 
  
    s.lock.Lock() 
  
    if previousIndex := s.hashmap[hashedKey]; previousIndex != 0 { // Check for old entries with the same hash key
       if previousEntry, err := s.entries.Get(int(previousIndex)); err == nil {  
          resetHashFromEntry(previousEntry) // reset hash of old entries
          delete(s.hashmap, hashedKey) // Remove old entries from the hash table 
       }  
    }  
  
    if !s.cleanEnabled { // If cleanup is not enabled  
       if oldestEntry, err := s.entries.Peek(); err == nil {  
          s.onEvict(oldestEntry, currentTimestamp, s.removeOldestEntry) // Process the oldest entries
       }  
    }  
  
    w := wrapEntry(currentTimestamp, hashedKey, key, entry, &s.entryBuffer)  
  
    for {  
       if index, err := s.entries.Push(w); err == nil { // Try to push new entries into the queue
          s.hashmap[hashedKey] = uint64(index)// Update the hash table 
          s.lock.Unlock() 
          return nil 
       }  
       if s.removeOldestEntry(NoSpace) != nil { // If there is not enough space, delete the oldest entry  
          s.lock.Unlock()  
          return errors.New("entry is bigger than max shard size") 
       }  
    }  
}

func (s *cacheShard) getValidWrapEntry(key string, hashedKey uint64) ([]byte, error) {  
    wrappedEntry, err := s.getWrappedEntry(hashedKey)  
    if err != nil {  
       return nil, err  
    }  
  
    if !compareKeyFromEntry(wrappedEntry, key) {  
       s.collision()  
       if s.isVerbose {  
          s.logger.Printf("Collision detected. Both %q and %q have the same hash %x", key, readKeyFromEntry(wrappedEntry), hashedKey)  
       }  
  
       return nil, ErrEntryNotFound  
    }  
    s.hitWithoutLock(hashedKey)  
    return wrappedEntry, nil  
}

queue.BytesQueue is an array of bytes, on demand. When a []byte is added, it copies the data to the end.
When bigcache deletes a cached element, it just removes the index it was added from from map[uint64]uint32 and sets its data in the queue.BytesQueue queue to 0. The deletion operation creates a lot of " Empty holesand thesewormholeswill not be organized or removed. Because it is implemented using a byte array underneath, the removal of the "holes" is a time-consuming operation that causes locks to be held for too long.bigcache` can only wait for the cleanup mechanism to remove these “holes”.

Some other details:

Cache objects in bigcache do not have the ability to refresh their expiration time. all caches will eventually expire.
The life cycle of all caches is configured by config.LifeWindow and cannot be set for individual keys.