Go High-Performance Programming EP11: lock-free practice

Introduction

The previous article discussed two lock-free programming strategies: eliminating shared data and avoiding concurrent access to the same data through careful design. While achieving lock-free programming is ambitious, it could be more practical. Generally, when discussing lock-free techniques, the aim is to avoid locks, and using Compare-And-Swap (CAS) is a good alternative. If CAS is insufficient, we can minimize the granularity of Mutex or replace it with RWMutex.

Advantages of Lock-Free Programming

Reduces thread blocking and waiting time
It avoids thread priority inversion.
Improves concurrency performance.
Eliminates issues like race conditions, deadlocks, and starvation.
Simplifies and clarifies code.

This article will explore and implement several lock-free programming techniques in Go.

Channel

The Go language advocates for the principle of sharing memory through communication, as stated in its official blog:

Do not communicate by sharing memory; instead, share memory by communicating.

Channels enable lock-free programming because:

Channel transfer ownership of data.
The channel uses internal locks for synchronization.

For example:

1
2
3
4
5
6
7
8
9


type Resource string

func Poller(in, out chan *Resource) {
    for r := range in {
        // Process the URL
        // Send the processed Resource to the output channel
        out <- r
    }
}

Data flows between in and out channels without multiple goroutines simultaneously operating on the same data, achieving a lock-free design.

CAS (Compare-And-Swap)

Modern CPUs support atomic CAS operations, allowing atomic data exchange in multithreaded environments. CAS helps avoid data inconsistencies caused by unpredictable execution orders and interruptions.
more about cas [[解密go 使用atomic 包解决并发问题-en]]

Lock-Free Stack Using CAS

A stack can typically be implemented with slice and RWMutex for concurrency safety. Alternatively, CAS can be used to implement a lock-free stack. Below is an example:
Lock-Free Stack Implementation

Benchmarking

A simple benchmark test shows the lock-free stack outperforms the slice + RWMutex implementation by approximately 20%. The performance gains are even more significant in high-concurrency scenarios.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


func BenchmarkConcurrentPushLockFree(b *testing.B) {
    stack := NewStackByLockFree()
    for i := 0; i < b.N; i++ {
       stack.Push(i)
    }
}

func BenchmarkConcurrentPushSlice(b *testing.B) {
    stack := NewStackBySlice()
    for i := 0; i < b.N; i++ {
       stack.Push(i)
    }
}

1
2
3
4


➜  lock-free git:(main) ✗ go test --bench=.

BenchmarkConcurrentPushLockFree-10      22657522                49.66 ns/op
BenchmarkConcurrentPushSlice-10         29135671                59.04 ns/op

Struct Copy: Trading Space for Time

Struct Copy is a technique that avoids data contention by duplicating shared resources. Each goroutine operates on its own copy, eliminating the need for locks.

Example

Struct Copy Implementation

Each goroutine owns a unique instance by wrapping shared resources with a BufferWrapper, avoiding data races, and using sync.Pool further optimizes memory allocation.

Benchmarking

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


func BenchmarkSingleBuffer_print(b *testing.B) {
    for i := 0; i < b.N; i++ {
       run_buff_bench(singleBuff)
    }
}

func BenchmarkBufferWrapper_print(b *testing.B) {
    bufferWrapper := NewBufferWrapper()
    for i := 0; i < b.N; i++ {
       run_buff_bench(bufferWrapper)
    }
}

func run_buff_bench(buff printId) {
    wg := sync.WaitGroup{}
    f := func(id int) {
       defer wg.Done()
       for j := 0; j < 10000; j++ {
          buff.print(id)
       }
    }
    for j := 0; j < 100; j++ {
       wg.Add(1)
       go f(j)
    }
    wg.Wait()
}

1
2


BenchmarkSingleBuffer_print-10                 4         261640427 ns/op
BenchmarkBufferWrapper_print-10               67          18320110 ns/op

A similar technique is used in fastjson’s Parser.

Conclusion

This article explored three practical techniques for lock-free programming in Go, with performance tests validating their effectiveness. While fully lock-free programming is challenging, proper design and technique selection can significantly reduce lock usage and enhance concurrency performance.