Go High-Performance Programming EP11: lock-free practice

 

Introduction

The previous article discussed two lock-free programming strategies: eliminating shared data and avoiding concurrent access to the same data through careful design. While achieving lock-free programming is ambitious, it could be more practical. Generally, when discussing lock-free techniques, the aim is to avoid locks, and using Compare-And-Swap (CAS) is a good alternative. If CAS is insufficient, we can minimize the granularity of Mutex or replace it with RWMutex.

Advantages of Lock-Free Programming

  • Reduces thread blocking and waiting time
  • It avoids thread priority inversion.
  • Improves concurrency performance.
  • Eliminates issues like race conditions, deadlocks, and starvation.
  • Simplifies and clarifies code.

This article will explore and implement several lock-free programming techniques in Go.

Channel

The Go language advocates for the principle of sharing memory through communication, as stated in its official blog:

Do not communicate by sharing memory; instead, share memory by communicating.

Channels enable lock-free programming because:

  1. Channel transfer ownership of data.
  2. The channel uses internal locks for synchronization.

For example:

1
2
3
4
5
6
7
8
9
type Resource string

func Poller(in, out chan *Resource) {
    for r := range in {
        // Process the URL
        // Send the processed Resource to the output channel
        out <- r
    }
}

Data flows between in and out channels without multiple goroutines simultaneously operating on the same data, achieving a lock-free design.


CAS (Compare-And-Swap)

Modern CPUs support atomic CAS operations, allowing atomic data exchange in multithreaded environments. CAS helps avoid data inconsistencies caused by unpredictable execution orders and interruptions.
more about cas [[解密go 使用atomic 包 解决并发问题-en]]

Lock-Free Stack Using CAS

A stack can typically be implemented with slice and RWMutex for concurrency safety. Alternatively, CAS can be used to implement a lock-free stack. Below is an example:
Lock-Free Stack Implementation

Benchmarking

A simple benchmark test shows the lock-free stack outperforms the slice + RWMutex implementation by approximately 20%. The performance gains are even more significant in high-concurrency scenarios.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
func BenchmarkConcurrentPushLockFree(b *testing.B) {
    stack := NewStackByLockFree()
    for i := 0; i < b.N; i++ {
       stack.Push(i)
    }
}

func BenchmarkConcurrentPushSlice(b *testing.B) {
    stack := NewStackBySlice()
    for i := 0; i < b.N; i++ {
       stack.Push(i)
    }
}
1
2
3
4
➜  lock-free git:(main) ✗ go test --bench=.

BenchmarkConcurrentPushLockFree-10      22657522                49.66 ns/op
BenchmarkConcurrentPushSlice-10         29135671                59.04 ns/op

Struct Copy: Trading Space for Time

Struct Copy is a technique that avoids data contention by duplicating shared resources. Each goroutine operates on its own copy, eliminating the need for locks.

Example

Struct Copy Implementation

Each goroutine owns a unique instance by wrapping shared resources with a BufferWrapper, avoiding data races, and using sync.Pool further optimizes memory allocation.

Benchmarking

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
func BenchmarkSingleBuffer_print(b *testing.B) {
    for i := 0; i < b.N; i++ {
       run_buff_bench(singleBuff)
    }
}

func BenchmarkBufferWrapper_print(b *testing.B) {
    bufferWrapper := NewBufferWrapper()
    for i := 0; i < b.N; i++ {
       run_buff_bench(bufferWrapper)
    }
}

func run_buff_bench(buff printId) {
    wg := sync.WaitGroup{}
    f := func(id int) {
       defer wg.Done()
       for j := 0; j < 10000; j++ {
          buff.print(id)
       }
    }
    for j := 0; j < 100; j++ {
       wg.Add(1)
       go f(j)
    }
    wg.Wait()
}
1
2
BenchmarkSingleBuffer_print-10                 4         261640427 ns/op
BenchmarkBufferWrapper_print-10               67          18320110 ns/op

A similar technique is used in fastjson’s Parser.

Conclusion

This article explored three practical techniques for lock-free programming in Go, with performance tests validating their effectiveness. While fully lock-free programming is challenging, proper design and technique selection can significantly reduce lock usage and enhance concurrency performance.

Built with Hugo
Theme Stack designed by Jimmy