Go High-Performance Programming EP10: Two Useful Golang Lock-Free Programming Tips

Inlock-free programming, the absence of locks is more superficial. Programmers aim to organize data and processing to eliminate data races. Traditional concurrent programming often uses critical sections, exclusive locks, or read-write locks to protect data from incorrect reads or writes during processing. In contrast, lock-free programming resolves these issues by eliminating shared data or concurrent operations on the same data.

Two of the most common techniques in lock-free programming include:

Structure Copy
Bumper Loop

Other methods are derivatives of the same principles.

This article was first published in the Medium MPP plan. If you are a Medium user, please follow me on Medium. Thank you very much.

The “other methods” mentioned here refer to algorithm designs that do not rely on locker or CAS. This article discusses designing data structures and algorithms that avoid locker and CAS.

Some lock-free solutions, such as RingBuffer libraries, forcefully eliminate race conditions on shared data. However, they depend highly on the specific data structure and CPU design. These are constrained approaches with limited abstraction and generality. For instance, a well-designed RingBuffer might need to be more efficient on single-core CPUs or single vCPUs. Therefore, this article focuses on more general lock-free techniques that work across different programming languages and targets, regardless of their support for threads or coroutines.

Structure-Copy

Structure-Copy is a classic lock-free programming technique. The key idea is to manage mutable data through a structure, often called an Entry, and then process data within this structure. Since the data processing only operates on its associated structure, the data remains non-shared, eliminating the potential for data races.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


type Worker struct {
  closed int32 // avoid duplicated release of resources in Close()
  entries []*Entry
}

type Entry struct {
  Count int
  Data  []any
}

func (w *Worker) New(data ...any) *Entry {
  e := &Entry{Data: data}
  w.entries = append(w.entries, e)
  return e
}

func (e *Entry) Run() {
  // process e.Data
}

In the above example, data []any is provided to the Worker, who creates an Entry based on it. The Entry.Run method then processes the initialized data, ensuring that the data package (Count, Data) is exclusive and independent and avoiding shared data issues.

Sometimes, breaking up the data package is tricky. In such cases, you can use a derived technique to decompose it, either by redesigning the data flow into steps (e.g., step1, step2, etc.) or by splitting the data into smaller sub-projects, which are then combined later (Map-Reduce?).

Regardless of the decomposition strategy, the principle remains the same: the data operated on by each processing step must be an independent copy, thus eliminating the need for shared locking and unlocking.

Challenges

Handling too many small memory chunks (like Entry structures) can introduce overhead. This can lead to increased memory consumption or excessive allocation and deallocation costs, especially when dealing with large lists or arrays. Languages with garbage collection may reduce this pressure, and tools like sync.Pool can help alleviate frequent small memory allocations.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


//https://github.com/golang/go/blob/f2d118fd5f7e872804a5825ce29797f81a28b0fa/src/log/slog/logger.go#L111
type Logger struct {
    handler Handler // for structured logging
}

type Handler interface {
    Handle(context.Context, Record) error
}

func (l *Logger) Debug(msg string, args ...any) {
    l.log(context.Background(), LevelDebug, msg, args...)
}

func (l *Logger) log(ctx context.Context, level Level, msg string, args ...any) {
    ...
    r := NewRecord(time.Now(), level, msg, pc)
    ...
    _ = l.Handler().Handle(ctx, r)
}

In slog, each log entry is wrapped into a NewRecord, avoiding data races. The Entry pattern is widely used and is not necessarily limited to lock-free programming, but it provides the benefit of lock-free operation wherever applied.

Bumper-Loop

Bumper-Loop is a design pattern.

A classic example of the Bumper Loop is Redis. While Redis uses a multithreaded network model, it employs a single-threaded model for transaction processing.

The core concept of the Bumper Loop is to serialize parallel events and then distribute them to specific handlers. Since only one handler runs simultaneously, shared data remains exclusive, eliminating race conditions. The Bumper-Loop becomes an efficient consumption model if handlers complete their tasks quickly and without blocking.

1
2
3
4
5
6
7
8


func (w *Worker) runLoop() {
  for {
    select {
    case ev := <-fileWatcherCh:
        onFileChanged(ev)
    }
  }
}

The Bumper-Loop is ideal for low-frequency event distribution, locker remove and improved performance. It is often used in TCP/IP servers to handle incoming data. A new connection request is dealt with by creating an independent connectionProcessor using the Entry pattern, which then accepts input data through a Bumper Loop, dispatching it to specific handlers based on the recognized input patterns.

Challenges

The downside of a Bumper-Loop is that any block in the loop can break the entire cycle, leading to delays in handling subsequent events. High-frequency events can cause bottlenecks or even data loss. Despite these challenges, the Bumper-Loop is still widely used in TCP/IP server-client programming due to its simplicity and efficiency.

Combining Structure-Copy and Bumper-Loop

The Bumper Loop needs to be better suited for high-frequency events. It performs best when event frequency is around 80ms or higher, meaning each event should be processed in less than 80ms. In less demanding environments (e.g., non-financial high-frequency trading), a costly event handler might spawn a goroutine to handle the event asynchronously, allowing the main bumper loop to continue processing subsequent events. The Entry pattern can then copy minimal state, helping decouple race conditions.