Inlock-free
programming, the absence of locks is more superficial. Programmers aim to organize data and processing to eliminate data races. Traditional concurrent programming often uses critical sections, exclusive locks, or read-write locks to protect data from incorrect reads or writes during processing. In contrast, lock-free programming resolves these issues by eliminating shared data or concurrent operations on the same data.
Two of the most common techniques in lock-free programming include:
- Structure Copy
- Bumper Loop
Other methods are derivatives of the same principles.
This article was first published in the Medium MPP plan. If you are a Medium user, please follow me on Medium. Thank you very much.
The “other methods” mentioned here refer to algorithm designs that do not rely on
locker
orCAS
. This article discusses designing data structures and algorithms thatavoid locker
andCAS
.
Some lock-free
solutions, such as RingBuffer
libraries, forcefully eliminate race conditions on shared data
. However, they depend highly on the specific data
structure and CPU design. These are constrained approaches with limited abstraction and generality. For instance, a well-designed RingBuffer
might need to be more efficient on single-core CPUs or single vCPUs. Therefore, this article focuses on more general lock-free techniques that work across different programming languages and targets, regardless of their support for threads or coroutines.
Structure-Copy
Structure-Copy
is a classic lock-free
programming technique. The key idea is to manage mutable data through a structure, often called an Entry
, and then process data within this structure. Since the data processing only operates on its associated structure
, the data remains non-shared, eliminating the potential for data races.
|
|
In the above example, data []any
is provided to the Worker
, who creates an Entry based on it. The Entry.Run
method then processes the initialized data, ensuring that the data package (Count, Data) is exclusive and independent and avoiding shared data issues.
Sometimes, breaking up the data package is tricky. In such cases, you can use a derived technique to decompose it, either by redesigning the data flow into steps (e.g., step1
, step2
, etc.) or by splitting the data into smaller sub-projects, which are then combined later (Map-Reduce?).
Regardless of the decomposition strategy, the principle remains the same: the data operated on by each processing step must be an independent copy, thus eliminating the need for shared locking and unlocking.
Challenges
Handling too many small memory chunks (like Entry
structs) can introduce overhead. This can lead to increased memory consumption or excessive allocation and deallocation costs, especially when dealing with large lists or arrays. For languages with garbage collection, this pressure may be reduced, and tools like sync.Pool
can help alleviate frequent small memory allocations.
Example
|
|
In slog
, each log entry is wrapped into a NewRecord
, avoiding data races. The Entry
pattern is widely used and is not necessarily limited to lock-free programming, but it provides the benefit of lock-free operation wherever applied.
Bumper-Loop
Bumper-Loop
is a design pattern.
A classic example of the Bumper Loop is Redis. While Redis uses a multithreaded network model, it employs a single-threaded model for transaction processing.
The core concept of the Bumper Loop is to serialize parallel events and then distribute them to specific handlers. Since only one handler runs simultaneously, shared data remains exclusive, eliminating race conditions. The Bumper-Loop
becomes an efficient consumption model if handlers complete their tasks quickly and without blocking.
|
|
The Bumper-Loop
is ideal for low-frequency event distribution, locker remove and improved performance. It is often used in TCP/IP
servers to handle incoming data. A new connection request is dealt with by creating an independent connectionProcessor
using the Entry pattern, which then accepts input data through a Bumper Loop, dispatching it to specific handlers based on the recognized input patterns.
Challenges
The downside of a Bumper-Loop
is that any block in the loop can break the entire cycle, leading to delays in handling subsequent events. High-frequency events can cause bottlenecks or even data loss. Despite these challenges, the Bumper-Loop
is still widely used in TCP/IP server-client programming due to its simplicity and efficiency.
Combining Structure-Copy and Bumper-Loop
The Bumper Loop
needs to be better suited for high-frequency events. It performs best when event frequency is around 80ms
or higher, meaning each event should be processed in less than 80ms
. In less demanding environments (e.g., non-financial high-frequency trading), a costly event handler might spawn a goroutine
to handle the event asynchronously, allowing the main bumper loop to continue processing subsequent events. The Entry
pattern can then copy minimal state, helping decouple race conditions.