The Truth About Panic And Recover In Go

Background

In the previous article, we learned that panic can occur in three ways:

Initiated by developers: by calling the panic() function.
Generated by the compiler: for example, in the case of division by zero.
Sent to the process by the kernel: for example, in the case of an illegal memory access.

This article is first published in the medium MPP plan. If you are a medium user, please follow me in medium. Thank you very much.

All three cases can be categorized as calls to the panic() function, indicating that panic in Go is just a special function call and is handled at the language level. Now that we know how panic is triggered, the next step is to understand how panic is handled. When I first started learning Go, I often had some questions in my mind:

What exactly is panic? Is it a struct or a function?
Why does panic cause the Go process to exit?
Why does recover need to be placed inside a defer statement to take effect?
Even if recover is placed inside a defer statement, why doesn’t the process recover?
Why is it possible to panic again after a panic? What are the implications?

Today, we will delve into the code to clarify these questions.

Based on Go 1.21.4

The `_panic` Data Structure

Let’s start by looking at an example of an actively triggered panic. Through the assembly code, we can see that the panic call is made to the runtime.gopanic function, which contains a crucial data structure called _panic.

Let’s take a look at the _panic data structure:

1
2
3
4
5
6
7
8
9


type _panic struct {  
    argp      unsafe.Pointer // pointer to arguments of deferred call run during panic; cannot move - known to liblink  
    arg       any            // argument to panic  
    link      *_panic        // link to earlier panic    pc        uintptr        // where to return to in runtime if this panic is bypassed  
    sp        unsafe.Pointer // where to return to in runtime if this panic is bypassed  
    recovered bool           // whether this panic is over  
    aborted   bool           // the panic was aborted  
    goexit    bool  
}

Key Fields to Focus On:

link: A pointer to the _panic structure, indicating that _panic can form a unidirectional linked list, similar to the _defer list.
recovered field: The recovery of the so-called _panic depends on whether this field is set to true. recover() actually modifies this field.

Now let’s take a look at two important fields in the g structure:

1
2
3
4
5


type g struct {
	_panic *_panic // panic linked list, the innermost one
	_defer *_defer // defer linked list, the innermost one
	// ...
}

From here, we can see that both the _defer and _panic linked lists are attached to the goroutine. When can there be multiple elements on the _panic linked list? The answer is when a panic() call is made within a defer function. Only in a defer function can an _panic linked list be formed because the panic() function only executes _defer functions!

The `recover()` Function

For the sake of explanation, let’s start with a simple analysis of what the recover() function does:

1
2
3


defer func() {
	recover()
}()

The recover() function corresponds to the gorecover function implementation in the runtime/panic.go file.

The `gorecover` Function

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


func gorecover(argp uintptr) any {
	// Must be in a function running as part of a deferred call during the panic.
	// Must be called from the topmost function of the call
	// (the function used in the defer statement).
	// p.argp is the argument pointer of that topmost deferred function call.
	// Compare against argp reported by caller.
	// If they match, the caller is the one who can recover.
	gp := getg()
	p := gp._panic
	if p != nil && !p.goexit && !p.recovered && argp == uintptr(p.argp) {
		p.recovered = true
		return p.arg
	}
	return nil
}

This function is quite simple:

Retrieve the current goroutine structure.
Get the latest _panic from the _panic linked list of the current goroutine. If it is not nil, proceed with the processing.
Set the recovered field of the _panic structure to true and return the arg field.

That’s all there is to the recover() function. It simply sets the value of the recovered field in the _panic structure and does not involve any magical code jumps. The assignment of recovered is effective within the logic of the panic() function.

The `panic()` Function

Based on the previous assembly code, we know that the panic call is made to the runtime.gopanic function.

The `gopanic` Function

The most important part of the panic mechanism is the gopanic function. All the details about panic are in this function. The complexity of understanding panic lies in two points:

Recursive execution of gopanic when panic is nested.
The program counter (pc) and stack pointer (sp) are not manipulated in the usual way of function call and return. During recovery, the instruction registers are modified directly, bypassing the remaining logic of gopanic and even the recursive logic of multiple gopanic calls.

We can understand the logic of gopanic by dividing it into two parts: inside the loop and outside the loop.

Inside the Loop

The actions inside the loop can be divided into the following steps:

Iterate through the defer linked list of the goroutine and retrieve a _defer deferred function.
Set the d.started flag and bind the current d._panic (used for recursive detection).
Execute the _defer deferred function.
Remove the executed _defer function from the linked list.
Check if the recovered field of the _panic structure is set to true and take appropriate action.
- If it is true, reset the pc and sp registers (usually starting from the deferreturn instruction) and enqueue the goroutine in the scheduler for execution.
Repeat the above steps.

Questions to Consider

You may notice that the recovered field is only modified during the third step. It cannot be modified anywhere else.
Question 1: Why does recover need to be placed inside a defer statement to take effect?
Because that’s the only opportunity!

Let’s consider a few straightforward examples:

1
2
3
4


func main() {
	panic("test")
	recover()
}

In the above example, recover() is called after panic(). Why does it still panic? Because the recover() function is never executed. The execution order is as follows:

1
2
3
4


panic
  -> gopanic
     -> Execute the defer linked list
        -> exit

Someone might argue, “What if I put recover() before panic('test')?”

1
2
3
4


func main() {
	recover()
	panic("test")
}

No, it won’t work because when recover() is executed, the _panic is not attached to the goroutine yet. So recover() is useless in this case.

Question 2: Even if recover is placed inside a defer statement, why doesn’t the process recover?
Let’s recall the operations in the for loop:

1
2
3
4
5
6


// Step: Iterate through the _defer linked list
d := gp._defer
// Step: Execute the defer function
reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
// Step: Remove the executed defer function
gp._defer = d.link

Key Point: In the gopanic function, only the _defer functions of the current goroutine are executed. Therefore, if a recover() is performed in a defer function attached to another goroutine, it will have no effect.

Let’s consider an example:

1
2
3
4
5
6
7
8


func main() { // g1
    go func() { // g2
        defer func() {
            recover()
        }()
    }()
    panic("test")
}

Since the panic and recover are in two different goroutines, the _panic is attached to g1, and the recover in g2’s _defer chain cannot access the _panic structure of g1. Therefore, it cannot set the recovered field to true, and the program still crashes.

Question 3: Why is it possible to panic again after a panic? What are the implications?
This is actually quite easy to understand. Some people might overthink it. Can we call panic() recursively? Yes, we can.
The scenario is usually as follows:

The gopanic function calls a _defer function.
The _defer function calls panic() or gopanic().

This is just a simple function recursion, and there’s nothing special about it. In this scenario, an _panic linked list will be formed starting from gp._panic. The instructions executed by gopanic are special in two ways:

If the _panic structure is set to recovered, the pc and sp registers are reset, bypassing gopanic (including nested function stacks), and jumping directly to the instruction to be executed after the defer function (deferreturn).
If there is no handler for the _panic data, exit the process and terminate the execution of subsequent instructions.

Let’s look at an example of nested panic:

1
2
3
4
5
6


func main() {
    defer func() { // defer_0
        panic("panic again")
    }()
    panic("first")
}

The function execution is as follows:

1
2
3
4
5
6
7


    gopanic // First panic
        defer_0 is executed
            gopanic // Second panic
                defer_0 is removed from the linked list (recursive call), termination condition is met
                
    // Print stack trace and exit the program
    fatalpanic

Here’s another example for comparison:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


func main() {
    println("=== begin ===")
    defer func() { // defer_0
        println("=== come in defer_0 ===")
    }()
    defer func() { // defer_1
        recover()
    }()
    defer func() { // defer_2
        panic("panic 2")
    }()
    panic("panic 1")
    println("=== end ===")
}

Will this function print the stack trace and exit?
The answer is no. The output will be:

1

➜ panic ./test_panic === begin === === come in defer_0 ===

Did you guess it correctly? Let me explain the complete route:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


main
    gopanic // First panic
        1. Retrieve defer_2, set started
        2. Execute defer_2
            gopanic // Second panic
                1. Retrieve defer_2, set panic as aborted
                2. Remove defer_2 from the linked list
                3. Execute defer_1
                    - Execute recover
                4. Remove defer_1
                5. Execute recovery, reset pc register, jump to the instruction registered in defer_1 (usually deferreturn)

    // Jump out of the recursive call of gopanic, directly to the execution of deferreturn;
    defereturn
        1. Iterate through the defer function chain, there is still defer_0 left, retrieve defer_0
        2. Execute defer_0
    // End of the main function

Here’s another example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


func main() {
    println("=== begin ===")
    defer func() { // defer_0
        println("=== come in defer_0 ===")
    }()
    defer func() { // defer_1
        panic("panic 2")    
    }()
    defer func() { // defer_2
        recover()
    }()
    panic("panic 1")
    println("=== end ===")
}

Will this function print the stack trace and exit?
The answer is yes.
The output will be:

1

➜ panic ./test_panic === begin === === come in defer_0 === panic: panic 2 goroutine 1 [running]: main.main.func2() /Users/code/gopher/src/panic/test_panic.go:9 +0x39 main.main() /Users/code/gopher/src/panic/test_panic.go:11 +0xf7

The execution path is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


main
    gopanic // First panic
        1. Retrieve defer_2, set started
        2. Execute defer_2 
            - Execute recover, set panic_1 as recovered
        3. Remove defer_2 from the linked list
        4. Execute recovery, reset pc register, jump to the instruction registered in defer_1 (usually deferreturn)

    // Jump out of the recursive call of gopanic, execute deferreturn;
    defereturn

        1. Iterate through the defer function chain, retrieve defer_1
        2. Remove defer_1
        2. Execute defer_1
            gopanic // Second panic
                1. There is a defer_0 on the defer chain
                2. Execute defer_0 (defer_0 does not recover, only prints one line of output)
                3. Remove defer_0, the chain is empty, exit the for loop
                3. Execute fatalpanic
                    - exit(2) terminates the process

Did you guess correctly?

The `recovery` Function

Finally, let’s take a look at the crucial recovery function. In the gopanic function, when executing the defer functions in the loop, if the recovered field of the _panic structure is set to true, the mcall(recovery) function is called to perform the so-called “recovery.”

Let’s take a look at the implementation of the recovery function. It’s a very simple function that resets the pc and sp registers and reschedules the Goroutine for execution.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


// runtime/panic.go
func recovery(gp *g) {
    // Retrieve the values of the stack pointer and program counter
    sp := gp.sigcode0
    pc := gp.sigcode1
    // Reset the pc and sp registers of the Goroutine
    gp.sched.sp = sp
    gp.sched.pc = pc
    // Reschedule the Goroutine
    gogo(&gp.sched)
}

Resetting the pc and sp registers means what? The pc register points to the address of the instruction, in other words, it jumps to another location to execute instructions. It no longer executes the instructions sequentially after gopanic. The _defer.pc is the instruction line of the executed code, and where is this instruction?

For this, let’s recall the chapter on defer. When a deferred function is registered, it corresponds to a _defer structure. When creating this structure, the _defer.pc field is assigned the instruction on the next line after the new function. This was explained in detail in the chapter on “In-Depth Analysis of Defer”.

Here’s an example: if it’s allocated on the stack, it will be in deferprocStack. So, mcall(recovery) jumps to this position, and the subsequent logic follows the deferreturn logic, executing the remaining _defer function chain.

That concludes the explanation of panic. It’s just a special function call, nothing special. The only thing that makes it special is the special instruction jumps it performs.

Background

The _panic Data Structure

The recover() Function

The gorecover Function

The panic() Function

The gopanic Function