Featured image of post Decrypt Go: Panic and Recover

Decrypt Go: Panic and Recover

 

Background

In the previous article, we learned that panic can occur in three ways:

  • Initiated by developers: by calling the panic() function.
  • Hidden code generated by the compiler: for example, in the case of division by zero.
  • Signals are sent to the process by the kernel, for example, in the case of illegal memory access.

This article was first published in the Medium MPP plan. If you are a Medium user, please follow me on Medium. Thank you very much.

All three cases can be categorized as calls to the panic() function, indicating that panic in Go is just a special function call and is handled at the language level. Now that we know how panic is triggered, the next step is to understand how panic is handled. When I was first learning Go, I often had some questions in my mind:

  • What exactly is panic? Is it a struct or a function?
  • Why does panic cause the process to exit?
  • Why does recovery need to be placed inside a defer statement to take effect?
  • Even if recover is placed inside a defer statement, why doesn’t the process recover?
  • Why is it possible to panic again after a panic? What are the consequences?

Today, let’s dive into the code to clarify these questions.

Based on Go 1.21.4

_panic Data Structure

First, let’s look at an example of an actively triggered panic (example). By examining the assembly code, we can find that panic calls the runtime.gopanic function, which contains a crucial data structure called _panic.

Let’s take a look at the _panic data structure:

1
2
3
4
5
6
7
8
9
type _panic struct {  
    argp      unsafe.Pointer // pointer to arguments of deferred call run during panic; cannot move - known to liblink  
    arg       any            // argument to panic  
    link      *_panic        // link to earlier panic    pc        uintptr        // where to return to in runtime if this panic is bypassed  
    sp        unsafe.Pointer // where to return to in runtime if this panic is bypassed  
    recovered bool           // whether this panic is over  
    aborted   bool           // the panic was aborted  
    goexit    bool  
}

Key fields to focus on:

  • link: A pointer to the _panic structure, indicating that _panic can be a unidirectional linked list, similar to the _defer linked list.
  • recovered field: This field determines whether the _panic has been recovered or not. The recover() function actually modifies this field.

9d5208d45a60e46284b441a06e10a633.png

Let’s also take a look at two important fields in g:

1
2
3
4
5
type g struct {
	_panic *_panic // panic linked list, this is the innermost one
	_defer *_defer // defer linked list, this is the innermost one
	// ...
}

From this, we can see that both the _defer and _panic linked lists are attached to a goroutine. When can the _panic linked list have multiple elements?
Only when the panic() flow calls the panic() function again within a defer function. This is because the panic() function only executes the _defer functions internally!

The recover() Function

To facilitate explanation, let’s start by analyzing what the recover() function does:

1
2
3
defer func() { 
	recover() 
}()

The recover() function corresponds to the gorecover function implementation in the runtime/panic.go file.

The gorecover Function

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
func gorecover(argp uintptr) any {
	// Must be in a function running as part of a deferred call during the panic.
	// Must be called from the topmost function of the call
	// (the function used in the defer statement).
	// p.argp is the argument pointer of that topmost deferred function call.
	// Compare against argp reported by caller.
	// If they match, the caller is the one who can recover.
	gp := getg()
	p := gp._panic
	if p != nil && !p.goexit && !p.recovered && argp == uintptr(p.argp) {
		p.recovered = true
		return p.arg
	}
	return nil
}

This function is quite simple:

  1. Retrieve the current goroutine structure.
  2. Retrieve the latest _panic from the _panic linked list of the current goroutine. If it is not nil, proceed with the processing.
  3. Set the recovered field of the _panic structure to true and return the arg field.

That’s all the recover() function does. It simply sets the value of the recovered field and does not involve any magical code jumps. The setting of the recovered field takes effect within the logic of the panic() function.

The panic() Function

Based on the previous assembly code, we know that panic calls the runtime.gopanic function.

The gopanic Function

The most important part of the panic mechanism is the gopanic function, which contains all the details about panic. The complexity of understanding panic lies in two points:

  1. Recursive execution of gopanic when panic is nested.
  2. The program counter (pc) and stack pointer (sp) are not manipulated in the usual way, but through direct modification of the instruction register structure, bypassing the logic after gopanic, and even handling recursive gopanic calls.

The logic inside gopanic can be divided into two parts: inside the loop and outside the loop.

Inside the Loop

The actions inside the loop can be broken down into the following steps:

  1. Traverse the _defer linked list of the goroutine to retrieve a _defer deferred function.
  2. Set the d.started flag and bind the current d._panic (used to check during recursion).
  3. Execute the _defer deferred function.
  4. Remove the executed _defer function from the linked list.
  5. Check if the recovered field of _panic is set to true and take appropriate action.
    • If it is true, reset the pc and sp registers (generally starting from the deferreturn instruction) and enqueue the goroutine in the scheduler to wait for execution.

Some Considerations

You may notice that the recovered field is only modified in the third step. In fact, you cannot modify the value of _panic.recovered anywhere else.
Question 1: Why does recover need to be placed inside a defer statement to take effect?
Because that is the only opportunity!
Let’s consider a few simple examples:

1
2
3
4
func main() {
	panic("test")
	recover()
}

In the above example, recover() is called, so why does it still panic?
Because it never reaches the recover() function.

true
Last updated on Jun 28, 2024 19:32 CST
Built with Hugo
Theme Stack designed by Jimmy