Analyze various high-performance JSON parsing libraries in Go.

 

Compare the performance, advantages, and disadvantages of fastjson, gjson, and jsonparser

This article delves into the analysis of how the standard library in Go parses JSON and then explores popular JSON parsing libraries, their characteristics, and how they can better assist us in development in different scenarios.

This article is first published in the medium MPP plan. If you are a medium user, please follow me on the medium. Thank you very much.

I didn’t plan to look into the JSON library’s performance issue. However, recently, I did a pprof on my project and found from the flame graph below that more than half of the performance consumption in business logic processing is during JSON parsing. Therefore, this article came about.

image-20210519160937326

This article delves into the analysis of how the standard library in Go parses JSON and then explores popular JSON parsing libraries, as well as their characteristics and how they can better assist us in development in different scenarios.

Mainly introduce the analysis of the following libraries (2024-06-13):

lib Star
JSON Unmarshal
valyala/fastjson 2.2 k
tidwall/gjson 13.8 k
buger/jsonparser 5.4 k

JSON Unmarshal

1
func Unmarshal(data []byte, v interface{})

“The official JSON parsing library requires two parameters: the object to be serialized and the type of this object. Before actually performing JSON parsing, reflect.ValueOf is called to obtain the reflection object of parameter v. Then, the method for parsing is determined based on the non-empty characters at the beginning of the incoming data object.”

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
func (d *decodeState) value(v reflect.Value) error {
    switch d.opcode {
    default:
        panic(phasePanicMsg)
    // array 
    case scanBeginArray:
        ...
    // struct or map
    case scanBeginObject:
        ...
	// Literals, including int, string, float, etc.
    case scanBeginLiteral:
        ...
    }
    return nil
}

If the parsed object starts with [, it indicates that this is an array object and will enter the scanBeginArray branch; if it starts with {, it indicates that the parsed object is a struct or map, and then enters the scanBeginObject branch, and so on.

Sub Summary

Looking at Unmarshal’s source code, it can be seen that a large amount of reflection is used to obtain field values. If the JSON is nested, recursive reflection is needed to obtain values. Thus, the performance can be imagined to be very poor.

However, if performance is not highly valued, using it directly is actually a very good choice. It has complete functionality, and the official team has been continuously iterating and optimizing it. Maybe its performance will also make a qualitative leap in future versions. It should be the only one that can directly convert JSON objects into Go structs.

fastjson

The characteristic of this library is fast, just like its name suggests. Its introduction page says so:

Fast. As usual, up to 15x faster than the standard encoding/json.

Its usage is also very simple, as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
func main() {
    var p fastjson.Parser
    v, _ := p.Parse(`{
                "str": "bar",
                "int": 123,
                "float": 1.23,
                "bool": true,
                "arr": [1, "foo", {}]
        }`)
    fmt.Printf("foo=%s\n", v.GetStringBytes("str"))
    fmt.Printf("int=%d\n", v.GetInt("int"))
    fmt.Printf("float=%f\n", v.GetFloat64("float"))
    fmt.Printf("bool=%v\n", v.GetBool("bool"))
    fmt.Printf("arr.1=%s\n", v.GetStringBytes("arr", "1"))
}
// Output:
// foo=bar
// int=123
// float=1.230000
// bool=true
// arr.1=foo

To use fastjson, first, give the JSON string to the Parser parser for parsing, and then retrieve it through the object returned by the Parse method. If it is a nested object, you can directly pass in the corresponding parent-child key when passing parameters to the Get method.

Analysis

The design of fastjson differs from the standard library Unmarshal in that it divides JSON parsing into two parts: Parse and Get.

Parse is responsible for parsing the JSON string into a structure and returning it. Data is then retrieved from the returned structure. The Parse process is lock-free, so if you want to call Parse concurrently, you need to use ParserPool.

fastjson processes JSON by traversing it from top to bottom, storing the parsed data in a Value structure:

1
type Value struct { o Object a []*Value s string t Type }

This structure is very simple:

  • o Object: Indicates that the parsed structure is an object.
  • a []*Value: Indicates that the parsed structure is an array.
  • s string: If the parsed structure is neither an object nor an array, other types of values are stored in this field as a string.
  • t Type: Represents the type of this structure, which can be TypeObject, TypeArray, TypeString, TypeNumber, etc.
1
type Object struct { kvs []kv keysUnescaped bool } type kv struct { k string v *Value }

This structure stores the recursive structure of objects. After parsing the JSON string in the example above, the resulting structure looks like this:

fastjson

Code

In terms of implementation, the absence of reflection code makes the entire parsing process very clean. Let’s directly look at the main part of the parsing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
func parseValue(s string, c *cache, depth int) (*Value, string, error) {
    if len(s) == 0 {
        return nil, s, fmt.Errorf("cannot parse empty string")
    }
    depth++
    // The maximum depth of the json string cannot exceed MaxDepth
    if depth > MaxDepth {
        return nil, s, fmt.Errorf("too big depth for the nested JSON; it exceeds %d", MaxDepth)
    }
    // parse object
    if s[0] == '{' {
        v, tail, err := parseObject(s[1:], c, depth)
        if err != nil {
            return nil, tail, fmt.Errorf("cannot parse object: %s", err)
        }
        return v, tail, nil
    }
    // parse array
    if s[0] == '[' {
        ...
    }
    // parse string
    if s[0] == '"' {
        ...
    } 
    ...
    return v, tail, nil
}

parseValue will determine the type to be parsed based on the first non-empty character of the string. Here, an object type is used for parsing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
func parseObject(s string, c *cache, depth int) (*Value, string, error) {
    ...
    o := c.getValue()
    o.t = TypeObject
    o.o.reset()
    for {
        var err error
        // 获取Ojbect结构体中的 kv 对象
        kv := o.o.getKV()
        ... 
        // 解析 key 值

        kv.k, s, err = parseRawKey(s[1:])
        ... 
        // 递归解析 value 值
        kv.v, s, err = parseValue(s, c, depth)
        ...
        // 遇到 ,号继续往下解析
        if s[0] == ',' {
            s = s[1:]
            continue
        }
        // 解析完毕
        if s[0] == '}' {
            return o, s[1:], nil
        }
        return nil, s, fmt.Errorf("missing ',' after object value")
    }
}

The parseObject function is also very simple. It will get the key value in the loop, and then call the parseValue function recursively to parse the value from top to bottom, parsing JSON objects one by one until encountering } at last.

Sub Summary

Through the above analysis, it can be seen that fastjson is much simpler in implementation and has higher performance than the standard library. After using Parse to parse the JSON tree, it can be reused multiple times, avoiding the need for repeated parsing and improving performance.

However, its functionality is very rudimentary and lacks common operations such as JSON to struct or JSON to map conversion. If you only want to simply retrieve values from JSON, then using this library is very convenient. But if you want to convert JSON values into a structure, you will need to manually set each value yourself.

GJSON

In my test, although the performance of GJSON is not as extreme as fastjson, its functionality is very complete and its performance is also quite OK. Next, let me briefly introduce the functionality of GJSON.

The usage of GJSON is similar to fastjson, it is also very simple. Just pass in the JSON string and the value that needs to be obtained as parameters.

1
2
json := `{"name":{"first":"li","last":"dj"},"age":18}`
lastName := gjson.Get(json, "name.last")

In addition to this function, simple fuzzy matching can also be performed. It supports wildcard characters * and ? in the key. * matches any number of characters, while ? matches a single character, as follows:

1
2
3
4
5
6
7
json := `{
    "name":{"first":"Tom", "last": "Anderson"},
    "age": 37,
    "children": ["Sara", "Alex", "Jack"]
}`
fmt.Println("third child*:", gjson.Get(json, "child*.2"))
fmt.Println("first c?ild:", gjson.Get(json, "c?ildren.0"))
  • child*.2: First, child* matches children, .2 reads the third element;
  • c?ildren.0: c?ildren matches children, .0 reads the first element;

In addition to fuzzy matching, it also supports modifier operations.

1
2
3
4
5
6
json := `{
    "name":{"first":"Tom", "last": "Anderson"},
    "age": 37,
    "children": ["Sara", "Alex", "Jack"]
}`
fmt.Println("third child*:", gjson.Get(json, "children|@reverse"))

children|@reverse 先读取数组children,然后使用修饰符@reverse翻转之后返回,输出。

1
nestedJSON := `{"nested": ["one", "two", ["three", "four"]]}` fmt.Println(gjson.Get(nestedJSON, "nested|@flatten"))

@flatten flattens the inner array of array nested to the outer array and returns:

1
["one," "two," "three," "four"]

There are some other interesting features, you can check the official documentation.

Analysis

The Get method parameter of GJSON is composed of two parts, one is a JSON string, and the other is called Path, which represents the matching path of the JSON value to be obtained.

In GJSON, because it needs to meet many definitions of parsing scenarios, the parsing is divided into two parts. You need to parse the Path before traversing the JSON string.

If you encounter a value that can be matched during the parsing process, it will be returned directly, and there is no need to continue to traverse down. If multiple values are matched, the whole JSON string will be traversed all the time. If you encounter a Path that cannot be matched in the JSON string, you also need to traverse the complete JSON string.

In the process of parsing, the content of parsing will not be saved in a structure like fastjson, which can be used repeatedly. So when you call GetMany to return multiple values, you actually need to traverse the JSON string many times, so the efficiency will be relatively low.

Pasted image 20240613165903

It’s important to be aware that when using the @flatten function to parse JSON, it won’t be validated. This means that even if the input string is not a valid JSON, it will still be parsed. Therefore, it’s essential for users to double-check that the input is indeed a valid JSON to avoid any potential issues.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
func Get(json, path string) Result {
    // 解析 path 
    if len(path) > 1 {
        ...
    }
    var i int
    var c = &parseContext{json: json}
    if len(path) >= 2 && path[0] == '.' && path[1] == '.' {
        c.lines = true
        parseArray(c, 0, path[2:])
    } else {
// Parse according to different objects, and loop here until '{' or '[' is found
        for ; i < len(c.json); i++ {
            if c.json[i] == '{' {
                i++

                parseObject(c, i, path)
                break
            }
            if c.json[i] == '[' {
                i++
                parseArray(c, i, path)
                break
            }
        }
    }
    if c.piped {
        res := c.value.Get(c.pipe)
        res.Index = 0
        return res
    }
    fillIndex(json, c)
    return c.value
}

In the Get method, you can see a long code string used to parse various paths. Then, a for loop continuously traverses JSON until it finds ‘{’ or ‘[’ before performing the corresponding logic processing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
func parseObject(c *parseContext, i int, path string) (int, bool) {
    var pmatch, kesc, vesc, ok, hit bool
    var key, val string
    rp := parseObjectPath(path)
    if !rp.more && rp.piped {
        c.pipe = rp.pipe
        c.piped = true
    }
	// Nest two for loops to find the key value
    for i < len(c.json) {
        for ; i < len(c.json); i++ {
            if c.json[i] == '"' { 
                i++
                var s = i
                for ; i < len(c.json); i++ {
                    if c.json[i] > '\\' {
                        continue
                    }
                    // Find the key value and jump to parse_key_string_done
                    if c.json[i] == '"' {
                        i, key, kesc, ok = i+1, c.json[s:i], false, true
                        goto parse_key_string_done
                    }
                    ...
                }
                key, kesc, ok = c.json[s:], false, false
            // break
            parse_key_string_done:
                break
            }
            if c.json[i] == '}' {
                return i + 1, false
            }
        }
        if !ok {
            return i, false
        }
        // Check whether it is a fuzzy match
        if rp.wild {
            if kesc {
                pmatch = match.Match(unescape(key), rp.part)
            } else {
                pmatch = match.Match(key, rp.part)
            }
        } else {
            if kesc {
                pmatch = rp.part == unescape(key)
            } else {
                pmatch = rp.part == key
            }
        }
        // parse value
        hit = pmatch && !rp.more
        for ; i < len(c.json); i++ {
            switch c.json[i] {
            default:
                continue
            case '"':
                i++
                i, val, vesc, ok = parseString(c.json, i)
                if !ok {
                    return i, false
                }
                if hit {
                    if vesc {
                        c.value.Str = unescape(val[1 : len(val)-1])
                    } else {
                        c.value.Str = val[1 : len(val)-1]
                    }
                    c.value.Raw = val
                    c.value.Type = String
                    return i, true
                }
            case '{':
                if pmatch && !hit {
                    i, hit = parseObject(c, i+1, rp.path)
                    if hit {
                        return i, true
                    }
                } else {
                    i, val = parseSquash(c.json, i)
                    if hit {
                        c.value.Raw = val
                        c.value.Type = JSON
                        return i, true
                    }
                }
            ...
            break
        }
    }
    return i, false
}

In reviewing the parseObject code, the intention was not to teach JSON parsing or string traversal but to illustrate a bad-case scenario. The nested for loops and consecutive if statements can be overwhelming and may remind you of a colleague’s code you’ve encountered at work.

Sub Summary

Advantages:

  1. Performance: jsonparser performs relatively well compared to the standard library.
  2. Flexibility: It offers various retrieval methods and customizable return values, making it very convenient.

Disadvantages:

  1. No JSON Validation: It does not check for the correctness of the JSON input.
  2. Code Smell: The code structure is cumbersome and hard to read, which can make maintenance challenging.

Note

When parsing JSON to retrieve values, the GetMany function will traverse the JSON string multiple times based on the specified keys. Converting the JSON to a map can reduce the number of traversals.

Conclusion

While jsonparser has notable performance and flexibility, its lack of JSON validation and complex, hard-to-read code structure present significant drawbacks. If you need to parse JSON and retrieve values frequently, consider the trade-offs between performance and code maintainability.

jsonparser

Analysis

jsonparser also processes an input JSON byte slice and allows for quickly locating and returning values by passing multiple keys.

Similar to GJSON, jsonparser does not cache the parsed JSON string in a data structure as fastjson does. However, when multiple values need to be parsed, the EachKey function can be used to parse multiple values in a single pass through the JSON string.

If a matching value is found, jsonparser returns immediately without further traversal. For multiple matches, it traverses the entire JSON string. If a path does not match any value in the JSON string, it still traverses the entire string.

jsonparser reduces the use of recursion by employing loops during JSON traversal, thus decreasing the call stack depth and enhancing performance.

In terms of functionality, ArrayEach, ObjectEach, and EachKey functions allow for passing a custom function to meet specific needs, greatly enhancing the utility of jsonparser.

The code for jsonparser is straightforward and clear, making it easy to analyze. Those interested can examine it themselves.

Sub Summary

The high performance of jsonparser compared to the standard library can be attributed to:

  1. Using for loops to minimize recursion.
  2. Avoid the use of reflection, unlike the standard library.
  3. Exiting immediately upon finding the corresponding key value without further recursion.
  4. Operating on the passed-in JSON string without allocating new space, thus reducing memory allocations.

Additionally, the API design is highly practical. Functions like ArrayEach, ObjectEach, and EachKey allow for passing custom functions, solving many issues in actual business development.

However, jsonparser has a significant drawback: it does not validate JSON. If the input is not valid JSON, jsonparser will not detect it.

Performance Comparison

Parsing Small JSON Strings

Parsing a simple JSON string of approximately 190 bytes

Library Operation Time per Iteration Memory Usage Memory Allocations Performance
Standard Library Parse to map 724 ns/op 976 B/op 51 allocs/op Slow
Parse to struct 297 ns/op 256 B/op 5 allocs/op Average
fastjson get 68.2 ns/op 0 B/op 0 allocs/op Fastest
parse 35.1 ns/op 0 B/op 0 allocs/op Fastest
GJSON Convert to map 255 ns/op 1009 B/op 11 allocs/op Average
get 232 ns/op 448 B/op 1 allocs/op Average
jsonparser get 106 ns/op 232 B/op 3 allocs/op Fast

Parsing Medium JSON Strings

Parsing a JSON string of moderate complexity, approximately 2.3KB

Library Operation Time per Iteration Memory Usage Memory Allocations Performance
Standard Library Parse to map 4263 ns/op 10212 B/op 208 allocs/op Slow
Parse to struct 4789 ns/op 9206 B/op 259 allocs/op Slow
fastjson get 285 ns/op 0 B/op 0 allocs/op Fastest
parse 302 ns/op 0 B/op 0 allocs/op Fastest
GJSON Convert to map 2571 ns/op 8539 B/op 83 allocs/op Average
get 1489 ns/op 448 B/op 1 allocs/op Average
jsonparser get 878 ns/op 2728 B/op 5 allocs/op Fast

Parsing Large JSON Strings

Parsing a JSON string of high complexity, approximately 2.2MB

Library Operation Time per Iteration Memory Usage Memory Allocations Performance
Standard Library Parse to map 2292959 ns/op 5214009 B/op 95402 allocs/op Slow
Parse to struct 1165490 ns/op 2023 B/op 76 allocs/op Average
fastjson get 368056 ns/op 0 B/op 0 allocs/op Fast
parse 371397 ns/op 0 B/op 0 allocs/op Fast
GJSON Convert to map 1901727 ns/op 4788894 B/op 54372 allocs/op Average
get 1322167 ns/op 448 B/op 1 allocs/op Average
jsonparser get 233090 ns/op 1788865 B/op 376 allocs/op Fastest

Summary

During this comparison, I analyzed several high-performance JSON parsing libraries. It was evident that these libraries share several common characteristics:

  • They avoid using reflection.
  • They parse JSON by traversing the bytes of the JSON string sequentially.
  • They minimize memory allocation by directly parsing the input JSON string.
  • They sacrifice some compatibility for performance.

Despite these trade-offs, each library offers unique features. The fastjson API is the simplest to use; GJSON offers fuzzy searching capabilities and high customizability; jsonparser supports inserting callback functions during high-performance parsing, providing a degree of convenience.

For my use case, which involves simply parsing certain fields from HTTP response JSON strings with predetermined fields and occasional custom operations, jsonparser is the most suitable tool.

Therefore, if performance concerns you, consider selecting a JSON parser based on your business requirements.

Reference

https://github.com/buger/jsonparser
https://github.com/tidwall/gjson
https://github.com/valyala/fastjson
https://github.com/json-iterator/go
https://github.com/mailru/easyjson
https://github.com/Jeffail/gabs
https://github.com/bitly/go-simplejson

Licensed under CC BY-NC-SA 4.0
Last updated on Jun 13, 2024 17:10 CST
Built with Hugo
Theme Stack designed by Jimmy