从Go开源项目BigCache学习加速并发访问和避免高额的GC开销

2019-11-21 Go语言中文网

几天前，我读了一篇关于BigCache的文章，我对它是如何做到以下两点十分感兴趣：

加速并发访问
避免高额的 GC 开销

于是我去阅读了它的代码。我觉得它的做法很赞，所以我写了这篇文章来与你分享。

BigCache 是一个快速，支持并发访问，自淘汰的内存型缓存，可以在存储大量元素时依然保持高性能。BigCache 将元素保存在堆上却避免了 GC 的开销。—— 摘自《BigCache README 中的简介》

// BigCache README 中的简单使用示例

import "github.com/allegro/bigcache"

cache, _ := bigcache.NewBigCache(bigcache.DefaultConfig(10 * time.Minute))

cache.Set("my-unique-key", []byte("value"))

entry, _ := cache.Get("my-unique-key")
fmt.Println(string(entry))

BigCache github 地址[1]

并发访问

一个缓存，支持并发访问是必须的，无论是程序使用了协程，还是 HTTP 服务器为每个请求分配的协程，都可能并发访问缓存。最常见的解决方案是使用读写锁(sync.RWMutex)来保证在一个时间点只允许一个协程修改缓存内容。但是如果这样做，锁可能会阻塞后续的操作，导致缓存性能下降。

为了解决这个问题，BigCache使用了shards（译者 yoko 注：单词意思为碎片，可以理解为桶或分片或分区），shard是什么？一个shard是一个结构体，它包含了一个带锁的cache实例。

BigCache使用了一个元素个数为 N 的shard数组，然后将数据打散到不同的shard中，所以当你从缓存中读写数据时，缓存会选取其中的一个shard使用（选取策略下文会说）。使用这种方式，锁粒度被大幅度减小，因为锁范围从全局缓存缩小到了单个shard中。（译者 yoko 注：对shard数组的访问是不需要加锁的，因为该数组在初始化时创建，后续大小就不会变了，即后续对数组的访问只有读，当然，对数组中的元素shard是有读有写的，但是这已经和数组没关系了）

高额 GC 开销

var map[string]Item

Go 中实现缓存最简单最常见的方式是使用一个 map 来存储元素，但是如果使用 map，GC（垃圾回收器）会在标记阶段访问 map 中的每一个元素，当 map 非常大时这会对程序性能带来非常大的开销。

go 1.5 版本之后，如果你使用的 map 的 key 和 value 中都不包含指针，那么 GC 会忽略这个 map。

var map[int]int

为避免这个问题，BigCache使用了一个 key 和 value 都不包含指针的 map，这样，GC 就会忽略掉这个 map。具体做法为：

map 的 key 为存入缓存中的 key 经过 hash 函数后得到的值。

map 的 value 比较值得一说，BigCache并不是直接把存入缓存的 value 作为 map 的 value，而是将存入缓存的 value 序列化成二进制的[]byte，然后将序列化后的[]byte追加到一个全局的[]byte中（一个 shard 包含一个全局[]byte）。map 中存储的是该序列化后数据在全局[]byte中的下标。

使用全局的[]byte是非常聪明的做法，因为这样做，只会给 GC 增加了一个额外对象，由于字节切片除了自身对象并不包含其他指针数据，所以 GC 对于整个对象的标记时间是 O(1)的。

译者 yoko 注
英文原文写到这，基本上关于优化方面的思想已经说明白了。大致是以下两点：
第一，将元素非常多的容器通过 hash 打散到各个桶（子容器）中。这是一种比较常见通用的减小锁粒度，提高性能的手段。
第二，就是把 map 的 key 和 value 都弄成了无指针类型，具体做法上面已经说了。这种做法说白了是针对 Go GC 的特性所做的针对性优化。
另外，值得一提的是，关于第二点，由于桶中的全局的[]byte使用的是数组类型，那么显然从中间删除元素的开销是很大的。我去看了看BigCache的实现，它也确实没有提供删除指定 key 的接口。这种缓存，一般来说，删除元素靠的是全局过期时间（注意，是先进先出的过期，并不能为每个 key 单独指定不同的过期时间）或缓存总大小达到一定阈值后删除，也即把数组当队列用。所以，这种实现的前提是，缓存是自淘汰类型，而非可手动删除指定元素类型的。
英文原文的后续部分，是英文作者在BigCache的基础上自己写了一个简单版本的 cache，然后通过代码来说明上面原理。如果你看到这觉得 ok 了，后面的内容就不用看了。

看代码

以下是我写的一个cache 的简单实现[2]，我去掉了关于过期淘汰，容量等功能，只为了更好的演示我上面所说的内容。

首先，哈希函数是从BigCache中抄的，这个哈希实现是零堆内存申请的。

hasher.go

package main

// newDefaultHasher returns a new 64-bit FNV-1a Hasher which makes no memory allocations.
// Its Sum64 method will lay the value out in big-endian byte order.
// See https://en.wikipedia.org/wiki/Fowler–Noll–Vo_hash_function
func newDefaultHasher() fnv64a {
\treturn fnv64a{}
}

type fnv64a struct{}

const (
\t// offset64 FNVa offset basis. See https://en.wikipedia.org/wiki/Fowler–Noll–Vo_hash_function#FNV-1a_hash
\toffset64 = 14695981039346656037
\t// prime64 FNVa prime value. See https://en.wikipedia.org/wiki/Fowler–Noll–Vo_hash_function#FNV-1a_hash
\tprime64 = 1099511628211
)

// Sum64 gets the string and returns its uint64 hash value.
func (f fnv64a) Sum64(key string) uint64 {
\tvar hash uint64 = offset64
\tfor i := 0; i < len(key); i++ {
\t\thash ^= uint64(key[i])
\t\thash *= prime64
\t}
\treturn hash
}

然后，cache 结构体包含了获取 shard 的逻辑，以及 get 和 set 方法。

在前文并发访问小节中，提到了为数据选取特定 shard 的函数，该函数的实现是首先通过前面的哈希函数将 key 转换成一个 hash 值，然后用 hash 值与 shard 的数量计算出 shard 数组的下标。值得一提的是，这里不是用取余运算得到结果，而是通过按位与计算的。

hashedkey&mask

AND 1101 (mask) = 0101

cache.go

package main

var minShards = 1024

type cache struct {
\tshards []*cacheShard
\thash fnv64a
}

func newCache() *cache {
\tcache := &cache{
\t\thash: newDefaultHasher(),
\t\tshards: make([]*cacheShard, minShards),
\t}
\tfor i := 0; i < minShards; i++ {
\t\tcache.shards[i] = initNewShard()
\t}

\treturn cache
}

func (c *cache) getShard(hashedKey uint64) (shard *cacheShard) {
\treturn c.shards[hashedKey&uint64(minShards-1)]
}

func (c *cache) set(key string, value []byte) {
\thashedKey := c.hash.Sum64(key)
\tshard := c.getShard(hashedKey)
\tshard.set(hashedKey, value)
}

func (c *cache) get(key string) ([]byte, error) {
\thashedKey := c.hash.Sum64(key)
\tshard := c.getShard(hashedKey)
\treturn shard.get(key, hashedKey)
}

最后，也就是最赞的地方，在每个 shard 中有一个字符数组[]byte和一个map[uint64]uint32。在 map 中，存储每个键值对的值在全局字符数组中的下标，在字符数组中存储键值对的值。

使用tail变量来保存字符数组的尾部下标。

shard.go

package main

import (
\t"encoding/binary"
\t"errors"
\t"sync"
)

const (
\theaderEntrySize = 4
\tdefaultValue = 1024 // For this example we use 1024 like default value.
)

type cacheShard struct {
\titems map[uint64]uint32
\tlock sync.RWMutex
\tarray []byte
\ttail int
\theaderBuffer []byte
}

func initNewShard() *cacheShard {
\treturn &cacheShard{
\t\titems: make(map[uint64]uint32, defaultValue),
\t\tarray: make([]byte, defaultValue),
\t\ttail: 1,
\t\theaderBuffer: make([]byte, headerEntrySize),
\t}
}

func (s *cacheShard) set(hashedKey uint64, entry []byte) {
\tw := wrapEntry(entry)
\ts.lock.Lock()
\tindex := s.push(w)
\ts.items[hashedKey] = uint32(index)
\ts.lock.Unlock()
}

func (s *cacheShard) push(data []byte) int {
\tdataLen := len(data)
\tindex := s.tail
\ts.save(data, dataLen)
\treturn index
}

func (s *cacheShard) save(data []byte, len int) {
\t// Put in the first 4 bytes the size of the value
\tbinary.LittleEndian.PutUint32(s.headerBuffer, uint32(len))
\ts.copy(s.headerBuffer, headerEntrySize)
\ts.copy(data, len)
}

func (s *cacheShard) copy(data []byte, len int) {
\t// Using the tail to keep the order to write in the array
\ts.tail += copy(s.array[s.tail:], data[:len])
}

func (s *cacheShard) get(key string, hashedKey uint64) ([]byte, error) {
\ts.lock.RLock()
\titemIndex := int(s.items[hashedKey])
\tif itemIndex == 0 {
\t\ts.lock.RUnlock()
\t\treturn nil, errors.New("key not found")
\t}

\t// Read the first 4 bytes after the index, remember these 4 bytes have the size of the value, so
\t// you can use this to get the size and get the value in the array using index+blockSize to know until what point
\t// you need to read
\tblockSize := int(binary.LittleEndian.Uint32(s.array[itemIndex : itemIndex+headerEntrySize]))
\tentry := s.array[itemIndex+headerEntrySize : itemIndex+headerEntrySize+blockSize]
\ts.lock.RUnlock()
\treturn readEntry(entry), nil
}

func readEntry(data []byte) []byte {
\tdst := make([]byte, len(data))
\tcopy(dst, data)

\treturn dst
}

func wrapEntry(entry []byte) []byte {
\t// You can put more information like a timestamp if you want.
\tblobLength := len(entry)
\tblob := make([]byte, blobLength)
\tcopy(blob, entry)
\treturn blob
}

main.go

package main

import "fmt"

func main() {
\tcache := newCache()
\tcache.set("key", []byte("the value"))

\tvalue, err := cache.get("key")
\tif err != nil {
\t\tfmt.Println(err)
\t}

\tfmt.Println(string(value))
\t// OUTPUT:
\t// the value
}

英文原文地址：How BigCache avoids expensive GC cycles and speeds up concurrent access in Go

原文链接：https://pengrl.com/p/35302/
原文出处：yoko blog
原文作者： yoko
版权声明： 本文欢迎任何形式转载，转载时完整保留本声明信息（包含原文链接、原文出处、原文作者、版权声明）即可。本文后续所有修改都会第一时间在原始地址更新。

从Go开源项目BigCache学习加速并发访问和避免高额的GC开销

项目中要不要使用 Go？我是这么思考的

Go语言 CPU 性能、内存分析调试方法大汇总：你要的都在这

Go官方的限流器 time/rate 如何使用

Go语言的 defer 链如何被遍历执行？

Go1.14 的这个改进让 Gopher 生活更美好

字节跳动商业产品研发团队招聘：各种职位应有尽有，Go当然有

TIOBE 发布2020年3月编程语言榜单：Go 冲进前十，Delphi 没落

好消息，Go 新的代码文档中心：pkg.go.dev 不久将开源

Go1.13.8和Go1.12.7发布，Go1.14 延期16天，还有些 bug 没解决

Go 程序员的演变，最后的“Rob Pike”这个梗看懂了吗？

创业公司更适合用 Go 语言，那大公司呢？

别告诉我这是真的？goroutine 可能使程序变慢

Go 语言中文网 2019 年终总结暨 2020 年展望

Go 在马蜂窝即时通讯服务建设中的实践

Go1.13 标准库的 http 包爆出重大 bug，你的项目中招了吗？

明白了，原来 Go Web 框架中的中间件都是这样实现的

像 Awesome-Go 一样提升企业 Go 项目代码质量

妙到颠毫：你应该学会的 bigcache 优化技巧

Go 如何处理 HTTP 请求？掌握这两点即可

sync.Pool 一定会提升性能吗？建议你先学习一下它的设计

Go 号称几行代码开启一个 HTTP Server，底层都做了什么？

Go 简单性的价值：来自对 Go 倍加青睐的谷歌软件工程师的自述

Go语言爱好者周刊：第 18 期

从Go开源项目BigCache学习加速并发访问和避免高额的GC开销