從Go開源項目BigCache學習加速並發訪問和避免高額的GC開銷

幾天前，我讀了一篇關於BigCache的文章，我對它是如何做到以下兩點十分感興趣：

加速並發訪問
避免高額的 GC 開銷

於是我去閱讀了它的代碼。我覺得它的做法很贊，所以我寫了這篇文章來與你分享。

BigCache 是一個快速，支持並發訪問，自淘汰的內存型緩存，可以在存儲大量元素時依然保持高性能。BigCache 將元素保存在堆上卻避免了 GC 的開銷。—— 摘自《BigCache README 中的簡介》

// BigCache README 中的簡單使用示例

import "github.com/allegro/bigcache"

cache, _ := bigcache.NewBigCache(bigcache.DefaultConfig(10 * time.Minute))

cache.Set("my-unique-key", []byte("value"))

entry, _ := cache.Get("my-unique-key")
fmt.Println(string(entry))

BigCache github 地址[1]

並發訪問

一個緩存，支持並發訪問是必須的，無論是程序使用了協程，還是 HTTP 伺服器為每個請求分配的協程，都可能並發訪問緩存。最常見的解決方案是使用讀寫鎖(sync.RWMutex)來保證在一個時間點只允許一個協程修改緩存內容。但是如果這樣做，鎖可能會阻塞後續的操作，導致緩存性能下降。

為了解決這個問題，BigCache使用了shards（譯者 yoko 註：單詞意思為碎片，可以理解為桶或分片或分區），shard是什麼？一個shard是一個結構體，它包含了一個帶鎖的cache實例。

BigCache使用了一個元素個數為 N 的shard數組，然後將數據打散到不同的shard中，所以當你從緩存中讀寫數據時，緩存會選取其中的一個shard使用（選取策略下文會說）。使用這種方式，鎖粒度被大幅度減小，因為鎖範圍從全局緩存縮小到了單個shard中。（譯者 yoko 註：對shard數組的訪問是不需要加鎖的，因為該數組在初始化時創建，後續大小就不會變了，即後續對數組的訪問只有讀，當然，對數組中的元素shard是有讀有寫的，但是這已經和數組沒關係了）

高額 GC 開銷

var map[string]Item

Go 中實現緩存最簡單最常見的方式是使用一個 map 來存儲元素，但是如果使用 map，GC（垃圾回收器）會在標記階段訪問 map 中的每一個元素，當 map 非常大時這會對程序性能帶來非常大的開銷。

go 1.5 版本之後，如果你使用的 map 的 key 和 value 中都不包含指針，那麼 GC 會忽略這個 map。

var map[int]int

為避免這個問題，BigCache使用了一個 key 和 value 都不包含指針的 map，這樣，GC 就會忽略掉這個 map。具體做法為：

map 的 key 為存入緩存中的 key 經過 hash 函數後得到的值。

map 的 value 比較值得一說，BigCache並不是直接把存入緩存的 value 作為 map 的 value，而是將存入緩存的 value 序列化成二進位的[]byte，然後將序列化後的[]byte追加到一個全局的[]byte中（一個 shard 包含一個全局[]byte）。map 中存儲的是該序列化後數據在全局[]byte中的下標。

使用全局的[]byte是非常聰明的做法，因為這樣做，只會給 GC 增加了一個額外對象，由於位元組切片除了自身對象並不包含其他指針數據，所以 GC 對於整個對象的標記時間是 O(1)的。

譯者 yoko 注
英文原文寫到這，基本上關於優化方面的思想已經說明白了。大致是以下兩點：
第一，將元素非常多的容器通過 hash 打散到各個桶（子容器）中。這是一種比較常見通用的減小鎖粒度，提高性能的手段。
第二，就是把 map 的 key 和 value 都弄成了無指針類型，具體做法上面已經說了。這種做法說白了是針對 Go GC 的特性所做的針對性優化。

另外，值得一提的是，關於第二點，由於桶中的全局的[]byte使用的是數組類型，那麼顯然從中間刪除元素的開銷是很大的。我去看了看BigCache的實現，它也確實沒有提供刪除指定 key 的接口。這種緩存，一般來說，刪除元素靠的是全局過期時間（注意，是先進先出的過期，並不能為每個 key 單獨指定不同的過期時間）或緩存總大小達到一定閾值後刪除，也即把數組當隊列用。所以，這種實現的前提是，緩存是自淘汰類型，而非可手動刪除指定元素類型的。
英文原文的後續部分，是英文作者在BigCache的基礎上自己寫了一個簡單版本的 cache，然後通過代碼來說明上面原理。如果你看到這覺得 ok 了，後面的內容就不用看了。

看代碼

以下是我寫的一個cache 的簡單實現[2]，我去掉了關於過期淘汰，容量等功能，只為了更好的演示我上面所說的內容。

首先，哈希函數是從BigCache中抄的，這個哈希實現是零堆內存申請的。

hasher.go

package main

// newDefaultHasher returns a new 64-bit FNV-1a Hasher which makes no memory allocations.
// Its Sum64 method will lay the value out in big-endian byte order.
// See https://en.wikipedia.org/wiki/Fowler–Noll–Vo_hash_function
func newDefaultHasher() fnv64a {
\treturn fnv64a{}
}

type fnv64a struct{}

const (
\t// offset64 FNVa offset basis. See https://en.wikipedia.org/wiki/Fowler–Noll–Vo_hash_function#FNV-1a_hash
\toffset64 = 14695981039346656037
\t// prime64 FNVa prime value. See https://en.wikipedia.org/wiki/Fowler–Noll–Vo_hash_function#FNV-1a_hash
\tprime64 = 1099511628211
)

// Sum64 gets the string and returns its uint64 hash value.
func (f fnv64a) Sum64(key string) uint64 {
\tvar hash uint64 = offset64
\tfor i := 0; i < len(key); i++ {
\t\thash ^= uint64(key[i])
\t\thash *= prime64
\t}
\treturn hash
}

然後，cache 結構體包含了獲取 shard 的邏輯，以及 get 和 set 方法。

在前文並發訪問小節中，提到了為數據選取特定 shard 的函數，該函數的實現是首先通過前面的哈希函數將 key 轉換成一個 hash 值，然後用 hash 值與 shard 的數量計算出 shard 數組的下標。值得一提的是，這裡不是用取余運算得到結果，而是通過按位與計算的。

hashedkey&mask

AND 1101 (mask) = 0101

cache.go

package main

var minShards = 1024

type cache struct {
\tshards []*cacheShard
\thash fnv64a
}

func newCache() *cache {
\tcache := &cache{
\t\thash: newDefaultHasher(),
\t\tshards: make([]*cacheShard, minShards),
\t}
\tfor i := 0; i < minShards; i++ {
\t\tcache.shards[i] = initNewShard()
\t}

\treturn cache
}

func (c *cache) getShard(hashedKey uint64) (shard *cacheShard) {
\treturn c.shards[hashedKey&uint64(minShards-1)]
}

func (c *cache) set(key string, value []byte) {
\thashedKey := c.hash.Sum64(key)
\tshard := c.getShard(hashedKey)
\tshard.set(hashedKey, value)
}

func (c *cache) get(key string) ([]byte, error) {
\thashedKey := c.hash.Sum64(key)
\tshard := c.getShard(hashedKey)
\treturn shard.get(key, hashedKey)
}

最後，也就是最贊的地方，在每個 shard 中有一個字符數組[]byte和一個map[uint64]uint32。在 map 中，存儲每個鍵值對的值在全局字符數組中的下標，在字符數組中存儲鍵值對的值。

使用tail變量來保存字符數組的尾部下標。

shard.go

package main

import (
\t"encoding/binary"
\t"errors"
\t"sync"
)

const (
\theaderEntrySize = 4
\tdefaultValue = 1024 // For this example we use 1024 like default value.
)

type cacheShard struct {
\titems map[uint64]uint32
\tlock sync.RWMutex
\tarray []byte
\ttail int
\theaderBuffer []byte
}

func initNewShard() *cacheShard {
\treturn &cacheShard{
\t\titems: make(map[uint64]uint32, defaultValue),
\t\tarray: make([]byte, defaultValue),
\t\ttail: 1,
\t\theaderBuffer: make([]byte, headerEntrySize),
\t}
}

func (s *cacheShard) set(hashedKey uint64, entry []byte) {
\tw := wrapEntry(entry)
\ts.lock.Lock()
\tindex := s.push(w)
\ts.items[hashedKey] = uint32(index)
\ts.lock.Unlock()
}

func (s *cacheShard) push(data []byte) int {
\tdataLen := len(data)
\tindex := s.tail
\ts.save(data, dataLen)
\treturn index
}

func (s *cacheShard) save(data []byte, len int) {
\t// Put in the first 4 bytes the size of the value
\tbinary.LittleEndian.PutUint32(s.headerBuffer, uint32(len))
\ts.copy(s.headerBuffer, headerEntrySize)
\ts.copy(data, len)
}

func (s *cacheShard) copy(data []byte, len int) {
\t// Using the tail to keep the order to write in the array
\ts.tail += copy(s.array[s.tail:], data[:len])
}

func (s *cacheShard) get(key string, hashedKey uint64) ([]byte, error) {
\ts.lock.RLock()
\titemIndex := int(s.items[hashedKey])
\tif itemIndex == 0 {
\t\ts.lock.RUnlock()
\t\treturn nil, errors.New("key not found")
\t}

\t// Read the first 4 bytes after the index, remember these 4 bytes have the size of the value, so
\t// you can use this to get the size and get the value in the array using index+blockSize to know until what point
\t// you need to read
\tblockSize := int(binary.LittleEndian.Uint32(s.array[itemIndex : itemIndex+headerEntrySize]))
\tentry := s.array[itemIndex+headerEntrySize : itemIndex+headerEntrySize+blockSize]
\ts.lock.RUnlock()
\treturn readEntry(entry), nil
}

func readEntry(data []byte) []byte {
\tdst := make([]byte, len(data))
\tcopy(dst, data)

\treturn dst
}

func wrapEntry(entry []byte) []byte {
\t// You can put more information like a timestamp if you want.
\tblobLength := len(entry)
\tblob := make([]byte, blobLength)
\tcopy(blob, entry)
\treturn blob
}

main.go

package main

import "fmt"

func main() {
\tcache := newCache()
\tcache.set("key", []byte("the value"))

\tvalue, err := cache.get("key")
\tif err != nil {
\t\tfmt.Println(err)
\t}

\tfmt.Println(string(value))
\t// OUTPUT:
\t// the value
}

英文原文地址：How BigCache avoids expensive GC cycles and speeds up concurrent access in Go

原文連結：https://pengrl.com/p/35302/
原文出處：yoko blog
原文作者： yoko
版權聲明： 本文歡迎任何形式轉載，轉載時完整保留本聲明信息（包含原文連結、原文出處、原文作者、版權聲明）即可。本文後續所有修改都會第一時間在原始地址更新。

從Go開源項目BigCache學習加速並發訪問和避免高額的GC開銷

項目中要不要使用 Go？我是這麼思考的

Go語言 CPU 性能、內存分析調試方法大匯總：你要的都在這

Go官方的限流器 time/rate 如何使用

Go語言的 defer 鏈如何被遍歷執行？

Go1.14 的這個改進讓 Gopher 生活更美好

位元組跳動商業產品研發團隊招聘：各種職位應有盡有，Go當然有

TIOBE 發布2020年3月程式語言榜單：Go 衝進前十，Delphi 沒落

好消息，Go 新的代碼文檔中心：pkg.go.dev 不久將開源

Go1.13.8和Go1.12.7發布，Go1.14 延期16天，還有些 bug 沒解決

Go 程式設計師的演變，最後的「Rob Pike」這個梗看懂了嗎？

創業公司更適合用 Go 語言，那大公司呢？

別告訴我這是真的？goroutine 可能使程序變慢

Go 語言中文網 2019 年終總結暨 2020 年展望

Go 在馬蜂窩即時通訊服務建設中的實踐

Go1.13 標準庫的 http 包爆出重大 bug，你的項目中招了嗎？

明白了，原來 Go Web 框架中的中間件都是這樣實現的

像 Awesome-Go 一樣提升企業 Go 項目代碼質量

妙到顛毫：你應該學會的 bigcache 優化技巧

Go 如何處理 HTTP 請求？掌握這兩點即可

sync.Pool 一定會提升性能嗎？建議你先學習一下它的設計

Go 號稱幾行代碼開啟一個 HTTP Server，底層都做了什麼？

Go 簡單性的價值：來自對 Go 倍加青睞的谷歌軟體工程師的自述

Go語言愛好者周刊：第 18 期

從Go開源項目BigCache學習加速並發訪問和避免高額的GC開銷