· Zen HuiFer · Learn  · 需要11 分钟阅读

How to cache well in Go

Optimize Go app performance with caching strategies. Learn local vs distributed cache, memory management, and eviction policies. Enhance efficiency with Go's new unique package.

Optimize Go app performance with caching strategies. Learn local vs distributed cache, memory management, and eviction policies. Enhance efficiency with Go's new unique package.

Cache is essential for speeding up application APIs, so it is indispensable in the initial design stage if there are high performance requirements.

If caching is needed during the design phase, the most important thing is to Estimate how much memory is needed

We first need to clarify what content of data we need to cache.

It is not advisable to cache all the data used in applications with continuously increasing user numbers.

Due to the limitation of single machine physical resources on the local memory of the application, unlimited cached data will eventually result in OOM, leading to the forced exit of the application.

If it is a distributed cache, the high hardware cost also requires us to perform a trade off.

If When there are no limitations on physical resources, it is best to put them all into the fastest physical devices.

But the reality of business scenarios does not allow it, so we need to divide the data into hot and cold data, and even archive and compress the cold data appropriately, storing it in cheaper media.

Analyzing which data can be stored in local memory is the first step in doing a good job of local caching.

Balance between stateless applications

Since the data is stored locally in the application, in a distributed system, our application is no longer stateless.

Taking web backend applications as an example, if we deploy 10 Pods as backend applications and add cache in one of the Pods that processes requests, when the same request is forwarded to another Pod, the corresponding data cannot be retrieved.

There are three solutions:

  • Using Distributed Cache Redis

  • Forward the same request to the same Pod

  • Cache the same data in each Pod

The first method does not need to be elaborated here, as storage has also become centralized.

The second method requires specific identification information, such as the useruidTo implement specific forwarding logic, limited by practical scenarios.

The third method will consume more storage space. Compared to the second method, we need to store data in each Pod. Although it cannot be said to be completely stateless, the probability of cache breakdown is lower compared to the second method, because when the gateway encounters problems and cannot forward to a Pod with specific data, other Pods can also process requests normally.

There is no silver bullet, just choose according to the actual scenario, but the farther the cache goes, the longer it takes.

GoimAlso, use memory alignment to make the cache hit as much as possible.

When the CPU performs an operation, it first searches for the required data in L1, then in L2, and finally in L3. If none of these caches are available in the end, the required data needs to be retrieved from the main memory.

The further you go, the longer the computation time will be.

Elimination strategy

If there is strict memory size control for the cache, then it can be usedLRUTo manage memory, let’s take a look at Go’s implementation of LRU caching.

LRU cache

Suitable for scenarios that require controlling cache size and automatically eliminating infrequently used caches.

For example, if you only want to save 128 key values, in LRU, if they are not fully saved, they will continue to increase, and when they are used in the middle or when new values are added, the key will be pushed back to the front to avoid being eliminated.

https://github.com/hashicorp/golang-lru That’s oneGOImplemented LRU cache.

We pass throughTestLet’s take a look at the usage of LRU

func TestLRU(t *testing.T) {
 l, _ := lru.New[int, any](128)
 for i := 0; i < 256; i++ {
  l.Add(i, i+1)
 }
     //Value not eliminated    
 value, ok := l.Get(200)
 assert.Equal(t, true, ok)
 assert.Equal(t, 201, value.(int))      //The value has been eliminated     
 value, ok = l.Get(1)
 assert.Equal(t, false, ok)
 assert.Equal(t, nil, value)
}

You can see200The key has not been eliminated yet, so it can be obtained.

and1The key has already exceededsize = 128Due to cache restrictions, it has been phased out and cannot be accessed normally.

This situation applies when the amount of data we save is too large, and frequently used data will be constantly moved to the head, thereby improving the cache hit rate.

The internal implementation of open source packages is to maintain all cached elements through linked lists

every timeAddWhen… ifkeyIf it already exists, it willkeyMove to the head.

func (l *LruList[K, V]) move(e, at *Entry[K, V]) {
 if e == at {
  return
 }
 e.prev.next = e.next
 e.next.prev = e.prev e.prev = at
 e.next = at.next
 e.prev.next = e
 e.next.prev = e
}

IfkeyIf it does not exist, it will be passed throughinsertMethod to Insert

func (l *LruList[K, V]) insert(e, at *Entry[K, V]) *Entry[K, V] {
 e.prev = at
 e.next = at.next
 e.prev.next = e
 e.next.prev = e
 e.list = l
 l.len++
 return e
}

If the linked list cachesizeIf it exceeds the limit, the last element of the linked list will be removed, which is an element that was added earlier and not used.

func (c *LRU[K, V]) removeOldest() {
 if ent := c.evictList.Back(); ent != nil {
  c.removeElement(ent)
 }
}func (c *LRU[K, V]) removeElement(e *internal.Entry[K, V]) {
 c.evictList.Remove(e)
 delete(c.items, e.Key)
      //Callback after deleting key     
 if c.onEvict != nil {
  c.onEvict(e.Key, e.Value)
 }
}func (l *LruList[K, V]) Remove(e *Entry[K, V]) V {
 e.prev.next = e.next
 e.next.prev = e.prev
      //To prevent memory leaks, set it to nil     
 e.next = nil 
 e.prev = nil 
 e.list = nil
 l.len-- return e.Value
}

Cache update

Timely updating cache in distributed systems can reduce data inconsistency issues.

Different methods also apply to different scenarios.

There are different situations for obtaining cached data. For example, when it comes to popular rankings, it is not related to the user. When there are multiple Pods, we need to maintain them in the local cache. When there is a write update operation, we need to notify all Pods to update.

If the data is unique to the user themselves, we would prefer to have it in a fixed Pod and then stably send requests to the same Pod through the user ID (UID). This way, we don’t need to maintain multiple copies of data in different Pods and reduces memory consumption.

Most of the time, we also want our application to be stateless, so we put this cached data intoRedisUp there.

There are three main strategies for updating distributed cache: bypass update strategy, write to database after cache, and write back strategy.

Bypass update strategy It is the most commonly used method in our daily lives, which involves deleting the cache before writing it to the database when updating data. If the cache does not exist during subsequent reads, it is then retrieved from the database and updated.

This strategy may result in inconsistency when the QPS of the read is very high, because when deleting the cache before updating the database, a read operation is called again, which will write the old value, resulting in the old value still being read from the database.

Although the probability of this situation actually occurring is not high, we still need to evaluate the specific scenarios used. If it is a devastating blow to the system data when it occurs, then this strategy cannot be used.

If this situation is acceptable, but you want to minimize inconsistent time as much as possible, you can set a cache expiration time. When there is no write operation triggered, the cache can be automatically expired to refresh the cached data.

After writing to the cache, both the write to library and write back strategies update the cache first, and then write to the database, with the only difference being whether it refreshes one or a batch.

One obvious disadvantage of this strategy is that it is relatively easy to lose data. Although Redis also has a write back disk strategy, for applications with high QPS, losing one second of data after a machine loses power is still a very large amount of data. Therefore, it is necessary to decide whether to adopt this strategy based on the actual situation of the business and scenario.

And ifRedisIf it still cannot meet our performance requirements, then we need to directly pass the cached content through Application variable storage Down, which is a local cache, is returned directly to the user after access, without the need for network requests.

So what we are discussing below is the strategy of updating local cache in a distributed situation.

1. Proactively notify updates, similar to the bypass update strategy.

Distributed can be achieved throughETCDBroadcast quickly spreads cached data without waiting for the next query to load.

But there will be a problem here, such asT1The notification of cache update has been sent, but at this time, the downstream service has not yet fully updated,T2=T1 + 1sAnother signal of cache update has been generated, andT1The time has not been fully updated yet.

At this point, it is possible that the update speed may be affectedT2The updated new values are overwrittenT1The old value of time.

In this case, a monotonically increasing time can be addedversionTo solve it. WhenT2After the version of the data takes effect,T1It can no longer be correctT2The cache is updated to avoid the problem of overwriting old values.

In proactive notifications, we can specify the correspondingkeyValue is used to update specific caches, thereby avoiding excessive load caused by updating all cached data.

This update strategy is similar to bypass updates, except that it changes from updating the distributed cache to updating the local cache.

2. Wait for the cache to automatically expire

This usage is suitable for scenarios where data consistency is not a high requirement. For local caching, if we want to spread it to all Pods, the maintenance strategy will also be relatively high.

We can usegoRelated open-source packages https://github.com/patrickmn/go-cache To maintain memory expiration time without the need to implement it yourself.

Next, let’s take a look at how Go cache implements local caching

Go Cahce

https://github.com/patrickmn/go-cache It is an open-source local cache package for Go

Internally, data is stored through maps

type Cache struct {
 *cache
}type cache struct {
 defaultExpiration time.Duration
 items             map[string]Item
 mu                sync.RWMutex
 onEvicted         func(string, interface{})
 janitor           *janitor
}

itemsAll corresponding data has been stored.

every timeSetandGetWhen it comes toitemsObtain from within.

janitorAt specific time intervals, expiredkeyDelete, the specific time interval can be specified by oneself.

func (j *janitor) Run(c *cache) {
 ticker := time.NewTicker(j.Interval)
 for {
  select {
  case <-ticker.C:
   c.DeleteExpired()
  case <-j.stop:
   ticker.Stop()
   return
  }
 }
}

Will passTickerGenerate signals and pass them at regular intervalsDeleteExpiredMethod to delete expired itemskey

func (c *cache) DeleteExpired() {
     //The eliminated KV value    
 var evictedItems []keyAndValue
 now := time.Now().UnixNano()
 c.mu.Lock()
     //Find expired keys and delete them    
 for k, v := range c.items {
  if v.Expiration > 0 && now > v.Expiration {
   ov, evicted := c.delete(k)
   if evicted {
    evictedItems = append(evictedItems, keyAndValue{k, ov})
   }
  }
 }
 c.mu.Unlock()
     //If there is a callback after being eliminated, it will be executed    
 for _, v := range evictedItems {
  c.onEvicted(v.key, v.value)
 }
}

From the code, we can see that cache expiration is achieved through a dependency loop elimination method.

If we obtain expired but haven’t had a chance yetDeleteofkeyWhat kind of outcome would it be?

When obtaining, we will also make judgmentskeyWill the value of expire.

func (c *cache) Get(k string) (interface{}, bool) {
 c.mu.RLock()
     //If not found, return directly    
 item, found := c.items[k]
 if !found {
  c.mu.RUnlock()
  return nil, false
 }       //If the obtained content has expired, return nil directly and wait for the loop to delete the key     
 if item.Expiration > 0 {
  if time.Now().UnixNano() > item.Expiration {
   c.mu.RUnlock()
   return nil, false
  }
 }
 c.mu.RUnlock()
 return item.Object, true
}

It can be seen that every time a specific value is obtained, a judgment is made, so it is possible to accurately avoid obtaining expired kv.

Cache preheating

How to preload during startup, Should we wait for initialization to complete before starting? Can we start in segments? Will concurrency cause pressure on middleware All of these are issues that need to be considered when preheating the cache during startup.

If starting the preloading process after all initialization is completed during startup consumes a large amount of overall resources, we can perform initialization and preloading in parallel, but we need to ensure that certain key components (such as database connections, network services, etc.) are ready to avoid resource unavailability during the preloading process.

If a request enters the application before it has been fully loaded, there needs to be a corresponding fallback strategy to ensure the normal return of access.

The advantage of segmented loading is that it can shorten the initialization time through concurrency, but concurrent loading not only improves the efficiency of preloading, but also puts pressure on middleware (such as cache servers, databases, etc.).

When encoding, it is necessary to evaluate the system’s concurrency processing capability and set reasonable concurrency limits. Adopting a current limiting mechanism can alleviate concurrency pressure and avoid overloading middleware.

It can also be done in Go throughchannelTo achieve the limit of concurrency in a certain way.

Cache preheating plays a very important role in practical production scenarios. During the release process, the local cache of the application will disappear with the restart. If it is a rolling update, there may be some issues At least one Pod needs to go back to the source, and in the case of a very large QPS, it is possible that the peak QPS of this Pod will drag down the database This leads to the avalanche effect.

There are two ways to deal with this situation. One is to minimize version upgrades during peak periods and instead opt for low traffic periods, which can be easily detected through monitoring.

Another way is to pre load the data at startup and provide services to the outside world after the loading is complete. However, there is a problem with this. If there are issues with the release version and we roll back, the service startup time will be extended, which is not conducive to our quick rollback.

Both methods have their own advantages and disadvantages. In practical scenarios, choose according to your own needs, but the most important thing is still Minimize reliance on special circumstances as much as possible The more dependencies are needed, the more likely problems are to occur when publishing.

返回博客
New package in Go 1.23: unique

New package in Go 1.23: unique

Go 1.23 introduces unique package for value normalization, enhancing memory efficiency and equality checks. Learn how "interning" works with unique and its benefits for Go developers.

The noCopy strategy you should know in Golang

The noCopy strategy you should know in Golang

Discover the importance of noCopy in Golang development. Learn how it prevents accidental copying of critical structures like sync primitives. Enhance your Go code safety and efficiency.

Modern LLM Basic Technology Compilation

Modern LLM Basic Technology Compilation

Explore the fundamentals of modern Large Language Models (LLMs) with an overview of Llama 3's training and architecture. Key points include pre-training data curation, model enhancements like GQA and KV Cache, and the importance of scaling laws in developing efficient LLMs.