· Zen HuiFer · Learn  · 需要6 分钟阅读

New package in Go 1.23: unique

Go 1.23 introduces unique package for value normalization, enhancing memory efficiency and equality checks. Learn how "interning" works with unique and its benefits for Go developers.

Go 1.23 introduces unique package for value normalization, enhancing memory efficiency and equality checks. Learn how "interning" works with unique and its benefits for Go developers.

New package in Go 1.23: unique

uniqueThe package provides some tools for normalizing “comparable values” (i.e. “residency”).

Specifically, “normalization” or “interning” refers to merging multiple identical values (such as strings or structures with the same content) into a unique replica through some mechanism. In this way, when there are multiple identical values, only one standardized version will be saved in memory, and all other identical values will point to this unique copy, thereby saving memory and accelerating equality comparison operations.

On the official Go blog, Michael Knyszek, the creator of the unique package, wrote an article about Introduction to the unique package And introduced some new discoveries during the implementation of this package (weak pointers, terminator alternatives). The following is the translation:

The Go 1.23 standard library has introduced a new package calledunique. The goal of this package is to achieve comparable normalization. Simply put, it allows you to deduplicate values so that they point to unique, normalized replicas, and efficiently manage these normalized replicas at the underlying level. You may already have some understanding of this concept, which is called ‘interning’. Let’s delve into how it works and why it’s useful.

A simple residency implementation

From a macro perspective, staying is very simple. The following code example demonstrates how to deduplicate a string using a regular Map.

var internPool map[string]string     //Intern returns a string equal to s, but may share storage with the string previously passed to Intern.     
func Intern(s string) string {
    pooled, ok := internPool[s]
    if !ok {
             //Clone a string to prevent it from being part of a larger string.     
             //If used properly, this situation should rarely occur.     
        pooled = strings.Clone(s)
        internPool[pooled] = pooled
    }
    return pooled
}

This is very useful when building many potentially repetitive strings, such as parsing text formats.
However, although this implementation is simple, there are some issues:

  1. It will never remove strings from the object pool.

  2. It cannot be safely used concurrently in multiple goroutines.

  3. It only applies to strings, and this idea is actually universally applicable.

In addition, this implementation also missed a subtle optimization opportunity. A string is an immutable structure at the bottom level, containing a pointer and a length. When comparing two strings, if the pointers are not equal, their contents must be compared to determine if they are equal. But if we know that two strings are normalized, then we only need to compare their pointers.

introduceuniquepackage

Newly introduceduniqueThe package provides a similarInternThe functionMakeIts working method andInternsimilar. Internally, it also has a global map (a fast) Generic Concurrent Map )And search for values in the map. However, it is related toInternThere are two important differences: first, it accepts values of any comparable type; Secondly, it returns a packaging valueHandle[T]You can retrieve standardized values from it.

Handle[T]It is the key to design.Handle[T]There is a characteristic that only when the two values used to create it are equal, the twoHandle[T]Only equal. More importantly, twoHandle[T]The comparison is very cheap: just perform pointer comparison. By comparison, the cost of comparing two long strings is much higher!

So far, these functions can be implemented through regular Go code. However,Handle[T]There is also a second function: as long as a certain value existsHandle[T]The Map will retain a normalized copy of the value. Once all maps reach a specific valueHandle[T]If they all disappear, the package will mark the internal Map items as deletable for the garbage collector to recycle in the future. This sets a clear strategy for when to remove entries from the Map: when normalized entries are no longer in use, the garbage collector is free to clean them up.

If you have ever used Lisp, all of this may sound familiar. Symbols in Lisp are resident strings, but they are not strings themselves, and the string values of all symbols are guaranteed to be in the same pool. The relationship between this symbol and a string is similar toHandle[string]WithstringThe relationship.

A practical example

How to useunique? You can take a look at the standard librarynet/netipPackage, it’s fornetip.AddrWithin the structureaddrDetailThe value of the type has been resident. The following isnet/netipA simplified version of the actual code, which usesuniquePackage.

type Addr struct {
        //The detailed information related to the address has been packaged together and standardized.    
    z unique.Handle[addrDetail]
}type addrDetail struct {
    isV6   bool        //If it is IPv4, it is false; If it is IPv6, it is true.     
    zoneV6 string      //If it is IPv6, it may not be equal to ''.     
}var z6noz = unique.Make(addrDetail{isV6: true})     //WithZone returns an IP address that is the same as the IP address, but with the specified zone. If the zone is empty, remove the zone.     
func (ip Addr) WithZone(zone string) Addr {
    if !ip.Is6() {
        return ip
    }
    if zone == "" {
        ip.z = z6noz
        return ip
    }
    ip.z = unique.Make(addrDetail{isV6: true, zoneV6: zone})
    return ip
}

Since many IP addresses may use the same zone and that zone is part of their identification, it is very reasonable to standardize them. The deduplication of Zone has reduced eachnetip.AddrThe average memory usage of is, and after normalization, comparing zone names only requires simple pointer comparisons, making value comparisons more efficient.

Footnotes on string residency

althoughuniquePackages are very useful, but they are different from the residency of strings because they are used to prevent strings from being deleted from the internal mapHandle[T]It is necessary. This means that you need to modify the code to preserve it at the same timeHandle[T]And strings.

But the special thing about strings is that although they behave like values, in reality, their underlying layers contain pointers. Therefore, theoretically, only the underlying storage of strings can be normalized, whileHandle[T]The details are hidden inside the string. Therefore, it is still possible to achieve so-called transparent string residency in the future, which can be achieved without the need for itHandle[T]Residency of strings under certain circumstances, similar toInternFunction, but more semantically similarMake

At present,unique.Make("my string").Value()It is a possible solution. Even without reservationHandle[T]Strings will also be allowed to be derived fromuniqueDelete from the internal map, but not immediately. In fact, entries will be deleted at least after the next garbage collection is completed, so this solution still allows for a certain degree of deduplication during the time period between collections.

Some History and Prospects

in fact,net/netipThe package has been resident on the zone string since its introduction. The resident package it uses is`go4.org/intern`The internal copy. WithuniqueThe package is similar, it has aValueType (looks very similar before generics)Handle[T])The entries in its internal map will be removed after they are no longer referenced.

In order to achieve this behavior, the oldinternThe package did some unsafe things, especially implementing weak pointers outside of runtime. And weak pointers areuniqueThe core abstraction of the package. A weak pointer is a pointer that does not prevent the garbage collector from collecting variables; When a variable is reclaimed, weak pointers will automatically becomenil

In achievinguniqueWhen packaging, we added appropriate weak pointer support for the garbage collector. After being tested by design decisions, we were surprised to find that everything was so simple and straightforward. Weak pointers have now become a Public proposal 。

This work also prompted us to re-examine the terminator and ultimately proposed a more user-friendly and efficient one Terminator alternative solution . As comparable hash functions are about to be introduced, build memory in Go Efficient caching The future is full of hope!

reference material

Introduction to the unique package:https://go.dev/blog/unique

Generic concurrent map:https://pkg.go.dev/internal/concurrent@go1.23.0

go4.org/intern: https://pkg.go.dev/go4.org/intern

Public proposal:https://go.dev/issue/67552

Terminator alternative solution:https://go.dev/issue/67535

Efficient caching:https://go.dev/issue/67552#issuecomment-2200755798

返回博客
How to cache well in Go

How to cache well in Go

Optimize Go app performance with caching strategies. Learn local vs distributed cache, memory management, and eviction policies. Enhance efficiency with Go's new unique package.

The noCopy strategy you should know in Golang

The noCopy strategy you should know in Golang

Discover the importance of noCopy in Golang development. Learn how it prevents accidental copying of critical structures like sync primitives. Enhance your Go code safety and efficiency.

Modern LLM Basic Technology Compilation

Modern LLM Basic Technology Compilation

Explore the fundamentals of modern Large Language Models (LLMs) with an overview of Llama 3's training and architecture. Key points include pre-training data curation, model enhancements like GQA and KV Cache, and the importance of scaling laws in developing efficient LLMs.