· Zen HuiFer · Learn · 需要6 分钟阅读
New package in Go 1.23: unique
Go 1.23 introduces unique package for value normalization, enhancing memory efficiency and equality checks. Learn how "interning" works with unique and its benefits for Go developers.
New package in Go 1.23: unique
unique
The package provides some tools for normalizing “comparable values” (i.e. “residency”).Specifically, “normalization” or “interning” refers to merging multiple identical values (such as strings or structures with the same content) into a unique replica through some mechanism. In this way, when there are multiple identical values, only one standardized version will be saved in memory, and all other identical values will point to this unique copy, thereby saving memory and accelerating equality comparison operations.
On the official Go blog, Michael Knyszek, the creator of the unique package, wrote an article about Introduction to the unique package And introduced some new discoveries during the implementation of this package (weak pointers, terminator alternatives). The following is the translation:
The Go 1.23 standard library has introduced a new package calledunique
. The goal of this package is to achieve comparable normalization. Simply put, it allows you to deduplicate values so that they point to unique, normalized replicas, and efficiently manage these normalized replicas at the underlying level. You may already have some understanding of this concept, which is called ‘interning’. Let’s delve into how it works and why it’s useful.
A simple residency implementation
From a macro perspective, staying is very simple. The following code example demonstrates how to deduplicate a string using a regular Map.
var internPool map[string]string //Intern returns a string equal to s, but may share storage with the string previously passed to Intern.
func Intern(s string) string {
pooled, ok := internPool[s]
if !ok {
//Clone a string to prevent it from being part of a larger string.
//If used properly, this situation should rarely occur.
pooled = strings.Clone(s)
internPool[pooled] = pooled
}
return pooled
}
This is very useful when building many potentially repetitive strings, such as parsing text formats.
However, although this implementation is simple, there are some issues:
It will never remove strings from the object pool.
It cannot be safely used concurrently in multiple goroutines.
It only applies to strings, and this idea is actually universally applicable.
In addition, this implementation also missed a subtle optimization opportunity. A string is an immutable structure at the bottom level, containing a pointer and a length. When comparing two strings, if the pointers are not equal, their contents must be compared to determine if they are equal. But if we know that two strings are normalized, then we only need to compare their pointers.
introduceunique
package
Newly introducedunique
The package provides a similarIntern
The functionMake
Its working method andIntern
similar. Internally, it also has a global map (a fast) Generic Concurrent Map )And search for values in the map. However, it is related toIntern
There are two important differences: first, it accepts values of any comparable type; Secondly, it returns a packaging valueHandle[T]
You can retrieve standardized values from it.
Handle[T]
It is the key to design.Handle[T]
There is a characteristic that only when the two values used to create it are equal, the twoHandle[T]
Only equal. More importantly, twoHandle[T]
The comparison is very cheap: just perform pointer comparison. By comparison, the cost of comparing two long strings is much higher!
So far, these functions can be implemented through regular Go code. However,Handle[T]
There is also a second function: as long as a certain value existsHandle[T]
The Map will retain a normalized copy of the value. Once all maps reach a specific valueHandle[T]
If they all disappear, the package will mark the internal Map items as deletable for the garbage collector to recycle in the future. This sets a clear strategy for when to remove entries from the Map: when normalized entries are no longer in use, the garbage collector is free to clean them up.
If you have ever used Lisp, all of this may sound familiar. Symbols in Lisp are resident strings, but they are not strings themselves, and the string values of all symbols are guaranteed to be in the same pool. The relationship between this symbol and a string is similar toHandle[string]
Withstring
The relationship.
A practical example
How to useunique
? You can take a look at the standard librarynet/netip
Package, it’s fornetip.Addr
Within the structureaddrDetail
The value of the type has been resident. The following isnet/netip
A simplified version of the actual code, which usesunique
Package.
type Addr struct {
//The detailed information related to the address has been packaged together and standardized.
z unique.Handle[addrDetail]
}type addrDetail struct {
isV6 bool //If it is IPv4, it is false; If it is IPv6, it is true.
zoneV6 string //If it is IPv6, it may not be equal to ''.
}var z6noz = unique.Make(addrDetail{isV6: true}) //WithZone returns an IP address that is the same as the IP address, but with the specified zone. If the zone is empty, remove the zone.
func (ip Addr) WithZone(zone string) Addr {
if !ip.Is6() {
return ip
}
if zone == "" {
ip.z = z6noz
return ip
}
ip.z = unique.Make(addrDetail{isV6: true, zoneV6: zone})
return ip
}
Since many IP addresses may use the same zone and that zone is part of their identification, it is very reasonable to standardize them. The deduplication of Zone has reduced eachnetip.Addr
The average memory usage of is, and after normalization, comparing zone names only requires simple pointer comparisons, making value comparisons more efficient.
Footnotes on string residency
althoughunique
Packages are very useful, but they are different from the residency of strings because they are used to prevent strings from being deleted from the internal mapHandle[T]
It is necessary. This means that you need to modify the code to preserve it at the same timeHandle[T]
And strings.
But the special thing about strings is that although they behave like values, in reality, their underlying layers contain pointers. Therefore, theoretically, only the underlying storage of strings can be normalized, whileHandle[T]
The details are hidden inside the string. Therefore, it is still possible to achieve so-called transparent string residency in the future, which can be achieved without the need for itHandle[T]
Residency of strings under certain circumstances, similar toIntern
Function, but more semantically similarMake
。
At present,unique.Make("my string").Value()
It is a possible solution. Even without reservationHandle[T]
Strings will also be allowed to be derived fromunique
Delete from the internal map, but not immediately. In fact, entries will be deleted at least after the next garbage collection is completed, so this solution still allows for a certain degree of deduplication during the time period between collections.
Some History and Prospects
in fact,net/netip
The package has been resident on the zone string since its introduction. The resident package it uses is`go4.org/intern`The internal copy. Withunique
The package is similar, it has aValue
Type (looks very similar before generics)Handle[T]
)The entries in its internal map will be removed after they are no longer referenced.
In order to achieve this behavior, the oldintern
The package did some unsafe things, especially implementing weak pointers outside of runtime. And weak pointers areunique
The core abstraction of the package. A weak pointer is a pointer that does not prevent the garbage collector from collecting variables; When a variable is reclaimed, weak pointers will automatically becomenil
。
In achievingunique
When packaging, we added appropriate weak pointer support for the garbage collector. After being tested by design decisions, we were surprised to find that everything was so simple and straightforward. Weak pointers have now become a Public proposal 。
This work also prompted us to re-examine the terminator and ultimately proposed a more user-friendly and efficient one Terminator alternative solution . As comparable hash functions are about to be introduced, build memory in Go Efficient caching The future is full of hope!
reference material
Introduction to the unique package:https://go.dev/blog/unique
Generic concurrent map:https://pkg.go.dev/internal/concurrent@go1.23.0
go4.org/intern
: https://pkg.go.dev/go4.org/intern
Public proposal:https://go.dev/issue/67552
Terminator alternative solution:https://go.dev/issue/67535
Efficient caching:https://go.dev/issue/67552#issuecomment-2200755798