Interning in Go

Val Deleplace
Google Cloud - Community
4 min readSep 16, 2024

Go 1.23 comes with a new package unique implementing interning, and a blog post about it. Interning is re-using objects of equal value instead of keeping duplicate equal objects in memory. It is intended to reduce memory usage.

Example program

The source code is hosted on GitHub. This program is an illustration of a use case for unique, with numbers!

A large string in memory

In this example, I have a very large plaintext file and I’m loading it fully in memory. This results in a string variable book that occupies 282 MiB of memory.

data, err := os.ReadFile("./large_book.txt")
if err != nil {
log.Fatalln(err)
}
book := string(data)

The memory will be monitored using a helper function mem() that runs a full garbage collection, and displays the remaining memory usage:

func mem() {
runtime.GC()
runtime.ReadMemStats(&memstat)
const MiB = 1024 * 1024
fmt.Println("The program is now using", memstat.Alloc/MiB, "MiB")
}

After loading the file, this is the output of the function mem() :

The program is now using 282 MiB

Let’s say I want to build a slice Bwords of all of the words in the book that start with the letter “B”. In my sample book, about 2% of the words are starting with a “B”. Once I have my B-words, I don’t need the full book anymore, and I’d like its memory to be freed.

Let’s explore 3 different strategies to achieve this.

Slices of a large string

Source: nointerning.go

Inside a single loop I’m finding all the words in book , and adding them to Bwords if they start with a “B”:

word := book[a:i]
if word[0] == 'b' || word[0] == 'B' {
Bwords = append(Bwords, word)
}

In Go, strings are immutable and it is safe to use a slice book[a:i] as a small substring of a much larger string:

As you can see, Bwords contains string headers that happen to have pointers to fragments of memory inside the large string book . As long as we keep using Bwords, the memory of book has no chance of being reclaimed by the garbage collector (GC).

The program is now using 299 MiB

Building Bwords allocated 17 MiB of string headers, in addition to the 282 MiB still used by books.

Cloned strings

Source: nointerning_Clone.go

The function strings.Clone, introduced in Go 1.18, is designed precisely to “retain only a small substring of a much larger string”.

word := book[a:i]
if word[0] == 'b' || word[0] == 'B' {
cloned := strings.Clone(word)
Bwords = append(Bwords, cloned)
}

Calling mem() after the last use of books, and before the last use of Bwords , prints

The program is now using 23 MiB

23 MiB is the total size of the small cloned strings contents, plus the size of the Bwords backing array, plus some runtime overhead. This output shows that the large string book was effectively garbage-collected.

Interned strings

Source: interning.go

Have you noticed the word “beaucoup” being cloned several times, producing identical copies in memory?

When using the new package unique, you know that identical words will become a single interned object:

word := book[a:i]
if word[0] == 'b' || word[0] == 'B' {
handle := unique.Make(word)
Bwords = append(Bwords, handle)
}

Calling mem() after the last use of books, and before the last use of Bwords, prints

The program is now using 8 MiB

8 MiB is the total size of the unique words starting with “B”, plus the size of Bwords (which contains many duplicate handles), plus some runtime overhead.

The original very long string book has been effectively freed, because the implementation of unique.Make takes care of cloning any string you give to it! The interning pool does not reference the original string objects.

Well, that’s true after a small fix to be shipped in Go 1.23.2. In the two initial patch releases of Go 1.23, the interning pool was keeping references to the original strings, which was not intended.

Only strings?

unique accepts any comparable object (comparable with ==) as input: numbers, strings, arrays, pointers, or your custom structs containing only comparable fields.

String is the only comparable type that supports slicing, which must be handled with care by the unique package, in order to not accidentally keep a reference to a substring of a very large string. The other comparable types don’t have this slicing problem at all.

The IP address struct type in the announcement blog post is a good example of what a custom type we’d want to intern.

Servers, and more

A common use case for processing similar strings over and over is a stateful web server, which may be running for several weeks. When the requests hitting your service consist of structured data with a lot of similarities, interning may help you keep the memory usage lower, run more frugal instances, fewer instances, and save money.

More generally, any execution environment (batch jobs, embedded systems, etc.) where memory is a precious resource can benefit from interning objects.

--

--

Val Deleplace
Google Cloud - Community

Engineer on cloudy things @Google. Opinions my own. Twitter @val_deleplace