Hacker News new | past | comments | ask | show | jobs | submit login
Manual Memory Management in Go using jemalloc (dgraph.io)
50 points by chewxy 7 days ago | hide | past | favorite | 46 comments

Reposting my comment from when this was on Reddit:

I'm confused about why you need jemalloc instead of just grabbing an N-gigabyte []Node and then manually making them "live" or "dead" by taking pointers. Or take a giant []byte and then cut that up into nodes with unsafe. Seems like those would be equivalent to jemalloc. Maybe throw in some runtime.KeepAlive to keep the GC from scanning it?

I think the idea is that so you have memory that is invisible to the Go GC. That way you are free (and at risk of memory leaks) to do whatever you want with that slab.

You can do this in pure Go though, you just need to manage the allocator yourself. That is a significant technical undertaking, but once done the advantages are huge. No GC overhead, no cgo. Done properly you can even have multiple readers and writers working concurrently in the same allocation space.

OK you piqued my curiosity. How?

You just mmap memory as a slice. You can grow and shrink it, flush to disk if backed by a file, etc. In code it’s just a slice. Tough part is you need to allocate, control bounds, binary encode/decode, struct align, defrag, etc. It’s hard work, but the performance is 100% worth it if performance is your main goal.

TIL mmap'd memory is invisible to Go's GC

malloc isn't magic, just mmap some memory to dole out.

Because then you have to actually implement the allocation logic.

Allocation logic is much simpler than you think. A solution to a specific problem will always be simpler and faster than a general solution like jemalloc.

Yes? Allocation in a multithreaded program is simple? Balancing allocator space overhead, fragmentation, and allocator performance? What about getting statistics and insight into allocations? Detecting buffer overruns?

You couldn't be more wrong. Behold the source code to jemalloc and despair.

Allocation can be simple in very specific cases where you can use a single threaded arena.

Allocation and deallocation in a perfomant manner, without introducing fragmentation, while being thread safe isn't easy by any means.

Adds in the relevant taxes, such as detecting use after free, etc and that takes a LOT of work

The solution in TFA doesn’t do anything about use after free, does it?

That is a logical fallacy. A solution for a specific _can_ be simpler and faster than a general solution, given enough time. However, jemalloc has had an absolutely huge amount of people-hours invested into optimizing it so it is not unlikely that it'll still be faster for specific problems unless the specific solution also has significant time invested into it.

Their Allocator library is really an arena, a special-purpose allocator that was discussed on HN recently. [1] I think it's fair to say that when not using GC, it's worth looking for a suitable scope for arenas: short-lived, bounded but significant number of allocations. In many servers, an arena per request is appropriate. You can totally beat directly using the global allocator by layering arenas on top, whether the global allocator is jemalloc or anything else. Batching the frees makes a huge difference, both because there are fewer calls to free and because you have to do less pointer-chasing (with potential cache misses) to decide what calls to make to free.

Maybe the arenas reduce the allocations enough (and makes them of reasonably uniform size) such that a simple buddy or slab allocator underneath would beat jemalloc. These simple allocators would have an "unfair" advantage of not having to go through Cgo.

Or maybe just each Allocator (arena) using the Go allocator for its backing allocations would be okay. It'd still be garbage-collected which they've been trying to avoid, but the allocator would no longer looking through each individual allocation, so maybe it'd zippier about freeing stuff.

Note that (as in my other comment) I still think Rust is a better language for this program. In fairness to them, there are plenty of practical reasons they might have ended up here:

* Rust may not have existed or been a reasonable choice when they started coding. In my experience, porting to Rust really is fairly labor-intensive.

* There may be significant other parts of the program where Go/GC is more appropriate.

* Maybe the whole rest of their company's codebase is in Go and they want to use that expertise.

[1] https://news.ycombinator.com/item?id=24762840

I have never seen a well-made but general solution beat a well-made and specific solution for one problem, in complexity or run time, ever. This is very true with allocators. A lot of the time people will just use 'malloc' without any thought into what they're actually allocating. For example, if you only allocate/deallocate from one thread, jemalloc is already way overblown in complexity.

That's not what I meant. If you can muster the time and budget for a well-made specific solution, great. What I was getting at is that due to time and/or budget constraints, most custom solution will not actually be well-made and the implementer would have been better off just picking the battle-tested off the shelf solution.

But TFA isn’t about just adapting some off the shelf quickie solution. It explains all the hoops necessary to cross the CGo barrier and use jemalloc instead of the normal Go garbage collector. ISTM once you put in that LOE, you’re in the space where a specific solution can beat a general one.

This actually sounds like the worst of both worlds. The number one feature of Go is simplicity. In service to that simplicity it deliberately left out a lot of features such as destructors and generics. And the thing is, the language works for its purposes. However, if you subvert that simplicity, it doesn't have the tools and abstractions to make using manual memory safer (destructors, generics, etc). My guess is you follow this path, you will be accumulating technical debt at a very high interest rate.

> However, if you subvert that simplicity, it doesn't have ... destructors

But it does have that, `runtime.SetFinalizer`. Sure it's clunky, and that's probably intentional: It's just there to be there as the escape hatch for you when you do need to subvert the normal way of doing things.

They did a great job of clearly laying out the caveats with this approach, which are extensive. It's obvious that they chose the wrong tool for the job and (un)fortunately were clever enough to make it fit. Go and Rust are both great languages. Rust is a much better language for this particular program.

How do you know they chose the wrong tool for the job without more context into the decision making process and important factors/constraints at the time?

I peeked at the reddit conversation earthboundkid mentioned. [1] The author wrote there:

> Go’s code readability, concurrency model, fast compilation, gofmt, go profiler, go vet, performance and so on. Also, we didn’t ditch Go GC entirely. We’re still using it pretty expansively in all non-critical paths and wherever it’s hard to trace the memory ownership and release.

> I see this as a no different than taking some pieces of your code and converting them to assembly (utilize SIMD instructions, for e.g.), which Go does as well. Also, note that the workload that databases need to incur are lot heavier than typical Go applications.

I don't think most of those are too satisfying as reasons to choose Go over Rust for this program. I think Go's code readability and safety are significantly compromised by this kind of manual memory management, enough so that I'd consider Rust more readable in this context. Likewise Rust's performance is significantly better—the biggest reason Go is faster to compile is that it has a simple compiler that does almost no optimizations. Rust can do that, too, in debug mode, especially with the recent cranelift backend. And Rust has equivalents to "go fmt" and "go vet".

Now it's quite possible that when they started this program years ago, Rust wasn't a good choice. Obviously at least the cranelift backend I just mentioned didn't exist at the time. Pragmatically, you often end up with this kind of "path-dependent design" / "for legacy reasons". Porting is a lot of effort, maybe more than is worth it. You can't always be chasing the hot new fashion or you'd get nothing else done. I think it still should be acknowledged however that a better tool exists now.

I agree with them the concurrency model in Go is genuinely more pleasant than in Rust. In Go, you just write synchronous code and it gets translated to a green thread model for you. In Rust, there's the async stuff which is harder to program and an async ecosystem that is still settling today. That's a clear advantage for Go, and maybe also why they say Go is more readable. (I find Rust otherwise pleasant to read.) And maybe Go is more reasonable for the non-critical path stuff as they said, and maybe that does have to be in the same program.

[1] https://www.reddit.com/r/golang/comments/jobgq9/manual_memor...

When you need to fight the garbage collector in a language that has one by design, it's a clear sign the wrong language was chosen, even if you'll find clever ways to fight it.

My point is that, even if that is true, there are many other factors that go into technical decisions than pure technical merits. These decisions happen in the context of a business, with existing systems, codebases, teams of people with different experiences and preferences, unique market dynamics, etc. To say "you used the wrong tool for the job" without considering the conditions under which that tool was chosen isn't a productive assertion.

I've heard such argument before, and it's often misused to justify the choice of the wrong tool in my opinion, because it's not based on objective technical reasons. So it's easy to use it with vague logic of "there are some business needs".

Are you saying the non-technical reasons they considered weren’t valid? What were they?

I'm saying this argument is often misused in practice.

> In our experience, doing manual memory allocation and chasing potential memory leaks takes less effort than trying to optimize memory usage in a language with garbage collection

They're generalising from the garbage collectors they know to all languages with garbage collectors. Go's GC is particularly crude, and great advances in GC technology have been made in recent years. E.g. take a look at the state of the G1 and ZGC collectors in OpenJDK 15/16.

At this point, why not just use C?

UB, memory leaks, memory corruption, implicit conversions,...

The benefit of using Go is keeping the memory safety for like 90% of the application, with just a tiny unsafe code portion.

In C, 100% of the source code is unsafe.

> UB, memory leaks, memory corruption, implicit conversions,[...] > In C, 100% of the source code is unsafe

Is it perhaps better to focus on context? That is,cost vs benifit wrt context:

- How much safety and what kind and level of safety assurances does the specific application need?

- How much does it cost in development time/friction, application performance, engineering complexity, [insert other relevant cost axes] to achieve the desired level of safety and safety assurances?

As proven by the high integrity security standards, if you want to write safety proven code in C, there is no way around something like MISRA-C, Frama-C, alongside certification tooling like the one sold by LDRA.


Naturally this is a kind of expenses that 99% of the companies aren't going to spend until it finally becomes a legal liability to have security exploits on the software.

Assuming that Go can easily call into C code you can still implement the performance-critical parts in carefully written and tested C and the other 90% in high-level Go.

On the other hand, if implanting a faster allocator fixes performance problems, then there's something bigger amiss in the overall application design. Creating and destroying small objects at such a high frequency that memory management overhead becomes noticeable isn't a good idea in any case, GC or not.

> performance-critical parts in carefully written and tested C

Is something that not even the Linux kernel with its careful and throughout patch review process is not able to achieve.

I believe calling C from Go is a massive pain, and is slow, because of goroutines. Go makes the OS syscalls directly to avoid going through libc.

-Wimplicit, UBSan, ASan,...

The idea that "you can't write safe C" is a big joke. C is as safe as you make it.

that'd be rewrite

They've done the hard work already. It'd be trivial to port.

Someone needs to implement Go in the JVM so it can finally have a good GC

You jest but I toyed with just this and came to the conclusion that it is feasible. In my case, I just transpiled to Kotlin (but would take a different approach when Loom is completed). I don't have much interest in the JVM anymore, but all of these [0] Go files compile and run. Because I'm mad, I also toyed with the inverse, JVM to Go [1], and seriously stressed the compiler [2].

0 - https://github.com/cretz/go2k/tree/master/compiler/src/test/... 1 - https://github.com/cretz/goahead 2 - https://github.com/golang/go/issues/18602

What’s wrong with Go‘s GC. I think it’s actually pretty fast and gets better with each release.

Plenty of stuff,


As usual, Go knows better than other the older languages and then a kludge solution gets done.

It seems to me like your opinion about Go GC is that it is an extremely buggy and slow piece of software.

Every software has bugs. Whether those bugs are a major blocker for people to use them or not should be taken into account.

The issue that you pointed out are difficult items which need more time and thought to take care of. But they aren't something which should cause any major pain in your day-to-day usage of Go. I would say as of 1.15, GC has improved by leaps and bounds and at this point, all major warts with the system are taken care of.

But then again, I don't know in what context you are using Go. It may be the case that you have been affected by any of these bugs, and unless they get fixed, using Go is a blocker for you.

It's not particularly fast, for instance. The golang GC has pretty short pause times but its actual throughput in GB/s collected does not stand out. Short pause times are nice for interactive systems but don't matter as much for eg batch or background jobs, where total time to completion is much more important. The sister comment also has a link to a GH issue detailing many more specific flaws of the GC.

Maybe something in GraalVM? serious implementations of golang there don't seem to exist yet, though.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact