What are malloc's alignment guarantees? #1533

RalfJung · 2019-07-02T07:12:54Z

What exactly are the guarantees that jemalloc's malloc provides in terms of alignment?

The docs say

The allocated space is suitably aligned (after possible pointer coercion) for storage of any type of object.

However, in rust-lang/rust#45955 we noticed that this is not correct: at least with GCC/clang extensions, one can define a type of size 8 that has alignment 16. However, jemalloc has been observed handing out allocations of size 8 that are just 8-aligned.

System allocator functions usually seem to guarantee that everything is at least 16-byte aligned on an x86-64 system -- at least that's what comments in the Rust source say, but I do not know where that information is coming from. However, jemalloc violates that expectation. It would be useful to know what exactly is guaranteed in terms of alignment for small allocations (including small non-power-of-2 allocations).

EDIT: This value of 16 seems to originate from https://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html#Aligned-Memory-Blocks (Ctrl-F "sixteen"). Of course that is not normative for jemalloc, but it will catch applications by surprise when jemalloc is used as a drop-in replacement.

The text was updated successfully, but these errors were encountered:

davidtgoldblatt · 2019-07-03T02:43:48Z

The (non-alignment-specifying) allocation functions return memory aligned for any type of object that can live in the returned space.

Practically, this means an alignment of 8 for 8-byte allocations, and 16 for 16-or-more-byte allocations, on 64-bit systems. (This can be tweaked as a config option, as well).

RalfJung · 2019-07-03T09:11:06Z

After some more chatting with @gnzlbg, I think I understand better now why I feel that there is a gap in the docs here.

You are specifically talking about types and objects in C here. In Rust, we support allocations (but not types) where align > size. Those do not exist in C, and hence the docs and also what you just said say nothing about how jemalloc behaves on them.

Practically, this means an alignment of 8 for 8-byte allocations, and 16 for 16-or-more-byte allocations, on 64-bit systems. (This can be tweaked as a config option, as well).

~~This is not accurate when the size is small -- e.g., I have seen malloc(8) return non-16-aligned allocations on a 64bit system.~~

EDIT: Sorry I misread. I assume you mean in general "alignment N for N-byte allocations" when N is a power of two. What about non-powers-of-two? I expect it's something like "N rounded down to the next power of 2"?

gnzlbg · 2019-07-03T09:40:03Z

You are specifically talking about types and objects in C here. In Rust, we support allocations (but not types) where align > size. Those do not exist in C, and hence the docs and also what you just said say nothing about how jemalloc behaves on them.

It was "obvious" to me that, since malloc is a C API, it only adheres to C rules. But it should also have been obvious to me that, because people call malloc from all sorts of programming languages with different models, this isn't necessarily obvious to everybody.

Interpreting the guarantees from the C standard isn't trivial, and extracting the precise guarantees from platforms ABI documents is hard.

I will send a PR documenting this, and showing an example for, e.g., SysV64.

This will need a bit of iteration to make sure that we only guarantee what we are allowed to guarantee, so that we do not limit platform compatibility, configurability, optimization opportunities, etc.

About align > size, first, not all APIs allow the user to pass an alignment - these will follow the C rules + the ABI implementation-defined behavior.

The standard APIs that do support an alignment argument, posix_memalign and aligned_alloc, are already documented (see https://jemalloc.net/jemalloc.3.html), and they call out the semantics precisely. It would be helpful if you could review those and let us know if there is anything we can improve there.

That kind of leaves the non-standard jemalloc-specific APIs which support an alignment request via the flags. For the non-standard API, i can't find anywhere the "size must be a multiple of the alignment" requirement - we only document how the flags are computed from an alignment request:

Align the memory allocation to start at an address that is a multiple of a, where a is a power of two. This macro does not validate that a is a power of 2.

For example, when one wants to perform a 2 byte sized allocation with an alignment of 4, such that align > size, MALLOCX_ALIGN(4) will return 4. The docs don't say whether mallocx(2, MALLOCX_ALIGN(4)) is ok or not. Looking at the tests it is also not clear to me whether this is actually tested.

I think that this behavior should be called out explicitly.

RalfJung · 2019-07-03T18:46:27Z

I will send a PR documenting this, and showing an example for, e.g., SysV64.

This will need a bit of iteration to make sure that we only guarantee what we are allowed to guarantee, so that we do not limit platform compatibility, configurability, optimization opportunities, etc.

Thanks!

The standard APIs that do support an alignment argument, posix_memalign and aligned_alloc, are already documented (see https://jemalloc.net/jemalloc.3.html), and they call out the semantics precisely. It would be helpful if you could review those and let us know if there is anything we can improve there.

Both seem fairly clear in the documentation. However, I can't tell if there are any other extra assumptions they might be making that are not mentioned. What about size == 0?

There is a curious difference between "must be" and "behavior is undefined"; my assumption would be that a violation of a "must" clause also causes UB?

gnzlbg · 2019-07-03T21:31:44Z

Both seem fairly clear in the documentation. However, I can't tell if there are any other extra assumptions they might be making that are not mentioned. What about size == 0?

See #1277 .

jasone · 2019-07-04T21:08:16Z

See the --with-lg-quantum documentation in INSTALL.md.

--with-lg-quantum=<lg-quantum>

Specify the base 2 log of the minimum allocation alignment. jemalloc needs to know the minimum
alignment that meets the following C standard requirement (quoted from the April 12, 2011 draft of
the C11 standard):

The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned
to a pointer to any type of object with a fundamental alignment requirement and then used
to access such an object or an array of such objects in the space allocated [...]

This setting is architecture-specific, and although jemalloc includes known safe values for the
most commonly used modern architectures, there is a wrinkle related to GNU libc (glibc) that may
impact your choice of <lg-quantum>. On most modern architectures, this mandates 16-byte
alignment (=4), but the glibc developers chose not to meet this requirement for performance
reasons. An old discussion can be found at https://sourceware.org/bugzilla/show_bug.cgi?id=206 .
Unlike glibc, jemalloc does follow the C standard by default (caveat: jemalloc technically cheats
for size classes smaller than the quantum), but the fact that Linux systems already work around
this allocator noncompliance means that it is generally safe in practice to let jemalloc's minimum
alignment follow glibc's lead. If you specify --with-lg-quantum=3 during configuration, jemalloc
will provide additional size classes that are not 16-byte-aligned (24, 40, and 56).

gnzlbg · 2019-07-04T21:37:34Z

but the glibc developers chose not to meet this requirement for performance
reasons. An old discussion can be found at https://sourceware.org/bugzilla/show_bug.cgi?id=206 .

Note that this bug has been fixed recently upstream because it made glibc's malloc incompatible with gcc, e.g., see also https://sourceware.org/bugzilla/show_bug.cgi?id=21120 .

RalfJung · 2019-07-04T21:58:08Z

See #1277 .

Yeah, basically that but for posix_memalign and aligned_alloc.

gnzlbg · 2019-07-04T22:09:20Z

The behavior is the same for all C standard APIs, so the behavior of aligned_alloc is covered in #1277 as well. The behavior of posix_memalign is clear, if the allocation fails, an error is returned, and otherwise the allocation succeeds.

EDIT: Actually POSIX:2018 improves on this, guaranteeing the same behavior as C:

If the size of the space requested is 0, the behavior is implementation-defined: either a null pointer shall be returned in memptr, or the behavior shall be as if the size were some non-zero value, except that the behavior is undefined if the the value returned in memptr is used to access an object.

So AFAICT this means that allocating zero size cannot ever return error, since there is always sufficient memory to store zero bytes.

RalfJung · 2019-07-06T08:33:07Z

The behavior of posix_memalign is clear, if the allocation fails, an error is returned, and otherwise the allocation succeeds.

You consider it clear. ;)

The behavior is the same for all C standard APIs, so the behavior of aligned_alloc is covered in #1277 as well.

I did not know that aligned_alloc is a C standard API. Other readers of the docs might not know that either.

oxalica · 2021-11-20T19:08:11Z

Practically, this means an alignment of 8 for 8-byte allocations, and 16 for 16-or-more-byte allocations, on 64-bit systems. (This can be tweaked as a config option, as well).

It violates the new LLVM 13 clang's assumption of malloc added in https://reviews.llvm.org/D100879.
Clang assumes the pointer returned by malloc must be 16-byte-aligned for any size, even for less-than-8-byte allocations.

This actually caused a crash in firefox due to unaligned access when using LLVM 13 and jemalloc together, see
https://bugzilla.mozilla.org/show_bug.cgi?id=1741454

davidtgoldblatt · 2021-11-22T18:18:13Z

Thanks! Replying upstream.

davidtgoldblatt · 2021-12-01T23:43:13Z

LLVM upstream's take on this is that the glibc malloc behavior of 16-byte alignment is a platform guarantee for anything targeting -linux-gnu targets, and so must be obeyed by malloc replacements on those targets.

From what I know of the runtime upstream I don't actually think this is a correct interpretation (I read it as information about the default alignment rather than a contract; other parts of the stdlib distribution avoid making similar assumptions, and from what I know of the glibc malloc maintainers they're fairly live-and-let-live in terms of the constraints they try to impose on malloc replacements), but for a combination of me being on parental leave / my job can easily work around the perf regression via compiler flags / upstream doesn't agree about it, I'm not super motivated to argue about it further.

For us going forward, I think just disabling sub-quantum size classes might be easiest. So long as the config setting stays around it's still semi-opt-in, just with some more coordination required.

zuiderkwast · 2022-11-23T14:02:22Z

Hello. What about 32bit systems? Is it possible to use lg-quantum=2 and what are the size classes with lg-quantum 2, 3 and 4?

I'm guessing here so please correct me if I'm wrong:

lg-quantum	size classes
2	4, 8, 12, 16, 20, 24, 28?, 32, 36?, 40, 44?, 48, 52?, 56, 60?, 64, ...
3	4, 8, 16, 24, 32, 40, 48, 56, 64, ...
4	4, 6, 16, 32, 48, 64, ...

davidtgoldblatt · 2022-11-30T04:57:14Z

Practically, SC_LG_TINY_MIN is a floor on lg-quantum and is unconfigurably set to 3. I think it might be possible to set lg-quantum below this, but I wouldn't expect things to work correctly -- the size class computation logic has this as a built-in assumption.

@bshanteau

The AVFrame deinterlacer is used by mytharchivehelper to create thumbnail images. In commit bb6365f the code has been updated to use the MythVideoFrame deinterlacer; previously a local implementation was used. This commit introduced an error in the conversion from AVFrame to MythVideoFrame. MythVideoFrame expects the video data to be stored in a contiguous memory area, with one base pointer and various offsets pointing to the different areas (Y, U, V). AVFrame has three pointers and can and does use separate memory areas. The original code used pointer subtractions to compute the offsets for MythVideoFrame but this only works if the video memory is one contigous memory area. If not, it leads to segfaults. This is now fixed by copying the AVFrame video data into a temporary buffer and passing that to the deinterlacer. Note that commit bb6365f has not been reverted because the MythVideoFrame deinterlacer gives a better picture quality than the original AVFrame deinterlacer. Thanks to @bshanteau for reporting the problem. Thanks to @rcrdnalor for analyzing the problem. Refs #633 (cherry picked from commit d8976e5)

gitamohr · 2023-04-11T19:27:30Z

Is it possible to configure jemalloc (5.3.0) so that it always returns 16-byte aligned addresses, even for requests of size <= 8? I'm running into an issue with a 3rd-party library on Linux that assumes all heap pointers are 16-byte aligned.

I'm using lg-quantum=4, but jemalloc will return 8-byte aligned addresses for sizes <= 8.

This was referenced Jul 2, 2019

We call posix_memalign with a too small alignment rust-lang/rust#62251

Closed

Figure out rules for minimal alignment of system allocator rust-lang/miri#812

Closed

gnzlbg mentioned this issue Jul 3, 2019

Clarify the alignment guarantees for memory returned by malloc #1537

Closed

evanj mentioned this issue Dec 29, 2021

NewCompressReader: Separate buffers to LZ4_compress_fast_continue DataDog/golz4#27

Merged

space88man mentioned this issue Jun 21, 2022

[Bug]: SIGSEGV as (sub)struct in WOLFSSL not aligned to 16 bytes wolfSSL/wolfssl#5264

Closed

interwq mentioned this issue Aug 5, 2022

crashes inside jemalloc version 5.2.1 with 8-bytes alignment (./configure --with-lg-quantum=3 ...) #2311

Open

MikePall mentioned this issue Sep 21, 2023

arm32 about sbufL(sb) LuaJIT/LuaJIT#1091

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are malloc's alignment guarantees? #1533

What are malloc's alignment guarantees? #1533

RalfJung commented Jul 2, 2019 •

edited

Loading

davidtgoldblatt commented Jul 3, 2019

RalfJung commented Jul 3, 2019 •

edited

Loading

gnzlbg commented Jul 3, 2019

RalfJung commented Jul 3, 2019

gnzlbg commented Jul 3, 2019

jasone commented Jul 4, 2019

gnzlbg commented Jul 4, 2019

RalfJung commented Jul 4, 2019

gnzlbg commented Jul 4, 2019 •

edited

Loading

RalfJung commented Jul 6, 2019

oxalica commented Nov 20, 2021

davidtgoldblatt commented Nov 22, 2021

davidtgoldblatt commented Dec 1, 2021 •

edited

Loading

zuiderkwast commented Nov 23, 2022

davidtgoldblatt commented Nov 30, 2022

gitamohr commented Apr 11, 2023

What are malloc's alignment guarantees? #1533

What are malloc's alignment guarantees? #1533

Comments

RalfJung commented Jul 2, 2019 • edited Loading

davidtgoldblatt commented Jul 3, 2019

RalfJung commented Jul 3, 2019 • edited Loading

gnzlbg commented Jul 3, 2019

RalfJung commented Jul 3, 2019

gnzlbg commented Jul 3, 2019

jasone commented Jul 4, 2019

gnzlbg commented Jul 4, 2019

RalfJung commented Jul 4, 2019

gnzlbg commented Jul 4, 2019 • edited Loading

RalfJung commented Jul 6, 2019

oxalica commented Nov 20, 2021

davidtgoldblatt commented Nov 22, 2021

davidtgoldblatt commented Dec 1, 2021 • edited Loading

zuiderkwast commented Nov 23, 2022

davidtgoldblatt commented Nov 30, 2022

gitamohr commented Apr 11, 2023

RalfJung commented Jul 2, 2019 •

edited

Loading

RalfJung commented Jul 3, 2019 •

edited

Loading

gnzlbg commented Jul 4, 2019 •

edited

Loading

davidtgoldblatt commented Dec 1, 2021 •

edited

Loading