Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are malloc's alignment guarantees? #1533

Open
RalfJung opened this issue Jul 2, 2019 · 16 comments
Open

What are malloc's alignment guarantees? #1533

RalfJung opened this issue Jul 2, 2019 · 16 comments

Comments

@RalfJung
Copy link

RalfJung commented Jul 2, 2019

What exactly are the guarantees that jemalloc's malloc provides in terms of alignment?

The docs say

The allocated space is suitably aligned (after possible pointer coercion) for storage of any type of object.

However, in rust-lang/rust#45955 we noticed that this is not correct: at least with GCC/clang extensions, one can define a type of size 8 that has alignment 16. However, jemalloc has been observed handing out allocations of size 8 that are just 8-aligned.

System allocator functions usually seem to guarantee that everything is at least 16-byte aligned on an x86-64 system -- at least that's what comments in the Rust source say, but I do not know where that information is coming from. However, jemalloc violates that expectation. It would be useful to know what exactly is guaranteed in terms of alignment for small allocations (including small non-power-of-2 allocations).

EDIT: This value of 16 seems to originate from https://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html#Aligned-Memory-Blocks (Ctrl-F "sixteen"). Of course that is not normative for jemalloc, but it will catch applications by surprise when jemalloc is used as a drop-in replacement.

@davidtgoldblatt
Copy link
Member

The (non-alignment-specifying) allocation functions return memory aligned for any type of object that can live in the returned space.

Practically, this means an alignment of 8 for 8-byte allocations, and 16 for 16-or-more-byte allocations, on 64-bit systems. (This can be tweaked as a config option, as well).

@RalfJung
Copy link
Author

RalfJung commented Jul 3, 2019

After some more chatting with @gnzlbg, I think I understand better now why I feel that there is a gap in the docs here.

You are specifically talking about types and objects in C here. In Rust, we support allocations (but not types) where align > size. Those do not exist in C, and hence the docs and also what you just said say nothing about how jemalloc behaves on them.

Practically, this means an alignment of 8 for 8-byte allocations, and 16 for 16-or-more-byte allocations, on 64-bit systems. (This can be tweaked as a config option, as well).

This is not accurate when the size is small -- e.g., I have seen malloc(8) return non-16-aligned allocations on a 64bit system.

EDIT: Sorry I misread. I assume you mean in general "alignment N for N-byte allocations" when N is a power of two. What about non-powers-of-two? I expect it's something like "N rounded down to the next power of 2"?

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 3, 2019

You are specifically talking about types and objects in C here. In Rust, we support allocations (but not types) where align > size. Those do not exist in C, and hence the docs and also what you just said say nothing about how jemalloc behaves on them.

It was "obvious" to me that, since malloc is a C API, it only adheres to C rules. But it should also have been obvious to me that, because people call malloc from all sorts of programming languages with different models, this isn't necessarily obvious to everybody.

Interpreting the guarantees from the C standard isn't trivial, and extracting the precise guarantees from platforms ABI documents is hard.

I will send a PR documenting this, and showing an example for, e.g., SysV64.

This will need a bit of iteration to make sure that we only guarantee what we are allowed to guarantee, so that we do not limit platform compatibility, configurability, optimization opportunities, etc.


About align > size, first, not all APIs allow the user to pass an alignment - these will follow the C rules + the ABI implementation-defined behavior.

The standard APIs that do support an alignment argument, posix_memalign and aligned_alloc, are already documented (see https://jemalloc.net/jemalloc.3.html), and they call out the semantics precisely. It would be helpful if you could review those and let us know if there is anything we can improve there.

That kind of leaves the non-standard jemalloc-specific APIs which support an alignment request via the flags. For the non-standard API, i can't find anywhere the "size must be a multiple of the alignment" requirement - we only document how the flags are computed from an alignment request:

Align the memory allocation to start at an address that is a multiple of a, where a is a power of two. This macro does not validate that a is a power of 2.

For example, when one wants to perform a 2 byte sized allocation with an alignment of 4, such that align > size, MALLOCX_ALIGN(4) will return 4. The docs don't say whether mallocx(2, MALLOCX_ALIGN(4)) is ok or not. Looking at the tests it is also not clear to me whether this is actually tested.

I think that this behavior should be called out explicitly.

@RalfJung
Copy link
Author

RalfJung commented Jul 3, 2019

I will send a PR documenting this, and showing an example for, e.g., SysV64.

This will need a bit of iteration to make sure that we only guarantee what we are allowed to guarantee, so that we do not limit platform compatibility, configurability, optimization opportunities, etc.

Thanks!

The standard APIs that do support an alignment argument, posix_memalign and aligned_alloc, are already documented (see https://jemalloc.net/jemalloc.3.html), and they call out the semantics precisely. It would be helpful if you could review those and let us know if there is anything we can improve there.

Both seem fairly clear in the documentation. However, I can't tell if there are any other extra assumptions they might be making that are not mentioned. What about size == 0?

There is a curious difference between "must be" and "behavior is undefined"; my assumption would be that a violation of a "must" clause also causes UB?

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 3, 2019

Both seem fairly clear in the documentation. However, I can't tell if there are any other extra assumptions they might be making that are not mentioned. What about size == 0?

See #1277 .

@jasone
Copy link
Member

jasone commented Jul 4, 2019

See the --with-lg-quantum documentation in INSTALL.md.

--with-lg-quantum=<lg-quantum>

Specify the base 2 log of the minimum allocation alignment. jemalloc needs to know the minimum
alignment that meets the following C standard requirement (quoted from the April 12, 2011 draft of
the C11 standard):

The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned
to a pointer to any type of object with a fundamental alignment requirement and then used
to access such an object or an array of such objects in the space allocated [...]

This setting is architecture-specific, and although jemalloc includes known safe values for the
most commonly used modern architectures, there is a wrinkle related to GNU libc (glibc) that may
impact your choice of <lg-quantum>. On most modern architectures, this mandates 16-byte
alignment (=4), but the glibc developers chose not to meet this requirement for performance
reasons. An old discussion can be found at https://sourceware.org/bugzilla/show_bug.cgi?id=206 .
Unlike glibc, jemalloc does follow the C standard by default (caveat: jemalloc technically cheats
for size classes smaller than the quantum), but the fact that Linux systems already work around
this allocator noncompliance means that it is generally safe in practice to let jemalloc's minimum
alignment follow glibc's lead. If you specify --with-lg-quantum=3 during configuration, jemalloc
will provide additional size classes that are not 16-byte-aligned (24, 40, and 56).

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 4, 2019

but the glibc developers chose not to meet this requirement for performance
reasons. An old discussion can be found at https://sourceware.org/bugzilla/show_bug.cgi?id=206 .

Note that this bug has been fixed recently upstream because it made glibc's malloc incompatible with gcc, e.g., see also https://sourceware.org/bugzilla/show_bug.cgi?id=21120 .

@RalfJung
Copy link
Author

RalfJung commented Jul 4, 2019

See #1277 .

Yeah, basically that but for posix_memalign and aligned_alloc.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 4, 2019

The behavior is the same for all C standard APIs, so the behavior of aligned_alloc is covered in #1277 as well. The behavior of posix_memalign is clear, if the allocation fails, an error is returned, and otherwise the allocation succeeds.

EDIT: Actually POSIX:2018 improves on this, guaranteeing the same behavior as C:

If the size of the space requested is 0, the behavior is implementation-defined: either a null pointer shall be returned in memptr, or the behavior shall be as if the size were some non-zero value, except that the behavior is undefined if the the value returned in memptr is used to access an object.

So AFAICT this means that allocating zero size cannot ever return error, since there is always sufficient memory to store zero bytes.

@RalfJung
Copy link
Author

RalfJung commented Jul 6, 2019

The behavior of posix_memalign is clear, if the allocation fails, an error is returned, and otherwise the allocation succeeds.

You consider it clear. ;)

The behavior is the same for all C standard APIs, so the behavior of aligned_alloc is covered in #1277 as well.

I did not know that aligned_alloc is a C standard API. Other readers of the docs might not know that either.

@oxalica
Copy link

oxalica commented Nov 20, 2021

Practically, this means an alignment of 8 for 8-byte allocations, and 16 for 16-or-more-byte allocations, on 64-bit systems. (This can be tweaked as a config option, as well).

It violates the new LLVM 13 clang's assumption of malloc added in https://reviews.llvm.org/D100879.
Clang assumes the pointer returned by malloc must be 16-byte-aligned for any size, even for less-than-8-byte allocations.

This actually caused a crash in firefox due to unaligned access when using LLVM 13 and jemalloc together, see
https://bugzilla.mozilla.org/show_bug.cgi?id=1741454

@davidtgoldblatt
Copy link
Member

Thanks! Replying upstream.

@davidtgoldblatt
Copy link
Member

davidtgoldblatt commented Dec 1, 2021

LLVM upstream's take on this is that the glibc malloc behavior of 16-byte alignment is a platform guarantee for anything targeting -linux-gnu targets, and so must be obeyed by malloc replacements on those targets.

From what I know of the runtime upstream I don't actually think this is a correct interpretation (I read it as information about the default alignment rather than a contract; other parts of the stdlib distribution avoid making similar assumptions, and from what I know of the glibc malloc maintainers they're fairly live-and-let-live in terms of the constraints they try to impose on malloc replacements), but for a combination of me being on parental leave / my job can easily work around the perf regression via compiler flags / upstream doesn't agree about it, I'm not super motivated to argue about it further.

For us going forward, I think just disabling sub-quantum size classes might be easiest. So long as the config setting stays around it's still semi-opt-in, just with some more coordination required.

@zuiderkwast
Copy link

Hello. What about 32bit systems? Is it possible to use lg-quantum=2 and what are the size classes with lg-quantum 2, 3 and 4?

I'm guessing here so please correct me if I'm wrong:

lg-quantum size classes
2 4, 8, 12, 16, 20, 24, 28?, 32, 36?, 40, 44?, 48, 52?, 56, 60?, 64, ...
3 4, 8, 16, 24, 32, 40, 48, 56, 64, ...
4 4, 6, 16, 32, 48, 64, ...

@davidtgoldblatt
Copy link
Member

Practically, SC_LG_TINY_MIN is a floor on lg-quantum and is unconfigurably set to 3. I think it might be possible to set lg-quantum below this, but I wouldn't expect things to work correctly -- the size class computation logic has this as a built-in assumption.

KungFuJesus referenced this issue in MythTV/mythtv Dec 1, 2022
The AVFrame deinterlacer is used by mytharchivehelper to create thumbnail images.
In commit bb6365f the code has been updated to use the MythVideoFrame deinterlacer;
previously a local implementation was used.
This commit introduced an error in the conversion from AVFrame to MythVideoFrame.
MythVideoFrame expects the video data to be stored in a contiguous memory area, with
one base pointer and various offsets pointing to the different areas (Y, U, V).
AVFrame has three pointers and can and does use separate memory areas.
The original code used pointer subtractions to compute the offsets for MythVideoFrame
but this only works if the video memory is one contigous memory area.
If not, it leads to segfaults.
This is now fixed by copying the AVFrame video data into a temporary buffer
and passing that to the deinterlacer.
Note that commit bb6365f has not been reverted because the MythVideoFrame deinterlacer
gives a better picture quality than the original AVFrame deinterlacer.
Thanks to @bshanteau for reporting the problem.
Thanks to @rcrdnalor for analyzing the problem.

Refs #633

(cherry picked from commit d8976e5)
@gitamohr
Copy link

Is it possible to configure jemalloc (5.3.0) so that it always returns 16-byte aligned addresses, even for requests of size <= 8? I'm running into an issue with a 3rd-party library on Linux that assumes all heap pointers are 16-byte aligned.

I'm using lg-quantum=4, but jemalloc will return 8-byte aligned addresses for sizes <= 8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants