MALLOC_MMAP_THRESHOLD_=0 makes systemd-cryptsetup fail #39

topimiettinen · 2020-09-23T14:08:54Z

I added this line to systemd-cryptsetup service files to instruct glibc malloc() to use mmap() instead of heap:

Environment=MALLOC_MMAP_THRESHOLD_=0

But then systemd-cryptsetup refuses to start:

Sep 23 15:24:29 systemd-cryptsetup[892]: Set cipher aes, mode xts-plain64, key size 256 bits for device /dev/mapper/levy-swap.
Sep 23 15:24:29 systemd-cryptsetup[892]: Couldn't create ioctl argument.
Sep 23 15:24:29 systemd-cryptsetup[892]: Cannot use device cswap, name is invalid or still in use.

The error message "Couldn't create ioctl argument." comes from device_mapper/ioctl/libdm-iface.c#L1818). The problem seems to be that mmap() (called from malloc() called from dm_zalloc() called from device_mapper/ioctl/libdm-iface.c#L1190) may fail with EAGAIN:

sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="PRIORITY=6\nSYSLOG_FACILITY=3\nCODE_FILE=src/cryptsetup/cryptsetup.c\nCODE_LINE=615\nCODE_FUNC=attach_luks_or_plain_or_bitlk\nSYSLOG_IDENTIFIER=systemd-cryptsetup\n", iov_len=158}, {iov_base="MESSAGE=", iov_len=
8}, {iov_base="Set cipher aes, mode xts-plain64, key size 256 bits for device /dev/mapper/levy-swap.", iov_len=85}, {iov_base="\n", iov_len=1}], msg_iovlen=4, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 252
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7517ebd1a000
mmap(NULL, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7517ebd10000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 EAGAIN (Resource temporarily unavailable)
brk(NULL)                               = 0x648b5b83c000
brk(0x648b5b85d000)                     = 0x648b5b83c000
mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 EAGAIN (Resource temporarily unavailable)
ioctl(4, DM_LIST_VERSIONS, {version=4.1.0, data_size=16384, data_start=312, flags=DM_EXISTS_FLAG} => {version=4.42.0, data_size=428, data_start=312, flags=DM_EXISTS_FLAG, ...}) = 0
munmap(0x7517ebd10000, 20480)           = 0
munmap(0x7517ebd1a000, 4096)            = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7517ebd1a000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7517ebd14000
mmap(NULL, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 EAGAIN (Resource temporarily unavailable)
brk(0x648b5b861000)                     = 0x648b5b83c000
mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 EAGAIN (Resource temporarily unavailable)
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7517ebd13000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7517ebd12000
munmap(0x7517ebd13000, 4096)            = 0
sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="PRIORITY=3\nSYSLOG_FACILITY=3\nCODE_FILE=src/shared/cryptsetup-util.c\nCODE_LINE=95\nCODE_FUNC=cryptsetup_log_glue\nSYSLOG_IDENTIFIER=systemd-cryptsetup\n", iov_len=148}, {iov_base="MESSAGE=", iov_len=8}, {iov_b
ase="Couldn't create ioctl argument.", iov_len=31}, {iov_base="\n", iov_len=1}], msg_iovlen=4, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 188

I'm puzzled why mmap() for 4096 bytes or even 1M would fail, so perhaps this is a bug in glibc or kernel, but I'd like first to check if this could be a bug in lvm2.

The text was updated successfully, but these errors were encountered:

topimiettinen · 2020-09-24T13:40:12Z

Reported also to glibc: 26663.

codonell · 2020-09-24T13:49:34Z

Why are you setting MALLOC_MMAP_THRESHOLD_=0? I would like to understand the specific use case so I can see if there isn't anything else we can do.

topimiettinen · 2020-09-24T14:30:20Z

I assumed that with mmap(), the allocations would be spread randomly over the memory ("ASLR for malloc()"). Sadly this is not the case and the mmap()ed regions are pretty much consecutive, so there's not much advantage over sbrk(). There can still be gaps if something is munmap()ed in the middle of the area.

kergon · 2020-09-24T15:21:19Z

This thread has no context so I have no idea what the issue is really about, but for information there are 'critical sections' within libdm during which memory allocations must not happen to avoid the possiblity of the machine deadlocking (e.g. if some I/O is blocked waiting for the sequence of libdm operations to complete and then an mmap within that sequence gets blocked waiting for that blocked I/O). LVM2's use of libdm pre-allocates the memory it requires in advance (brk) and never calls mmap() within those sections - but it does call malloc() which gives out free memory from what it previously reserved with brk.

kergon · 2020-09-24T15:32:59Z

So if you wanted to have malloc use mmap instead of brk in any user-space device mapper tools, you would need a way to instruct malloc to preallocate a certain amount of (pinned - already allocated and present in core and not swappable) free memory with mmap available for future allocation requests and keep hold of it if freed, not releasing it back to the system until instructed to do so by the application. (Corresponding to entering and leaving critical sections of code in the application.)

codonell · 2020-09-24T16:19:35Z

@kergon The malloc API uses brk or mmap MAP_ANONYMOUS | MAP_NORESERVE (note morecore() is deprecated in 2.32 so you can't provide your own special pages without providing the entire allocator API). How would those operations block on waiting I/O? Would they block waiting for swap device?

codonell · 2020-09-24T16:20:19Z

@topimiettinen If you could confirm why you are using MALLOC_MMAP_THRESHOLD_=0 that would be helpful.

topimiettinen · 2020-09-24T16:26:49Z

On 24.9.2020 19.20, codonell wrote: @topimiettinen <https://github.com/topimiettinen> If you could confirm why you are using MALLOC_MMAP_THRESHOLD_=0 that would be helpful.

The reason for MALLOC_MMAP_THRESHOLD_=0 was that it changes malloc() to use mmap() for allocating new memory, instead of sbrk(). I assumed that with mmap(), the allocations would be spread randomly over the memory ("ASLR for malloc()"). Sadly this is not the case and the mmap()ed regions are pretty much consecutive, so there's not much advantage over sbrk(). There can still be gaps if something is munmap()ed in the middle of the area. -Topi

codonell · 2020-09-24T16:51:27Z

On 24.9.2020 19.20, codonell wrote: @topimiettinen https://github.com/topimiettinen If you could confirm why you are using MALLOC_MMAP_THRESHOLD_=0 that would be helpful.
The reason for MALLOC_MMAP_THRESHOLD_=0 was that it changes malloc() to use mmap() for allocating new memory, instead of sbrk(). I assumed that with mmap(), the allocations would be spread randomly over the memory ("ASLR for malloc()"). Sadly this is not the case and the mmap()ed regions are pretty much consecutive, so there's not much advantage over sbrk(). There can still be gaps if something is munmap()ed in the middle of the area. -Topi

This doesn't answer my question. Why are you doing this? You write "ASLR for malloc()," but is there a particular threat model you're trying to target, or a specific security vulnerability that happened in the past?

topimiettinen · 2020-09-24T17:47:12Z

The risk which I'm trying to mitigate with more randomized memory allocations is that when the memory is allocated from process heap area, this area is contiguous and then buffer overflows by an attacker have pretty large surface to play with. It may be possible to access pretty much anything allocated at the time of attack. The attacker may even predetermine the offsets between the buffer (base of attack) and an area of interest (for example security credentials) if memory allocation patterns of the target process happens to be relatively fixed.

But if instead the memory for malloc() would be dispersed throughout the address space, a buffer overflow would be contained to very small area, perhaps only a 4k page. Since nothing would be mapped in the near vicinity, trying to access any memory address above and below the areas would be invalid, probably leading to a quick segfault in a buffer overflow attack.

codonell · 2020-09-24T18:26:26Z

@topimiettinen Thanks for explaining your position and intent. I reached out quickly to a security researcher I know (Eyal Itkin) and we agree that what you're really looking for here is not ASLR, but rather control over the heap layout. With ASLR you only ever randomize the arena heap's base, and the minute you start batching mmap's for peformance (like the arena does) you will invariable end up with a known base address and a deterministic layout (heap shape). What you want is random allocations within the heap such that the shape of the heap is random (you don't know what follows next). You should look at Google's Scudo (https://llvm.org/docs/ScudoHardenedAllocator.html) project to see if that allocator meets your specific needs. It is not likely the generic system allocator in glibc will ever do what you want by default, the performance impact of the randomization can be quite high in certain scenarios.

codonell · 2020-09-24T18:31:39Z

@topimiettinen To summarize: I suggest not using MALLOC_MMAP_THRESHOLD_=0 since it will cost you a lot of performance, has kernel interactions with the number of mmaps, and doesn't do what you want. Instead you should more clearly refine your threat model and look for an allocator that meets your requirements.

topimiettinen · 2020-09-24T19:00:49Z

Thanks for the tip, but Scudo does not look very interesting. I'm looking at description of OpenBSD malloc(). It seems to use mmap() extensively. At least the directory pages are protected with guard pages, but the memory pages don't seem to be.

eyalitki · 2020-09-24T19:29:12Z

Are you trying to limit the effect of a possible linear overflow between heap allocations? (Using unmapped pages / guard pages) Or are you trying to hinder attacks from one buffer to the other, from attackers that know how the heap is shaped / shaped the heap to a known state?

In the latter case, scudo should work, as this is one of the threats in their threat model, as Carlos mention earlier. If this is the first case, the randomness of the buffer addresses is totally irrelevant, and you just need the guards between the allocations. Which in turn, is a waste of memory (at least 50% of the heap memory won't be used, and serve only as guards) or require mmap / kernel to separate the allocations, and they are currently designed differently, so you would get adjacent allocations, defeating your cause.

eyalitki · 2020-09-24T19:32:37Z

As Carlos mentioned, please formalize your threat model, so we could help you design a proper solution / point you to an existing solution. Technical terms like mmap() should depend on the problem you want to solve and be part of a possible solution if necessary.

Don't try to guess a solution before formalizing your threat model. Such a fixation will most probably leave you with the wrong solution to your original problem.

topimiettinen · 2020-09-24T20:06:09Z

I'm interested in guard pages rather than the shape of the heap. I think randomized addresses would also improve security, they would work similarly to shuffling of the heap so the attacker can't guess offsets between items. The disadvantage of the continuous heap is that the attacker may be able to probe the heap without segfaulting, this is not possible if the memory pages are dispersed with guard pages.

Instead of trusting that the kernel will randomize the addresses of mmap(), a random address hint can be supplied as the first argument of the mmap() system call. If the address is available, kernel will use it, otherwise returns sequential address as if zero was supplied.

I made this small program to test this. It maps a guard page with PROT_NONE, a memory page and then another guard page. It assumes that kernel accepts a random address below 2^47.

#include <sys/mman.h>
#include <sys/random.h>

int main(void) {
	unsigned long addr;
	getrandom(&addr, sizeof(addr), GRND_RANDOM);
	addr &= ((1UL << 47) - 1) & ~0xfffUL;
	addr = mmap(addr, 4096 * 3, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
	mmap(addr + 4096, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
	return 0;
}

There's no output, but strace shows that the pages are indeed at a random address:

getrandom("\xbc\x4a\xf4\xe0\xfc\xc1\xfd\xda", 8, GRND_RANDOM) = 8
mmap(0x41fce0f44000, 12288, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x41fce0f44000
mmap(0x41fce0f45000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x41fce0f45000

eyalitki · 2020-09-24T20:24:57Z

While still more clearly stated, I'm still missing a few details. If there is a page guard, a linear overflow will crash the program. Randomization against an attacker that is "guessing" the offsets implies an attacker with a Write-What-Where (Absolute or Relative) primitive. If your attacker already has such a powerful write primitive, they could corrupt the return address, global function pointer or various other control-flow-critical components, and win.

Simply stated, an attacker with such a powerful primitive most probably won't corrupt the heap's meta-data in order to gain code execution. So the solution will only protect vtable / fptrs that are stored on the heap from being corrupted, while doing nothing for the stack / global variables which are still accessible to the attacker.

If the primitive is a read and not a write primitive, the randomization would still work poorly. Assuming that most heap objects contain pointers to other objects, an attacker with a relative-read-where will easily traverse the pointers, leak by leak, and build a nice map of the entire memory. An absolute-read-where will be done the same, requiring a single initial pointer leak.

As to the suggested code snippet, please pay attention to the fact that most heap allocations are usually not that important / error-prone. Allocating 3 pages per heap object is memory wasteful to the extreme, and is recommended only for specific code-sensitive allocations. In addition, since 47 address bits are assumed, and the hint is to a page-aligned address, one can request 47-12=35 bits from the os, thus saving precious entropy. Also make sure to check if the calls to getrandom() or mmap() failed, and act accordingly.

topimiettinen · 2020-09-24T21:00:40Z

While still more clearly stated, I'm still missing a few details. If there is a page guard, a linear overflow will crash the program. Randomization against an attacker that is "guessing" the offsets implies an attacker with a Write-What-Where (Absolute or Relative) primitive. If your attacker already has such a powerful write primitive, they could corrupt the return address, global function pointer or various other control-flow-critical components, and win.
Simply stated, an attacker with such a powerful primitive most probably won't corrupt the heap's meta-data in order to gain code execution. So the solution will only protect vtable / fptrs that are stored on the heap from being corrupted, while doing nothing for the stack / global variables which are still accessible to the attacker.

That's true. But fixing such problems are beyond the scope of poor malloc(). I think only processor manufacturers could introduce features to prevent such attacks. For example, there should be a separate stack for return addresses. Function pointers could be opaque IDs loaded from code descriptors. Execute access should not imply read access. Data and program address spaces should be separate. There should be a fast and unprivileged method to manipulate page tables or switch between address spaces, so various parts of a program or libraries would be able to protect their data when they are not in control. Speculative execution should not cross privilege levels and cache lines should be tagged with address space identifiers. Etc.

If the primitive is a read and not a write primitive, the randomization would still work poorly. Assuming that most heap objects contain pointers to other objects, an attacker with a relative-read-where will easily traverse the pointers, leak by leak, and build a nice map of the entire memory. An absolute-read-where will be done the same, requiring a single initial pointer leak.

Why would most heap object contain pointers? I'd guess strings would be more common.

As to the suggested code snippet, please pay attention to the fact that most heap allocations are usually not that important / error-prone. Allocating 3 pages per heap object is memory wasteful to the extreme, and is recommended only for specific code-sensitive allocations. In addition, since 47 address bits are assumed, and the hint is to a page-aligned address, one can request 47-12=35 bits from the os, thus saving precious entropy. Also make sure to check if the calls to getrandom() or mmap() failed, and act accordingly.

This was just a quick code snippet, I didn't even bother to put printf statements to write the addresses, never mind checking for errors or optimizing the random bits. Two out of three pages are mapped with PROT_NONE, why would they also consume memory?

topimiettinen · 2020-09-24T21:15:48Z

Openbsd malloc() has another interesting feature, the directory structures are offset from start of page by a random number of bytes.

topimiettinen · 2020-09-25T15:28:02Z

With the patch below to glibc, memory mappings where the address is not important are mapped at more randomized locations and also guard pages are installed. This also applies to mappings made by ld.so. So far I haven't noticed any problems.

diff --git a/sysdeps/unix/sysv/linux/mmap64.c b/sysdeps/unix/sysv/linux/mmap64.c
index 8074deb466..4d7c528a40 100644
--- a/sysdeps/unix/sysv/linux/mmap64.c
+++ b/sysdeps/unix/sysv/linux/mmap64.c
@@ -22,6 +22,8 @@
 #include <sys/mman.h>
 #include <sysdep.h>
 #include <mmap_internal.h>
+/* For getrandom() */
+#include <sys/random.h>
 
 #ifdef __NR_mmap2
 /* To avoid silent truncation of offset when using mmap2, do not accept
@@ -52,12 +54,58 @@ __mmap64 (void *addr, size_t len, int prot, int flags, int fd, off64_t offset)
     return (void *) INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
 
   MMAP_PREPARE (addr, len, prot, flags, fd, offset);
+  if (!addr)
+    {
+       unsigned long laddr;
+
+       /*
+        Caller doesn't care about the exact address and kernel uses
+        too contiguous addresses, so let's help the kernel. It can
+        still override the address if it's unusable.
+       */
+       getrandom(&laddr, sizeof(laddr), GRND_RANDOM);
+
+       if (sizeof(unsigned long) == 8)
+        {
+          laddr &= ((1UL << (sizeof(unsigned long) * 8 - 17)) - 1UL) & ~0xfffUL;
+        }
+
+#ifdef __NR_mmap2
+       laddr = (unsigned long) MMAP_CALL (mmap2, (void *) laddr, len, prot, flags, fd,
+                                         (off_t) (offset / MMAP2_PAGE_UNIT));
+#else
+       laddr = (unsigned long) MMAP_CALL (mmap, (void *) laddr, len, prot, flags, fd, offset);
+#endif
+
+       if ((void *) laddr != MAP_FAILED)
+        {
+           int saved_errno = errno;
+
+          /* Try to map also guard pages with PROT_NONE, ignore failures and restore errno */
+#ifdef __NR_mmap2
+          (void) MMAP_CALL (mmap2, (void *) (laddr - 4096), 4096, PROT_NONE,
+                            MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+          (void) MMAP_CALL (mmap2, (void *) (laddr + len), 4096, PROT_NONE,
+                            MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+#else
+          (void) MMAP_CALL (mmap, (void *) (laddr - 4096), 4096, PROT_NONE,
+                            MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+          (void) MMAP_CALL (mmap, (void *) (laddr + len), 4096, PROT_NONE,
+                            MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+#endif
+          errno = saved_errno;
+        }
+       return (void *) laddr;
+    }
+  else
+    {
 #ifdef __NR_mmap2
-  return (void *) MMAP_CALL (mmap2, addr, len, prot, flags, fd,
+      return (void *) MMAP_CALL (mmap2, addr, len, prot, flags, fd,
                             (off_t) (offset / MMAP2_PAGE_UNIT));
 #else
-  return (void *) MMAP_CALL (mmap, addr, len, prot, flags, fd, offset);
+      return (void *) MMAP_CALL (mmap, addr, len, prot, flags, fd, offset);
 #endif
+    }
 }
 weak_alias (__mmap64, mmap64)
 libc_hidden_def (__mmap64)

Example strace from /bin/sync:

execve("/bin/sync", ["/bin/sync"], 0x7ffd0b8d95b8 /* 64 vars */) = 0
brk(NULL)                               = 0x5e25f5d65000
access("/etc/ld.so.preload", R_OK)      = 0
openat(AT_FDCWD, "/etc/ld.so.preload", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3)                                = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=189088, ...}) = 0
getrandom("\x52\x04\xf3\xe2\xed\x47\xa9\x3f", 8, GRND_RANDOM) = 8
mmap(0x47ede2f30000, 189088, PROT_READ, MAP_PRIVATE, 3, 0) = 0x47ede2f30000
mmap(0x47ede2f2f000, 4096, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x47ede2f2f000
mmap(0x47ede2f5e2a0, 4096, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x748ecd4ad000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20n\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=12749056, ...}) = 0
getrandom("\xa2\x07\x34\xd2\x92\x4a\xc9\xf7", 8, GRND_RANDOM) = 8
mmap(0x4a92d2340000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4a92d2340000
mmap(0x4a92d233f000, 4096, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4a92d233f000
mmap(0x4a92d2342000, 4096, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4a92d2342000
getrandom("\x66\x81\x3b\x4e\xdc\x61\x30\x45", 8, GRND_RANDOM) = 8
mmap(0x61dc4e3b8000, 1852680, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x61dc4e3b8000
mmap(0x61dc4e3b7000, 4096, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x61dc4e3b7000
mmap(0x61dc4e57c508, 4096, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x748ecd483000
mprotect(0x61dc4e3dd000, 1662976, PROT_NONE) = 0
mmap(0x61dc4e3dd000, 1355776, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x61dc4e3dd000
mmap(0x61dc4e528000, 303104, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x170000) = 0x61dc4e528000
mmap(0x61dc4e573000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ba000) = 0x61dc4e573000
mmap(0x61dc4e579000, 13576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x61dc4e579000
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x4a92d2341580) = 0
mprotect(0x61dc4e573000, 12288, PROT_READ) = 0
mprotect(0x5e25f4cde000, 4096, PROT_READ) = 0
mprotect(0x748ecd4ae000, 4096, PROT_READ) = 0
munmap(0x47ede2f30000, 189088)          = 0
brk(NULL)                               = 0x5e25f5d65000
brk(0x5e25f5d86000)                     = 0x5e25f5d86000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=5642592, ...}) = 0
getrandom("\xf7\xe3\xca\xa9\x8b\xd4\x86\x82", 8, GRND_RANDOM) = 8
mmap(0x548ba9cae000, 5642592, PROT_READ, MAP_PRIVATE, 3, 0) = 0x548ba9cae000
mmap(0x548ba9cad000, 4096, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x548ba9cad000
mmap(0x548baa20f960, 4096, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x748ecd482000
close(3)                                = 0
sync()                                  = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?

topimiettinen · 2020-09-25T15:36:08Z

Actually the address for guard page above the mapping is wrong, it should be aligned to next higher page. Now it gets assigned an address by kernel, which should be also prevented with MAP_FIXED since we very much care what is the address.

A different way would be to map one continuous guard area which is two pages larger than the actual mapped area and then do the mapping in the middle, this saves one system call.

eyalitki · 2020-09-25T16:07:29Z

I doubt that this patch will work without breaking things, seeing that it totally ignores the flags passed to mmap() such as flags for large/huge pages for instance. Changing the default behavior of mmap() under the hood, and adding guard pages (and a dependency on infinite supply of random values) to every allocation feels wasteful.

Without a proper threat analysis, I fail to see the need to such a drastic measure. Especially when the code for it will obviously fail in some edge cases, as stated above. As I am not a maintainer of this project, I leave the decision to the maintainers themselves while choosing to leave this thread all together.

topimiettinen · 2020-09-25T16:55:51Z

It would be surely trivial to align the address further if MAP_HUGE_* are used, thanks for pointing this out. I'm happy to waste resources if it improves security, though I don't know if PROT_NONE actually consumes much memory. Certainly some kernel VM structures may grow, but the growth shouldn't equal one page for one PROT_NONE page if the kernel is any good. It's also certainly possible to drop GRND_RANDOM flag. Though IMHO distinction between true random and CSRNG isn't very interesting, the attackers who know the secret RDRAND algorithms and other machine internals so well that they can recreate the pseudo-randomness probably have lots of other options to do whatever they please.

Why are you so negative calling this "breaking", "wasteful", "drastic", "failing" etc (and I agree there may be further bugs for code which didn't exist yesterday and this may have marginal effects to resources too), can't you see any possible benefits from doing proper ASLR? Are you not concerned at all that the libraries and anonymous mappings are located pretty much contiguously by default, so if one address is known by an attacker, it may be possible to infer other addresses? For example Windows and OpenBSD seem to do this by default, why should Linux be worse?

topimiettinen · 2020-09-25T18:03:36Z

Another bug: nothing removes the guard page mappings during munmap() of the original pages, so there are plenty of useless mappings after a while. So I agree that the guard pages shouldn't be installed automatically when something calls mmap(). Perhaps they could be, if there was also some clever tracking mechanism for them which would also remove them when the caller requested pages get munmap()ed (or mremap()...). Doing that properly would be much more complex than this trivial proof-of-concept.

For malloc(), protecting the arenas with guard pages would still make sense since the mappings are only managed by the memory allocator.

topimiettinen · 2020-09-25T19:11:06Z

This version lets kernel handle MAP_32BIT and HUGETLB mappings. I dropped the guard pages and GRND_RANDOM.

diff --git a/sysdeps/unix/sysv/linux/mmap64.c b/sysdeps/unix/sysv/linux/mmap64.c
index 8074deb466..f4d604f29e 100644
--- a/sysdeps/unix/sysv/linux/mmap64.c
+++ b/sysdeps/unix/sysv/linux/mmap64.c
@@ -22,6 +22,10 @@
 #include <sys/mman.h>
 #include <sysdep.h>
 #include <mmap_internal.h>
+/* For getrandom() */
+#include <sys/random.h>
+/* For HUGETLB_* */
+#include <linux/mman.h>
 
 #ifdef __NR_mmap2
 /* To avoid silent truncation of offset when using mmap2, do not accept
@@ -52,6 +56,27 @@ __mmap64 (void *addr, size_t len, int prot, int flags, int fd, off64_t offset)
     return (void *) INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
 
   MMAP_PREPARE (addr, len, prot, flags, fd, offset);
+
+  /* Let kernel handle 32 bit and hugetlb mappings */
+  if (!addr && !(flags & MAP_32BIT) &&
+      !(flags >> HUGETLB_FLAG_ENCODE_SHIFT & HUGETLB_FLAG_ENCODE_MASK))
+    {
+       unsigned long laddr;
+
+       /*
+        Caller doesn't care about the exact address and kernel uses
+        too contiguous addresses, so let's help the kernel. It can
+        still override the address if it's unusable.
+       */
+       getrandom(&laddr, sizeof(laddr), 0);
+
+       if (sizeof(unsigned long) == 8)
+        {
+          laddr &= ((1UL << (sizeof(unsigned long) * 8 - 17)) - 1UL) & ~0xfffUL;
+        }
+       addr = (void *) laddr;
+    }
+
 #ifdef __NR_mmap2
   return (void *) MMAP_CALL (mmap2, addr, len, prot, flags, fd,
                             (off_t) (offset / MMAP2_PAGE_UNIT));

Strace:

execve("/bin/sync", ["/bin/sync"], 0x7ffc8dc833d8 /* 64 vars */) = 0
brk(NULL)                               = 0x636a6aec6000
access("/etc/ld.so.preload", R_OK)      = 0
openat(AT_FDCWD, "/etc/ld.so.preload", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3)                                = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=189088, ...}) = 0
getrandom("\x99\x1b\xde\x0a\x0f\xc2\x07\xae", 8, 0) = 8
mmap(0x420f0ade1000, 189088, PROT_READ, MAP_PRIVATE, 3, 0) = 0x420f0ade1000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20n\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=12744968, ...}) = 0
getrandom("\x8a\xc6\x9a\xa8\xd9\x3f\x6e\x25", 8, 0) = 8
mmap(0x3fd9a89ac000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fd9a89ac000
getrandom("\xdf\x10\x54\xf8\x53\x6e\xe1\x65", 8, 0) = 8
mmap(0x6e53f8541000, 1852680, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x6e53f8541000
mprotect(0x6e53f8566000, 1662976, PROT_NONE) = 0
mmap(0x6e53f8566000, 1355776, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x6e53f8566000
mmap(0x6e53f86b1000, 303104, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x170000) = 0x6e53f86b1000
mmap(0x6e53f86fc000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ba000) = 0x6e53f86fc000
mmap(0x6e53f8702000, 13576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x6e53f8702000
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x3fd9a89ad580) = 0
mprotect(0x6e53f86fc000, 12288, PROT_READ) = 0
mprotect(0x636a6a943000, 4096, PROT_READ) = 0
mprotect(0x7ab1b8e8e000, 4096, PROT_READ) = 0
munmap(0x420f0ade1000, 189088)          = 0
brk(NULL)                               = 0x636a6aec6000
brk(0x636a6aee7000)                     = 0x636a6aee7000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=5642592, ...}) = 0
getrandom("\x43\xf7\xc0\x46\x41\x9c\x0f\xbb", 8, 0) = 8
mmap(0x1c4146c0f000, 5642592, PROT_READ, MAP_PRIVATE, 3, 0) = 0x1c4146c0f000
close(3)                                = 0
sync()                                  = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?

topimiettinen · 2020-09-29T08:40:14Z

Perhaps this should be fixed in kernel instead, so I prepared a patch.

topimiettinen · 2020-09-30T18:04:51Z

@eyalitki Here's my attempt to formalize the threat scenario:

A malware group knows of a previously unknown vulnerability V in a small library L1. V is limited in scope to running only the small set of ROP gadgets which can be found in L1 itself (RL1). The group wishes to utilize V to employ a specific exploit E against the Linux kernel in a target system S. Previously the group has been able to determine the vendor of the OS including versions of L1 and L2 of S. However, RL1 don't contain the ROP gadgets (RE) needed to launch E, but RE can be found in another (larger) library (L2). RL1 still happens to contain a ROP gadget for a jump relative to RIP (RJ), so RE in L2 can be called from L1 iff the exact relative VM offset (O) between RJ in L1 and RE in L2 is known.

With current, unmodified libc and kernel, the group is able to determine O by running in another, non-target system S', the same versions of software and examining the locations of L1 and L2. This is possible since the kernel only randomizes the first mapping and then reuses the same VMA for the following mappings with predictable allocation patterns. Thus the group can continue with the exploit attempt.

When libc (or Linux kernel) is modified to fully randomize the locations of mappings and thus the locations of L1 and L2, this method is no longer possible since O is also random. This seems to be the situation in Windows and OpenBSD.

eyalitki · 2020-09-30T19:05:31Z

I think there is some basic misunderstanding about some key terminology aspects here, but I'll try my best. Your claim is that you wish to improve the ASLR, effectively breaking the existing (Linux) correlation between two mapped libraries (.so files).

Your threat actor has the capabilities to perform a full ROP attack against a vulnerable target + a known address and version of a small library L1 + the version of a larger library L2. The attacker will build a ROP stack with gadgets from L2, based on its "known" address relative to L1, and fully take over the target process.

Notice that a threat analysis defines which attacker (local or remote, doesn't matter in thia case), with what capabilities (stated above), will perform what kind of attack in an attempt to gain which assets (full code execution over the target process). For the sake of the argument, we are talking about a VERY powerful attacker in this case.

Judging by this threat scenario we can see that the guard pages play no effective role, and so I will ignore them. In addition, except for dynamic loading (dlopen), this code is only needed in load time, and shouldn't necessarily affect ALL mmap() invocations in a given program. In addition, it should be noted that the allocations are not exactly adjacent at boot time, and an inspection of a sample process between boots will show that estimating the addresses of all libraries based on a single leaked library isn't easy, if even possible when there are more than 10 loaded .so libraries. I tried that on guacd (Apache Guacamole), and eventually leaked multiple library addresses as the gaps appeared random (in bulks) on my Ubuntu 18 machine.

I suggest a closer inspection of Linux's program loader and mmap() logic. If there is some determinism that could be randomized I would consult the Linux kernel developers about the nature of the fix and the proper place to it. It could be only the loader, and might also be mmap() itself. I also advise about a closer examination of the code involved (it is open source) together with more test cases so that the true behavior will be inspected before a solution will be design to a problem that isn't fully understood yet.

topimiettinen · 2020-10-01T12:20:24Z

Thanks for the review. In some cases it's trivial to find the software versions, for example Apache and sshd may tell its version (which may include OS vendor info) to remote attackers. I agree that guard pages play no role here.

Regarding the adjacent libraries: I had different results showing strong determinism with the locations. In simple cases (like sync above) there are no differences between offsets in different runs of the program. With complex programs there are indeed differences, but there's still strong determinism. For a test, I searched for a program which depends on most dynamic libraries, which happened to be ksysguard in my Debian system, loading 219 libraries according to ldd. Running strace -e mmap ksysguard --help shows 1235 mmap() operations. In two runs, 1090 (88%) have exactly the same offset from the first mapping. In some cases even if there are some differences between some mappings, the differences are also cancelled later. The first difference occurred only after 1060 mmap() operations.

The loader just uses mmap(NULL, ...) to map ELF segments. In principle it could also implement randomization by specifying a random address for each mmap() operation instead of zero. Another improvement for the loader could be to shuffle the order of loading the libraries, but maybe the order can't be changed. Also if the resulting mappings would be random anyway (due to randomization of mmap()), shuffled order shouldn't give any further benefits.

I'd actually prefer a fix in kernel, something like I proposed (sysctl kernel.randomize_va_space=3 would make all mmap(NULL, ...) operations use a random address). Specifying the address for mmap() (by ld.so, libc mmap() or malloc()) seems to me just a workaround for the kernel's undesired behavior and it's possible to miss some uses of mmap(), for example in statically linked applications.

I think the only improvement I'd want for malloc() is that it should be possible to forbid using heap. It is always located next to the program mappings, so the offset to those is predictable.

topimiettinen · 2020-10-02T15:17:07Z

It looks like heap use can be disabled with for example a seccomp filter which returns EPERM for brk(). Example again with /bin/sync:

execve("/bin/sync", ["sync"], 0x7fff7fca3f30 /* 63 vars */) = 0
brk(NULL)                               = -1 EPERM (Operation not permitted)
access("/etc/ld.so.preload", R_OK)      = 0
openat(AT_FDCWD, "/etc/ld.so.preload", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3)                                = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=189166, ...}) = 0
mmap(NULL, 189166, PROT_READ, MAP_PRIVATE, 3, 0) = 0xa6d414a2000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0n\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1839792, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xda1facbb000
mmap(NULL, 1852680, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x5163a085000
mprotect(0x5163a0aa000, 1662976, PROT_NONE) = 0
mmap(0x5163a0aa000, 1355776, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x5163a0aa000
mmap(0x5163a1f5000, 303104, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x170000) = 0x5163a1f5000
mmap(0x5163a240000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ba000) = 0x5163a240000
mmap(0x5163a246000, 13576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x5163a246000
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0xda1facbc580)  = 0
mprotect(0x5163a240000, 12288, PROT_READ) = 0
mprotect(0x55b72981f000, 4096, PROT_READ) = 0
mprotect(0x9de98c13000, 4096, PROT_READ) = 0
munmap(0xa6d414a2000, 189166)           = 0
brk(NULL)                               = -1 EPERM (Operation not permitted)
mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xdace61d7000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=5642592, ...}) = 0
mmap(NULL, 5642592, PROT_READ, MAP_PRIVATE, 3, 0) = 0x1fd2c5df000
close(3)                                = 0
sync()                                  = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

Comparing to previous strace, glibc substitutes heap with mmap() and thus its address is random (since I'm using a patched kernel with sysctl.kernel.randomize_va_space=3 kernel command line option). There shouldn't be a performance degradation similar to when using MALLOC_MMAP_THRESHOLD_=0 since the allocation logic stays the same.

Perhaps the preferred solution is again to modify kernel, for example so that brk() always returns ENOSYS when compiled without CONFIG_BRK_SYSCALL.

topimiettinen · 2020-10-02T17:35:13Z

Sent patch for disabling brk() entirely to linux-mm. With the patched kernel, strace from /bin/sync is same as with seccomp (modulo addresses), but instead of EPERM the errno is ENOSYS. I don't see any problems here.

topimiettinen · 2020-10-03T08:25:38Z

OK, found the first problem, ldconfig crashes at start. The crash happens when glibc initializes TLS in __libc_setup_tls() in csu/libc_tls.c and calls brk() assuming that it will not fail. When that still happens due to ENOSYS, the system call wrapper will try to write to errno in uninitialized TLS, but since TLS is not initialized yet, it leads to segfault. For some reason, other TLS using programs are not affected.

I think libc should use mmap() also for TLS, heap doesn't seem a great choice here either. The fix could be to introduce a version of mmap() for glibc internal use which does not access errno but passes the error some other way. Then the crash would be avoided.

topimiettinen · 2020-10-03T18:17:13Z

With these changes, __libc_setup_tls() and also malloc() uses mmap() instead of sbrk(). There probably are better ways to change to malloc() but the comments in the file seemed to suggest something like this.

diff --git a/csu/libc-tls.c b/csu/libc-tls.c
index 73ade0fec5..9f194a2ca3 100644
--- a/csu/libc-tls.c
+++ b/csu/libc-tls.c
@@ -24,6 +24,9 @@
 #include <stdio.h>
 #include <sys/param.h>
 #include <array_length.h>
+#include <sys/mman.h>
+#include <sysdep.h>
+#include <mmap_internal.h>
 
 #ifdef SHARED
  #error makefile bug, this file is for static only
@@ -127,29 +130,32 @@ __libc_setup_tls (void)
 
   /* We have to set up the TCB block which also (possibly) contains
      'errno'.  Therefore we avoid 'malloc' which might touch 'errno'.
-     Instead we use 'sbrk' which would only uses 'errno' if it fails.
-     In this case we are right away out of memory and the user gets
-     what she/he deserves.
+     Instead we use 'internal_mmap' which does not use 'errno'.
 
      The initialized value of _dl_tls_static_size is provided by dl-open.c
      to request some surplus that permits dynamic loading of modules with
      IE-model TLS.  */
+  int error = 0;
 #if TLS_TCB_AT_TP
   /* Align the TCB offset to the maximum alignment, as
      _dl_allocate_tls_storage (in elf/dl-tls.c) does using __libc_memalign
      and dl_tls_static_align.  */
   tcb_offset = roundup (memsz + GL(dl_tls_static_size), max_align);
-  tlsblock = __sbrk (tcb_offset + TLS_INIT_TCB_SIZE + max_align);
+  tlsblock = __mmap_internal (NULL, tcb_offset + TLS_INIT_TCB_SIZE + max_align,
+                             PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0, &error);
 #elif TLS_DTV_AT_TP
   tcb_offset = roundup (TLS_INIT_TCB_SIZE, align ?: 1);
-  tlsblock = __sbrk (tcb_offset + memsz + max_align
-                    + TLS_PRE_TCB_SIZE + GL(dl_tls_static_size));
+  tlsblock = __mmap_internal (NULL, tcb_offset + memsz + max_align
+                             + TLS_PRE_TCB_SIZE + GL(dl_tls_static_size),
+                             PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0, &error);
   tlsblock += TLS_PRE_TCB_SIZE;
 #else
   /* In case a model with a different layout for the TCB and DTV
      is defined add another #elif here and in the following #ifs.  */
 # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
 #endif
+  if (error)
+    _startup_fatal ("Cannot allocate TCB");
 
   /* Align the TLS block.  */
   tlsblock = (void *) (((uintptr_t) tlsblock + max_align - 1)
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 6acb5ad43a..6fcd3ecfbe 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -371,13 +371,23 @@ __malloc_assert (const char *assertion, const char *file, unsigned int line,
 #define TRIM_FASTBINS  0
 #endif
 
-
+#if 0
 /* Definition for getting more memory from the OS.  */
 #define MORECORE         (*__morecore)
 #define MORECORE_FAILURE 0
 void * __default_morecore (ptrdiff_t);
 void *(*__morecore)(ptrdiff_t) = __default_morecore;
-
+#else
+#define MORECORE_FAILURE (-1)
+#define MORECORE(x)         (MORECORE_FAILURE)
+static void *
+__failing_morecore2 (ptrdiff_t d)
+{
+  return (void *) MORECORE_FAILURE;
+}
+void *(*__morecore)(ptrdiff_t) = __failing_morecore2;
+#define MORECORE_CONTIGUOUS 0
+#endif
 
 #include <string.h>
 
@@ -2793,7 +2803,7 @@ systrim (size_t pad, mstate av)
          some downstream failure.)
        */
 
-      MORECORE (-extra);
+      (void) MORECORE (-extra);
       /* Call the `morecore' hook if necessary.  */
       void (*hook) (void) = atomic_forced_read (__after_morecore_hook);
       if (__builtin_expect (hook != NULL, 0))
diff --git a/sysdeps/unix/sysv/linux/mmap64.c b/sysdeps/unix/sysv/linux/mmap64.c
index 8074deb466..11f7c3f99b 100644
--- a/sysdeps/unix/sysv/linux/mmap64.c
+++ b/sysdeps/unix/sysv/linux/mmap64.c
@@ -67,3 +67,22 @@ weak_alias (__mmap64, mmap)
 weak_alias (__mmap64, __mmap)
 libc_hidden_def (__mmap)
 #endif
+
+void *
+__mmap_internal (void *addr, size_t len, int prot, int flags, int fd, off64_t offset, int *error)
+{
+  unsigned long int ret;
+#ifdef __NR_mmap2
+  ret = INTERNAL_SYSCALL_CALL (mmap2, addr, len, prot, flags, fd,
+                            (off_t) (offset / MMAP2_PAGE_UNIT));
+#else
+  ret = INTERNAL_SYSCALL_CALL (mmap, addr, len, prot, flags, fd, offset);
+#endif
+  if (INTERNAL_SYSCALL_ERROR_P(ret))
+    {
+      *error = ret;
+      return MAP_FAILED;
+    }
+
+  return (void *) ret;
+}
diff --git a/sysdeps/unix/sysv/linux/mmap_internal.h b/sysdeps/unix/sysv/linux/mmap_internal.h
index d53f0c642a..00fc14902e 100644
--- a/sysdeps/unix/sysv/linux/mmap_internal.h
+++ b/sysdeps/unix/sysv/linux/mmap_internal.h
@@ -46,4 +46,7 @@ static uint64_t page_unit;
   INLINE_SYSCALL_CALL (__nr, __addr, __len, __prot, __flags, __fd, __offset)
 #endif
 
+/* Internal version of mmap() which doesn't attempt to access errno */
+void *__mmap_internal (void *addr, size_t len, int prot, int flags, int fd, off64_t offset, int *error);
+
 #endif /* MMAP_INTERNAL_LINUX_H  */

topimiettinen · 2020-10-04T16:28:42Z

Submitted a set of patches to glibc list for comments.

topimiettinen · 2020-10-06T08:27:08Z

There's nothing wrong with LVM with this issue, so I'll close this.

topimiettinen closed this as completed Oct 6, 2020

MALLOC_MMAP_THRESHOLD_=0 makes systemd-cryptsetup fail #39

MALLOC_MMAP_THRESHOLD_=0 makes systemd-cryptsetup fail #39

Comments

topimiettinen commented Sep 23, 2020

topimiettinen commented Sep 24, 2020

codonell commented Sep 24, 2020

topimiettinen commented Sep 24, 2020

kergon commented Sep 24, 2020

kergon commented Sep 24, 2020

codonell commented Sep 24, 2020

codonell commented Sep 24, 2020

topimiettinen commented Sep 24, 2020 via email

codonell commented Sep 24, 2020

topimiettinen commented Sep 24, 2020

codonell commented Sep 24, 2020

codonell commented Sep 24, 2020

topimiettinen commented Sep 24, 2020

eyalitki commented Sep 24, 2020

eyalitki commented Sep 24, 2020

topimiettinen commented Sep 24, 2020

eyalitki commented Sep 24, 2020

topimiettinen commented Sep 24, 2020

topimiettinen commented Sep 24, 2020

topimiettinen commented Sep 25, 2020

topimiettinen commented Sep 25, 2020

eyalitki commented Sep 25, 2020

topimiettinen commented Sep 25, 2020

topimiettinen commented Sep 25, 2020

topimiettinen commented Sep 25, 2020

topimiettinen commented Sep 29, 2020

topimiettinen commented Sep 30, 2020

eyalitki commented Sep 30, 2020

topimiettinen commented Oct 1, 2020

topimiettinen commented Oct 2, 2020

topimiettinen commented Oct 2, 2020

topimiettinen commented Oct 3, 2020

topimiettinen commented Oct 3, 2020

topimiettinen commented Oct 4, 2020

topimiettinen commented Oct 6, 2020