-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MALLOC_MMAP_THRESHOLD_=0 makes systemd-cryptsetup fail #39
Comments
Reported also to glibc: 26663. |
Why are you setting MALLOC_MMAP_THRESHOLD_=0? I would like to understand the specific use case so I can see if there isn't anything else we can do. |
I assumed that with mmap(), the allocations would be spread randomly over the memory ("ASLR for malloc()"). Sadly this is not the case and the mmap()ed regions are pretty much consecutive, so there's not much advantage over sbrk(). There can still be gaps if something is munmap()ed in the middle of the area. |
This thread has no context so I have no idea what the issue is really about, but for information there are 'critical sections' within libdm during which memory allocations must not happen to avoid the possiblity of the machine deadlocking (e.g. if some I/O is blocked waiting for the sequence of libdm operations to complete and then an mmap within that sequence gets blocked waiting for that blocked I/O). LVM2's use of libdm pre-allocates the memory it requires in advance (brk) and never calls mmap() within those sections - but it does call malloc() which gives out free memory from what it previously reserved with brk. |
So if you wanted to have malloc use mmap instead of brk in any user-space device mapper tools, you would need a way to instruct malloc to preallocate a certain amount of (pinned - already allocated and present in core and not swappable) free memory with mmap available for future allocation requests and keep hold of it if freed, not releasing it back to the system until instructed to do so by the application. (Corresponding to entering and leaving critical sections of code in the application.) |
@kergon The malloc API uses brk or mmap MAP_ANONYMOUS | MAP_NORESERVE (note morecore() is deprecated in 2.32 so you can't provide your own special pages without providing the entire allocator API). How would those operations block on waiting I/O? Would they block waiting for swap device? |
@topimiettinen If you could confirm why you are using MALLOC_MMAP_THRESHOLD_=0 that would be helpful. |
On 24.9.2020 19.20, codonell wrote:
@topimiettinen <https://github.com/topimiettinen> If you could confirm
why you are using MALLOC_MMAP_THRESHOLD_=0 that would be helpful.
The reason for MALLOC_MMAP_THRESHOLD_=0 was that it changes malloc() to
use mmap() for allocating new memory, instead of sbrk().
I assumed that with mmap(), the allocations would be spread randomly
over the memory ("ASLR for malloc()"). Sadly this is not the case and
the mmap()ed regions are pretty much consecutive, so there's not much
advantage over sbrk(). There can still be gaps if something is
munmap()ed in the middle of the area.
-Topi
|
This doesn't answer my question. Why are you doing this? You write "ASLR for malloc()," but is there a particular threat model you're trying to target, or a specific security vulnerability that happened in the past? |
The risk which I'm trying to mitigate with more randomized memory allocations is that when the memory is allocated from process heap area, this area is contiguous and then buffer overflows by an attacker have pretty large surface to play with. It may be possible to access pretty much anything allocated at the time of attack. The attacker may even predetermine the offsets between the buffer (base of attack) and an area of interest (for example security credentials) if memory allocation patterns of the target process happens to be relatively fixed. But if instead the memory for malloc() would be dispersed throughout the address space, a buffer overflow would be contained to very small area, perhaps only a 4k page. Since nothing would be mapped in the near vicinity, trying to access any memory address above and below the areas would be invalid, probably leading to a quick segfault in a buffer overflow attack. |
@topimiettinen Thanks for explaining your position and intent. I reached out quickly to a security researcher I know (Eyal Itkin) and we agree that what you're really looking for here is not ASLR, but rather control over the heap layout. With ASLR you only ever randomize the arena heap's base, and the minute you start batching mmap's for peformance (like the arena does) you will invariable end up with a known base address and a deterministic layout (heap shape). What you want is random allocations within the heap such that the shape of the heap is random (you don't know what follows next). You should look at Google's Scudo (https://llvm.org/docs/ScudoHardenedAllocator.html) project to see if that allocator meets your specific needs. It is not likely the generic system allocator in glibc will ever do what you want by default, the performance impact of the randomization can be quite high in certain scenarios. |
@topimiettinen To summarize: I suggest not using MALLOC_MMAP_THRESHOLD_=0 since it will cost you a lot of performance, has kernel interactions with the number of mmaps, and doesn't do what you want. Instead you should more clearly refine your threat model and look for an allocator that meets your requirements. |
Thanks for the tip, but Scudo does not look very interesting. I'm looking at description of OpenBSD malloc(). It seems to use mmap() extensively. At least the directory pages are protected with guard pages, but the memory pages don't seem to be. |
Are you trying to limit the effect of a possible linear overflow between heap allocations? (Using unmapped pages / guard pages) Or are you trying to hinder attacks from one buffer to the other, from attackers that know how the heap is shaped / shaped the heap to a known state? In the latter case, scudo should work, as this is one of the threats in their threat model, as Carlos mention earlier. If this is the first case, the randomness of the buffer addresses is totally irrelevant, and you just need the guards between the allocations. Which in turn, is a waste of memory (at least 50% of the heap memory won't be used, and serve only as guards) or require mmap / kernel to separate the allocations, and they are currently designed differently, so you would get adjacent allocations, defeating your cause. |
As Carlos mentioned, please formalize your threat model, so we could help you design a proper solution / point you to an existing solution. Technical terms like mmap() should depend on the problem you want to solve and be part of a possible solution if necessary. Don't try to guess a solution before formalizing your threat model. Such a fixation will most probably leave you with the wrong solution to your original problem. |
I'm interested in guard pages rather than the shape of the heap. I think randomized addresses would also improve security, they would work similarly to shuffling of the heap so the attacker can't guess offsets between items. The disadvantage of the continuous heap is that the attacker may be able to probe the heap without segfaulting, this is not possible if the memory pages are dispersed with guard pages. Instead of trusting that the kernel will randomize the addresses of mmap(), a random address hint can be supplied as the first argument of the mmap() system call. If the address is available, kernel will use it, otherwise returns sequential address as if zero was supplied. I made this small program to test this. It maps a guard page with PROT_NONE, a memory page and then another guard page. It assumes that kernel accepts a random address below 2^47.
There's no output, but strace shows that the pages are indeed at a random address:
|
While still more clearly stated, I'm still missing a few details. If there is a page guard, a linear overflow will crash the program. Randomization against an attacker that is "guessing" the offsets implies an attacker with a Write-What-Where (Absolute or Relative) primitive. If your attacker already has such a powerful write primitive, they could corrupt the return address, global function pointer or various other control-flow-critical components, and win. Simply stated, an attacker with such a powerful primitive most probably won't corrupt the heap's meta-data in order to gain code execution. So the solution will only protect vtable / fptrs that are stored on the heap from being corrupted, while doing nothing for the stack / global variables which are still accessible to the attacker. If the primitive is a read and not a write primitive, the randomization would still work poorly. Assuming that most heap objects contain pointers to other objects, an attacker with a relative-read-where will easily traverse the pointers, leak by leak, and build a nice map of the entire memory. An absolute-read-where will be done the same, requiring a single initial pointer leak. As to the suggested code snippet, please pay attention to the fact that most heap allocations are usually not that important / error-prone. Allocating 3 pages per heap object is memory wasteful to the extreme, and is recommended only for specific code-sensitive allocations. In addition, since 47 address bits are assumed, and the hint is to a page-aligned address, one can request 47-12=35 bits from the os, thus saving precious entropy. Also make sure to check if the calls to getrandom() or mmap() failed, and act accordingly. |
That's true. But fixing such problems are beyond the scope of poor malloc(). I think only processor manufacturers could introduce features to prevent such attacks. For example, there should be a separate stack for return addresses. Function pointers could be opaque IDs loaded from code descriptors. Execute access should not imply read access. Data and program address spaces should be separate. There should be a fast and unprivileged method to manipulate page tables or switch between address spaces, so various parts of a program or libraries would be able to protect their data when they are not in control. Speculative execution should not cross privilege levels and cache lines should be tagged with address space identifiers. Etc.
Why would most heap object contain pointers? I'd guess strings would be more common.
This was just a quick code snippet, I didn't even bother to put printf statements to write the addresses, never mind checking for errors or optimizing the random bits. Two out of three pages are mapped with PROT_NONE, why would they also consume memory? |
Openbsd malloc() has another interesting feature, the directory structures are offset from start of page by a random number of bytes. |
With the patch below to glibc, memory mappings where the address is not important are mapped at more randomized locations and also guard pages are installed. This also applies to mappings made by ld.so. So far I haven't noticed any problems.
Example strace from /bin/sync:
|
Actually the address for guard page above the mapping is wrong, it should be aligned to next higher page. Now it gets assigned an address by kernel, which should be also prevented with MAP_FIXED since we very much care what is the address. A different way would be to map one continuous guard area which is two pages larger than the actual mapped area and then do the mapping in the middle, this saves one system call. |
I doubt that this patch will work without breaking things, seeing that it totally ignores the flags passed to mmap() such as flags for large/huge pages for instance. Changing the default behavior of mmap() under the hood, and adding guard pages (and a dependency on infinite supply of random values) to every allocation feels wasteful. Without a proper threat analysis, I fail to see the need to such a drastic measure. Especially when the code for it will obviously fail in some edge cases, as stated above. As I am not a maintainer of this project, I leave the decision to the maintainers themselves while choosing to leave this thread all together. |
It would be surely trivial to align the address further if MAP_HUGE_* are used, thanks for pointing this out. I'm happy to waste resources if it improves security, though I don't know if PROT_NONE actually consumes much memory. Certainly some kernel VM structures may grow, but the growth shouldn't equal one page for one PROT_NONE page if the kernel is any good. It's also certainly possible to drop GRND_RANDOM flag. Though IMHO distinction between true random and CSRNG isn't very interesting, the attackers who know the secret RDRAND algorithms and other machine internals so well that they can recreate the pseudo-randomness probably have lots of other options to do whatever they please. Why are you so negative calling this "breaking", "wasteful", "drastic", "failing" etc (and I agree there may be further bugs for code which didn't exist yesterday and this may have marginal effects to resources too), can't you see any possible benefits from doing proper ASLR? Are you not concerned at all that the libraries and anonymous mappings are located pretty much contiguously by default, so if one address is known by an attacker, it may be possible to infer other addresses? For example Windows and OpenBSD seem to do this by default, why should Linux be worse? |
Another bug: nothing removes the guard page mappings during munmap() of the original pages, so there are plenty of useless mappings after a while. So I agree that the guard pages shouldn't be installed automatically when something calls mmap(). Perhaps they could be, if there was also some clever tracking mechanism for them which would also remove them when the caller requested pages get munmap()ed (or mremap()...). Doing that properly would be much more complex than this trivial proof-of-concept. For malloc(), protecting the arenas with guard pages would still make sense since the mappings are only managed by the memory allocator. |
This version lets kernel handle MAP_32BIT and HUGETLB mappings. I dropped the guard pages and GRND_RANDOM.
Strace:
|
Perhaps this should be fixed in kernel instead, so I prepared a patch. |
@eyalitki Here's my attempt to formalize the threat scenario: A malware group knows of a previously unknown vulnerability V in a small library L1. V is limited in scope to running only the small set of ROP gadgets which can be found in L1 itself (RL1). The group wishes to utilize V to employ a specific exploit E against the Linux kernel in a target system S. Previously the group has been able to determine the vendor of the OS including versions of L1 and L2 of S. However, RL1 don't contain the ROP gadgets (RE) needed to launch E, but RE can be found in another (larger) library (L2). RL1 still happens to contain a ROP gadget for a jump relative to RIP (RJ), so RE in L2 can be called from L1 iff the exact relative VM offset (O) between RJ in L1 and RE in L2 is known. With current, unmodified libc and kernel, the group is able to determine O by running in another, non-target system S', the same versions of software and examining the locations of L1 and L2. This is possible since the kernel only randomizes the first mapping and then reuses the same VMA for the following mappings with predictable allocation patterns. Thus the group can continue with the exploit attempt. When libc (or Linux kernel) is modified to fully randomize the locations of mappings and thus the locations of L1 and L2, this method is no longer possible since O is also random. This seems to be the situation in Windows and OpenBSD. |
I think there is some basic misunderstanding about some key terminology aspects here, but I'll try my best. Your claim is that you wish to improve the ASLR, effectively breaking the existing (Linux) correlation between two mapped libraries (.so files). Your threat actor has the capabilities to perform a full ROP attack against a vulnerable target + a known address and version of a small library L1 + the version of a larger library L2. The attacker will build a ROP stack with gadgets from L2, based on its "known" address relative to L1, and fully take over the target process. Notice that a threat analysis defines which attacker (local or remote, doesn't matter in thia case), with what capabilities (stated above), will perform what kind of attack in an attempt to gain which assets (full code execution over the target process). For the sake of the argument, we are talking about a VERY powerful attacker in this case. Judging by this threat scenario we can see that the guard pages play no effective role, and so I will ignore them. In addition, except for dynamic loading (dlopen), this code is only needed in load time, and shouldn't necessarily affect ALL mmap() invocations in a given program. In addition, it should be noted that the allocations are not exactly adjacent at boot time, and an inspection of a sample process between boots will show that estimating the addresses of all libraries based on a single leaked library isn't easy, if even possible when there are more than 10 loaded .so libraries. I tried that on guacd (Apache Guacamole), and eventually leaked multiple library addresses as the gaps appeared random (in bulks) on my Ubuntu 18 machine. I suggest a closer inspection of Linux's program loader and mmap() logic. If there is some determinism that could be randomized I would consult the Linux kernel developers about the nature of the fix and the proper place to it. It could be only the loader, and might also be mmap() itself. I also advise about a closer examination of the code involved (it is open source) together with more test cases so that the true behavior will be inspected before a solution will be design to a problem that isn't fully understood yet. |
Thanks for the review. In some cases it's trivial to find the software versions, for example Apache and sshd may tell its version (which may include OS vendor info) to remote attackers. I agree that guard pages play no role here. Regarding the adjacent libraries: I had different results showing strong determinism with the locations. In simple cases (like The loader just uses mmap(NULL, ...) to map ELF segments. In principle it could also implement randomization by specifying a random address for each mmap() operation instead of zero. Another improvement for the loader could be to shuffle the order of loading the libraries, but maybe the order can't be changed. Also if the resulting mappings would be random anyway (due to randomization of mmap()), shuffled order shouldn't give any further benefits. I'd actually prefer a fix in kernel, something like I proposed (sysctl I think the only improvement I'd want for malloc() is that it should be possible to forbid using heap. It is always located next to the program mappings, so the offset to those is predictable. |
It looks like heap use can be disabled with for example a seccomp filter which returns EPERM for brk(). Example again with /bin/sync:
Comparing to previous strace, glibc substitutes heap with mmap() and thus its address is random (since I'm using a patched kernel with Perhaps the preferred solution is again to modify kernel, for example so that brk() always returns ENOSYS when compiled without CONFIG_BRK_SYSCALL. |
Sent patch for disabling brk() entirely to linux-mm. With the patched kernel, strace from /bin/sync is same as with seccomp (modulo addresses), but instead of EPERM the errno is ENOSYS. I don't see any problems here. |
OK, found the first problem, I think libc should use mmap() also for TLS, heap doesn't seem a great choice here either. The fix could be to introduce a version of mmap() for glibc internal use which does not access errno but passes the error some other way. Then the crash would be avoided. |
With these changes, __libc_setup_tls() and also malloc() uses mmap() instead of sbrk(). There probably are better ways to change to malloc() but the comments in the file seemed to suggest something like this.
|
Submitted a set of patches to glibc list for comments. |
There's nothing wrong with LVM with this issue, so I'll close this. |
I added this line to systemd-cryptsetup service files to instruct glibc
malloc()
to usemmap()
instead of heap:But then systemd-cryptsetup refuses to start:
The error message "Couldn't create ioctl argument." comes from device_mapper/ioctl/libdm-iface.c#L1818). The problem seems to be that
mmap()
(called frommalloc()
called fromdm_zalloc()
called from device_mapper/ioctl/libdm-iface.c#L1190) may fail with EAGAIN:I'm puzzled why
mmap()
for 4096 bytes or even 1M would fail, so perhaps this is a bug in glibc or kernel, but I'd like first to check if this could be a bug in lvm2.The text was updated successfully, but these errors were encountered: