Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for a bug in malloc() from newlib-nano 4.1.0 (in GCC 10.3) #15045

Merged
merged 2 commits into from
Sep 3, 2021

Conversation

LDong-Arm
Copy link
Contributor

@LDong-Arm LDong-Arm commented Sep 1, 2021

Summary of changes

The GNU Arm Embedded Toolchain 10.3-2021.07 bundles the newlib 4.1.0 release.

The commit 84d0689 "Nano-malloc: Fix for unwanted external heap fragmentation" from newlib 4.1.0 introduced several optimizations, one of which is as follows:

When the last chunk of buffer in the free list (i.e. buffers recycled by free()) is smaller than requested, nano_malloc() calls sbrk(0) to see if the heap's current head is adjacent to this chunk, and if so it asks sbrk() to allocate the difference in bytes only and expands the current chunk.

This doesn't work if the heap consists of non-contiguous regions. sbrk(0) returns the the current region's head if the region has any remaining capacity. But if this capacity is not enough for the second (non-trivial) call to sbrk() described above, allocation will happen from the next region if available. Expanding the current chunk won't work and will result in a segmentation fault.

So this optimization needs to be reverted in order to bring back compatibility with non-contiguous heaps. Before the next version of newlib becomes available and gets updated in the GCC Arm Embedded Toolchain, we work around this issue by including the fix in Mbed OS. The linker prioritizes malloc() from the project over the one from the toolchain.

Meantime, we try to upstream the fix to newlib so the next release will have correctly behaving malloc().

Note: The Mbed OS implementation of sbrk() for split-heap is here.

Fixes: #15023

Impact of changes

This workaround ensures malloc() works correctly on targets with two heap regions (i.e. MBED_SPLIT_HEAP) such as K64F and DISCO_L475VG_IOT01A when using GCC 10.3-2021.07.

Migration actions required

Documentation

None.


Pull request type

[x] Patch update (Bug fix / Target update / Docs update / Test update / Refactor)
[] Feature update (New feature / Functionality change / New API)
[] Major update (Breaking change E.g. Return code change / API behaviour change)

Test results

[] No Tests required for this change (E.g docs only update)
[x] Covered by existing mbed-os tests (Greentea or Unittest)
[x] Tests / results supplied as part of this PR

The CI currently uses GCC 9 because this bug prevented us from upgrading it. I tested this workaround locally with GCC 10.3. After this PR gets in, we can update the CI to use GCC 10.3.


Reviewers

@ARMmbed/mbed-os-core @mbed-os-test @donatieng @kjbracey-arm


@LDong-Arm LDong-Arm requested review from a team and donatieng September 1, 2021 15:56
@ciarmcom
Copy link
Member

ciarmcom commented Sep 1, 2021

@LDong-Arm, thank you for your changes.
@ARMmbed/mbed-os-maintainers please review.

The commit 84d0689 "Nano-malloc: Fix for unwanted external heap
fragmentation" from newlib 4.1.0 introduced several optimizations,
one of which is as follows:

When the last chunk in the free list is smaller than requested,
nano_malloc() calls sbrk(0) to see if the heap's current head is
adjacent to this chunk, and if so it asks sbrk() to allocate the
difference in bytes only and expands the current chunk.

This doesn't work if the heap consists of non-contiguous regions.
sbrk(0) returns the the current region's head if the region has any
remaining capacity. But if this capacity is not enough for the second
(non-trivial) call to sbrk() described above, allocation will happen
from the next region if available. Expanding the current chunk won't
work and will result in a segmentation fault.

So this optimization needs to be reverted in order to bring back
compatibility with non-contiguous heaps. Before the next version
of newlib becomes available and gets updated in the GCC Arm Embedded
Toolchain, we work around this issue by including the fix in Mbed OS.
The linker prioritizes malloc() from the project to the one from the
toolchain.
This reverts commit 6649e95.

In the malloc test, `ALLOC_ARRAY_SIZE` defines the *capacity* of array
that stores pointers to `malloc`'d buffers. It is *not* the number
of allocations.

In an ideal scenario, the test makes as many allocations as possible
until the heap runs out and malloc() returns NULL. So revert the
capacity of the array of pointers from 50 back to 100 so this array
is less likely to run out before the heap does.

#include <newlib.h>

#if (__NEWLIB__ == 4) && (__NEWLIB_MINOR__ == 1) && (__NEWLIB_PATCHLEVEL__ == 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are assuming the next release will have it fixed.

@0xc0170
Copy link
Contributor

0xc0170 commented Sep 2, 2021

Nice work bisecting this and finding the cause.

@mbed-ci
Copy link

mbed-ci commented Sep 3, 2021

Jenkins CI Test : ✔️ SUCCESS

Build Number: 2 | 🔒 Jenkins CI Job | 🌐 Logs & Artifacts

CLICK for Detailed Summary

jobs Status
jenkins-ci/mbed-os-ci_unittests ✔️
jenkins-ci/mbed-os-ci_build-cloud-example-ARM ✔️
jenkins-ci/mbed-os-ci_cmake-cloud-example-ARM ✔️
jenkins-ci/mbed-os-ci_build-cloud-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_cmake-cloud-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-greentea-ARM ✔️
jenkins-ci/mbed-os-ci_build-greentea-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_cmake-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-example-ARM ✔️
jenkins-ci/mbed-os-ci_cmake-example-ARM ✔️
jenkins-ci/mbed-os-ci_greentea-test ✔️

@LDong-Arm
Copy link
Contributor Author

Once it's merged, it'd be good to update GCC in CI to 10.3 and do a whole run as soon as possible.

@0xc0170
Copy link
Contributor

0xc0170 commented Sep 3, 2021

@ARMmbed/mbed-os-core @donatieng Please review

Copy link
Contributor

@saheerb saheerb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix has been tested with GCC-10.3 in CI. Looks good.

Copy link
Contributor

@donatieng donatieng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @LDong-Arm - really good investigation, and thanks for making a workaround :)

@0xc0170 0xc0170 merged commit c14396a into ARMmbed:master Sep 3, 2021
@mergify mergify bot added release version missing When PR does not contain release version, bot should label it and we fix it afterwards and removed ready for merge labels Sep 3, 2021
@mergify
Copy link

mergify bot commented Sep 3, 2021

This PR does not contain release version label after merging.

@0xc0170 0xc0170 added release-type: patch Indentifies a PR as containing just a patch and removed release version missing When PR does not contain release version, bot should label it and we fix it afterwards labels Sep 3, 2021
@0xc0170
Copy link
Contributor

0xc0170 commented Sep 3, 2021

This PR does not contain release version label after merging.

Fixed

@mbedmain mbedmain added release-version: 6.15.0 Release-pending and removed release-type: patch Indentifies a PR as containing just a patch Release-pending labels Sep 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate and fix heap allocation test failure with GNU Arm Embedded Toolchain 10.3-2021.07
7 participants