Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanstrtab #472

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Cleanstrtab #472

wants to merge 8 commits into from

Conversation

rohoog
Copy link

@rohoog rohoog commented Mar 2, 2023

I have created the change that adds cleaning of the dynstr. All unreferenced symbols are removed if --clean-strtab option is given and the section is shrunk. It could solve issues #453 and #449.

However I found that the binary actually grows by about 4KB. I think this is due to this note section normalizing. I'm not sure why this is done and why this should be needed. Why not leave all sections before the one(s) that are modified alone?

Copy link
Collaborator

@brenoguim brenoguim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it seems there are many changes in this PR, some needed for the functionality and some cosmetic. This makes it a bit hard to review with confidence.

Also, adding a test is pretty important, so it would go a long way in helping the PR to be accepted.

src/patchelf.cc Show resolved Hide resolved
src/patchelf.cc Outdated

#include <argz.h>

class mymap : public map<symstr, forward_list<symref>, Comparator> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps one would want a better name for this class

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you suggest? symmap or such?

src/patchelf.cc Outdated
auto strTab = (char *)fileContents->data() + rdi(shdrDynStr.sh_offset);
int verneednum = 0;
int verdefnum = 0;
//mkref(strTab, rdi(shdrDynStr.sh_size)); // new strInfo()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup old code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird, this line is gone already here.... commit "remove some leftover comment-out lines " should have removed it.

changed = true;
}

template<ElfFileParams>
void ElfFile<ElfFileParamNames>::clearSymbolVersions(const std::set<std::string> & syms)
void ElfFile<ElfFileParamNames()>::clearSymbolVersions(const std::set<std::string> & syms)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer if these changes weren't made in this PR, or at least in a different commit

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll (try to) split it in cosmetic commit and functional commit.

src/patchelf.cc Outdated
}

struct symstr {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps one could use a std::string_view for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe... This looks most like the original C-implementation I did. However, I need none of the 'burden' of string_view.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looked into string_view and the constructor string_view(char s) can be useful to replace the argz_ functions too. Good tip,

src/patchelf.cc Outdated
Comment on lines 432 to 486
static uint64_t roundUp(uint64_t n, uint64_t m)
{
return ((n - 1) / m + 1) * m;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change related?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, part of the 'cosmetic' changes.

src/patchelf.cc Outdated
@@ -600,7 +614,7 @@ void ElfFile<ElfFileParamNames>::writeReplacedSections(Elf_Off & curOff,
const std::string & sectionName = i.first;
Elf_Shdr & shdr = findSectionHeader(sectionName);
if (rdi(shdr.sh_type) != SHT_NOBITS)
memset(fileContents->data() + rdi(shdr.sh_offset), 'X', rdi(shdr.sh_size));
memset(fileContents->data() + rdi(shdr.sh_offset), 'Y', rdi(shdr.sh_size));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left-over from my debugging

src/patchelf.cc Outdated
Comment on lines 715 to 731

curOff += roundUp(i->second.size(), sectionAlignment);
debug("curoff %d, size %d, align %d\n", curOff, i->second.size(), sectionAlignment);
curOff += roundUp((Elf_Off)i->second.size(), sectionAlignment);
debug("-> next curoff %d\n", curOff);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debugging left-over

src/patchelf.cc Outdated
@@ -1514,7 +1529,7 @@ void ElfFile<ElfFileParamNames>::modifyRPath(RPathOp op,
size_t rpathSize = 0;
if (rpath) {
rpathSize = strlen(rpath);
memset(rpath, 'X', rpathSize);
memset(rpath, 0, rpathSize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debugging left-over

Comment on lines +1606 to 1612
if (!changed) return;
auto newsize = rdi(shdrDynamic.sh_size) - sizeof(Elf_Dyn) * (dyn - last);
wri(shdrDynamic.sh_size, newsize);
dynamic.resize(newsize);
debug("new .dynamic size %d\n", rdi(shdrDynamic.sh_size));
replaceSection(".dynamic", dynamic);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading indentation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this used to be if (changed) {...

@brenoguim
Copy link
Collaborator

I also forgot to say. You might want to enable the CI in your github fork so that you can make sure all tests are passing.

@brenoguim
Copy link
Collaborator

We are going to need to decide between my implementation and yours.
Mine has the disadvantage of undoing the sharing, but I intend to implement sharing detection on it, which would then create sharing for the modifications.
Yours persist the sharing, but would not add sharing. I suspect adding sharing detection on my implementation would be more fiting the algorithm, while in yours it would need a more significant rewrite.
What do you think about it?

@rohoog
Copy link
Author

rohoog commented Mar 6, 2023

Sorry, I didn't look into your implementation. Mine was a C++ learning experience for me. Do with it as you please.
I addressed your review comments, except the split of the initial big commit and the adding of the test case (if you did a test-case, then that would probably work for my code too). Please have a look.
Initially I also had the idea of adding the symbol sharing, until I found that the gnu binutils already does that, so I didn't bother. Since the 'patching' is about removing DT_NEEDED, I don't think there will be new opportunities for sym sharing.

I do find the growing of the binary disturbing. Also the number of program headers increases by the 'rewriteSections' method. Should I make a new issue for that?

@brenoguim
Copy link
Collaborator

Since the 'patching' is about removing DT_NEEDED, I don't think there will be new opportunities for sym sharing.

Patchelf has many operations that will grow the dynstr table. For example, changing RPATH, or renaming symbols. So redoing the sharing could improve in those cases.

I do find the growing of the binary disturbing. Also the number of program headers increases by the 'rewriteSections' method. Should I make a new issue for that?

There is definitely room for improvement indeed. Patchelf will run a generic algorithm to change the place of modified sections, and to do that it needs to add more loads.
But if you know that you did not grow any section, you can just tune the algorithm to avoid calling rewriteSections.

@r-hoogenboom
Copy link

My changes are all about shrinking sections, not growing. Do you mean that I don't need to call rewriteSections after shrinking? What else should I call to 'propagate' the shrinking into the section headers and program headers and yield a smaller binary?

@brenoguim
Copy link
Collaborator

I understand that. I just meant that it would be useful to redo the sharing because some options in patchelf add new strings.

If you just set changed=true without rewriting the sections, the binary will have exactly the same size. At least it won't grow. Patchelf will just update the changed bytes in the file.

If you want to actually shrink, I guess this needs to be implemented.

It's complicated because if you want to keep the section in the same place as it was before and make the file smaller, that would mean all other sections would be loaded in an earlier address space, which breaks the binary.
If you want to move the new smaller section to another place in the virtual space, you need to add a new program header table entry. And maybe there isn't room for that, forcing you to displace the next section into another location. But to displace sections, you might have to normalize note sections... So you get to the issue existing today.

@r-hoogenboom
Copy link

HMM, I expected that exactly this was the responsibility of rewriteSections. If it can deal with grown sections, why wouldn't it be able to deal with shrunk sections.

Note sections are just arbitrary 'advisory' non-critical things (like which assembler assembled the binary, which linker linked it, etc), right? Do they really refer to absolute locations of data in the binary?

I think patchelf is growing to be a full-blown linker (and un-linker) and by that, one that supports many architectures and bit-nesses. I think patchelf should leave stuff alone which it can, try to leave as much as possible to the linker that originally built the binary. But that's just my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants