Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NFC] Refactor symbol table creation in binary writer #1535

Merged
merged 1 commit into from
Sep 10, 2020

Conversation

wingo
Copy link
Contributor

@wingo wingo commented Sep 8, 2020

This commit refactors the representations of symbols in the binary writer. It will allow us to fix #1534 in a followup.

src/binary-writer.cc Outdated Show resolved Hide resolved
@binji binji requested a review from sbc100 September 9, 2020 01:06
Copy link
Member

@binji binji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but let @sbc100 take a look too. Can you modify/add on to one of the existing tests (perhaps test/binary/linking_section.txt?)

src/binary-writer.cc Outdated Show resolved Hide resolved
auto iter = symtab_.find(name);
if (iter != symtab_.end()) {
Index sym_index = iter->second;
assert(symbols_[sym_index].type() == T::type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether this assert is safe; seems like it could be invalid based on malformed data in wasm module. @sbc100 do you know?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; let's print a warning in that case instead of failing.

In any case, the current logic doesn't actually follow the toolchain-conventions invariants; symbols that are neither imported nor exported by a module have no name uniqueness requirements AFAIU. I will fix the logic in a followup.

src/binary-writer.cc Outdated Show resolved Hide resolved
src/binary-writer.cc Outdated Show resolved Hide resolved
@wingo wingo force-pushed the refactor-symbol-table-writing branch from bc34d78 to f185838 Compare September 9, 2020 12:58
This commit refactors the representations of symbols in the binary
writer.  It will allow us to fix WebAssembly#1534 in a followup.
@wingo wingo force-pushed the refactor-symbol-table-writing branch from f185838 to 55f2a91 Compare September 9, 2020 13:17
@wingo
Copy link
Contributor Author

wingo commented Sep 9, 2020

Updated patch addresses review comments, except regarding tests: because this patch has no functional change (it doesn't fixed the linked issues yet), no test needs to be modified. I'll extend the linking-section.txt test in the followup though; thanks for the pointer!

@wingo
Copy link
Contributor Author

wingo commented Sep 9, 2020

(And apologies wrt force-pushes; I gather the convention is to push commits to fix nits, then you rebase/squash when landing -- will do that next time.)

@sbc100
Copy link
Member

sbc100 commented Sep 9, 2020

(And apologies wrt force-pushes; I gather the convention is to push commits to fix nits, then you rebase/squash when landing -- will do that next time.)

I don't think we have any rules around this. I basically always use a rebase flow which means I often/always have to force push to github PRs (even if i don't squash, which I also occasionally do too).

@wingo wingo mentioned this pull request Sep 9, 2020
Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Maybe add "NFC" to the PR title?

src/binary-writer.cc Show resolved Hide resolved
bool explicit_name() const { return flags() & WABT_SYMBOL_FLAG_EXPLICIT_NAME; }
bool no_strip() const { return flags() & WABT_SYMBOL_FLAG_NO_STRIP; }

bool IsFunction() const { return type() == Function::type; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use SymbolType::Function here? What purpose does the Function::type serve?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the ::type definitions is to allow InternSymbol to know what kind of symbol to make. I admit that I wasn't sure which name to use here in IsFunction and friends; if you prefer that I change it, I am happy to do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it will make more sense if/when we add data symbol but for now it looks like you could avoid a lot of complexity and avoid InternSymbol being a template at all by just passing the SymbolType to InternSymbol.

It seems like having a separate class nested inside of Symbol, for each different type of symbol payload adds a fair amount of complexity, but perhaps that is justified and will make more sense once the followups land?

lgtm aside from this minor concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are saying, and if every symbol just had an Index payload it would be entirely clear-cut. I guess since I saw the SymbolType::Data symbols having additional fields, I found it confusing for Symbol to have unused fields for e.g. SymbolType::Function symbols, which would propagate around as arguments to e.g. InternSymbol and related functions. But that is a theoretical concern given that wabt doesn't know anything about data symbols, and my patches don't change that.

The followup #1537 doesn't do anything amazing here; this is just a question of data representation. If you like I can change to a less structured class Symbol { string name; uint8_t flags; SymbolType type; Index index; Offset offset; Address size; } and assert that offset/size accessors only happen for data symbols (of which we will have none), or even exclude data symbols from the representation. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm good with this approach. What do you think binji?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy either way, tbh. I like having more explicit types (i.e. more structured) since it seems like it will be easier to expand if/when we add new symbol values. OTOH, YAGNI and all :)

@wingo
Copy link
Contributor Author

wingo commented Sep 10, 2020

Thanks for the review @binji and @sbc100 :) AFAICS this can be merged. (Cheeky request: may I have merging privileges? Very OK to say no.)

@@ -312,34 +401,49 @@ Index BinaryWriter::GetEventVarDepth(const Var* var) {
return var->index();
}

template <typename T>
Index BinaryWriter::InternSymbol(const std::string& name, uint8_t flags,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this have a better name? Perhaps addToSymtab or getSymbolIndex?

bool explicit_name() const { return flags() & WABT_SYMBOL_FLAG_EXPLICIT_NAME; }
bool no_strip() const { return flags() & WABT_SYMBOL_FLAG_NO_STRIP; }

bool IsFunction() const { return type() == Function::type; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it will make more sense if/when we add data symbol but for now it looks like you could avoid a lot of complexity and avoid InternSymbol being a template at all by just passing the SymbolType to InternSymbol.

It seems like having a separate class nested inside of Symbol, for each different type of symbol payload adds a fair amount of complexity, but perhaps that is justified and will make more sense once the followups land?

lgtm aside from this minor concern.

@wingo wingo changed the title Refactor symbol table creation in binary writer [NFC] Refactor symbol table creation in binary writer Sep 10, 2020
@binji
Copy link
Member

binji commented Sep 10, 2020

may I have merging privileges?

Fine with me, I kinda thought you already could as long as there was a reviewer. Lemme check the settings.

@binji binji merged commit 774a115 into WebAssembly:master Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wat2wasm --relocatable should record exports in linking symtab
3 participants