-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hungarian Notation] Variables with different names but same labels are renamed to the same variable. #317
base: main
Are you sure you want to change the base?
Conversation
To decide:
|
Also: Substitute variables in a batch (dictionary) instead of iterating over AST every time |
Will only go once through the whole tree, but will still call n times replace_variable for all n replacement pairs in the nodes. If the nodes would know the variables in their instructions, it could be faster. |
decompiler/pipeline/controlflowanalysis/variable_name_generation.py
Outdated
Show resolved
Hide resolved
decompiler/pipeline/controlflowanalysis/variable_name_generation.py
Outdated
Show resolved
Hide resolved
for var in self._variables: | ||
names[var] = var.copy(self.getVariableName(var)) | ||
|
||
self._ast.replace_variables_in_subtree(self._ast.root, names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you want to use the substitute here and not the replace function Hendrik is using?
My point is the following:
The variables var1 = Variable("var1", int, None, False, Variable("rax", int, 5), "some tag")
and variable var2 = Variable("var1", int, None, False, Variable("rax", int, 7), "some other tag")
have the hash and are equivalent, although some values differ.
Now, if you find variable var1
before variable var2
, then each occurrence of var2
is also replaced by variable Variable("iVar3", int, None, False, Variable("rax", int, 5), "some tag")
.
There are two options, depending on what we want resp. need:
- We say from this point on, we do not need the tags and old SSA-Variable names. Then we can ignore this information when creating the replacement variable, and both would be replaced by the same variable anyway.
- We really only change the name, but then each variable need to be replaced by its unique new variable, i.e., we really check that we have the same object.
In general, I would prefer the second, because we do not know whether we need it later, and it is more general. However, then I think your approach is not the most efficient one, because as soon as we find a variable, it would make sense to replace it in its expression.
However, when we decide that we do not need this information after this stage, you only have to update line 56, i.e., remove the unnecessary information when copying the variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get your point.
The old algorithm, which simply renamed every variable (var.name
), did it like in the second approach.
But my problem was, how messy variables are handled.
I have no grantee if a variable is a real copy of some other variable, or not.
For example: Let's say we collected three times var_0
, where the first and third are the same object and the second is a real copy.
If I would iterate over every variable and simply assign the .name
field, the third variable would change as I'm changing the first one.
Afterwards, as the third one is being processed, if will again be renamed, destroying the name completely.
Using a set, does not help, because as you said, their hashes are the same, which results in missing collected variables.
The only other thing i could do, is to keep a list of id
s of variables which I already renamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we are back to the point that pseudo should be immutable.
Okay, but I think Hendrik did exactly this, considering the id
s when replacing. So this is probably the cleanest way. Collecting variables according to their id
s, assigning the new names and replace them with this new visitor.
Seems very complicated for such an easy thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will look into his code/ask him today and probably block this request until his code is in main.
And YES, I HATE IT.
Someone would think that renaming variables would be an easy task, but nope.
Hopefully with the id's everything works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this solved now and if so, where and how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stage stores the new names as a dictionary and uses the SubstituteVisitor
to iterate only once over each dataflow object.
- Collection:
__init__()
ofHungarianScheme
- Iteration:
_rename_with_scheme
in the stage itself
- Renaming:
- done via callback method
_rename_variable
ofHungarianScheme
The AST visitor methods were removed.
decompiler/pipeline/controlflowanalysis/variable_name_generation.py
Outdated
Show resolved
Hide resolved
71996e6
to
0be5830
Compare
0be5830
to
20facab
Compare
Rebase onto main. |
Removing old review because of too much changes. |
* Copy variables when renaming * Fix: ArrayType,config,tests * Fix: Prefix+Arrays * Fix: Custom test --------- Co-authored-by: rihi <[email protected]> Co-authored-by: Spartak Ehrlich <[email protected]>
@@ -215,7 +215,7 @@ | |||
}, | |||
{ | |||
"dest": "variable-name-generation.notation", | |||
"default": "default", | |||
"default": "system_hungarian", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discuss on Thursday whether this should be the default or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
tests/pipeline/controlflowanalysis/test_variable_name_generation.py
Outdated
Show resolved
Hide resolved
# because the way our cfg works, each use site of each variable could theoretically have a different type | ||
# we just take the first assuming that they are all the same... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this. I thought out-of-SSA gives two variables only the same name, if they have the same type. The only exception could be aliased variables.
I am not happy with the first type, could we do something like the largest type or so? This way the order is important and I am afraid we get some non-deterministic behavior this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure we can change it.
But what exactly do you mean by largest type?
Most common in vars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, most common was my idea, but I am open to suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the most reasonable criteria.
Done with next commit.
closes #309