-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed index counting bug in attribution #89
Conversation
@MarioKrenn6240 @akshat998 - It looks like all the dataset tests failed because the SMILES couldn't parse correctly. Maybe we need to pin a dependency here? |
we are on it, thanks! |
selfies/encoder.py
Outdated
# account for branch symbol because it is inserted after | ||
for i in range(start, end): | ||
attribution_maps[i].index += len(Q_as_symbols) + 1 | ||
derived.append(branch_symbol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay! I believe the reason the tests are failing might be the moving of this line, since it swaps the order of SELFIES index and branch symbols.
@@ -442,7 +446,7 @@ def _derive_smiles_from_fragment( | |||
token = atom_to_smiles(curr_atom) | |||
derived.append(token) | |||
attribution_maps.append(AttributionMap( | |||
len(derived) - 1 + attribution_index, | |||
_strlen(derived) - 1 + attribution_index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, I am a little hesitant to make this change because it joins the entire derived
list together every call. Would it be possible to implement this as some sort of running counter instead?
Hi @alstonlo - you have a sharp eye! I did not spot that line switch - thanks. For the second comment about timing, I did try what you said and found it to add quite a bit of complexity to the code because there are possible empty tokens added that require if/else statements for a counter approach. I decided to just join them (usually quite fast) to count instead. I've put timing information below. Some paths are slower/faster on PR so I don't think it's a meaningful impact on performance. master 0f0061a 16.10s call tests/test_on_datasets.py::test_roundtrip_translation[test_path1]
10.49s call tests/test_on_datasets.py::test_roundtrip_translation[test_path9]
9.98s call tests/test_on_datasets.py::test_roundtrip_translation[test_path11]
9.93s call tests/test_on_datasets.py::test_roundtrip_translation[test_path3]
6.23s call tests/test_on_datasets.py::test_roundtrip_translation[test_path14]
6.11s call tests/test_on_datasets.py::test_roundtrip_translation[test_path13]
4.94s call tests/test_on_datasets.py::test_roundtrip_translation[test_path10]
3.60s call tests/test_on_datasets.py::test_roundtrip_translation[test_path2]
2.23s call tests/test_on_datasets.py::test_roundtrip_translation[test_path4]
1.98s call tests/test_on_datasets.py::test_roundtrip_translation[test_path5]
1.86s call tests/test_on_datasets.py::test_roundtrip_translation[test_path12]
1.58s call tests/test_on_datasets.py::test_roundtrip_translation[test_path6]
0.62s call tests/test_on_datasets.py::test_roundtrip_translation[test_path7]
0.24s call tests/test_on_datasets.py::test_roundtrip_translation[test_path8]
0.04s call tests/test_on_datasets.py::test_roundtrip_translation[test_path0] PR e51e3dc 15.74s call tests/test_on_datasets.py::test_roundtrip_translation[test_path1]
10.62s call tests/test_on_datasets.py::test_roundtrip_translation[test_path9]
10.04s call tests/test_on_datasets.py::test_roundtrip_translation[test_path11]
9.71s call tests/test_on_datasets.py::test_roundtrip_translation[test_path3]
6.45s call tests/test_on_datasets.py::test_roundtrip_translation[test_path14]
5.91s call tests/test_on_datasets.py::test_roundtrip_translation[test_path13]
4.80s call tests/test_on_datasets.py::test_roundtrip_translation[test_path10]
3.54s call tests/test_on_datasets.py::test_roundtrip_translation[test_path2]
2.27s call tests/test_on_datasets.py::test_roundtrip_translation[test_path4]
2.03s call tests/test_on_datasets.py::test_roundtrip_translation[test_path5]
1.86s call tests/test_on_datasets.py::test_roundtrip_translation[test_path12]
1.60s call tests/test_on_datasets.py::test_roundtrip_translation[test_path6]
0.62s call tests/test_on_datasets.py::test_roundtrip_translation[test_path7]
0.24s call tests/test_on_datasets.py::test_roundtrip_translation[test_path8]
0.04s call tests/test_on_datasets.py::test_roundtrip_translation[test_path0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I've revised the indexing added in 2.1.0 to follow SMILES string index exactly and SELFIES token index exactly. Previously it used the current length of the SMILES string list which could contain empty elements. In SELFIES, it incorrectly treated order of index. This PR also increments version/updates changelog.
New behavior: