-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New mapping file format and associated processor overhaul #172
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am almost at Schiphol, so here are some very preliminary thoughts. I get that the new mapping is not compatible with the old one. We therefore need a script (even quick and dirty) to convert the old mappings to the new format. Such script would also allow to convert mappings from backward now that the compatibility is lost with them. An other possibility is to add some flexibility in the new format to remain compatible with the old files.
What is required for the most simple use case? It looks like a lot of things to define here. There should not be the need to define more than what the current format defines.
vermouth/map_parser.py
Outdated
class MappingDirector: | ||
RESNAME_NUM_SEP = '#' | ||
RESIDUE_ATOM_SEP = ':' | ||
MAP_TYPES = ['block', 'modification'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling the section "block" is error prone. It looks like you are defining a block instead of a mapping. "mapping block" may be better?
"modification" is tricky as well. Indeed, as the force field and the mapping files now look so similar, we have to expect users mistaking one for the other. (We can also think about merging the parsers at some point to save on code duplication).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not accidental that the fileformats resemble eachother. And merging parsers is a great idea, but for later.
I agree with the error prone-ness of the current headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent most of my day reviewing the PR. I still have the main meat to review: the do_mapping processor. Though, I start slowing down reading the code which mean it is time to go home. In the meantime, here are the comments so far.
The new format is powerful, but quite tedious for the simple cases (being, cases that do not cross residues). As I stated in my previous read through, I'd like to see how it looks like on a simple case (e.g. leucine), and how it can be reduced so the user has to provide only the very minimum.
I would like to see the format of each section at least demonstrated in the docstring of the corresponding parsing method. It makes understanding the method much easier, and it will be of tremendous help when writing the file format documentation.
I made multiple comments about LinkPredicate.match
as I understood better the ins and outs by reading the code further. I do not like the change, I'd rather have attribute_match
be modified to handle the case were an attribute is a LinkPredicate
in the reference Molecule
instance. Having the case handled in LinkPredicate.match
and its subclass equivalents puts the responsibility of that piece of logic in the hand of the ones writing the link predicates, where it should not be.
What I read so far is good work despite my comments. There are things I think should be fixed, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuation of yesterday.
What happens if you do not define a reference in your mapping? I would expect that if you have a residue-crossing mapping, then you get an error at parsing time; if you have a non-crossing mapping, then the first bead if used.
vermouth/processors/do_mapping.py
Outdated
return out | ||
|
||
|
||
def map_modifications(molecule, graph_out, mol_to_out, out_to_mol): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before reading it, I would tell that this function is too long and need to be split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest commit should help significantly.
vermouth/processors/do_mapping.py
Outdated
all_matches.append((mol_to_mod, modification)) | ||
LOGGER.info('Applying modification mapping {}', modification.name, type='general') | ||
mod_to_mol = defaultdict(dict) | ||
for mol_idx, mod_idxs in mol_to_mod.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn these names are close to each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to suggestions ;)
These do describe what they contain though.
To address some of the questions:
The only missing feature so far is easily sharing atoms between beads. At the moment you'd have to specify that atom multiple times. Not defining a reference is fine (todo, implement for modifications where it's required). do_mapping:355 takes care of warning of garbage node attributes. I'll try to process your comments, but I also kind of promised The Boss a draft of the chapter before the holidays, so writing has priority at the moment. |
No worries. Writing comes first. |
I was afraid you'd say that :') |
Just a note. Earlier this week, we discussed about the options to have the
We settled on the first one, though after some thoughts, I am almost sure that we must adopt the second instead:
|
Do you do any filtering on |
No, I don't. I'm also not convinced it'll make sense; I can't come up with a usage example. Mappings map from one FF to another, and I'm not sure I see where the meta comes into that. |
My question barely makes sense; likely because I asked it with a fried brained. Let's try again. Links have a In the case of links, the |
I ran into a problem with test_map_input.py. The semantics of backmapping files changed in that the blocks that are mapped should be in the relevant forcefield(s). The best solution would be to make a force_fields fixture that defines what's needed, but that's quite a bit of rather tedious work. Do you have any blinding insight in how to fix it? |
Do you need any information in the forcefield? If you only need confirmation that the block exists, you could mock the forcefield with an object that just says OK. |
Unfortunately yes. I need Blocks with nodes with the correct 'atomname' attribute. |
From what I see, there is enough in the test cases to build a fake force field with the needed blocks. If you take the target mapping information, you have the name of the block, and the atom names in it. If you have a function that digest a mapping reference and outputs a minimally populated force field, you should be able to fix the tests. I would not make it a fixture, though, just a function that gets called in the tests. |
d6febf3
to
e9c300d
Compare
4de2735
to
6d85301
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not have time to read as much as I want, but here are a couple of comments I managed to write between two interruptions.
Also, if you change how termini are handled, you should change it in all the martini-based force fields.
raise NotImplementedError | ||
|
||
def __eq__(self, other): | ||
# Should maybe be: | ||
# return (isinstance(other, self.__class__) or isinstance(self, other.__class__))\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing the classes directly seems better than the isinstance
approach. It is always a question to know if a subclass should be considered equal to its parent, in this case I think it should not. The non-commented code looks better to me than the commented code, or even than the code currently in master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hedging on this. I guess we'll only really know once we need and make a subclass of e.g. the ChoicePredicate.
I suggest to leave the commented code there until then.
Some thoughts I had this morning: I suggested to rename the old mapping files from In insight, though, the renaming is likely not a good idea. First, the Based on these thoughts, I suggest to revert 7b75264 so that the old mapping files keep the This way, we still have a suitable name for the new mapping format we intend to keep; but we do not break compatibility, and we do not introduce a misleading name. We do not encourage users to check for the compatibility change to do on the backward like files; though, this encouragement requires anyway to introduce some meaningful error messages that we still can implement. |
I know you're busy and all, but what more is needed to progress this PR? |
Last time I checked is was almost there. I need to give a new look to the termini, and to run in on real cases. I'll try to do that before the end of the week. In the meantime, I can feel pylint complaining at the very least because of missing docstrings; so it would be good that you run pylint and please the mighty linter. Could you also have a look at what you are not covering in your patch? I'd aim for 100% coverage for the parser, and the new processors. The mapping is a big beast so I will not cry if the coverage is not complete, but try to still have a look. |
Part of the untested code is (abstract) base class methods, so that makes sense. I'll have a more detailed look at the coverage and pylint to see if I can squeeze out a few more percent :) |
You can mark the abstract method to be skipped by coverage. Would be cleaner than getting use to uncovered lines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't all be fully okay/optimal. But changes for improvement should probably be planned for future PRs now. One worry is that there are constraints posed by the file formats that are tricky to tackle later on without revising the file formats and possibly giving problems for backwards compatibility. However, I can't exactly put my finger on it now. One is to possibly reformat a label as AA1:CA in a broken up form, where AA1: sets a scope and the corresponding particles are on subsequent lines. Approving now. From here better to just carefully run over the working code after this PR...
7ee1ec7
to
d1265f4
Compare
- Implements PTM mapping in do_mapping. It's a greedy covering of everything not covered by normal mappings. - Implement a new parser for a new mapping file format in map_parser. This one allows for mappings to cross residue boundaries without wanting to hurt yourself. - Make sure blocks read from RTP get a 'resname' attribute - Make ForceField.modifications a dict of modification name: modification - Relax forcefield equality test - Make LinkPredicates match with other LinkPredicates Most of the changes are probably temporary Fix a bunch of docstrings and broken tests Add docstrings Improve some do_mapping comments Implement macro expansion for mappings In addition, add some sample mapping files, and a first bit of cleaning
Address most comments involving map_parser Finish parser infra; improve docstrings Make SectionLineParser pass the closed section to finalize_section Implement flexible mappings folders Update do_mapping tests Appease intersphinx Add tests for modern mapping file parser. Fix minor bugs/missing features Add simple test for multiple mappings per file Make node equality order independent. Will still cause issues for edges though, but I'm ok with that until it breaks.
Fix issue with reference atoms. Add attribute_must to do_mapping Attribute_must is there for those attributes that should be in the output graph, but can either come from the input molecule, OR the blocks; and they should be taken from the blocks preferentially.
Fix simple changes in map_input tests Make test_map_input new mapping style compliant
Also, propagate some minor changes to tests
Previously, interactions from modifications could not be added to the final molecule. Sem-Ver: bugfix Fix broken test
…iles Caused an issue where atoms could not be found because the nodes in the blocks did have the "correct" resid (1, 2, ...), but the associated identifiers would all have resid 1. Sem-Ver: bugfix
In addition, since references have no (sane) resid, that should be taken from residue being repaired when printing missing atoms. Sem-Ver: bugfix
Sem-Ver: bugfix
Add tests for cover Make naming of mappings more consistent; add integrative test for mapping
a47bd39
to
40d6895
Compare
This is a first sample of the new mapping file format, and the associated processor. This is an early stage PR so that @jbarnoud can play with it and find all the outstanding issues. In the meantime I'll be working on paying the accrued maintenance debt on this (cleaning, tests, docs).
TODO:
Modifications remove atoms Modifications can't just remove atoms #98Out of scopeFixes #165
Fixes #160
Fixes #128
Fixes partially #124
Fixes #45