Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New mapping file format and associated processor overhaul #172

Merged
merged 21 commits into from
Sep 13, 2019

Conversation

pckroon
Copy link
Member

@pckroon pckroon commented Dec 19, 2018

This is a first sample of the new mapping file format, and the associated processor. This is an early stage PR so that @jbarnoud can play with it and find all the outstanding issues. In the meantime I'll be working on paying the accrued maintenance debt on this (cleaning, tests, docs).

TODO:

Fixes #165
Fixes #160
Fixes #128
Fixes partially #124
Fixes #45

@pckroon pckroon added the WIP label Dec 19, 2018
Copy link
Collaborator

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am almost at Schiphol, so here are some very preliminary thoughts. I get that the new mapping is not compatible with the old one. We therefore need a script (even quick and dirty) to convert the old mappings to the new format. Such script would also allow to convert mappings from backward now that the compatibility is lost with them. An other possibility is to add some flexibility in the new format to remain compatible with the old files.

What is required for the most simple use case? It looks like a lot of things to define here. There should not be the need to define more than what the current format defines.

vermouth/map_parser.py Outdated Show resolved Hide resolved
class MappingDirector:
RESNAME_NUM_SEP = '#'
RESIDUE_ATOM_SEP = ':'
MAP_TYPES = ['block', 'modification']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling the section "block" is error prone. It looks like you are defining a block instead of a mapping. "mapping block" may be better?

"modification" is tricky as well. Indeed, as the force field and the mapping files now look so similar, we have to expect users mistaking one for the other. (We can also think about merging the parsers at some point to save on code duplication).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not accidental that the fileformats resemble eachother. And merging parsers is a great idea, but for later.
I agree with the error prone-ness of the current headers.

vermouth/map_parser.py Outdated Show resolved Hide resolved
@jbarnoud jbarnoud self-assigned this Jan 7, 2019
Copy link
Collaborator

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent most of my day reviewing the PR. I still have the main meat to review: the do_mapping processor. Though, I start slowing down reading the code which mean it is time to go home. In the meantime, here are the comments so far.

The new format is powerful, but quite tedious for the simple cases (being, cases that do not cross residues). As I stated in my previous read through, I'd like to see how it looks like on a simple case (e.g. leucine), and how it can be reduced so the user has to provide only the very minimum.

I would like to see the format of each section at least demonstrated in the docstring of the corresponding parsing method. It makes understanding the method much easier, and it will be of tremendous help when writing the file format documentation.

I made multiple comments about LinkPredicate.match as I understood better the ins and outs by reading the code further. I do not like the change, I'd rather have attribute_match be modified to handle the case were an attribute is a LinkPredicate in the reference Molecule instance. Having the case handled in LinkPredicate.match and its subclass equivalents puts the responsibility of that piece of logic in the hand of the ones writing the link predicates, where it should not be.

What I read so far is good work despite my comments. There are things I think should be fixed, though.

bin/flup/jon.map Outdated Show resolved Hide resolved
bin/flup/modifications.map Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Show resolved Hide resolved
vermouth/molecule.py Outdated Show resolved Hide resolved
vermouth/parser_utils.py Outdated Show resolved Hide resolved
vermouth/parser_utils.py Outdated Show resolved Hide resolved
vermouth/processors/average_beads.py Outdated Show resolved Hide resolved
vermouth/processors/canonicalize_modifications.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuation of yesterday.

What happens if you do not define a reference in your mapping? I would expect that if you have a residue-crossing mapping, then you get an error at parsing time; if you have a non-crossing mapping, then the first bead if used.

return out


def map_modifications(molecule, graph_out, mol_to_out, out_to_mol):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before reading it, I would tell that this function is too long and need to be split.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest commit should help significantly.

vermouth/processors/do_mapping.py Show resolved Hide resolved
vermouth/processors/do_mapping.py Show resolved Hide resolved
all_matches.append((mol_to_mod, modification))
LOGGER.info('Applying modification mapping {}', modification.name, type='general')
mod_to_mol = defaultdict(dict)
for mol_idx, mod_idxs in mol_to_mod.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn these names are close to each other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to suggestions ;)
These do describe what they contain though.

vermouth/processors/do_mapping.py Show resolved Hide resolved
vermouth/processors/do_mapping.py Show resolved Hide resolved
vermouth/processors/do_mapping.py Outdated Show resolved Hide resolved
vermouth/tests/test_molecule.py Outdated Show resolved Hide resolved
@pckroon
Copy link
Member Author

pckroon commented Jan 14, 2019

To address some of the questions:

[ block mapping ]
[ from blocks]
LEU
[ to blocks ]
LEU
[ from ]
charmm
[ to ]
martini
[ mapping ]
N BB
H BB 0
C BB
O BB
CA BB
HA BB 0
CB SC1
HB1 SC1 0
HB2 SC1 0
CG SC1
HG SC1 0
CD1 SC1
HD11 SC1 0
HD12 SC1 0
HD13 SC1 0
CD2 SC1
HD21 SC1 0
HD22 SC1 0
HD23 SC1 0

The only missing feature so far is easily sharing atoms between beads. At the moment you'd have to specify that atom multiple times.

Not defining a reference is fine (todo, implement for modifications where it's required). do_mapping:355 takes care of warning of garbage node attributes.

I'll try to process your comments, but I also kind of promised The Boss a draft of the chapter before the holidays, so writing has priority at the moment.

@jbarnoud
Copy link
Collaborator

I'll try to process your comments, but I also kind of promised The Boss a draft of the chapter before the holidays, so writing has priority at the moment.

No worries. Writing comes first.

@pckroon
Copy link
Member Author

pckroon commented Jan 14, 2019

I'll try to process your comments, but I also kind of promised The Boss a draft of the chapter before the holidays, so writing has priority at the moment.

No worries. Writing comes first.

I was afraid you'd say that :')
Feel free to resolve/fix your own comments in the meantime ;-)

@pckroon pckroon mentioned this pull request Jan 21, 2019
@jbarnoud
Copy link
Collaborator

Just a note. Earlier this week, we discussed about the options to have the parser_section account for the section hierarchy. We suggested two data structures for METH_DICT:

  • a recursive dict of dicts where keys are section names, and the values are either a dict of the child sections, or a callback for leaf sections;
  • a flat dict where keys are tuples describing the path in the tree, and values are callbacks.

We settled on the first one, though after some thoughts, I am almost sure that we must adopt the second instead:

  • Zen of python says "Flat is better than nested";
  • We need to maintain a deque to keep track of where we are in the tree, and a tuple van be almost directly match to that deque;
  • The force field parser have sections that need both a callback and children, which is not compatible with the recursive dict approach.

@jbarnoud
Copy link
Collaborator

Do you do any filtering on molecule.molecule_meta as we do with links? I am not completely sure it makes sense as it does not work for blocks, though the parser authorize to define modification wide meta attributes.

@pckroon
Copy link
Member Author

pckroon commented Jan 30, 2019

No, I don't. I'm also not convinced it'll make sense; I can't come up with a usage example. Mappings map from one FF to another, and I'm not sure I see where the meta comes into that.
Also, I can't parse your last sentence.

@jbarnoud
Copy link
Collaborator

My question barely makes sense; likely because I asked it with a fried brained. Let's try again.

Links have a molecule_meta attribute. When links are applied, the molecule_meta have to match the meta of the molecule. Because modifications are actually links, they also have this molecule_meta attribute. Therefore, they could be filtered when testing for isomorphism based on the matching with molecule.meta.

In the case of links, the molecule_meta allows to decorate the molecule based on the feature requested, and apply the links accordingly. It could work the same for modification, and it would make it consistent. Though I am not fully sure it makes sense.

vermouth/forcefield.py Outdated Show resolved Hide resolved
@pckroon
Copy link
Member Author

pckroon commented Feb 11, 2019

I ran into a problem with test_map_input.py. The semantics of backmapping files changed in that the blocks that are mapped should be in the relevant forcefield(s).

The best solution would be to make a force_fields fixture that defines what's needed, but that's quite a bit of rather tedious work. Do you have any blinding insight in how to fix it?

@jbarnoud
Copy link
Collaborator

Do you need any information in the forcefield? If you only need confirmation that the block exists, you could mock the forcefield with an object that just says OK.

@pckroon
Copy link
Member Author

pckroon commented Feb 12, 2019

Unfortunately yes. I need Blocks with nodes with the correct 'atomname' attribute.

@jbarnoud
Copy link
Collaborator

From what I see, there is enough in the test cases to build a fake force field with the needed blocks. If you take the target mapping information, you have the name of the block, and the atom names in it. If you have a function that digest a mapping reference and outputs a minimally populated force field, you should be able to fix the tests. I would not make it a fixture, though, just a function that gets called in the tests.

@pckroon
Copy link
Member Author

pckroon commented Feb 19, 2019

I'm done with this for now, it's getting way too large anyway.
I'll leave #154 and #153 open for the time being. I hardcoded half a solution to #154 so that resids are always kept.

Copy link
Collaborator

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not have time to read as much as I want, but here are a couple of comments I managed to write between two interruptions.

Also, if you change how termini are handled, you should change it in all the martini-based force fields.

vermouth/tests/test_mapping_integrative.py Outdated Show resolved Hide resolved
vermouth/forcefield.py Outdated Show resolved Hide resolved
vermouth/molecule.py Outdated Show resolved Hide resolved
raise NotImplementedError

def __eq__(self, other):
# Should maybe be:
# return (isinstance(other, self.__class__) or isinstance(self, other.__class__))\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing the classes directly seems better than the isinstance approach. It is always a question to know if a subclass should be considered equal to its parent, in this case I think it should not. The non-commented code looks better to me than the commented code, or even than the code currently in master.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hedging on this. I guess we'll only really know once we need and make a subclass of e.g. the ChoicePredicate.
I suggest to leave the commented code there until then.

@jbarnoud
Copy link
Collaborator

jbarnoud commented Mar 1, 2019

Some thoughts I had this morning: I suggested to rename the old mapping files from *.map to *.backmap. The reasoning was that *.map would be better to keep for the format we intend to use on the long run, and that having to rename backward compatible files would push users to chack they did the needed compatibility changes.

In insight, though, the renaming is likely not a good idea. First, the .backmap extension is misleading: the files have nothing to do with backmapping. Then, we do have users now since Paulo directed the protein task force for Martini3 toward martinize2. The renaming means that we introduce a backward incompatible change to users that still need to be convinced about the benefits of the new version.

Based on these thoughts, I suggest to revert 7b75264 so that the old mapping files keep the .map extension, and to rename the new mapping files from .map to .mapping.

This way, we still have a suitable name for the new mapping format we intend to keep; but we do not break compatibility, and we do not introduce a misleading name. We do not encourage users to check for the compatibility change to do on the backward like files; though, this encouragement requires anyway to introduce some meaningful error messages that we still can implement.

@pckroon
Copy link
Member Author

pckroon commented Mar 19, 2019

I know you're busy and all, but what more is needed to progress this PR?

@jbarnoud
Copy link
Collaborator

Last time I checked is was almost there. I need to give a new look to the termini, and to run in on real cases. I'll try to do that before the end of the week.

In the meantime, I can feel pylint complaining at the very least because of missing docstrings; so it would be good that you run pylint and please the mighty linter. Could you also have a look at what you are not covering in your patch? I'd aim for 100% coverage for the parser, and the new processors. The mapping is a big beast so I will not cry if the coverage is not complete, but try to still have a look.

@pckroon
Copy link
Member Author

pckroon commented Mar 19, 2019

Part of the untested code is (abstract) base class methods, so that makes sense. I'll have a more detailed look at the coverage and pylint to see if I can squeeze out a few more percent :)

@jbarnoud
Copy link
Collaborator

You can mark the abstract method to be skipped by coverage. Would be cleaner than getting use to uncovered lines.

Tsjerk
Tsjerk previously approved these changes Sep 13, 2019
Copy link
Collaborator

@Tsjerk Tsjerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't all be fully okay/optimal. But changes for improvement should probably be planned for future PRs now. One worry is that there are constraints posed by the file formats that are tricky to tackle later on without revising the file formats and possibly giving problems for backwards compatibility. However, I can't exactly put my finger on it now. One is to possibly reformat a label as AA1:CA in a broken up form, where AA1: sets a scope and the corresponding particles are on subsequent lines. Approving now. From here better to just carefully run over the working code after this PR...

pckroon and others added 20 commits September 13, 2019 16:55
- Implements PTM mapping in do_mapping. It's a greedy covering of everything not covered by normal mappings.
- Implement a new parser for a new mapping file format in map_parser. This one allows for mappings to cross residue boundaries without wanting to hurt yourself.
- Make sure blocks read from RTP get a 'resname' attribute
- Make ForceField.modifications a dict of modification name: modification
- Relax forcefield equality test
- Make LinkPredicates match with other LinkPredicates

Most of the changes are probably temporary

Fix a bunch of docstrings and broken tests

Add docstrings

Improve some do_mapping comments

Implement macro expansion for mappings

In addition, add some sample mapping files, and a first bit of cleaning
Address most comments involving map_parser

Finish parser infra; improve docstrings

Make SectionLineParser pass the closed section to finalize_section

Implement flexible mappings folders

Update do_mapping tests

Appease intersphinx

Add tests for modern mapping file parser. Fix minor bugs/missing features

Add simple test for multiple mappings per file

Make node equality order independent.

Will still cause issues for edges though, but I'm ok with that until it breaks.
Fix issue with reference atoms. Add attribute_must to do_mapping

Attribute_must is there for those attributes that should be in the output graph, but can either come from the input molecule, OR the blocks; and they should be taken from the blocks preferentially.
Fix simple changes in map_input tests

Make test_map_input new mapping style compliant
Also, propagate some minor changes to tests
Previously, interactions from modifications could not be added to the final molecule.
Sem-Ver: bugfix

Fix broken test
…iles

Caused an issue where atoms could not be found because the nodes in the blocks did have the "correct" resid (1, 2, ...), but the associated identifiers would all have resid 1.
Sem-Ver: bugfix
In addition, since references have no (sane) resid, that should be taken from residue being repaired when printing missing atoms.
Sem-Ver: bugfix
Add tests for cover

Make naming of mappings more consistent; add integrative test for mapping
@pckroon pckroon merged commit 1133126 into master Sep 13, 2019
@pckroon pckroon deleted the mapping-issue165 branch June 3, 2020 09:46
@pckroon pckroon restored the mapping-issue165 branch September 29, 2020 16:03
@pckroon pckroon deleted the mapping-issue165 branch September 29, 2020 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants