New mapping file format and associated processor overhaul #172

pckroon · 2018-12-19T10:38:23Z

This is a first sample of the new mapping file format, and the associated processor. This is an early stage PR so that @jbarnoud can play with it and find all the outstanding issues. In the meantime I'll be working on paying the accrued maintenance debt on this (cleaning, tests, docs).

TODO:

Modification mappings (Mappings and modifications #165)
Protein termini Links for protein termini not working as intended #160
Resids resids are not kept #154
Mappings define variables Allow mappings to define variables #153
? Residue renaming Handle residue renaming by PTMs in the data inputs #128
? Capped termini Capped termini #124
? Mappings transfer too much DoMapping transfer to many attributes #45
~~Modifications remove atoms Modifications can't just remove atoms #98~~ Out of scope

Fixes #165
Fixes #160
Fixes #128
Fixes partially #124
Fixes #45

jbarnoud

I am almost at Schiphol, so here are some very preliminary thoughts. I get that the new mapping is not compatible with the old one. We therefore need a script (even quick and dirty) to convert the old mappings to the new format. Such script would also allow to convert mappings from backward now that the compatibility is lost with them. An other possibility is to add some flexibility in the new format to remain compatible with the old files.

What is required for the most simple use case? It looks like a lot of things to define here. There should not be the need to define more than what the current format defines.

vermouth/map_parser.py

jbarnoud · 2018-12-21T14:05:26Z

vermouth/map_parser.py

+class MappingDirector:
+ RESNAME_NUM_SEP = '#'
+ RESIDUE_ATOM_SEP = ':'
+ MAP_TYPES = ['block', 'modification']


Calling the section "block" is error prone. It looks like you are defining a block instead of a mapping. "mapping block" may be better?

"modification" is tricky as well. Indeed, as the force field and the mapping files now look so similar, we have to expect users mistaking one for the other. (We can also think about merging the parsers at some point to save on code duplication).

It's not accidental that the fileformats resemble eachother. And merging parsers is a great idea, but for later.
I agree with the error prone-ness of the current headers.

vermouth/map_parser.py

jbarnoud

I spent most of my day reviewing the PR. I still have the main meat to review: the do_mapping processor. Though, I start slowing down reading the code which mean it is time to go home. In the meantime, here are the comments so far.

The new format is powerful, but quite tedious for the simple cases (being, cases that do not cross residues). As I stated in my previous read through, I'd like to see how it looks like on a simple case (e.g. leucine), and how it can be reduced so the user has to provide only the very minimum.

I would like to see the format of each section at least demonstrated in the docstring of the corresponding parsing method. It makes understanding the method much easier, and it will be of tremendous help when writing the file format documentation.

I made multiple comments about LinkPredicate.match as I understood better the ins and outs by reading the code further. I do not like the change, I'd rather have attribute_match be modified to handle the case were an attribute is a LinkPredicate in the reference Molecule instance. Having the case handled in LinkPredicate.match and its subclass equivalents puts the responsibility of that piece of logic in the hand of the ones writing the link predicates, where it should not be.

What I read so far is good work despite my comments. There are things I think should be fixed, though.

bin/flup/jon.map

bin/flup/modifications.map

bin/martinize2

vermouth/molecule.py

vermouth/parser_utils.py

vermouth/processors/average_beads.py

vermouth/processors/canonicalize_modifications.py

jbarnoud

Continuation of yesterday.

What happens if you do not define a reference in your mapping? I would expect that if you have a residue-crossing mapping, then you get an error at parsing time; if you have a non-crossing mapping, then the first bead if used.

jbarnoud · 2019-01-08T09:38:19Z

vermouth/processors/do_mapping.py

+ return out
+
+
+def map_modifications(molecule, graph_out, mol_to_out, out_to_mol):


Before reading it, I would tell that this function is too long and need to be split.

Latest commit should help significantly.

vermouth/processors/do_mapping.py

jbarnoud · 2019-01-08T11:03:15Z

vermouth/processors/do_mapping.py

+ all_matches.append((mol_to_mod, modification))
+ LOGGER.info('Applying modification mapping {}', modification.name, type='general')
+ mod_to_mol = defaultdict(dict)
+ for mol_idx, mod_idxs in mol_to_mod.items():


Damn these names are close to each other.

I'm open to suggestions ;)
These do describe what they contain though.

vermouth/processors/do_mapping.py

vermouth/tests/test_molecule.py

pckroon · 2019-01-14T11:04:12Z

To address some of the questions:

[ block mapping ]
[ from blocks]
LEU
[ to blocks ]
LEU
[ from ]
charmm
[ to ]
martini
[ mapping ]
N BB
H BB 0
C BB
O BB
CA BB
HA BB 0
CB SC1
HB1 SC1 0
HB2 SC1 0
CG SC1
HG SC1 0
CD1 SC1
HD11 SC1 0
HD12 SC1 0
HD13 SC1 0
CD2 SC1
HD21 SC1 0
HD22 SC1 0
HD23 SC1 0

The only missing feature so far is easily sharing atoms between beads. At the moment you'd have to specify that atom multiple times.

Not defining a reference is fine (todo, implement for modifications where it's required). do_mapping:355 takes care of warning of garbage node attributes.

I'll try to process your comments, but I also kind of promised The Boss a draft of the chapter before the holidays, so writing has priority at the moment.

jbarnoud · 2019-01-14T11:16:43Z

I'll try to process your comments, but I also kind of promised The Boss a draft of the chapter before the holidays, so writing has priority at the moment.

No worries. Writing comes first.

pckroon · 2019-01-14T13:19:06Z

I'll try to process your comments, but I also kind of promised The Boss a draft of the chapter before the holidays, so writing has priority at the moment.

No worries. Writing comes first.

I was afraid you'd say that :')
Feel free to resolve/fix your own comments in the meantime ;-)

jbarnoud · 2019-01-25T07:54:43Z

Just a note. Earlier this week, we discussed about the options to have the parser_section account for the section hierarchy. We suggested two data structures for METH_DICT:

a recursive dict of dicts where keys are section names, and the values are either a dict of the child sections, or a callback for leaf sections;
a flat dict where keys are tuples describing the path in the tree, and values are callbacks.

We settled on the first one, though after some thoughts, I am almost sure that we must adopt the second instead:

Zen of python says "Flat is better than nested";
We need to maintain a deque to keep track of where we are in the tree, and a tuple van be almost directly match to that deque;
The force field parser have sections that need both a callback and children, which is not compatible with the recursive dict approach.

jbarnoud · 2019-01-30T14:44:00Z

Do you do any filtering on molecule.molecule_meta as we do with links? I am not completely sure it makes sense as it does not work for blocks, though the parser authorize to define modification wide meta attributes.

pckroon · 2019-01-30T15:07:11Z

No, I don't. I'm also not convinced it'll make sense; I can't come up with a usage example. Mappings map from one FF to another, and I'm not sure I see where the meta comes into that.
Also, I can't parse your last sentence.

jbarnoud · 2019-01-30T21:15:08Z

My question barely makes sense; likely because I asked it with a fried brained. Let's try again.

Links have a molecule_meta attribute. When links are applied, the molecule_meta have to match the meta of the molecule. Because modifications are actually links, they also have this molecule_meta attribute. Therefore, they could be filtered when testing for isomorphism based on the matching with molecule.meta.

In the case of links, the molecule_meta allows to decorate the molecule based on the feature requested, and apply the links accordingly. It could work the same for modification, and it would make it consistent. Though I am not fully sure it makes sense.

vermouth/forcefield.py

pckroon · 2019-02-11T14:27:35Z

I ran into a problem with test_map_input.py. The semantics of backmapping files changed in that the blocks that are mapped should be in the relevant forcefield(s).

The best solution would be to make a force_fields fixture that defines what's needed, but that's quite a bit of rather tedious work. Do you have any blinding insight in how to fix it?

jbarnoud · 2019-02-11T14:59:55Z

Do you need any information in the forcefield? If you only need confirmation that the block exists, you could mock the forcefield with an object that just says OK.

pckroon · 2019-02-12T11:54:03Z

Unfortunately yes. I need Blocks with nodes with the correct 'atomname' attribute.

jbarnoud · 2019-02-13T10:03:33Z

From what I see, there is enough in the test cases to build a fake force field with the needed blocks. If you take the target mapping information, you have the name of the block, and the atom names in it. If you have a function that digest a mapping reference and outputs a minimally populated force field, you should be able to fix the tests. I would not make it a fixture, though, just a function that gets called in the tests.

pckroon · 2019-02-19T15:50:03Z

I'm done with this for now, it's getting way too large anyway.
I'll leave #154 and #153 open for the time being. I hardcoded half a solution to #154 so that resids are always kept.

jbarnoud

I did not have time to read as much as I want, but here are a couple of comments I managed to write between two interruptions.

Also, if you change how termini are handled, you should change it in all the martini-based force fields.

vermouth/tests/test_mapping_integrative.py

vermouth/forcefield.py

vermouth/molecule.py

jbarnoud · 2019-02-28T15:02:01Z

vermouth/molecule.py

 raise NotImplementedError

+ def __eq__(self, other):
+ # Should maybe be:
+ # return (isinstance(other, self.__class__) or isinstance(self, other.__class__))\


Comparing the classes directly seems better than the isinstance approach. It is always a question to know if a subclass should be considered equal to its parent, in this case I think it should not. The non-commented code looks better to me than the commented code, or even than the code currently in master.

I'm hedging on this. I guess we'll only really know once we need and make a subclass of e.g. the ChoicePredicate.
I suggest to leave the commented code there until then.

jbarnoud · 2019-03-01T10:30:41Z

Some thoughts I had this morning: I suggested to rename the old mapping files from *.map to *.backmap. The reasoning was that *.map would be better to keep for the format we intend to use on the long run, and that having to rename backward compatible files would push users to chack they did the needed compatibility changes.

In insight, though, the renaming is likely not a good idea. First, the .backmap extension is misleading: the files have nothing to do with backmapping. Then, we do have users now since Paulo directed the protein task force for Martini3 toward martinize2. The renaming means that we introduce a backward incompatible change to users that still need to be convinced about the benefits of the new version.

Based on these thoughts, I suggest to revert 7b75264 so that the old mapping files keep the .map extension, and to rename the new mapping files from .map to .mapping.

This way, we still have a suitable name for the new mapping format we intend to keep; but we do not break compatibility, and we do not introduce a misleading name. We do not encourage users to check for the compatibility change to do on the backward like files; though, this encouragement requires anyway to introduce some meaningful error messages that we still can implement.

pckroon · 2019-03-19T08:52:48Z

I know you're busy and all, but what more is needed to progress this PR?

jbarnoud · 2019-03-19T09:26:55Z

Last time I checked is was almost there. I need to give a new look to the termini, and to run in on real cases. I'll try to do that before the end of the week.

In the meantime, I can feel pylint complaining at the very least because of missing docstrings; so it would be good that you run pylint and please the mighty linter. Could you also have a look at what you are not covering in your patch? I'd aim for 100% coverage for the parser, and the new processors. The mapping is a big beast so I will not cry if the coverage is not complete, but try to still have a look.

pckroon · 2019-03-19T09:42:36Z

Part of the untested code is (abstract) base class methods, so that makes sense. I'll have a more detailed look at the coverage and pylint to see if I can squeeze out a few more percent :)

jbarnoud · 2019-03-19T10:16:43Z

You can mark the abstract method to be skipped by coverage. Would be cleaner than getting use to uncovered lines.

Stale

Tsjerk

This can't all be fully okay/optimal. But changes for improvement should probably be planned for future PRs now. One worry is that there are constraints posed by the file formats that are tricky to tackle later on without revising the file formats and possibly giving problems for backwards compatibility. However, I can't exactly put my finger on it now. One is to possibly reformat a label as AA1:CA in a broken up form, where AA1: sets a scope and the corresponding particles are on subsequent lines. Approving now. From here better to just carefully run over the working code after this PR...

- Implements PTM mapping in do_mapping. It's a greedy covering of everything not covered by normal mappings. - Implement a new parser for a new mapping file format in map_parser. This one allows for mappings to cross residue boundaries without wanting to hurt yourself. - Make sure blocks read from RTP get a 'resname' attribute - Make ForceField.modifications a dict of modification name: modification - Relax forcefield equality test - Make LinkPredicates match with other LinkPredicates Most of the changes are probably temporary Fix a bunch of docstrings and broken tests Add docstrings Improve some do_mapping comments Implement macro expansion for mappings In addition, add some sample mapping files, and a first bit of cleaning

Address most comments involving map_parser Finish parser infra; improve docstrings Make SectionLineParser pass the closed section to finalize_section Implement flexible mappings folders Update do_mapping tests Appease intersphinx Add tests for modern mapping file parser. Fix minor bugs/missing features Add simple test for multiple mappings per file Make node equality order independent. Will still cause issues for edges though, but I'm ok with that until it breaks.

Fix issue with reference atoms. Add attribute_must to do_mapping Attribute_must is there for those attributes that should be in the output graph, but can either come from the input molecule, OR the blocks; and they should be taken from the blocks preferentially.

Fix simple changes in map_input tests Make test_map_input new mapping style compliant

…of links

Also, propagate some minor changes to tests

Previously, interactions from modifications could not be added to the final molecule. Sem-Ver: bugfix Fix broken test

…iles Caused an issue where atoms could not be found because the nodes in the blocks did have the "correct" resid (1, 2, ...), but the associated identifiers would all have resid 1. Sem-Ver: bugfix

In addition, since references have no (sane) resid, that should be taken from residue being repaired when printing missing atoms. Sem-Ver: bugfix

Sem-Ver: bugfix

…ng files

Add tests for cover Make naming of mappings more consistent; add integrative test for mapping

pckroon added the WIP label Dec 19, 2018

pckroon requested a review from jbarnoud December 19, 2018 10:38

jbarnoud reviewed Dec 21, 2018

View reviewed changes

jbarnoud self-assigned this Jan 7, 2019

jbarnoud previously requested changes Jan 7, 2019

View reviewed changes

jbarnoud reviewed Jan 8, 2019

View reviewed changes

pckroon mentioned this pull request Jan 21, 2019

Implement ISMAGS #155

Merged

jbarnoud reviewed Feb 7, 2019

View reviewed changes

vermouth/forcefield.py Outdated Show resolved Hide resolved

pckroon force-pushed the mapping-issue165 branch from d6febf3 to e9c300d Compare February 13, 2019 15:31

pckroon mentioned this pull request Feb 19, 2019

Less scary disclaimer in the README #188

Merged

pckroon force-pushed the mapping-issue165 branch from 4de2735 to 6d85301 Compare February 19, 2019 10:54

pckroon removed the WIP label Feb 19, 2019

jbarnoud reviewed Feb 28, 2019

View reviewed changes

Tsjerk previously approved these changes Sep 13, 2019

View reviewed changes

pckroon dismissed Tsjerk’s stale review via d1265f4 September 13, 2019 14:36

pckroon force-pushed the mapping-issue165 branch from 7ee1ec7 to d1265f4 Compare September 13, 2019 14:36

pckroon and others added 20 commits September 13, 2019 16:55

Remove apply blocks processor. It's function is taken over by do_mapping

ee2dfbe

Improve attribute_match logic

f6f82a0

Change prints for debug messages.

1b7b483

Fix simple changes in map_input tests Make test_map_input new mapping style compliant

Adapt do_mapping for changes in #103

0218f5a

Pre-empt issues with PTMs that cross residues and modify existing nodes

b0f2e99

Hardcode temporary fix for #154; move termini to mod mapping instead …

1554e66

…of links

Address first comments

60344a5

Improve martini22 mod mappings, fix node modification by mod mapping

284fb9b

Implement a fragile solution to make -nt flag work again

e1e46b7

Also, propagate some minor changes to tests

Add test for interactions from modification mappings. Fix bug.

bd00838

Fix small bug in interaction propagation

099d1d4

Previously, interactions from modifications could not be added to the final molecule. Sem-Ver: bugfix Fix broken test

Fix resids automatically assigned to identifiers in parsing mapping f…

95d89db

…iles Caused an issue where atoms could not be found because the nodes in the blocks did have the "correct" resid (1, 2, ...), but the associated identifiers would all have resid 1. Sem-Ver: bugfix

Add some debug/progress output to repair_graph

0458cc6

In addition, since references have no (sane) resid, that should be taken from residue being repaired when printing missing atoms. Sem-Ver: bugfix

Make the PDB writer deal with missing res/atomname

af97e63

Make ff parser add interactions to modifications

40d6895

Sem-Ver: bugfix

Add friendly message for people that forgot to rename their backmappi…

5f5fef4

…ng files

Test for modification_matches, minor bugsquash in do_mapping

4ad62f7

Add tests for cover Make naming of mappings more consistent; add integrative test for mapping

pckroon force-pushed the mapping-issue165 branch from a47bd39 to 40d6895 Compare September 13, 2019 14:55

Remove remnant files that survived the rebase

a87b447

pckroon merged commit 1133126 into master Sep 13, 2019

pckroon deleted the mapping-issue165 branch June 3, 2020 09:46

pckroon restored the mapping-issue165 branch September 29, 2020 16:03

pckroon deleted the mapping-issue165 branch September 29, 2020 16:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New mapping file format and associated processor overhaul #172

New mapping file format and associated processor overhaul #172

pckroon commented Dec 19, 2018 •

edited

Loading

jbarnoud left a comment

jbarnoud Dec 21, 2018

pckroon Dec 21, 2018

jbarnoud left a comment

jbarnoud left a comment

jbarnoud Jan 8, 2019

pckroon Feb 8, 2019

jbarnoud Jan 8, 2019

pckroon Jan 14, 2019

pckroon commented Jan 14, 2019

jbarnoud commented Jan 14, 2019

pckroon commented Jan 14, 2019

jbarnoud commented Jan 25, 2019

jbarnoud commented Jan 30, 2019

pckroon commented Jan 30, 2019

jbarnoud commented Jan 30, 2019

pckroon commented Feb 11, 2019

jbarnoud commented Feb 11, 2019

pckroon commented Feb 12, 2019

jbarnoud commented Feb 13, 2019

pckroon commented Feb 19, 2019

jbarnoud left a comment

jbarnoud Feb 28, 2019

pckroon Mar 1, 2019

jbarnoud commented Mar 1, 2019

pckroon commented Mar 19, 2019

jbarnoud commented Mar 19, 2019

pckroon commented Mar 19, 2019

jbarnoud commented Mar 19, 2019

Tsjerk left a comment

		return out


		def map_modifications(molecule, graph_out, mol_to_out, out_to_mol):

New mapping file format and associated processor overhaul #172

New mapping file format and associated processor overhaul #172

Conversation

pckroon commented Dec 19, 2018 • edited Loading

jbarnoud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbarnoud left a comment

Choose a reason for hiding this comment

jbarnoud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pckroon commented Jan 14, 2019

jbarnoud commented Jan 14, 2019

pckroon commented Jan 14, 2019

jbarnoud commented Jan 25, 2019

jbarnoud commented Jan 30, 2019

pckroon commented Jan 30, 2019

jbarnoud commented Jan 30, 2019

pckroon commented Feb 11, 2019

jbarnoud commented Feb 11, 2019

pckroon commented Feb 12, 2019

jbarnoud commented Feb 13, 2019

pckroon commented Feb 19, 2019

jbarnoud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbarnoud commented Mar 1, 2019

pckroon commented Mar 19, 2019

jbarnoud commented Mar 19, 2019

pckroon commented Mar 19, 2019

jbarnoud commented Mar 19, 2019

Tsjerk left a comment

Choose a reason for hiding this comment

pckroon commented Dec 19, 2018 •

edited

Loading