Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(re-)Implement GMX RTP parser using lark-parser #359

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pckroon
Copy link
Member

@pckroon pckroon commented Apr 8, 2021

This PR (re) implements the RTP parser using the lark-parser parsing framework, where the file grammar is separated from the interpretation.
If we like this style we'll use this to parse FF files as well, after we redo that file format.

This PR is a draft because it's main purpose is to foster some discussion. It also requires more extensive testing. On top of that there is one TODO left regarding the generation of 1-4 pairs between hydrogens, and docstrings are missing.

@fgrunewald what's your opinion/take on this?
The main advantages/disadvantages that I see are the following:
+ The grammar and syntax are explicitly defined, which helps documentation
+ It's faster than the SectionLineParser (but I haven't quantified this)
+ When we implement the FF format as well we can start inheriting/including grammar from the other files, which makes the syntax more uniform
+ It'll allow us to write more complex things for e.g. the JSON parts of the FF format, relaxing the need for quotes and such. See #175
- We need to redo the parsers. Again.
- The interpreter logic is not always as clearly depicted as with the sectionlineparser, but that may also be because I'm not good at working with abstract syntax trees.

@pckroon pckroon requested a review from fgrunewald April 8, 2021 10:07
@fgrunewald
Copy link
Member

@pckroon

Thanks for sharing this LARK version of the RTP parser. In general it seems more complicated to use than the SectionLineParser, but also I find it more clear in general as a language. However, I think the deciding factor will be if we can do the gromacs specific syntax with LARK as well. That we have to see because RTP ironically is the most easy to parse file format of gromacs in my opinion.

Having said that, could you run one or two atomsitic FFs on lysozyme using PDB2GMX and compare to what you get with martinize using this LARK parser? I think it is good to get a feeling on how much is missing.

@pckroon
Copy link
Member Author

pckroon commented Apr 20, 2021

because RTP ironically is the most easy to parse file format of gromacs in my opinion

I don't agree I think. The semantics of the format are super unclear due to missing documentation. It's also the format that we currently can't do with the SLP.

Having said that, could you run one or two atomsitic FFs on lysozyme using PDB2GMX and compare to what you get with martinize using this LARK parser? I think it is good to get a feeling on how much is missing.

Good idea, I'll see if I can get around to that this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants