Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

soft_to_angle theory #15

Open
TrentBrick opened this issue Apr 18, 2019 · 4 comments
Open

soft_to_angle theory #15

TrentBrick opened this issue Apr 18, 2019 · 4 comments

Comments

@TrentBrick
Copy link

Would it be possible to share a link to the research or reasoning behind the soft_to_angle Module for someone new to structural protein problems?

My current hunch is that you have run a mixture model on the pfam database and found the average angle conformations of the different families. You then use a LogSoftmax activation function to allow each amino acid to choose which of the omega, psi and phi angles it wants from this table of options. You then take these values and use sin, cos and arctan to convert them into angles?
Why does the mixture model have 500 clusters, how was the mixture_model table generated, and why is there a 90:10 pos/neg omega ratio that is then randomly shuffled in?

Again I am a noob so pointers to any papers or other grounded reasoning for this approach would be really appreciated.

@TrentBrick
Copy link
Author

TrentBrick commented Apr 19, 2019

Along similar lines in preprocessing.py you take the ProteinNet tertiary data which is in coordinate format, and then convert it into angles and then back to coordinates again. Why?

Starting from line 132
angles, batch_sizes = calculate_dihedral_angles_over_minibatch(pos, [len(prim)], use_gpu=use_gpu) tertiary, _ = get_backbone_positions_from_angular_prediction(angles, batch_sizes, use_gpu=use_gpu) tertiary = tertiary.squeeze(1)

@TrentBrick
Copy link
Author

Any further insight on this would be really appreciated!

@JeppeHallgren
Copy link
Collaborator

For inspiration to model design, you're probably best off by reading https://www.cell.com/cell-systems/fulltext/S2405-4712(19)30076-6
In preprocessing.py we're currently converting to angles and back-again to ensure the distance between amino acids is exactly the ones we use the pnerf module. Going from coordinations -> angles -> coordinates should give back exactly the same coordinates. However, the original coordinates (measured) can contain some noise, so this is essentially a preprocessing step to remove it.

@TrentBrick
Copy link
Author

TrentBrick commented Apr 22, 2019

Thanks, I read this paper a while ago and didn't remember there being the right side of figure 2 with the "torsional alphabet", it may have been added in a later edition.

I still don't see any information about using a mixture model or in the RGN github repo any actual mixture model angles (you have three different files for these). Did you generate these yourself or correspond with AlQuraishi to get them?

And the preprocessing.py noise removal makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants