-
-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI support for graph transformation 'pipelines' #195
Comments
Hi @cimranm great suggestion! To address your immediate problem, I think you can try just passing the filenames (no extension) as the With respect to pipelining, I think this would be a great feature (and not too tricky to implement). It should be quite straightforward to write a parser for the transformation functions from a Yaml file (see: https://stackoverflow.com/questions/67442071/passing-python-functions-from-yaml-file). I can provide some support and help implement some of this if you're keen to build this feature. I don't have the bandwidth at the moment to pick this up on my own though. |
Sure, I've already built something like this for my own use case so would be happy to figure out an elegant way to make it generalisable and add it to the graphein CLI. Will let you know if I'm stuck! |
I'm aiming to generate protein graphs in bulk in order to then perform unsupervised clustering on them. I would also like to repeat this process on several different proteomes.
I would also like to apply several intermediate steps (e.g. select subgraph of radius
r
for each graph; select subgraph of thresholdrsa
)So far, I have seen that
ProteinGraphDataset
retrieves PDB files from a list ofid
s (either UniProt or PDB accession codes) and downloads from PDB or AF2, and the 'intermediate steps' can be achieved by supplying functions to thegraph_transformation_funcs
parameter.However, I would like to use a subset of a proteome (list of IDs) and an already existing set of
.pdb
files in a directory (as opposed to downloading them again). Would it be possible for a more elegant solution to exist in a similar fashion to the existing command line interface?I was thinking that some sort of 'pipeline' could be written as a CLI command, perhaps by providing
config.yml
file for graph constructiongraph_processing.yml
file detailing a list of functions to apply (e.g. subgraph selection)This is just my naive idea for now, I haven't fleshed out exactly how it would work; but maybe a way to describe 'transformations' in a
processing.yml
file in a similar way to theProteinGraphConfig
parser?I think a framework that allows people to script pipelines (like the one I am trying to make) from the command line would allow for ease of experimentation and simplicity, compared to making it all in python using the low-level functions.
Would appreciate any thoughts on this!
The text was updated successfully, but these errors were encountered: