-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assembly Step for new pipeline kernel: Questions and Strategies #1676
Comments
Basically, I think the thing to argue against is as follows: |
We definitely could eyeball it and figure it out. But that's only because of how the names historically happen to line up: paint_mgi goes into mgi. If we did it this way, then the name would convey real semantic meaning. Which is fine if we want to do that, but I think I feel mild discomfort about it? Maybe it just feels brittle. But I'm definitely not opposed. We'd have to document this fact somewhere. Although, now that I'm saying this, we are the ones that control all the "mixins", so the naming convention is mostly on us anyway. I'm less discomforted by that since realistically we will mostly control the mix-in sources. |
In geneontology/pipeline#206 we're making steps to reform the pipeline kernel. Currently, @dustine32 and I are working on the Assembly step ("shovel2pile") which should take "pristine", validated annotations in gpad+gpi format and merge any mixin gpads into the final produce.
For example, we have mgi and paint_mgi. At the end of the run, a validated paint_mgi will be merged into a validated mgi, and their corresponding headers will also be joined, to produce the final mgi dataset product.
Here we discuss various strategies for this:
Final <dataset> = Sum[<dataset>.header, <mixin0>.header, <mixin1>.header, ...] + Sum[<dataset>, <mixin0>, <mixin1>, ...]
merges_into: mgi
.mgi_valid -> mgi; paint_mgi_valid -> paint_mgi; <mixin>_<dataset>
<mixin>_<dataset>
<dataset>
matches an existing source, namely "mgi".<dataset>
part of the name corresponds to an existing file in "pristine". If it does, then we have a<dataset>
, and a<mixin>_<dataset>
match.<group>_<dataset>
, look in<group>.yaml
for a<dataset>
entry, and if itmerges_into: <dataset>
. If so, we can confirm that this mixin should merge into the given dataset name."has_mixin": ["paint_mgi"]
The text was updated successfully, but these errors were encountered: