Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data upload trait mismatch/duplication #28

Open
BoMeyering opened this issue Mar 30, 2023 · 5 comments
Open

Data upload trait mismatch/duplication #28

BoMeyering opened this issue Mar 30, 2023 · 5 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@BoMeyering
Copy link

The Issue

When importing data from a Germinate upload template, traits that share the same name do not auto merge with the existing traits in the database, causing a user to have to run trait unification for each trait they uploaded.

Additional Quirks

  • Using BrAPI import in Gridscore to import traits works fine. When exporting data from Gridscore, the Germinate upload template only includes the name and the data type of the trait, none of the trait metadata
  • When defining a new trait in Gridscore with categorical data, the trait exports with the discrete categories in a comma separated format like hi,med,lo instead of the correct format required by Germinate uploads, ["hi","med","lo"].

Proposed Solution

Can Germinate have a name checking feature during upload which checks to see if the name, data type, and categories (if applicable) of traits match and then auto merge them? And if no matches are found, it would then upload it as a new trait.

@sebastian-raubach
Copy link
Member

Germinate will check for exact matches on the trait name, short name, description, data type unit and restrictions.
If no exact match exists, a new one is created. This is a very strict check and I can see this being relaxed a bit, but we need to be careful not to match on too few fields. A trait with a unit of "kg" should not be handled the same as a trait with the same name, but unit of "g".

Having the user pick matches during the data upload would not fit into the current way data templates are uploaded and checked.

I think this needs some fine-tuning to make sure that at least data imported into GridScore from Germinate via BrAPI will not create new trait definitions when uploading data templates exported from GridScore at a later date.

@sebastian-raubach sebastian-raubach added bug Something isn't working enhancement New feature or request labels Apr 4, 2023
@sebastian-raubach sebastian-raubach changed the title Issue: Trait Auto-Merging Data upload trait mismatch/duplication Apr 4, 2023
@BoMeyering
Copy link
Author

I agree, this should be a strict check, I just don't know the workaround for it in the meantime.
If it checks all of those data fields, then that should be sufficient. However, I guess the problem we are running into is that when we Brapi Import the traits into Gridscore, it only imports the trait name and data type (and categories if applicable) but not the short name nor the description. I'll have to do some tests to see if adding those back in to the upload template will facilitate the trait merges. It would be helpful to have those other fields auto-populate the upload template when exporting data from Gridscore.

@sebastian-raubach
Copy link
Member

I think they aren't send, because the BrAPI definition of a variable doesn't have the short name and description. Looking at the spec, though, the variable contains a trait, which has them, so there may be potential do submit them properly.

@BoMeyering
Copy link
Author

Oh that makes sense, I didn't realize the api doesn't pull in the other fields. Yeah, the Brapi (and Crop Ontology for that matter) specs for these variables are a little different than how traits are defined in Germinate.

@BoMeyering
Copy link
Author

Also, this gets complicated even when uploading traits that have exactly all of the same information. I just uploaded three test datasets for the same Trial with data taken on different dates for two traits. The traits had no short name, or description, just the name, data type and units. It still uploads them as separate traits that need to be unified.
image
It seems like these kinds of exact matches should be merged too, even when some information is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants