Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing documentary in regard to SNP input files #11

Open
Balthasar-eu opened this issue Apr 28, 2021 · 1 comment
Open

Confusing documentary in regard to SNP input files #11

Balthasar-eu opened this issue Apr 28, 2021 · 1 comment

Comments

@Balthasar-eu
Copy link

Hi,

I am interested in the tool, but find parts of the documentation somewhat confusing.

Could you maybe clarify the difference between these options:

--siteVCF
--candidateVCF
--predefinedVCF
--dbSNP
--dbsnpVCF
--SVDPrefix
--RefVCF
--RefVCFList

And the difference between:
--callableRegion
and
--regionList

siteVCF (for candidate sites in the wiki) and predefinedVCF (for predefined sites) are used in the index function, so I assume they contain the SNPs to build the reduced region as written in the paper, which is also filtered by either callableRegion and regionList, but what exactly is the difference between candidates and predefined sites and what is the difference between target region and callable region?

In the main Readme file you use dbSNP, in the wiki you have dbsnpVCF as option. Is that a type or are there two options. If there is only one option, how is this VCF file handled differently from the candidate or predefined sites?

Also in the examples the candidateVCF option is used, which is not found in the wiki.
The same goes for RefVCFList.

I would appreciate if you could take the time explain what the each option does.

@Griffan
Copy link
Owner

Griffan commented Apr 30, 2021

I will later update the readme to better describe the differences. But now I will give you a brief description:

First of all, there are two layers of cmdline, one is for the wrapper FASTQuick.sh the other is for the underlying binary executable programs.

It’s recommended to stick with FASTQuick.sh arguments which is simplified based on binary executables' arguments.
but if you are interested in the binary executables, you can always use "$program_name —help” for detailed description.

Now back to your question

—siteVCF specifies a variant list you want to choose your variant from, it is the "variant pool”
—predefinedVCF specifies a variant list that you exactly want, which could be the result of previous rounds’ chosen variants

—candidateVCF specifies a variant list, which will feed either —siteVCF or —predefinedVCF depending on the —step arguments, unless user has customized tasks, regular users don't need to interact with "—siteVCF or —predefinedVCF"

—dbSNP specifies the dbSNP variant list, it tells the program if a certain variant is categorized in dbsnp database

—SVDPrefix specifies the prefix of SVD resource files, either it can be any version of resource files from the resource directory in the repo, or it can be the result(resource files, SVD files) of previous rounds

—RefVCF(single file) and —RefVCFList(multiple files) specifies the reference genotype vcf panel, ideally something like 1000 genome project vcfs, which provides the genotype matrix for SVD step.

—callableRegion indicates which region is easy to align and call genotype based on reference genome content

—regionList indicates which region is the user desired or the dataset is limited to

Hope this answer your question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants