-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export to additional population genetics formats #10
Comments
I have written an |
Another format which might be appreciated by plant breeders is HapMap. To the best of my knowledge, there isn't software that can import a HapMap file with polyploid genotypes, but the format has the advantage of being easily comprehensible to the human eye. A tetraploid SNP genotype could for example be represented as |
There are also requests for export to GenoDive and to |
Overview
There has been some interest in using genotypes called by polyRAD for population genetics analysis. Two pieces of software that have long supported polyploid genotypes are Structure and SPAGeDi. Although they were originally designed for relatively small sets of PCR markers, they both scale up to large sequence-based datasets. They each use a custom format, and it would be nice if polyRAD exported to those formats.
I may eventually get to writing these export functions. If you would like to see them (or others), feel free to comment here! If you have some R experience and would like to write the functions yourself, see below. First-time contributors are welcome!
Contributing
You will need to make your own fork of polyRAD, clone that fork to your computer, commit your changes, push those commits to GitHub, and open a pull request. See the Github documentation if you need help understanding those tasks.
R code
Name the function
Export_Structure
orExport_SPAGeDi
, as appropriate. Add it to the fileR/data_export.R
.The first argument for the function should be a
RADdata
object namedobject
. There should also be an argument calledfile
indicating the path to write to. Another possible argument might indicate a particular ploidy to use for export; see other export functions, and check the documentation for Structure and SPAGeDi to see if variable ploidy is allowed.The function will need to call
GetProbableGenotypes
withomit1allelePerLocus = FALSE
andmultiallelic = "correct"
. This will get you a list, the first item of which is a matrix with individuals and rows and alleles in columns, showing the copy number of that allele in that individual. This will then need to be converted to a different format, where each allele is assigned an integer, and the individual has a number of alleles up to its ploidy. For example:Gets changed to
Your function will make use of
object$alleles2loc
in order to group columns of the matrix output byGetProbableGenotypes
into loci. A loop in R to process loci might look like:See the
MakeGTstrings
internal function insrc/PrepVCFexport.cpp
. For SPAGeDi in particular this function could simply have an argument added to it to start numbering from 1 rather than 0. For Structure,MakeGTstrings
should probably not be used directly but can provide some inspiration for what needs to be done.See the Structure or SPAGeDi documentation for details on file format needed. Your function should export a text file in that format.
Test the function on
exampleRAD
after runningIterateHWE
. Install Structure or SPAGeDi on your computer and see if it can import the file.Documentation and integration
man/ExportGAPIT.Rd
to document your function.export
call inNAMESPACE
README.md
vignettes/polyRADtutorial.Rmd
.vignettes/render_vignettes.R
, withvignettes
as the working directory, to recompile the vignette.DESCRIPTION
file! Use the "ctb" role code inAuthors@R
andAuthor
.R CMD build
andR CMD check
to make sure the package builds and passes checks.The text was updated successfully, but these errors were encountered: