Applying polygenic scores (PGS) on imputed genotypes
- command line program (works on linux or MacOS)
- supports vcf.gz files (imputed or genotyped)
- supports different filters (e.g. r2 or variant list)
- supports PGS Catalog format (https://www.pgscatalog.org, currently over 2,000 scores)
- creates an interactive html report
- supports liftover of score files
- supports converting rsIDs to positions
- Download
pgs-calc-*.tar.gz
from latest release - Extract the downloaded archive (e.g
tar -xf pgs-calc-*.tar.gz
) - Validate installation with
pgs-calc --version
Applying polygenic scores (PGS) on imputed genotypes:
pgs-calc apply --ref PGS000018 --out PGS000018.scores.txt chr*.dose.noID.vcf.gz --report-html PGS000018.html
The weights for score PGS000018
are downloaded automatically from PGSCatalog and all scores are written to file PGS000018.scores.txt
. An interactive report html report is created.
--ref <file(s) or PGS-ID>
- score file with weights or a PGS ID. Multiple scores are separated by,
(.e.gscore1.txt.gz,score2.txt.gz
orPGS000018,PGS000027
)--out <file>
- Output file name
--minR2 <value>
- Use only variants with an imputation quality (R2) >=<value>
--writeVariants <file>
- Writes csv file with all variants used in calculation--includeVariants <file>
- Restrict calculation to use only variants from this csv file--genotypes GT|DS
- Use genotypes or dosage--report-html <file>
- Creates an interactive html report. The report includes summary statistics (like coverage) for each score and can be filtered by e.g. id or trait.--samples
- Restrict calculation to use only samples from this csv file--meta <file>
- Use this meta file to annotate scores
- VCF file format (
*.vcf
and*.vcf.gz
) - one VCF file per chromosome (e.g. output of Imputationserver)
- works out of the box with imputed genotypes from Michigan Imputation Server
pgs-calc
supports PGSCatalog out of the box: open the website, find your score of interest and download the provided txt.gz
files.
As pgs-calc
works with chromosomal positions and not with marker ids, the following requirements must be fulfilled:
- The build of your genotypes and the build of the score must be the same. If the score is on a different build, you can use the
pgs-calc resolve
command to lift over to the build of the genotypes. - The score file needs
chr_name
andchr_position
columns. If there is onlyrsID
present, you need to set the parameter--dbsnp
and the correct index to convert rsIDs on the fly to the correct chromosomal positions. Depending on the build of your genotypes (hg19 or hg38) you can download the dbsnp-index from here. - The column
other_allele
is mandatory to handle multi-allelic variants in an unified way.
If you want to create your own weight files, you need a tab-delimited text file with the following columns:
chr_name chr_position effect_allele other_allele effect_weight
Apply PGS to a single file (e.g. one chromosome):
pgs-calc apply --ref PGS000018.txt.gz test.chr1.vcf.gz --out scores.txt
All scores are written to file scores.txt
Apply PGS to multiple files (e.g. multiple chromosomes):
pgs-calc apply --ref PGS000018.txt.gz test.chr1.vcf.gz test.chr2.vcf.gz test.chr3.vcf.gz test.chr4.vcf.gz --out scores.txt
Apply PGS to multiple files by using file patterns:
pgs-calc apply --ref PGS000018.txt.gz test.chr*.vcf.gz --out scores.txt
Apply multiple score files:
pgs-calc apply --ref PGS000018.txt.gz,PGS000027.txt.gz test.chr*.vcf.gz --out scores.txt
You can also create a file scores_filenames.txt
that lists all paths to your score files:
scores
PGS000018.txt.gz
PGS000027.txt.gz
pgs-calc apply --ref scores_filenames.txt test.chr*.vcf.gz --out scores.txt
Attention: All paths inside the file are relative to the location of the file itself.
Use only variants with an imputation quality (R2) >= 0.9:
pgs-calc apply --ref PGS000018.txt.gz test.chr*.vcf.gz --minR2 0.9 --out scores.txt
If a PGS id is provided, pgs-calc downloads the file from PGSCatalog automatically:
pgs-calc apply --ref PGS000018 test.chr1.vcf.gz --out scores.txt
All scores are written to file scores.txt
.
You can also use the download
command to download a specific PGS id:
pgs-calc download PGS000018 --out PGS000018.txt.gz
The weights are saved in file PGS002297.txt.gz
.
If the --dbsnp
parameter is set, pgs-calc converts on the fly all rsID automatically to their positions. Depending on the build of your genotypes (hg19 or hg38) you can download the dbsnp index from here.
pgs-calc apply --ref PGS002297 test.chr1.vcf.gz --out scores.txt --dbsnp dbsnp154_hg19.txt.gz
All scores are written to file scores.txt
The build of your genotypes and the score must be the same. If the score is on a different build, you can use the pgs-calc resolve
command to lift over the score file to the build of the genotypes. You need a dbsnp-index file and a chain file.
pgs-calc resolve --in PGS002297 --out PGS002297.hg38.txt.gz --dbsnp dbsnp154_hg38.txt.gz --chain hg19_to_hg38.over.chain.gz
The new positions are written to file PGS002297.hg38.txt.gz
and this file can the be used by pgs-calc apply
.
- dbsnp-index files to resolve rsIDs: https://imputationserver.sph.umich.edu/resources/chain/
- Chain files: https://imputationserver.sph.umich.edu/resources/chain/
Lukas Forer, Institute of Genetic Epidemiology, Medical University of Innsbruck