VariantCalling

This package provides a set of tools to assist variant calling on Ultima data. The best practice pipeline is published here. The code below is used mostly in the post-GATK filtering step.

In addition, the code provides

Tools to perform evaluation of the callset relative to the ground truth.
Tools to perform building a database of noisy locaitons (SEC) and filtering callset relative to them - still undocumented
Set of tools for MRD (minimal residual disease) - still undocumented.
Tools to perform germline CNV calling : germline_cnv_calling

Setup

Make sure git-lfs is installed on your system if you want to clone test resources along with the code (https://git-lfs.github.com/)
Clone VariantCalling repository to e.g. software/VariantCalling
Create the three conda environments:
- conda env create -f setup/environment.yml
- conda env create -f setup/other_envs/ucsc.yml
- conda env create -f setup/other_envs/cutadapt.yml
- conda env create -f setup/other_envs/cnmops.yml
Activate the main conda environment
- conda activate genomics.py3
Install the package
- cd software/VariantCalling
- pip install .

Using ugvc package

Run through cli

To get a list of available cli tools:

python /path/to/ugvc

To run a specific tool:

python /path/to/ugvc <tool_name> <args>

Run individual tools not through CLI

coverage_analysis:
	Run full coverage analysis of an aligned bam/cram file

evaluate_concordance:
	Calculate precision and recall for compared HDF5

filter_variants_pipeline:
	POST-GATK variant filtering

run_comparison_pipeline:
	Concordance between VCF and ground truth

train_models_pipeline:
	Train filtering models

Documentation of individual tools:

Train post-calling model: train_models_pipeline
Filter callset using pre-trained ML model: filter_variants_pipeline
Compare callset to ground truth: run_comparison_pipeline
Coverage bias analyses: coverage_analysis
Evaluation of compared callsets: evaluate_concordance

Howtos

Test

Recommended way to run tests for external users

./run_tests.sh

This script will validate that test resources were correctly cloned, and only then run tests

Run all tests

python -m pytest

Notice that test_db_access needs your machine to have access credentials to mongoDB. To ignore this test, run:

python -m pytest --ignore test/unit/test_db_access.py

Run unit-tests

python -m pytest test/unit

Run system-tests

python -m pytest test/system

Git-lfs

Whenever commiting a data-file to the repo, check that it's suffix is tracked by git-lfs in .gitattributes If not, add the new suffix to the .gitattributes file before adding the data-file and commiting it. Also make sure to commit .gitattributes itself.

git-lfs track "*.new_suffix"

Development guidelines

Always develop on a branch, not on master
Public functions/classes should be tested, using either pytest or unittest syntax
commit and push your changes to that branch on the remote repo
Open a pull-request through github
1. Add at least one code reviewer
2. Wait for CI tests to pass (green V sign)
scripts that you want to be available on the path should be added to setup.py
scripts that you want to be available to ugvc should be added to __main__.py
Code changes should pass all pre-commit hooks

How To pre-commit

pre-commit hooks are configured within .pre-commit-config.yaml

install: https://pre-commit.com/#installation

After pre-commit package is installed, you need to set git hooks scripts:

pre-commit install
pre-commit install -t pre-commit

After the installation it will run the pre-commit hooks for all files changed as part of the commit. This should look like this, notice mostly the red "Failed" issues that you must fix, the pre-commit verifies the fix before enables the commit:

trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check json...........................................(no files to check)Skipped
check for added large files..............................................Passed
pycln....................................................................Passed
isort....................................................................Passed
black....................................................................Passed
flake8...................................................................Passed
pylint...................................................................Passed
[master 9a1a910e] Test pre-commit
 1 file changed, 1 deletion(-)

For running all pre-commit hooks on all files (used for initial pre-commit run) use: pre-commit run --all-files

The hooks we use are:

pycln - remove unused import statements

isort - Python utility library to sort imports alphabetically, and automatically separated into sections and by type

black - uncompromising Python code formatter

flake8 - python coding style guide for PEP8

pylint - python static code analysis tool

Name		Name	Last commit message	Last commit date
Latest commit History 1,808 Commits
.github/workflows		.github/workflows
Third Party Licenses		Third Party Licenses
docs		docs
setup		setup
test		test
ugvc		ugvc
.dockerignore		.dockerignore
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
Dockerfile.jbvc		Dockerfile.jbvc
Dockerfile.ppmseq		Dockerfile.ppmseq
Dockerfile_genomics		Dockerfile_genomics
LICENSE		LICENSE
README.md		README.md
Ultima UGVC - EULA.pdf		Ultima UGVC - EULA.pdf
build_vc_docker.sh		build_vc_docker.sh
pyproject.toml		pyproject.toml
run_tests.sh		run_tests.sh
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VariantCalling

Setup

Using ugvc package

Run through cli

Run individual tools not through CLI

Documentation of individual tools:

Howtos

Test

Recommended way to run tests for external users

Run all tests

Run unit-tests

Run system-tests

Git-lfs

Development guidelines

How To pre-commit

The hooks we use are:

About

Releases 17

Packages

Contributors 23

Languages

License

Ultimagen/VariantCalling

Folders and files

Latest commit

History

Repository files navigation

VariantCalling

Setup

Using ugvc package

Run through cli

Run individual tools not through CLI

Documentation of individual tools:

Howtos

Test

Recommended way to run tests for external users

Run all tests

Run unit-tests

Run system-tests

Git-lfs

Development guidelines

How To pre-commit

The hooks we use are:

About

Resources

License

Stars

Watchers

Forks

Releases 17

Packages 0

Contributors 23

Languages

Packages