This package provides a set of tools to assist variant calling on Ultima data. The best practice pipeline is published here. The code below is used mostly in the post-GATK filtering step.
In addition, the code provides
- Tools to perform evaluation of the callset relative to the ground truth.
- Tools to perform building a database of noisy locaitons (SEC) and filtering callset relative to them - still undocumented
- Set of tools for MRD (minimal residual disease) - still undocumented.
- Tools to perform germline CNV calling : germline_cnv_calling
-
Make sure git-lfs is installed on your system if you want to clone test resources along with the code (https://git-lfs.github.com/)
-
Clone VariantCalling repository to e.g.
software/VariantCalling
-
Create the three conda environments:
conda env create -f setup/environment.yml
conda env create -f setup/other_envs/ucsc.yml
conda env create -f setup/other_envs/cutadapt.yml
conda env create -f setup/other_envs/cnmops.yml
-
Activate the main conda environment
conda activate genomics.py3
-
Install the package
cd software/VariantCalling
pip install .
To get a list of available cli tools:
python /path/to/ugvc
To run a specific tool:
python /path/to/ugvc <tool_name> <args>
coverage_analysis:
Run full coverage analysis of an aligned bam/cram file
evaluate_concordance:
Calculate precision and recall for compared HDF5
filter_variants_pipeline:
POST-GATK variant filtering
run_comparison_pipeline:
Concordance between VCF and ground truth
train_models_pipeline:
Train filtering models
- Train post-calling model: train_models_pipeline
- Filter callset using pre-trained ML model: filter_variants_pipeline
- Compare callset to ground truth: run_comparison_pipeline
- Coverage bias analyses: coverage_analysis
- Evaluation of compared callsets: evaluate_concordance
./run_tests.sh
This script will validate that test resources were correctly cloned, and only then run tests
python -m pytest
Notice that test_db_access needs your machine to have access credentials to mongoDB. To ignore this test, run:
python -m pytest --ignore test/unit/test_db_access.py
python -m pytest test/unit
python -m pytest test/system
Whenever commiting a data-file to the repo, check that it's suffix is tracked by git-lfs in .gitattributes If not, add the new suffix to the .gitattributes file before adding the data-file and commiting it. Also make sure to commit .gitattributes itself.
git-lfs track "*.new_suffix"
- Always develop on a branch, not on master
- Public functions/classes should be tested, using either pytest or unittest syntax
- commit and push your changes to that branch on the remote repo
- Open a pull-request through github
- Add at least one code reviewer
- Wait for CI tests to pass (green V sign)
- scripts that you want to be available on the path should be added to
setup.py
- scripts that you want to be available to
ugvc
should be added to__main__.py
- Code changes should pass all pre-commit hooks
pre-commit hooks are configured within .pre-commit-config.yaml
install: https://pre-commit.com/#installation
After pre-commit package is installed, you need to set git hooks scripts:
pre-commit install
pre-commit install -t pre-commit
After the installation it will run the pre-commit hooks for all files changed as part of the commit. This should look like this, notice mostly the red "Failed" issues that you must fix, the pre-commit verifies the fix before enables the commit:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check json...........................................(no files to check)Skipped
check for added large files..............................................Passed
pycln....................................................................Passed
isort....................................................................Passed
black....................................................................Passed
flake8...................................................................Passed
pylint...................................................................Passed
[master 9a1a910e] Test pre-commit
1 file changed, 1 deletion(-)
For running all pre-commit hooks on all files (used for initial pre-commit run) use: pre-commit run --all-files
pycln - remove unused import statements
isort - Python utility library to sort imports alphabetically, and automatically separated into sections and by type
black - uncompromising Python code formatter
flake8 - python coding style guide for PEP8
pylint - python static code analysis tool