Skip to content

Ultima scripts related to downstream processing of WGS, MRD, methylation analysis


Notifications You must be signed in to change notification settings


Repository files navigation


Code style: black

This package provides a set of tools to assist variant calling on Ultima data. The best practice pipeline is published here. The code below is used mostly in the post-GATK filtering step.

In addition, the code provides

  • Tools to perform evaluation of the callset relative to the ground truth.
  • Tools to perform building a database of noisy locaitons (SEC) and filtering callset relative to them - still undocumented
  • Set of tools for MRD (minimal residual disease) - still undocumented.
  • Tools to perform germline CNV calling : germline_cnv_calling


  1. Make sure git-lfs is installed on your system if you want to clone test resources along with the code (

  2. Clone VariantCalling repository to e.g. software/VariantCalling

  3. Create the three conda environments:

    • conda env create -f setup/environment.yml
    • conda env create -f setup/other_envs/ucsc.yml
    • conda env create -f setup/other_envs/cutadapt.yml
    • conda env create -f setup/other_envs/cnmops.yml
  4. Activate the main conda environment

    • conda activate genomics.py3
  5. Install the package

    • cd software/VariantCalling
    • pip install .

Using ugvc package

Run through cli

To get a list of available cli tools:

python /path/to/ugvc

To run a specific tool:

python /path/to/ugvc <tool_name> <args>

Run individual tools not through CLI

	Run full coverage analysis of an aligned bam/cram file

	Calculate precision and recall for compared HDF5

	POST-GATK variant filtering

	Concordance between VCF and ground truth

	Train filtering models

Documentation of individual tools:



Recommended way to run tests for external users


This script will validate that test resources were correctly cloned, and only then run tests

Run all tests

python -m pytest

Notice that test_db_access needs your machine to have access credentials to mongoDB. To ignore this test, run:

python -m pytest --ignore test/unit/

Run unit-tests

python -m pytest test/unit

Run system-tests

python -m pytest test/system


Whenever commiting a data-file to the repo, check that it's suffix is tracked by git-lfs in .gitattributes If not, add the new suffix to the .gitattributes file before adding the data-file and commiting it. Also make sure to commit .gitattributes itself.

git-lfs track "*.new_suffix"

Development guidelines

  1. Always develop on a branch, not on master
  2. Public functions/classes should be tested, using either pytest or unittest syntax
  3. commit and push your changes to that branch on the remote repo
  4. Open a pull-request through github
    1. Add at least one code reviewer
    2. Wait for CI tests to pass (green V sign)
  5. scripts that you want to be available on the path should be added to
  6. scripts that you want to be available to ugvc should be added to
  7. Code changes should pass all pre-commit hooks

How To pre-commit

pre-commit hooks are configured within .pre-commit-config.yaml


After pre-commit package is installed, you need to set git hooks scripts:

pre-commit install
pre-commit install -t pre-commit

After the installation it will run the pre-commit hooks for all files changed as part of the commit. This should look like this, notice mostly the red "Failed" issues that you must fix, the pre-commit verifies the fix before enables the commit:

trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check json...........................................(no files to check)Skipped
check for added large files..............................................Passed
[master 9a1a910e] Test pre-commit
 1 file changed, 1 deletion(-)

For running all pre-commit hooks on all files (used for initial pre-commit run) use: pre-commit run --all-files

The hooks we use are:

pycln - remove unused import statements

isort - Python utility library to sort imports alphabetically, and automatically separated into sections and by type

black - uncompromising Python code formatter

flake8 - python coding style guide for PEP8

pylint - python static code analysis tool