MyVCF2 is a tool that enables the storage, loading, querying and analysis of mutational data (SNPs and indels) stored in VCF files. The tool implements a simple django interface with a mysql3 database and secure access (login). The VCF files can be stored in the database and they can be opened and browsed once the user has logged in to the application. The tool can be deployed either locally or in a dedicated server (See deployment).
This tool is a modified version of the code present in https://github.com/apietrelli/myVCF/
The tool requires well formed VCF files that have been annotated with either Annovar, VEP or snpEff. If multiple samples are available, it is recommended to merge them into a single VCF file. The tool will recognize the different samples and show them as different columns.
The following fields are required for each of the annotation tools:
- VEP = CSQ, symbol and consequence
- Annovar = gene_refgene, gene_ensgene and exonicfunc_ensgene
- snpEff = gene_name, gene_id, ANN and annotation
Instructions on how to deploy the tool with Anaconda:
git clone https://github.com/jfnavarro/myVCF.git
cd myVCF
conda env create -f environment.yml
conda activate myvcf
# set to 0 for production
export DEBUG=1
export SECRET_KEY=yourkey
export DJANGO_ALLOWED_HOSTS="localhost 127.0.0.1 [::1]"
python manage.py runserver
# Open the internet browser and go to https://localhost:8000/
# The user "admin" with password "1234admin" is already registered in the database
# Go to https://localhost:8000/admin to change the password
Instructions on how to deploy the tool with Docker:
git clone https://github.com/jfnavarro/myVCF.git
cd myVCF
# .env contains 3 environment variables that are used to deploy (DEBUG, SECRET_KEY and ALLOWED_HOSTS)
docker-compose build
docker-compose up --detach
# Open the internet browser and go to https://localhost:8000/
# The user "admin" with password "1234admin" is already registered in the database
# Go to https://localhost:8000/admin to change the password
To deploy the tool in a production environment the following steps must be followed:
- Update the secret key in .env (KEEP THIS SAFE)
- Update .env to add your host to ALLOWED_HOSTS (if needed)
- Update .env to set DEBUG to 0
- Create a super user using python manage.py createsuperuser
- Deploy the tool (see instructions above)
- To change the password of the super user (https://ADDRESS/admin)
The folder /data/annotation contains files with gene names and ensembl ids downloaded from different versions of Ensemble. These are then added to the database in order to be able to fetch gene names from Ensembl ids.
The tool was configured using the following commands:
cd myVCF
python manage.py startproject
python data/db/popuplate_genes.py
python manage.py makemigration
python manage.py migrate
python manage.py createsuperuser
# admin
# 1234admin
Two test VCF files annodated with Annovar and VEP are located in /data/VCFs. They can be used to test and develop the tool.
The tool requires login acess. Once the user has logged in, the main page is shown and here the user can create, delete and open projects (one project per VCF file). Once a project has been opened, the user enters the VCF browser mode where different options are available:
- See summary statistics of the dataset.
- Open the settings to define groups or which columns are visible.
- Query the dataset using: genes, regions and variants.
- See the results of the query (genes and regions) in tables with filtering options.
- See variants where multiple statistics and graphs are displayed.
Most of the visualizations and tables are interactive.
In order to upload a file to the database, the VCF file must be located in the folder /data/VCFs.
- Jose Fernandez Navarro [email protected]
See LICENSE