Skip to content

Kobie-Kirven/SCiMS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sex Calling for Metagenomic Sequences


What is SCiMS?

SCiMS (Sex Calling for Metagenomic Sequences) is a tool for determining host sex using shotgun metagenomic data. Currently, this tool supports sex determination for host organisms that contain two sex chromosomes. (Ex. X and Y in humans). Briefly, SCiMS works by aligning shotgun metagenomic data to a reference genome and comparing the coverage of the homogametic chromosome to the coverage of the autosomes.

Installation

Requirements

  • bowtie2==2.4.4
  • Samtools==1.13

All of the required tools can be installed with conda (see Install Requirements)

SCiMS can be easily installed with:

pip3 install git+https://github.com/Kobie-Kirven/SCiMS

Usage

  • SCiMS is designed to work with both single and paired-end metagenomic data. The command-line usage is:
    usage: scims [-h] [-v] [--bowtie-index INDEX] [--ref-genome REF] [--scaffold-names SCAFFOLD] [-1 FORWARD] [-2 REVERSE] [-s SINGLE] [--x HOMOGAMETIC]
             [--y HETEROGAMETIC] --o OUTPUT [--t THREADS] [--from-sam FROM_SAM] [--get-scaffold-lengths LENGTHS] [--scaffold-lengths SCAF_LENGTHS]
    Sex Calling for Metagenomic Sequences
    options:
    -h, --help            show this help message and exit
    -v, --version         show program's version number and exit
    --bowtie-index INDEX  Path to Bowtie2 index
    --ref-genome REF      Reference genome in FASTA or FASTA.gz format
    --scaffold-names SCAFFOLD
                           Path to file with scaffold names
    -1 FORWARD            Forward reads in FASTA or FASTQ format (PE mode only)
    -2 REVERSE            Reverse reads in FASTA or FASTQ format (PE mode only)
    -s SINGLE             Single-end reads in FASTA or FASTQ format (SE mode only)
    --x HOMOGAMETIC       ID of homogametic sex chromosome (ex. X)
    --y HETEROGAMETIC     ID of heterogametic sex chromesome (ex. Y)
    --o OUTPUT            Output plot prefix
    --t THREADS           Number of threads to use
    --from-sam FROM_SAM   Use sam file instead of preforming the alignment
    --get-scaffold-lengths LENGTHS
                           Determine scaffold lengths
    --scaffold-lengths SCAF_LENGTHS
                           Path to file with scaffold lengths
    

Install Requirements

To install the required tools with conda, run the following code.

conda install -c bioconda bowtie2
conda install -c bioconda samtools

Note: For some reason, the current version of Bowtie2 has an issue with tbb. To fix this, run

conda install tbb=2020.2

Quick Tutorial

This tutorial is intended to ensure that your SCiMS installation is working correctly:

Directly from SAM file

  1. Download test files:

    wget https://github.com/Kobie-Kirven/SCiMS/raw/main/test_data/male_test.sam
    wget https://github.com/Kobie-Kirven/SCiMS/raw/main/test_data/scaffold_lengths.txt
    wget https://github.com/Kobie-Kirven/SCiMS/raw/main/test_data/scaffolds.txt
  2. Run SCiMS in alignment-free mode:

    scims_test]$ scims --scaffold-names scaffolds.txt --x NC_000023.11 --y NC_000024.10 --o test --t 1 --from-sam male_test.sam --scaffold-lengths scaffold_lengths.txt
    
  3. You should see output similar to the following:

    Using SAM file
    Extracting propper alignments:
          A total of 18449 alignments met the criteria
    The average NC_000024.10:NC_000023.11 ratio for the file was 0.5285776962795797
    

Preform alignment first

  1. Download sequence files
wget https://github.com/Kobie-Kirven/SCiMS/raw/main/test_data/female_10000_1.fa
wget https://github.com/Kobie-Kirven/SCiMS/raw/main/test_data/female_10000_2.fa
  1. Download reference genome
wget https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.fna.gz
  1. Build the Bowtie2 index
bowtie2-build GRCh38_latest_genomic.fna.gz GRCh38_latest_genomic.fna.gz
  1. Run SCiMS in alignment-free mode:

    scims --scaffold-names scaffolds.txt --x NC_000023.11 --y NC_000024.10 --o test --t 1 -1 female_10000_1.fa -2 female_10000_2.fa --scaffold-lengths scaffold_lengths.txt --bowtie-index GRCh38_latest_genomic.fna.gz
    
  2. You should see output similar to the following:

    Aligning sequences to GRCh38_latest_genomic.fna.gz:
    Forward reads: female_10000_1.fa
    Reverse reads: female_10000_2.fa
    Extracting propper alignments:
          A total of 18552 alignments met the criteria
    The average NC_000024.10:NC_000023.11 ratio for the file was 0.9519628016280371
    

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages