Skip to content

Command line tool

alumi edited this page May 23, 2024 · 16 revisions

Installation

A command-line interface is provided to check the features quickly.

Downloading executable:

$ curl -sSLO https://github.com/chrovis/cljam/releases/download/0.8.5/cljam
$ chmod +x cljam

Place cljam on your $PATH where your shell can find it (e.g. ~/bin).

Homebrew (Mac OS X):

$ brew install xcoo/formulae/cljam

Building from source:

lein bin creates standalone console executable into target directory.

$ lein bin

Copy the executable cljam somewhere in your $PATH (e.g. ~/bin).

Usage

Sub-commands

cljam has various sub-commands similar to SAMtools. Use --help option to print all sub-commands and their descriptions.

$ cljam --help
Usage: cljam {view,convert,normalize,sort,index,pileup,faidx,dict,level,version} ...

 Options                Default  Desc
 -------                -------  ----
 -h, --no-help, --help  false    Show help

 Command    Desc
 -------    ----
 view       Extract/print all or sub alignments in SAM or BAM format.
 convert    Convert file format based on the file extension.
 normalize  Normalize references of alignments.
 sort       Sort alignments by leftmost coordinates.
 index      Index sorted alignment for fast random access.
 pileup     Generate pileup for the BAM file.
 faidx      Index reference sequence in the FASTA format.
 dict       Create a FASTA sequence dictionary file.
 level      Add level of alignments.
 version    Print version number.

cljam [sub-command] --help shows help with each sub-command.

$ cljam view --help
Extract/print all or sub alignments in SAM or BAM format.

Usage: cljam view [--header] [-f FORMAT] [-r REGION] <in.bam|sam>

Options:
      --header               Include header
  -f, --format FORMAT  auto  Input file format <auto|sam|bam>
  -r, --region REGION        Only print in region (e.g. chr6:1000-2000)
  -h, --help                 Print help

view

view prints contents of SAM/BAM, which is equivalent of samtools view.

$ cljam view --header test-resources/sam/test.sam
@SQ     SN:ref  LN:45
@SQ     SN:ref2 LN:40
r003    16      ref     29      30      6H5M    *       0       0       TAGGC   *
r001    163     ref     7       30      8M4I4M1D3M      =       37      39      TTAGATAAAGAGGATACTG     *       XX:B:S,12561,2,20,112
r002    0       ref     9       30      1S2I6M1P1I1P1I4M2I      *       0       0       AAAAGATAAGGGATAAA       *
r003    0       ref     9       30      5H6M    *       0       0       AGCTAA  *
x3      0       ref2    6       30      9M4I13M *       0       0       TTATAAAACAAATAATTAAGTCTACA      ??????????????????????????
r004    0       ref     16      30      6M14N1I5M       *       0       0       ATAGCTCTCAGC    *
r001    83      ref     37      30      9M      =       7       -39     CAGCGCCAT       *
x1      0       ref2    1       30      20M     *       0       0       AGGTTTTATAAAACAAATAA    ????????????????????
x2      0       ref2    2       30      21M     *       0       0       GGTTTTATAAAACAAATAATT   ?????????????????????
x4      0       ref2    10      30      25M     *       0       0       CAAATAATTAAGTCTACAGAGCAAC       ?????????????????????????
x6      0       ref2    14      30      23M     *       0       0       TAATTAAGTCTACAGAGCAACTA ???????????????????????
x5      0       ref2    12      30      24M     *       0       0       AATAATTAAGTCTACAGAGCAACT        ????????????????????????

convert

convert command converts SAM into BAM or BAM into SAM while keeping the contents.

$ cljam convert test-resources/sam/test.sam /tmp/test.converted.bam
$ cljam convert test-resources/bam/test.bam /tmp/test.converted.sam

normalize

normalize normalizes reference names in SAM/BAM. For example, chr01 will be replaced by chr1, and 11 by chr11.

$ cljam view --header test-resources/sam/normalize_before.sam
@SQ     SN:chr01        LN:45
@SQ     SN:11   LN:40
r003    16      chr01   29      30      6H5M    *       0       0       TAGGC   *
r001    163     chr01   7       30      8M4I4M1D3M      =       37      39      TTAGATAAAGAGGATACTG     *       XX:B:S,12561,2,20,112
r002    0       chr01   9       30      1S2I6M1P1I1P1I4M2I      *       0       0       AAAAGATAAGGGATAAA       *
r003    0       chr01   9       30      5H6M    *       0       0       AGCTAA  *
x3      0       11      6       30      9M4I13M *       0       0       TTATAAAACAAATAATTAAGTCTACA      ??????????????????????????
r004    0       chr01   16      30      6M14N1I5M       *       0       0       ATAGCTCTCAGC    *
r001    83      chr01   37      30      9M      =       7       -39     CAGCGCCAT       *
x1      0       11      1       30      20M     *       0       0       AGGTTTTATAAAACAAATAA    ????????????????????
x2      0       11      2       30      21M     *       0       0       GGTTTTATAAAACAAATAATT   ?????????????????????
x4      0       11      10      30      25M     *       0       0       CAAATAATTAAGTCTACAGAGCAAC       ?????????????????????????
x6      0       11      14      30      23M     *       0       0       TAATTAAGTCTACAGAGCAACTA ???????????????????????
x5      0       11      12      30      24M     *       0       0       AATAATTAAGTCTACAGAGCAACT        ????????????????????????

$ cljam normalize test-resources/sam/normalize_before.sam /tmp/normalized.sam

$ cljam view --header /tmp/normalized.sam
@SQ     SN:chr1 LN:45
@SQ     SN:chr11        LN:40
r003    16      chr1    29      30      6H5M    *       0       0       TAGGC   *
r001    163     chr1    7       30      8M4I4M1D3M      =       37      39      TTAGATAAAGAGGATACTG     *       XX:B:S,12561,2,20,112
r002    0       chr1    9       30      1S2I6M1P1I1P1I4M2I      *       0       0       AAAAGATAAGGGATAAA       *
r003    0       chr1    9       30      5H6M    *       0       0       AGCTAA  *
x3      0       chr11   6       30      9M4I13M *       0       0       TTATAAAACAAATAATTAAGTCTACA      ??????????????????????????
r004    0       chr1    16      30      6M14N1I5M       *       0       0       ATAGCTCTCAGC    *
r001    83      chr1    37      30      9M      =       7       -39     CAGCGCCAT       *
x1      0       chr11   1       30      20M     *       0       0       AGGTTTTATAAAACAAATAA    ????????????????????
x2      0       chr11   2       30      21M     *       0       0       GGTTTTATAAAACAAATAATT   ?????????????????????
x4      0       chr11   10      30      25M     *       0       0       CAAATAATTAAGTCTACAGAGCAAC       ?????????????????????????
x6      0       chr11   14      30      23M     *       0       0       TAATTAAGTCTACAGAGCAACTA ???????????????????????
x5      0       chr11   12      30      24M     *       0       0       AATAATTAAGTCTACAGAGCAACT        ????????????????????????

sort

sort sorts alignments in SAM/BAM by leftmost coordinates or querynames.

$ cljam view test-resources/bam/test.bam
r003    16      ref     29      30      6H5M    *       0       0       TAGGC   *
r001    163     ref     7       30      8M4I4M1D3M      =       37      39      TTAGATAAAGAGGATACTG     *       XX:B:S,12561,2,20,112
r002    0       ref     9       30      1S2I6M1P1I1P1I4M2I      *       0       0       AAAAGATAAGGGATAAA       *
r003    0       ref     9       30      5H6M    *       0       0       AGCTAA  *
x3      0       ref2    6       30      9M4I13M *       0       0       TTATAAAACAAATAATTAAGTCTACA      ??????????????????????????
r004    0       ref     16      30      6M14N1I5M       *       0       0       ATAGCTCTCAGC    *
r001    83      ref     37      30      9M      =       7       -39     CAGCGCCAT       *
x1      0       ref2    1       30      20M     *       0       0       AGGTTTTATAAAACAAATAA    ????????????????????
x2      0       ref2    2       30      21M     *       0       0       GGTTTTATAAAACAAATAATT   ?????????????????????
x4      0       ref2    10      30      25M     *       0       0       CAAATAATTAAGTCTACAGAGCAAC       ?????????????????????????
x6      0       ref2    14      30      23M     *       0       0       TAATTAAGTCTACAGAGCAACTA ???????????????????????
x5      0       ref2    12      30      24M     *       0       0       AATAATTAAGTCTACAGAGCAACT        ????????????????????????

$ cljam sort test-resources/bam/test.bam /tmp/test.sorted.bam

$ cljam view /tmp/test.sorted.bam
r001    163     ref     7       30      8M4I4M1D3M      =       37      39      TTAGATAAAGAGGATACTG     *       XX:B:S,12561,2,20,112
r002    0       ref     9       30      1S2I6M1P1I1P1I4M2I      *       0       0       AAAAGATAAGGGATAAA       *
r003    0       ref     9       30      5H6M    *       0       0       AGCTAA  *
r004    0       ref     16      30      6M14N1I5M       *       0       0       ATAGCTCTCAGC    *
r003    16      ref     29      30      6H5M    *       0       0       TAGGC   *
r001    83      ref     37      30      9M      =       7       -39     CAGCGCCAT       *
x1      0       ref2    1       30      20M     *       0       0       AGGTTTTATAAAACAAATAA    ????????????????????
x2      0       ref2    2       30      21M     *       0       0       GGTTTTATAAAACAAATAATT   ?????????????????????
x3      0       ref2    6       30      9M4I13M *       0       0       TTATAAAACAAATAATTAAGTCTACA      ??????????????????????????
x4      0       ref2    10      30      25M     *       0       0       CAAATAATTAAGTCTACAGAGCAAC       ?????????????????????????
x5      0       ref2    12      30      24M     *       0       0       AATAATTAAGTCTACAGAGCAACT        ????????????????????????
x6      0       ref2    14      30      23M     *       0       0       TAATTAAGTCTACAGAGCAACTA ???????????????????????

Supply -o queryname option to sort alignments by querynames.

$ cljam sort -o queryname test-resources/bam/test.bam /tmp/test.sorted2.bam

$ cljam view /tmp/test.sorted2.bam
r001    83      ref     37      30      9M      =       7       -39     CAGCGCCAT       *
r001    163     ref     7       30      8M4I4M1D3M      =       37      39      TTAGATAAAGAGGATACTG     *       XX:B:S,12561,2,20,112
r002    0       ref     9       30      1S2I6M1P1I1P1I4M2I      *       0       0       AAAAGATAAGGGATAAA       *
r003    16      ref     29      30      6H5M    *       0       0       TAGGC   *
r003    0       ref     9       30      5H6M    *       0       0       AGCTAA  *
r004    0       ref     16      30      6M14N1I5M       *       0       0       ATAGCTCTCAGC    *
x1      0       ref2    1       30      20M     *       0       0       AGGTTTTATAAAACAAATAA    ????????????????????
x2      0       ref2    2       30      21M     *       0       0       GGTTTTATAAAACAAATAATT   ?????????????????????
x3      0       ref2    6       30      9M4I13M *       0       0       TTATAAAACAAATAATTAAGTCTACA      ??????????????????????????
x4      0       ref2    10      30      25M     *       0       0       CAAATAATTAAGTCTACAGAGCAAC       ?????????????????????????
x5      0       ref2    12      30      24M     *       0       0       AATAATTAAGTCTACAGAGCAACT        ????????????????????????
x6      0       ref2    14      30      23M     *       0       0       TAATTAAGTCTACAGAGCAACTA ???????????????????????

index

index creates a BAM index (.bai) for fast random access. An input BAM must be sorted beforehand.

$ cp test-resources/bam/test.sorted.bam /tmp/
$ cljam index /tmp/test.sorted.bam

pileup

pileup generates pileup for a BAM, which is equivalent of samtools mpileup. An input BAM must be sorted and indexed beforehand.

$ cljam pileup test-resources/bam/test.sorted.bam
ref     7       N       1       T       ~
ref     8       N       1       T       ~
ref     9       N       3       AAA     ~~~
ref     10      N       3       GGG     ~~~
ref     11      N       3       AAC     ~~~
ref     12      N       3       TTT     ~~~
ref     13      N       3       AAA     ~~~
ref     14      N       3       A+4AGAGA+2GGA   ~~~
ref     15      N       2       GG      ~~
  ...

faidx

faidx creates a FASTA index (.fai) for fast random access.

$ cp test-resources/fasta/test.fa /tmp/
$ cljam faidx /tmp/test.fa

dict

dict creates a FASTA sequence dictionary (.dict).

$ cljam dict test-resources/fasta/test.fa /tmp/test.dict

level

level adds level of alignments to BAM.

$ cljam level test-resources/bam/test.sorted.bam /tmp/leveled.bam

$ cljam view /tmp/leveled.bam
r001    163     ref     7       30      8M4I4M1D3M      =       37      39      TTAGATAAAGAGGATACTG     *       LV:i:0  XX:B:S,12561,2,20,112
r002    0       ref     9       30      1S2I6M1P1I1P1I4M2I      *       0       0       AAAAGATAAGGGATAAA       *       LV:i:1
r003    0       ref     9       30      5H6M    *       0       0       AGCTAA  *       LV:i:2
r004    0       ref     16      30      6M14N1I5M       *       0       0       ATAGCTCTCAGC    *       LV:i:2
r003    16      ref     29      30      6H5M    *       0       0       TAGGC   *       LV:i:0
r001    83      ref     37      30      9M      =       7       -39     CAGCGCCAT       *       LV:i:0
x1      0       ref2    1       30      20M     *       0       0       AGGTTTTATAAAACAAATAA    ????????????????????    LV:i:0
x2      0       ref2    2       30      21M     *       0       0       GGTTTTATAAAACAAATAATT   ?????????????????????   LV:i:1
x3      0       ref2    6       30      9M4I13M *       0       0       TTATAAAACAAATAATTAAGTCTACA      ??????????????????????????      LV:i:2
x4      0       ref2    10      30      25M     *       0       0       CAAATAATTAAGTCTACAGAGCAAC       ?????????????????????????       LV:i:3
x5      0       ref2    12      30      24M     *       0       0       AATAATTAAGTCTACAGAGCAACT        ????????????????????????        LV:i:4
x6      0       ref2    14      30      23M     *       0       0       TAATTAAGTCTACAGAGCAACTA ??????????????????????? LV:i:5

LV:i:5 is a level field in the above result.

version

version prints the version of cljam.

$ cljam version
0.8.5