Skip to content

Getting Started for Clojure Beginners

alumi edited this page May 23, 2024 · 41 revisions

Requirements

  • Java (1.8 or greater)
  • Leiningen (2.0 or greater)
  • Unix environment*

* Clojure works on any environments supported by Java. But this guide supposes Unix OS such as Linux and Mac OS.

Java

Install Java from Oracle website, OpenJDK website, or package manager.

Leiningen

Leiningen is a popular build tool for Clojure.

Leiningen is similar to Maven and Gradle. It automatically resolves dependent libraries and classpath according to the project configuration. You do not have to install Clojure itself.

Install Leiningen according to the official instruction.

Getting started

Creating project

First, create a new Leiningen project with lein new command.

$ lein new cljam-start
$ cd cljam-start

This command creates cljam-start/ directory and generates the project files in the directory.

cljam-start/
├── CHANGELOG.md
├── LICENSE
├── README.md
├── doc/
│   └── intro.md
├── project.clj
├── resources/
├── src/
│   └── cljam_start/
│       └── core.clj
└── test/
    └── cljam_start/
        └── core_test.clj

Configuration

Then add cljam dependency to the project configuration in project.clj.

(defproject cljam-start "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http:https://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http:https://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.8.0"]
                 [cljam "0.8.5"]]) ; <- Add this line

Leiningen automatically downloads Clojure, cljam, and other dependent libraries. Execute lein deps.

$ lein deps
Retrieving org/clojure/clojure/1.8.0/clojure-1.8.0.pom from central
Retrieving org/sonatype/oss/oss-parent/7/oss-parent-7.pom from central
Retrieving cljam/cljam/0.5.0/cljam-0.5.0.pom from clojars
  ...
Retrieving org/clojure/tools.logging/0.3.1/tools.logging-0.3.1.jar from central
Retrieving org/clojure/clojure/1.8.0/clojure-1.8.0.jar from central
Retrieving org/clojure/tools.cli/0.3.5/tools.cli-0.3.5.jar from central
  ...
Retrieving org/clojure/tools.nrepl/0.2.12/tools.nrepl-0.2.12.jar from central
Retrieving cljam/cljam/0.5.0/cljam-0.5.0.jar from clojars
Retrieving me/raynes/fs/1.4.6/fs-1.4.6.jar from clojars
  ...

Preparing resources

Downloads example SAM/BAM files (test.sam and test.bam) into resources/ directory.

$ wget https://raw.githubusercontent.com/chrovis/cljam/master/test-resources/sam/test.sam -O resources/test.sam
$ wget https://raw.githubusercontent.com/chrovis/cljam/master/test-resources/bam/test.bam -O resources/test.bam

Try in REPL

Clojure provides an interactive shell that is called REPL. In REPL, you can try Clojure codes quickly.

lein repl launches REPL in Leiningen project, which resolves dependencies.

$ lein repl
nREPL server started on port 49451 on host 127.0.0.1 - nrepl:https://127.0.0.1:49451
REPL-y 0.3.7, nREPL 0.2.12
Clojure 1.8.0
Java HotSpot(TM) 64-Bit Server VM 1.8.0_05-b13
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
    Exit: Control+D or (exit) or (quit)
 Results: Stored in vars *1, *2, *3, an exception in *e

user=>

Reading SAM/BAM

require loads the specified namespaces into the current namespace.

user=> (require '[cljam.io.sam :as sam])

Opens test.sam with reader and reads its header information.

user=> (with-open [r (sam/reader "resources/test.sam")]
         (sam/read-header r))
{:SQ [{:SN "ref", :LN 45} {:SN "ref2", :LN 40}]}

To read alignments,

user=> (with-open [r (sam/reader "resources/test.sam")]
         (doall (take 5 (sam/read-alignments r))))
(#cljam.io.protocols.SAMAlignment{:qname r003, :flag 16, :rname ref, :pos 29, :end 33, :mapq 30, :cigar 6H5M, :rnext *, :pnext 0, :tlen 0, :seq TAGGC, :qual *, :options []}
 #cljam.io.protocols.SAMAlignment{:qname r001, :flag 163, :rname ref, :pos 7, :end 22, :mapq 30, :cigar 8M4I4M1D3M, :rnext =, :pnext 37, :tlen 39, :seq TTAGATAAAGAGGATACTG, :qual *, :options [{:XX {:type B, :value S,12561,2,20,112}}]}
 #cljam.io.protocols.SAMAlignment{:qname r002, :flag 0, :rname ref, :pos 9, :end 18, :mapq 30, :cigar 1S2I6M1P1I1P1I4M2I, :rnext *, :pnext 0, :tlen 0, :seq AAAAGATAAGGGATAAA, :qual *, :options []}
 #cljam.io.protocols.SAMAlignment{:qname r003, :flag 0, :rname ref, :pos 9, :end 14, :mapq 30, :cigar 5H6M, :rnext *, :pnext 0, :tlen 0, :seq AGCTAA, :qual *, :options []}
 #cljam.io.protocols.SAMAlignment{:qname x3, :flag 0, :rname ref2, :pos 6, :end 27, :mapq 30, :cigar 9M4I13M, :rnext *, :pnext 0, :tlen 0, :seq TTATAAAACAAATAATTAAGTCTACA, :qual ??????????????????????????, :options []})

Sorting SAM/BAM

cljam.algo.sorter provides sorting functions.

user=> (require '[cljam.io.sam :as sam]
                '[cljam.algo.sorter :as sorter])
user=> (with-open [r (sam/reader "resources/test.bam")
                   w (sam/writer "resources/test.sorted.bam")]
         (sorter/sort-by-pos r w))
nil

The above code creates a sorted BAM file (test.sorted.bam).

$ ls resources
test.bam  test.sam  test.sorted.bam

cljam.algo.sorter/sort-by-pos accepts reader and writer as arguments. In this case, reader is the source BAM and writer is the sorted BAM that will be created.

Indexing BAM

To create a BAM index (BAI),

user=> (require '[cljam.algo.bam-indexer :as bai])
user=> (bai/create-index "resources/test.sorted.bam"
                         "resources/test.sorted.bam.bai")
nil

The index file (test.sorted.bam.bai) has generated.

$ ls resources
test.bam  test.sam  test.sorted.bam  test.sorted.bam.bai

Pileup

cljam.algo.depth/depth calculates a simple pileup and returns it as a lazy sequence.

user=> (require '[cljam.algo.depth :as depth])
user=> (with-open [r (sam/reader "resources/test.sorted.bam")]
         (depth/depth r {:chr "ref" :start 1 :end 30}))
(0 0 0 0 0 0 0 1 1 3 3 3 3 3 3 2 3 3 3 2 2 2 2 1 1 1 1 1 1 2)

pileup requires the BAI so that you need to create it beforehand.

Command-line interface

cljam provides a command-line interface to check its features quickly.

Installation

$ wget https://github.com/chrovis/cljam/releases/download/0.8.5/cljam
$ chmod +x cljam

Place cljam on your $PATH where your shell can find it (e.g. ~/bin).

Usage

cljam has some sub-commands similar to SAMtools. Use --help option to print all sub-commands and their descriptions.

$ cljam --help
Usage: cljam {view,convert,normalize,sort,index,pileup,faidx,dict,level,version} ...

 Options                Default  Desc
 -------                -------  ----
 -h, --no-help, --help  false    Show help

 Command    Desc
 -------    ----
 view       Extract/print all or sub alignments in SAM or BAM format.
 convert    Convert file format based on the file extension.
 normalize  Normalize references of alignments.
 sort       Sort alignments by leftmost coordinates.
 index      Index sorted alignment for fast random access.
 pileup     Generate pileup for the BAM file.
 faidx      Index reference sequence in the FASTA format.
 dict       Create a FASTA sequence dictionary file.
 level      Add level of alignments.
 version    Print version number.

cljam [sub-command] --help shows help with each sub-command.

$ cljam view --help
Extract/print all or sub alignments in SAM or BAM format.

Usage: cljam view [--header] [-f FORMAT][-r REGION] <in.bam|sam>

Options:
      --header               Include header
  -f, --format FORMAT  auto  Input file format <auto|sam|bam>
  -r, --region REGION        Only print in region (e.g. chr6:1000-2000)
  -h, --help                 Print help

view

view prints contents of SAM/BAM, which is equivalent of samtools view.

$ cljam view --header resources/test.sam
@SQ     SN:ref  LN:45
@SQ     SN:ref2 LN:40
r003    16      ref     29      30      6H5M    *       0       0       TAGGC   *
r001    163     ref     7       30      8M4I4M1D3M      =       37      39      TTAGATAAAGAGGATACTG     *       XX:B:S,12561,2,20,112
r002    0       ref     9       30      1S2I6M1P1I1P1I4M2I      *       0       0       AAAAGATAAGGGATAAA       *
r003    0       ref     9       30      5H6M    *       0       0       AGCTAA  *
x3      0       ref2    6       30      9M4I13M *       0       0       TTATAAAACAAATAATTAAGTCTACA      ??????????????????????????
r004    0       ref     16      30      6M14N1I5M       *       0       0       ATAGCTCTCAGC    *
r001    83      ref     37      30      9M      =       7       -39     CAGCGCCAT       *
x1      0       ref2    1       30      20M     *       0       0       AGGTTTTATAAAACAAATAA    ????????????????????
x2      0       ref2    2       30      21M     *       0       0       GGTTTTATAAAACAAATAATT   ?????????????????????
x4      0       ref2    10      30      25M     *       0       0       CAAATAATTAAGTCTACAGAGCAAC       ?????????????????????????
x6      0       ref2    14      30      23M     *       0       0       TAATTAAGTCTACAGAGCAACTA ???????????????????????
x5      0       ref2    12      30      24M     *       0       0       AATAATTAAGTCTACAGAGCAACT        ????????????????????????

convert

convert command converts SAM into BAM or BAM into SAM.

$ cljam convert resources/test.sam /tmp/test.converted.bam
$ cljam convert resources/test.bam /tmp/test.converted.sam

See command-line tool manual for more information.