Skip to content

Latest commit

 

History

History
33 lines (23 loc) · 3.55 KB

quick-start.md

File metadata and controls

33 lines (23 loc) · 3.55 KB

Quick start

There are a lot of different partis subcommands, but probably the first thing you want to do is run partition on a fasta input file /path/to/yourseqs.fa. This will group the sequences into clonal families, and then annotate each family with V gene, naive sequence, etc. Assuming you don't pass it a parameter dir, it will also first infer parameters, including germline inference.

/path/to/<partis_dir>/bin/partis partition --infname /path/to/yourseqs.fa --outfname /path/to/yourseqs-partition.yaml.

Note that all input must be plus strand sequences, and if this isn't human igh you must set the --species {human,mouse,macaque} and/or --locus {tra,trb,trd,trg,igl,igk,igh} options. If your input file has a mix of plus and minus strand sequences and/or different loci (e.g. igk and igl are together), you'll need to run ./bin/split-loci.py (see --reverse-negative-strands) to split them into separate files. If you have heavy/light pairing information, you can incorporate it as described here. If you're using Docker, and you mounted your host filesystem as described here, you should replace /path/to with the appropriate host mount point within Docker. To parallelize on your local machine, add --n-procs N; to paralellize over many machines, the slurm and sge batch systems are currently supported (details here).

Typically, you can expect to annotate 10,000 sequences on an 8-core desktop in about five minutes, and partition in 25 minutes.

In addition to any output files specified with --oufname (described here), partis writes to two directories on your file system. Temporary working files go in --workdir, which is entirely removed upon successful completion. The workdir defaults to a subdirectory of /tmp (/tmp/$USER/hmms/<random.randint>), and this default shouldn't need to be changed unless you're using a batch system to run on multiple machines, in which case it needs to be on a network mount that they can all see. Permament parameter files are written to --parameter-dir, which defaults to a subdirectory of the current directory (see here).

You can add --debug 1 (or --debug 2) to print a lot of additional information about what's going on. The output of these should usually be viewed with less -RS either directly by piping | or after redirecting > to a log file (S disables line wrapping -- use left/right arrows to move side-to-side).

For details on the large number of available partition options, run partis partition --help.

A variety of overview plots will be written to disk if you set --plotdir <plotdir>. Details on their content can be found here.

If it's taking too long, or using too much memory, try the suggestions here, here and here.

After you've partitioned your sample, you might want to view an ascii-art representation of the resulting clusters and annotations with view-output, or calculate selection metrics to predict affinity with get-selection-metrics. You might also want to use the linearham package for accurate Bayesian infererence of trees and naive sequences. And for rich, browser-based visualization of families, trees, and annotations we recommend Olmsted.