Skip to content



Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation

SpLitteR - Repeat resolution in assembly graph using synthetic long reads



SpLitteR is a tool that uses synthetic long reads (SLRs) to improve the contiguity of HiFi assemblies. Given a SLR library and a HiFi assembly graph in the GFA format, SpLitteR resolves repeats in the assembly graph using linked-reads and generates a simplified (more contiguous) assembly graph with corresponding scaffolds.


  • g++ (version 5.3.1 or higher)
  • cmake (version 3.12 or higher)
  • zlib
  • libbz2


cd spades/assembler/
mkdir build && cd build && cmake ../src
make splitter

Now to run SpLitteR move to folder assembler/ and execute



The tool requires

  • Assembly graph file in GFA 1.0 format, with scaffolds included as path lines.
  • SLR library in YAML format. The tool supports SLR libraries produced using 10X Genomics Chromium and UST TELL-Seq technologies. Other SLR technologies, such as stLFR or LoopSeq can potentially be used as an input if converted to 10X or TELL-Seq format.

SpLitteR supports LJA and Flye assembly graphs out of the box. Other assembly graphs should prefferably be converted into blunt format by e.g. GetBlunted utility.

UST TELLSeq format

TELL-Seq library should include barcodes, left reads, and right reads as three separate FASTQ files.

For example, if you have a TELL-Seq library


YAML file should look like this:

        orientation: "fr",
        type: "tell-seq",
        right reads: [
        left reads: [
        aux: [

10X Genomics Chromium format

10X library should be in FASTQ format with barcodes attached as BC:Z or BX:Z tags:


For example, if you have an SLR library


YAML file should look like this:

        orientation: "fr",
        type: "clouds10x",
        right reads: [
        left reads: [

Command line

Synopsis: splitter <graph (in binary or GFA)> <SLR library description (in YAML)> <path to output directory> [OPTION...]

Main options:

  • -t Number of threads to use (default: 1/2 of available threads)
  • --mapping-k k-mer length for read mapping (default: 31)
  • -Gmdbg|-Gblunt Assembly graph type: mDBG (LJA) or blunted (Flye)
  • -Mdiploid|-Mmeta Repeat resolution mode (diploid or meta)
  • --assembly-info Path to metaFlye assembly_info.txt file (meta mode, metaFlye graphs only)

Barcode index construction:

  • --count-threshold Minimum number of reads for barcode index
  • --frame-size Resolution of the barcode index
  • --length-threshold Minimum scaffold graph edge length (meta mode option)
  • --linkage-distance Reads are assigned to the same fragment on long edges based on the linkage distance
  • --min-read-threshold Minimum number of reads for path cluster extraction
  • --relative-score-threshold Relative score threshold for path cluster extraction
  • --sampling-factor Downsample input SLR reads by this factor

Repeat resolution:

  • --score Score threshold for link index.
  • --tail-threshold Barcodes are assigned to the first and last <tail_threshold> nucleotides of the edge.
  • --scaffold-links Use scaffold links in addition to graph links for repeat resolution

Developer options:

  • --ref Reference path for repeat resolution evaluation
  • --bin-load Load read-to-graph alignment
  • --debug Produce lots of debug data, save read-to-graph alignment
  • --tmp-dir Scratch directory to use
  • -h, --help Print help message

Example command lines:

  • Assembly produced LJA from HiFi diploid human dataset, with 10X SLR library (HPC compressed)
    splitter lja_output/mdbg/mdbg.hpc.gfa 10x_dataset.yaml output -Mdiploid -Gmdbg
  • Assembly produced by metaFlye from metagenomic dataset, with TELL-Seq SLR library
    splitter metaflye_output/assembly_graph.gfa tellseq_dataset.yaml output --assembly-info metaflye_output/assembly_info.txt -Mmeta -Gblunt


SpLitteR stores all output files in output directory <output_dir> , which is set by the user.

  • <output_dir>/assembly_graph.gfa input assembly graph in mDBG encoding
  • <output_dir>/resolved_graph.gfa output assembly graph after repeat resolution
  • <output_dir>/contigs.fasta output scaffolds

In addition

  • <output_dir>/edge_transform.tsv map from input graph edges to resolved graph edges
  • <output_dir>/vertex_stats.tsv Statistics for complex vertices
  • <output_dir>/resolved_graph.fasta Sequences of the resolved graph edges