Skip to content

MotomuMatsui/gs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gs2

gs2 is a software to conduct a brand-new phylogenetic analysis method--the Graph Splitting (GS). It can effectively resolve early evolution of protein families, and its accuracy and speed was proved by extensive evolutionary simulation.
gs2 is open-source software (GPL v3.0) inplemented in C++ for Linux, Mac (macOS) and Windows (Cygwin).

Reference: Motomu Matsui and Wataru Iwasaki, Systematic Biology, 2019
Online tool: GS analysis server
Our Laboratory: Iwasaki Lab

Build Status Ubuntu CentOS Mac Windows Language LAPACK MMseqs GPL License

History

version 2.4 (2019/02/12)

  • Modified distance function

version 2.3 (2018/11/16)

  • Added Transfer Bootstrap Expectation algorithm (F. Lemoine, et al., Nature, 2018)

version 2.2 (2018/11/07)

  • Updated to display warnings in case redundant sequences are input

version 2.1 (2018/10/15)

  • Added transitivity function
  • Modified addEP function

version 2.0 (2018/06/01)

  • Re-implemented in C++
  • MMseq2 is used for all-to-all pairwise sequence alignment

version 1.0 (2017/02/07)

  • Implemented in R and Perl
  • BLAST+ is used for all-to-all pairwise sequence alignment

Demo

demo

Installation

0. Requirements

  • GNU GCC compiler (5.0+) is required to compile gs2

  • CMake (3.0+) is required to compile mmseqs

    ❗ Mac users are recommended to install gcc and cmake using Homebrew

1. Compile from source code:

    $ git clone https://github.com/MotomuMatsui/gs
    $ cd gs
    $ make
  • You can optimize the Makefile in response to your environment (ex. CXX := g++-8, CXXFLAGS += -std=c++1z)

2. Set PATH environment variable:

    $ export PATH=$(pwd)/MMseqs2/build/bin:$PATH
  • You can move mmseqs to the other place where you want (ex. ~/bin) and add this path to your PATH environment variable (ex. export PATH=~/bin:$PATH)

Known issues

Mac ... Compiling LAPACK/BLAS sometimes fails

  • Rewrite OPTS = -O2 -frecursive to OPTS = -O3 -frecursive -pipe in lapack-3.7.1/make.inc, then re-execute make

Mac ... You might get the following error message

    ld: library not found for -lgfortran
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    make: *** [gs2] Error 1
  • Firstly, please execute locate gfortran to get the path to gfortran. If you already have gfortran (ex. /usr/local/bin/gfortran-8), execute the following commands in response to your environment.
    $ ln -sf /usr/local/bin/gcc-8 /usr/local/bin/gcc
    $ ln -sf /usr/local/bin/g++-8 /usr/local/bin/g++
    $ ln -sf /usr/local/bin/gfortran-8 /usr/local/bin/gfortran
    $ hash -r
    $ make clean
    $ make
  • If you have not had gfortran yet, please install the most current version of gcc using Homebrew, and execute the above commands

Mac ... You might get the following error message

    ar cr ../../liblapacke.a 
    ar: no archive members specified
        ...
        ...
        ...
    make: *** [lapack] Error 2
  • Please re-execute make

Mac ... If you had previously installed an old version of gcc, installing mmseqs sometimes fails

Windows ... LAPACK/BLAS version 3.8.0 has some problem to be installed

  • Choose LAPACK/BLAS version 3.7.1 for installation (default)

Usage

To get on-line help:

    $ ./gs2 -h

The following command enables you to calculate GS tree (phylogenetic tree reconstructed by Graph Splitting method):

    $ ./gs2 [arguments] input > output

❗ A multiple sequence file (ex. example/200.faa) should be required as input in fasta format

Arguments:

Option Description
-e [integer(>=0)] The number of replicates for EP method. Default: 0
-r [integer(>=1)] The random seed number for EP method. Default: random number
-t [integer(>=1)] The number of threads for MMseqs. Default: 1
-m [real(1–7.5)] Sensitivity for MMseqs. Default: 7.5
-b [string(tbe/fbs)] The bootstrap method. Default: tbe
-s Silent mode: do not report progress. Default: Off
-l Newick format with actual names. Default: Off
-h Show help messages. Default: Off
-v Show the version. Default: Off

Examples

GS tree (in newick format) will be displayed in STDOUT (correspondence table between IDs and Sequence Names → example/200_annotation.txt):

    $ ./gs2 example/200.faa

GS tree with branch reliability (Edge perturbation; EP) scores will be saved in test.nwk:

    $ ./gs2 -e 100 example/200.faa > example/200.nwk

GS tree with EP scores; a seed number is specified for EP method:

    $ ./gs2 -e 100 -r 12345 example/200.faa > example/200.nwk

GS tree WITHOUT EP scores + silent mode:

    $ ./gs2 -e 0 -s example/200.faa > example/200.nwk

MMseqs2 runs multithreaded jobs (4 CPUs are used in parallel):

    $ ./gs2 -e 100 -t 4 example/200.faa > example/200.nwk

Visualization of 200.nwk by iTOL:

License

This software is distributed under the GNU GPL, see LICENSE
Copyright © 2019, Motomu Matsui

Author

Motomu Matsui

Reference

Frederic Lemoine, Jean-Baka Domelevo Entfellner, Eduan Wilkinson, Damien Correia, Miraine Davila Felipe, Tulio De Oliveira, and Olivier Gascuel, Renewing Felsensteins phylogenetic bootstrap in the era of big data, Nature, 2018
Motomu Matsui and Wataru Iwasaki, Graph Splitting: A Graph-Based Approach for Superfamily-Scale Phylogenetic Tree Reconstruction, Systematic Biology, 2019

Acknowledgements

This package includes the LAPACKE/CBLAS (Univ. of Tennessee; Univ. of California, Berkeley; Univ. of Colorado Denver; and NAG Ltd.) and MMseqs (Söding Laboratory) packages. The authors give special thanks to both teams. You can get the detailed information from http:https://www.netlib.org/lapack/ and https://github.com/soedinglab/MMseqs2.