Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation: Conda, Docker, etc #389

Open
nextgenusfs opened this issue Mar 2, 2020 · 17 comments
Open

Installation: Conda, Docker, etc #389

nextgenusfs opened this issue Mar 2, 2020 · 17 comments

Comments

@nextgenusfs
Copy link
Owner

nextgenusfs commented Mar 2, 2020

I know there are a lot of dependencies with funannotate (it isn't ideal). I spent quite some time getting funannotate working on conda. However, as most of you know that have worked with conda, it is also far from perfect. I can't possibly test everybody's system, my two test environments are centOS and macOS. In my experience, the key to keeping conda working is to never install anything in the base environment, this is especially true in a shared system as inevitably you end up with permissions issues with multiple users. If you are trying to install funannotate via conda, i.e. conda create -n funannotate funannotate and you are getting errors that it is unable to solve, then you most likely have packages installed in your base environment that are clashing with the dependencies.

I also know there are several people interested in Docker/Singularity containers #183 #366 #257 #379 . I've been told that conda doesn't work in these environments and fails to find a solution. I finally had some time to look around and see what worked/didn't work for me.

On my macOS system, I'm able to build a Docker image for funannotate v1.7.4 using python 2.7 as follows. Note that apparently on Debian systems the forge program in the Bioconda snap recipe isn't working (#387 thanks to @nhartwic for the fix), below you'll find a fix for that in this recipe. For me, bioconda snap works on both centOS and macOS.

FROM continuumio/miniconda3:latest

RUN conda config --add channels defaults && conda config --add channels bioconda && \
	conda config --add channels conda-forge && conda update -n base -c defaults conda && \
	apt-get update && apt-get -y install gcc

RUN conda create -n funannotate --yes "funannotate=1.7.4" && conda clean -afy && \
	mkdir -p /home/funannotate_db && echo "source activate funannotate" > ~/.bashrc

ENV FUNANNOTATE_DB=/home/funannotate_db

SHELL ["conda", "run", "-n", "funannotate", "/bin/bash", "-c"]

#bioconda snap is partially broken on some systems (debian apparently), forge is problem
WORKDIR /opt

RUN git clone https://github.com/KorfLab/SNAP.git && cd SNAP && make && \
	cp /opt/SNAP/forge /opt/conda/envs/funannotate/bin/forge

RUN funannotate setup -i all 

#for some reason USER needs to be set for seqclean to work in docker
ENV USER='me'

WORKDIR /home

I'm also working on a python3 port of the code which is in the python3 branch. I'm still doing some testing on this branch to make sure everything is working. It would be helpful for others to test the code as well, as I know people use the scripts in slightly different ways. This can be tested in a docker environment like this (note I found a unicode error that needs to be fixed in this py3 port, but you get the idea):

FROM continuumio/miniconda3:latest

RUN conda config --add channels defaults && conda config --add channels bioconda &&\
  conda config --add channels conda-forge && conda update -n base -c defaults --yes \
  conda && apt-get update && apt-get -y install gcc make

RUN conda create -n funannotate --yes python=3.7 biopython goatools matplotlib natsort \
  psutil requests scikit-learn scipy seaborn "blast=2.2.31" tantan bedtools hmmer \
  exonerate "diamond>0.9,<=0.9.24" tbl2asn ucsc-pslcdnafilter "pasa>=2.4.1" \
  trimmomatic raxml trimal "mafft>=7" iqtree "kallisto>=0.46,<0.46.2" evidencemodeler \
  codingquarry stringtie snap glimmerhmm trnascan-se hisat2 "proteinortho>=6.0.9" \
  "salmon>=0.9" perl "perl-bioperl>1.7" perl-dbd-mysql perl-clone perl-hash-merge \
  perl-soap-lite perl-json perl-logger-simple perl-scalar-util-numeric minimap2 \
  perl-text-soundex perl-parallel-forkmanager "r-base>=3.4.1" bamtools numpy pandas \
  "augustus>3.3" "trinity>=2.8.5=h8b12597_5" pip && conda clean -afy && \
  mkdir -p /home/funannotate_db && echo "source activate funannotate" > ~/.bashrc

ENV FUNANNOTATE_DB=/home/funannotate_db

SHELL ["conda", "run", "-n", "funannotate", "/bin/bash", "-c"]

#bioconda snap is partially broken on some systems (debian apparently), forge is problem
WORKDIR /opt

RUN git clone https://github.com/KorfLab/SNAP.git && cd SNAP && make && \
  cp /opt/SNAP/forge /opt/conda/bin/forge

RUN python -m pip install git+https://github.com/nextgenusfs/funannotate.git@python3

RUN funannotate setup -i all -w

#for some reason USER needs to be set for seqclean to work in docker
ENV USER='me'

WORKDIR /home
@photocyte
Copy link
Contributor

Hi there,

Just double checking: Is there something wrong with using the Docker container automatically pushed by Bioconda to Quay.io?
https://bioconda.github.io/recipes/funannotate/README.html

For newer versions of Singularity, you can directly pull from quay.io as well using the docker:https:// URL format, e.g.:

singularity pull docker:https://quay.io/biocontainers/funannotate:1.7.4--py27_0

I just tested it, and the singularity image seemed to pull / be executable with singularity shell without any obvious errors.

@reslp
Copy link
Contributor

reslp commented May 12, 2020

Hi everyone,

I also spent some time on creating a Docker/Singularity container for funannotate. It is available here: https://github.com/reslp/funannotate-docker. I have been using it quite a bit lately and it works pretty well (also in a cluster environment using Snakemake and job scheduling). Although I have not tested all features (eg. training with RNA Seq evidence).
It does not use the preferred way to install funannotate though (which is conda), because when I started working on it there were some dependency issues with ete3 (which are now fixed). Everything in the container is thus installed manually. However I may transition to the conda installation at some point, probably when funannotate 1.8 is released.

I am posting this here in the hope that this could be useful to some of you.

all the best,
Philipp

@apredeus
Copy link

apredeus commented Aug 2, 2021

Hello @nextgenusfs,

Thank you for these comments on the installation! What makes it a bit harder and confusing for new users (like myself) is the fact that the instructions on readthedocs and in the readme of this GitHub differ.

Also, I see the discussion above about moving to python3. Does that mean that we should create the conda env with python=2.7?

Thank you in advance.

@nextgenusfs
Copy link
Owner Author

@apredeus this is >1 year old. You should not use python2.7 at this point. Sorry that the docs are not always up to date -- this entire project is all outside of work and I volunteer my time.

@apredeus
Copy link

apredeus commented Aug 2, 2021

Thank you for commenting - didn't notice that at all, I was just searching for comments that can bring some clarity to the instructions. I certainly didn't mean the comment in any negative way - big thanks for keeping the project alive in your spare time!

PS So just to clarify: all the readthedocs instructions can be ignored, and github readme is the way to go, right?

@nextgenusfs
Copy link
Owner Author

As of 3 minutes ago installation instructions are the same in both places.

@yizhouc
Copy link

yizhouc commented Sep 26, 2021

"IndexError: list index out of range" in funannotate update Parsing GenBank files...comparing annotation.
I am trying to use funannotate to annotate two-spotted mite (genome size 90mb). I use the docker image. I have the illumian RNA-seq and pacbio ISO-seq for the evidence. The process clean, sort, mask, train, predict ran smoothly. When I ran update, I got the error message. I tried use -- --pasa_db mysql, the error message is the same. I would like to have some help.
Here is the command I use:
funannotate-docker update -i fun --pasa_db mysql --cpus 32
the error message:
(base) yizhouc@yct7920:~/raid_mnt/p_ausgem_urticae$ bin/2021-09-26_funannote_Susceptible_update.sh "

[Sep 26 09:28 AM]: OS: Debian GNU/Linux 10, 64 cores, ~ 791 GB RAM. Python: 3.8.10
[Sep 26 09:28 AM]: Running 1.8.9
[Sep 26 09:28 AM]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt'
[Sep 26 09:28 AM]: Found relevant files in fun/training, will re-use them:
Forward reads: fun/training/left.fq.gz
Reverse reads: fun/training/right.fq.gz
Forward Q-trimmed reads: fun/training/trimmomatic/trimmed_left.fastq.gz
Reverse Q-trimmed reads: fun/training/trimmomatic/trimmed_right.fastq.gz
Forward normalized reads: fun/training/normalize/left.norm.fq
Reverse normalized reads: fun/training/normalize/right.norm.fq
Trinity results: fun/training/funannotate_train.trinity-GG.fasta
Long-read results: fun/training/funannotate_long-reads.fasta
PASA config file: fun/training/pasa/alignAssembly.txt
BAM alignments: fun/training/funannotate_train.coordSorted.bam
StringTie GTF: fun/training/funannotate_train.stringtie.gtf
[Sep 26 09:28 AM]: Reannotating Tetranychus urticae, NCBI accession: None
[Sep 26 09:28 AM]: Previous annotation consists of: 17,462 protein coding gene models and 108 non-coding gene models
[Sep 26 09:28 AM]: Existing BAM alignments found: fun/update_misc/trinity.alignments.bam, fun/update_misc/transcript.alignments.bam
[Sep 26 09:28 AM]: Skipping PASA, found existing output: fun/update_misc/pasa_final.gff3
[Sep 26 09:28 AM]: Existing Kallisto output found: fun/update_misc/kallisto.tsv
[Sep 26 09:28 AM]: Parsing Kallisto results. Keeping alt-splicing transcripts if expressed at least 10.0% of highest transcript per locus.
[Sep 26 09:28 AM]: Wrote 18,358 transcripts derived from 17,390 protein coding loci.
[Sep 26 09:28 AM]: Validating gene models (renaming, checking translations, filtering, etc)
[Sep 26 09:28 AM]: Writing 17,486 loci to TBL format: dropped 0 overlapping, 0 too short, and 0 frameshift gene models
[Sep 26 09:28 AM]: Converting to Genbank format
[Sep 26 09:31 AM]: Collecting final annotation files
[Sep 26 09:31 AM]: Parsing GenBank files...comparing annotation
Traceback (most recent call last):
File "/venv/bin/funannotate", line 8, in
sys.exit(main())
File "/venv/lib/python3.8/site-packages/funannotate/funannotate.py", line 705, in main
mod.main(arguments)
File "/venv/lib/python3.8/site-packages/funannotate/update.py", line 2435, in main
compareAnnotations2(GBK, final_gbk, Changes, args=args)
File "/venv/lib/python3.8/site-packages/funannotate/update.py", line 1443, in compareAnnotations2
UTRs = findUTRs(
File "/venv/lib/python3.8/site-packages/funannotate/update.py", line 1497, in findUTRs
refInterlap = InterLap(mrna[i])
IndexError: list index out of range

@nextgenusfs
Copy link
Owner Author

Please open a new issue as this is unrelated to docker. Make sure to post log files and commands you ran.

@ccgallen
Copy link

Hi! I am trying to install funannotate with docker using the instructions in the README.md file. I am getting it to run (nice!) but am not able to get some of the dependencies working, such as signalp and genemark. I found some instructions related to docker here but it is for an older version of funannotate and as far as I can tell, that dockerfile doesn't exist anymore. Can I do something similar with the more recent nextgenusfs/funannotate/Dockerfile? I am using a linux server.

Thanks!

(base) ccallen@debary-vm1:/data/ccallen$ docker run --rm -it nextgenusfs/funannotate funannotate check --show-versions

Checking dependencies for 1.8.10

You are running Python v 3.8.12. Now checking python packages...
biopython: 1.77
goatools: 1.1.12
matplotlib: 3.5.1
natsort: 8.0.2
numpy: 1.22.0
pandas: 1.3.5
psutil: 5.9.0
requests: 2.27.1
scikit-learn: 1.0.2
scipy: 1.5.3
seaborn: 0.11.2
All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules...
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
local::lib: 2.000024
threads: 2.15
threads::shared: 1.56
ERROR: Bio::Perl not installed, install with cpanm Bio::Perl

Checking Environmental Variables...
$FUNANNOTATE_DB=/opt/databases
$PASAHOME=/venv/opt/pasa-2.4.1
$TRINITYHOME=/venv/opt/trinity-2.8.5
$EVM_HOME=/venv/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/venv/config
ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies...
PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.3
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v36
diamond: 2.0.13
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2017-11-15
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.9.1-internal
kallisto: 0.46.1
mafft: v7.490 (2021/Oct/30)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.24-r1122
pigz: pigz 2.6
proteinortho: 6.0.16
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.12
snap: 2006-07-28
stringtie: 2.2.0
tRNAscan-SE: 2.0.9 (July 2021)
tantan: tantan 26
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
ERROR: emapper.py not installed
ERROR: gmes_petap.pl not installed
ERROR: signalp not installed

@hyphaltip
Copy link
Collaborator

genemark,eggnog, and signalp need to be installed outside of conda they have separate licenses that do not let them be packaged within conda.
biocore/conda-recipes#17
The conda instructions here guide you to info on installing genemark
https://funannotate.readthedocs.io/en/latest/conda.html

it looks like eggnog mapper also needs to be installed too. https://anaconda.org/bioconda/eggnog-mapper

@ccgallen
Copy link

Thanks @hyphaltip! I have all of these working using conda, or can at least run them externally and pass the files to predict or annotate. I am trying to get the latest build installed with docker and haven’t worked out yet how to incorporate these tools into the container. I am new to docker and will keep working at it thanks!

@nextgenusfs
Copy link
Owner Author

@ccgallen you'll need to build your own docker image to do this. Note I wouldn't actually recommend using docker if you already have something working with conda as it gives you a lot more flexibility, docker is great but has some limitations -- in the context of funannotate it makes it difficult to use because of the database sizes and incorporating other tools (eggnog/interproscan/etc).

The docs you linked to above are several years old. Now the docker image is built when new versions are tagged as well as on every commit by GitHub Actions, so to incorporate the tools that have separate licenses, you'll need to setup a new docker file that has the build instructions, you can use the funannotate one as a base image. Here is a quick example -- note this is untested code -- but the idea is you will need to copy the install packages into the docker container, and then install it in the container during the build, it then will also need to be in the PATH. You can look at the Dockerfiles in this repo to see how the base image is made, its made in a two step process initially to save space, but effectively /venv/bin/ is in the path so you can softlink tools to that location and funannotate should be able to find them.

So the sed statements below are part of "normal" signalp 4.1 installation -- any thing you would need to do to install a particular tool on your existing system needs to be in the dockerfile.

FROM nextgenusfs/funannotate

WORKDIR /opt

COPY signalp-4.1f.Linux.tar.gz /opt

RUN tar -zxvf signalp-4.1f.Linux.tar.gz && \ 
    sed -i 's,/usr/cbs/bio/src/signalp-4.1,/opt/signalp-4.1,g' signalp-4.1/signalp && \
    sed -i 's,#!/usr/bin/perl,#!/usr/bin/env perl,g' signalp-4.1/signalp && \
    ln -s /opt/signalp-4.1/signalp /venv/bin/signalp

@ccgallen
Copy link

thanks so much @nextgenusfs!

@aramos-solena
Copy link

Hi. I am trying to add procps ("ps command") to the latest Docker image. When trying to build the latest Docker file (funannotate-1.8.11) without changing anything, I got the following error:

docker build -t funannotate-1.8.11 .
Step 6/13 : RUN conda-pack -n funannotate -o /tmp/env.tar &&     mkdir /venv && cd /venv && tar xf /tmp/env.tar &&     rm /tmp/env.tar
 ---> Running in 21c43c8747ce
CondaPackError: 
Files managed by conda were found to have been deleted/overwritten in the
following packages:

- xlrd 2.0.1:
    lib/python3.8/site-packages/xlrd-2.0.1.dist-info/INSTALLER
    lib/python3.8/site-packages/xlrd-2.0.1.dist-info/LICENSE
    lib/python3.8/site-packages/xlrd-2.0.1.dist-info/METADATA
    + 5 others

This is usually due to `pip` uninstalling or clobbering conda managed files,
resulting in an inconsistent environment. Please check your environment for
conda/pip conflicts using `conda list`, and fix the environment by ensuring
only one version of each package is installed (conda preferred).

Do you have any suggestions on how to fix this?

@nextgenusfs
Copy link
Owner Author

@aramos-solena it was due to a Conda-pack issue -- the current docker image should have procps installed. You can see the Dockerfile and what I changed to make it work.

@aramos-solena
Copy link

@aramos-solena it was due to a Conda-pack issue -- the current docker image should have procps installed. You can see the Dockerfile and what I changed to make it work.

Thanks so much @nextgenusfs , this fixed the issue.

@tjhinet
Copy link

tjhinet commented Mar 20, 2023

Hi,

I am fairly new to this forum, so pardon me if I'm not supposed to comment here. I encountered a similar issue as described previously #702 as I was checking for SignalP function when checking dependencies. This is based on the latest docker image.

Logfile

Checking dependencies for 1.8.14

You are running Python v 3.8.12. Now checking python packages...
biopython: 1.80
goatools: 1.2.3
matplotlib: 3.7.0
natsort: 8.2.0
numpy: 1.22.4
pandas: 1.5.3
psutil: 5.9.4
requests: 2.28.2
scikit-learn: 1.1.1
scipy: 1.5.3
seaborn: 0.12.2
All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules...
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
local::lib: 2.000029
threads: 2.15
threads::shared: 1.56
All 27 Perl modules installed

Checking Environmental Variables...
$FUNANNOTATE_DB=/opt/databases
$PASAHOME=/venv/opt/pasa-2.4.1
$TRINITYHOME=/venv/opt/trinity-2.8.5
$EVM_HOME=/venv/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config
$GENEMARK_PATH=/home/u4485090/funannotate/gmes_linux_64_4
All 6 environmental variables are set

Checking external dependencies...
ERROR: pslDnaFiler found but error running: pslCDnaFilter: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory

ERROR: signalp found but error running signalp
PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.2
bamtools: bamtools 2.5.2
bedtools: bedtools v2.30.0
blat: BLAT v35
diamond: 2.0.15
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: 36.3.8g
glimmerhmm: 3.0.4
gmap: 2017-11-15
gmes_petap.pl: 4.71_lic
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.8-internal
kallisto: 0.46.1
mafft: v7.515 (2023/Jan/15)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.24-r1122
pigz: 2.6
proteinortho: 6.0.16
salmon: salmon 0.14.1
samtools: samtools 1.12
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.9 (July 2021)
tantan: tantan 40
tbl2asn: 25.8
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
ERROR: emapper.py not installed
ERROR: pslCDnaFilter not installed
ERROR: signalp not installed
Singularity> signalp
Can't locate Getopt/Std.pm in @inc (you may need to install the Getopt::Std module) (@inc contains: /home/u4485090/perl5/lib/perl5/x86_64-linux-gnu-thread-multi /home/u4485090/perl5/lib/perl5 /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.28.1 /usr/local/share/perl/5.28.1 /usr/lib/x86_64-linux-gnu/perl5/5.28 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.28 /usr/share/perl/5.28 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at /home/u4485090/funannotate/signalp-4.1/signalp line 76.
BEGIN failed--compilation aborted at /home/u4485090/funannotate/signalp-4.1/signalp line 76.

Any help is appreciated! Thanks.

Cheers,
Erick