Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong PDF file at the end of the Sequeduct demo #2

Open
madeleinevlt opened this issue May 31, 2024 · 5 comments
Open

Wrong PDF file at the end of the Sequeduct demo #2

madeleinevlt opened this issue May 31, 2024 · 5 comments

Comments

@madeleinevlt
Copy link

Dear authors,

I tried to run the Sequeduct demo pipeline https://github.com/Edinburgh-Genome-Foundry/Sequeduct_demo/
However, at the end of the pipeline, the PDF generated only the first 2 pages, without any errors.

To install Sequeduct, I used

nextflow pull edinburgh-genome-foundry/Sequeduct -r v0.3.1

I had to change the GeneBlocks installation in the docker file to make the pipeline run :

# GeneBlocks:
RUN pip install --no-cache-dir biopython==1.78
RUN apt-get install -y ncbi-blast+
RUN pip install --no-cache-dir geneblocks==1.2.3

Into

# GeneBlocks:
RUN pip install --no-cache-dir biopython==1.80
RUN apt-get update && apt-get install -y wget procps \
    &&  wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.10.1/ncbi-blast-2.10.1+-x64-linux.tar.gz  \
    &&  tar -xvf ncbi-blast-2.10.1+-x64-linux.tar.gz
ENV PATH="/ncbi-blast-2.10.1+/bin:$PATH"
RUN pip install --no-cache-dir geneblocks==1.2.3

I made those changes because apt-get install -y ncbi-blast+ wasn't working and I had an error in the 'analysis:analysis_workflow:runEdiacara' step from biopython reported here Edinburgh-Genome-Foundry/DnaFeaturesViewer#84.

Then I built the docker file with :
docker build . --network host -f Sequeduct/containers/Dockerfile --tag sequeduct_local

And to run the demo :

git clone https://github.com/Edinburgh-Genome-Foundry/Sequeduct_demo/
cd Sequeduct_demo/
sudo nextflow run edinburgh-genome-foundry/Sequeduct -r v0.3.1 -entry analysis --fastq_dir='fastq_pass' --reference_dir='genbank' --sample_sheet='sample_sheet.csv' --projectname='EGF demo' -with-docker sequeduct_local

I didn't see anything suspicious, but I might miss something. Do you have any idea ?
Ediacara_report.pdf

Best

@veghp
Copy link
Member

veghp commented May 31, 2024

Thanks for trying out the pipeline and for the provided details. Yes it does seem that the geneblocks plot is not generated and terminates the pdf (specifically the html generation which is then translated into pdf). I'll try and build a local image in the coming days, until then a few comments: perhaps biopython==1.81 would be better as that's the version the linked issues address.

Alternatively, you can try and use the versions in our currently working image. Ediacara also has dependences which have been updated in the meantime. The original versions for these:

matplotlib==3.7.2
numpy==1.25.2
pandas==2.0.3
weighted_levenshtein==0.2.2
biopython==1.78
dna_features_viewer==3.1.2
geneblocks==1.2.3
cyvcf2==0.30.22
pdf_reports==0.3.5
portion==2.4.1

Geneblocks dependencies:
networkx==3.1
python-Levenshtein==0.21.1

PDF Reports dependencies:
pypugjs==5.9.12
jinja2==3.1.2
weasyprint==59.0
beautifulsoup4==4.12.2
Markdown==3.4.4

So maybe these could be installed before geneblocks. (pip install --upgrade)

The python base image has also been updated and that could be an issue, especially for weasyprint and PDF Reports.
This will be replaced with an base ubuntu image, as our packages were developed on it.

Strangely, blast should install as it's still one of the packages: https://packages.debian.org/bookworm/ncbi-blast+

@veghp
Copy link
Member

veghp commented May 31, 2024

You can check versions in your current image with
docker run -it --entrypoint=/bin/bash sequeduct_local

then run python or pip list

@madeleinevlt
Copy link
Author

madeleinevlt commented Jun 3, 2024

Hi,
So I updated my dockerfile with your recommandation and it seems to work.

Before :

# GeneBlocks:
RUN pip install --no-cache-dir biopython==1.78
RUN apt-get install -y ncbi-blast+
RUN pip install --no-cache-dir geneblocks==1.2.3

Now :

# GeneBlocks:
RUN pip install --no-cache-dir biopython==1.81
RUN pip install --no-cache-dir ediacara==0.2.2 matplotlib==3.7.2 numpy==1.25.2 pandas==2.0.3 weighted_levenshtein==0.2.2 biopython==1.78 dna_features_viewer==3.1.2 geneblocks==1.2.3 cyvcf2==0.30.22 pdf_reports==0.3.5 portion==2.4.1
RUN pip install --no-cache-dir networkx==3.1 python-Levenshtein==0.21.1 pypugjs==5.9.12 jinja2==3.1.2 weasyprint==59.0 beautifulsoup4==4.12.2 Markdown==3.4.4
RUN apt-get update
RUN apt-get install -y ncbi-blast+
RUN pip install --no-cache-dir geneblocks==1.2.3

I also added ediacara step in those lines.
I found that if I moove around the RUN apt-get update I can make RUN apt-get install -y ncbi-blast+ work.

Maybe the command is a bit long but at least it's ensure the good version.

@veghp
Copy link
Member

veghp commented Jun 5, 2024

Thanks for the update! Note that biopython (and geneblocks) is installed twice, and I believe you actually use version 1.78. So the first line can be removed. I'd also start with the "networkx.." line so that dependencies are updated first (and also not twice), then the EGF packages. I'll implement this in the Dockerfile (and then close the issue).

@madeleinevlt
Copy link
Author

Oh you're right, then you can remove biopython==1.81.
Thank you so much for your time and your help !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants