Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bismark aligning comma-separated list of fastq files stops after first sample finished #637

Closed
chuddy-ibk opened this issue Nov 9, 2023 · 1 comment

Comments

@chuddy-ibk
Copy link

chuddy-ibk commented Nov 9, 2023

Dear colleagues,

I am re-running a WGBS pipeline to see how well it can be replicated with the partial code i have.

I am no stuck but I want the code to be more efficient and not wait for me until i always initiate to continue with the next sample(pair) after one sample was aligned. So I use following script:

_echo "Bismark aligning"
input_files_1=""
input_files_2=""
for file in fastq/trim/_R1_001_val_1.fq.gz; do
input_files_1+="${file},"
done
for file in fastq/trim/
_R2_001_val_2.fq.gz; do
input_files_2+="${file},"
done
input_files_1=${input_files_1%,} # Remove the trailing comma
input_files_2=${input_files_2%,}

bismark --genome ~/bioinformatics/ref_genomes/mouse_38/genome
-1 "${input_files_1}" -2 "${input_files_2}"
-o BAM/prededuplicate/ --temp_dir BAM/
--parallel 3 -q --score_min L,0,-0.2 --maxins 500_

the input_files_1 variable would then have following sample names saved (comma separated as requested in the bismark --help):
fastq/trim/Ctrl-1_R1_001_val_1.fq.gz,fastq/trim/Ctrl-2_R1_001_val_1.fq.gz,fastq/trim/F1-1_R1_001_val_1.fq.gz,fastq/trim/F1-2_R1_001_val_1.fq.gz

according to the first lines after starting the alignment everything seems to be fine as all fastq files were detected:
Input files to be analysed (in current folder '/home/chuddy/bioinformatics/lamarck-project'):
fastq/trim/Ctrl-1_R1_001_val_1.fq.gz
fastq/trim/Ctrl-1_R2_001_val_2.fq.gz
fastq/trim/Ctrl-2_R1_001_val_1.fq.gz
fastq/trim/Ctrl-2_R2_001_val_2.fq.gz
fastq/trim/F1-1_R1_001_val_1.fq.gz
fastq/trim/F1-1_R2_001_val_2.fq.gz
fastq/trim/F1-2_R1_001_val_1.fq.gz
fastq/trim/F1-2_R2_001_val_2.fq.gz
Library is assumed to be strand-specific (directional), alignments to strands complementary to the original top or bottom strands will be ignored (i.e. not performed!)

After 887 minutes of running time, i received a bam file, which looked okay, also according the detection of C in CpG context, etc.

What did I do wrong, since normally the alignment of the second sample should start immediately after the first finished?
Since 887 minutes is a long time, I wonder how i can speed things up?
I have difficulties estimating what my mobile workstation is capable of carrying out. I used parallel 3 to be on the save side, although I have 24 CPUs and approx 62 GB of memory. I am working with the mouse genome (mm10, from ensembl).

Bismark Version: v0.24.1
bowties2 version 2.5.1

If anything else is needed to help me, pls tell me so and i will happily deliver.

Best
Tom

@FelixKrueger
Copy link
Owner

Hi Tom,

Updating to v0.24.2 (https://github.com/FelixKrueger/Bismark/releases/tag/v0.24.2) should fix the issue that the run stops after the first set of files (there was an exit 0 in the wrong scope...

I suppose you might get away with --parallel 4 on that machine, but I would monitor closely whether some of the alignment threads run OOM. Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants