bismark aligning comma-separated list of fastq files stops after first sample finished #637

chuddy-ibk · 2023-11-09T09:05:22Z

Dear colleagues,

I am re-running a WGBS pipeline to see how well it can be replicated with the partial code i have.

I am no stuck but I want the code to be more efficient and not wait for me until i always initiate to continue with the next sample(pair) after one sample was aligned. So I use following script:

_echo "Bismark aligning"
input_files_1=""
input_files_2=""
for file in fastq/trim/_R1_001_val_1.fq.gz; do
input_files_1+="${file},"
done
for file in fastq/trim/_R2_001_val_2.fq.gz; do
input_files_2+="${file},"
done
input_files_1=${input_files_1%,} # Remove the trailing comma
input_files_2=${input_files_2%,}

bismark --genome ~/bioinformatics/ref_genomes/mouse_38/genome
-1 "${input_files_1}" -2 "${input_files_2}"
-o BAM/prededuplicate/ --temp_dir BAM/
--parallel 3 -q --score_min L,0,-0.2 --maxins 500_

the input_files_1 variable would then have following sample names saved (comma separated as requested in the bismark --help):
fastq/trim/Ctrl-1_R1_001_val_1.fq.gz,fastq/trim/Ctrl-2_R1_001_val_1.fq.gz,fastq/trim/F1-1_R1_001_val_1.fq.gz,fastq/trim/F1-2_R1_001_val_1.fq.gz

according to the first lines after starting the alignment everything seems to be fine as all fastq files were detected:
Input files to be analysed (in current folder '/home/chuddy/bioinformatics/lamarck-project'):
fastq/trim/Ctrl-1_R1_001_val_1.fq.gz
fastq/trim/Ctrl-1_R2_001_val_2.fq.gz
fastq/trim/Ctrl-2_R1_001_val_1.fq.gz
fastq/trim/Ctrl-2_R2_001_val_2.fq.gz
fastq/trim/F1-1_R1_001_val_1.fq.gz
fastq/trim/F1-1_R2_001_val_2.fq.gz
fastq/trim/F1-2_R1_001_val_1.fq.gz
fastq/trim/F1-2_R2_001_val_2.fq.gz
Library is assumed to be strand-specific (directional), alignments to strands complementary to the original top or bottom strands will be ignored (i.e. not performed!)

After 887 minutes of running time, i received a bam file, which looked okay, also according the detection of C in CpG context, etc.

What did I do wrong, since normally the alignment of the second sample should start immediately after the first finished?
Since 887 minutes is a long time, I wonder how i can speed things up?
I have difficulties estimating what my mobile workstation is capable of carrying out. I used parallel 3 to be on the save side, although I have 24 CPUs and approx 62 GB of memory. I am working with the mouse genome (mm10, from ensembl).

Bismark Version: v0.24.1
bowties2 version 2.5.1

If anything else is needed to help me, pls tell me so and i will happily deliver.

Best
Tom

FelixKrueger · 2023-11-09T10:17:00Z

Hi Tom,

Updating to v0.24.2 (https://github.com/FelixKrueger/Bismark/releases/tag/v0.24.2) should fix the issue that the run stops after the first set of files (there was an exit 0 in the wrong scope...

I suppose you might get away with --parallel 4 on that machine, but I would monitor closely whether some of the alignment threads run OOM. Good luck!

FelixKrueger closed this as completed Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bismark aligning comma-separated list of fastq files stops after first sample finished #637

bismark aligning comma-separated list of fastq files stops after first sample finished #637

chuddy-ibk commented Nov 9, 2023 •

edited

Loading

FelixKrueger commented Nov 9, 2023

bismark aligning comma-separated list of fastq files stops after first sample finished #637

bismark aligning comma-separated list of fastq files stops after first sample finished #637

Comments

chuddy-ibk commented Nov 9, 2023 • edited Loading

FelixKrueger commented Nov 9, 2023

chuddy-ibk commented Nov 9, 2023 •

edited

Loading