bonito basecall model refinement preprocessing memory issues #361

CodingKaiser · 2023-08-28T08:47:46Z

Hello,

I am currently doing a basecalling model refinement and am running into some issues in the pre-processing basecalling step surrounding memory management.

Specifically, I have a folder of pod5 files totalling ~24GB I am passing to bonito basecaller in the following way:
bonito basecaller [email protected] --save-ctc --min-accuracy-save-ctc 0.9 -v --alignment-threads 10 --device 'cuda' --reference ~/Documents/genomes/T7_V01146.1.fasta ./T7/pod5s/ > ./T7/bonito_mapped_hac_ctc/basecalls_ctc.bam

However, after the initial basecalling, the process is killed due to a maxing out of available RAM on my machine.

> reading pod5
> outputting aligned bam
> loading model [email protected]
> model basecaller params: {'batchsize': 512, 'chunksize': 10000, 'overlap': 500, 'quantize': None}
> loading reference
> calling: 1290710 reads [59:34, 361.04 reads/s]Killed

For now, I am attempting to subset the initial data, but this is obviously not ideal as I am discarding potentially useful signal from being used in the training step. It appears to me that bonito train only accepts a single --directory, so breaking up the basecalling by pod5 or similar would also not work. Is there an alternate approach?

Thanks in advance for your input.

All the best,
Falko Noé

The text was updated successfully, but these errors were encountered:

andrewgalbraith21 · 2024-01-24T23:36:34Z

Hello Falko,

You could run bonito basecalling on separate directories and then merge the .npy files after and then train. There's already a post with the code to merge .npy files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bonito basecall model refinement preprocessing memory issues #361

bonito basecall model refinement preprocessing memory issues #361

CodingKaiser commented Aug 28, 2023

andrewgalbraith21 commented Jan 24, 2024

bonito basecall model refinement preprocessing memory issues #361

bonito basecall model refinement preprocessing memory issues #361

Comments

CodingKaiser commented Aug 28, 2023

andrewgalbraith21 commented Jan 24, 2024