alignment problem with reference genome #661

Silvia-Giorgia-Signorini · 2024-03-12T10:35:16Z

Hello Felix,
I’m a PhD student and I’m currently working on epigenetics data (DNA methylation) with limpets. I work in parallel with two different species, Patella caerulea and Patella ulyssiponensis, but I have some problems with the alignment of P. ulyssiponensis sequences to its reference genome. I know that it is a problem of the reference genome because I’ve aligned the sequence of P. ulyssiponensis to the other reference genome that I have (P. caerulea) and it worked well, even if the map efficiency was low (10.8%). The genome preparation was the same for the two species. And I also know that the script for the alignment is ok because it worked well with P. caerulea.

Final Alignment report

Sequence pairs analysed in total: 13359954
Number of paired-end alignments with a unique best hit: 5061646
Mapping efficiency: 37.9%
Sequence pairs with no alignments under any condition: 6812837
Sequence pairs did not map uniquely: 1485471
Sequence pairs which were discarded because genomic sequence could not be extracted: 5061646

Number of sequence pairs with unique best (first) alignment came from the bowtie output:
CT/GA/CT: 0 ((converted) top strand)
GA/CT/CT: 0 (complementary to (converted) top strand)
GA/CT/GA: 0 (complementary to (converted) bottom strand)
CT/GA/GA: 2531542 ((converted) bottom strand)

Number of alignments to (merely theoretical) complementary strands being rejected in total: 0

Final Cytosine Methylation Report

Total number of C's analysed: 0

Total methylated C's in CpG context: 0
Total methylated C's in CHG context: 0
Total methylated C's in CHH context: 0
Total methylated C's in Unknown context: 0

Total unmethylated C's in CpG context: 0
Total unmethylated C's in CHG context: 0
Total unmethylated C's in CHH context: 0
Total unmethylated C's in Unknown context: 0

Can't determine percentage of methylated Cs in CpG context if value was 0
Can't determine percentage of methylated Cs in CHG context if value was 0
Can't determine percentage of methylated Cs in CHH context if value was 0
Can't determine percentage of methylated Cs in unknown context (CN or CHN) if value was 0

Bismark completed in 0d 3h 25m 51s

Here you can find other details:

Reference genome: downloaded from NCBI (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_963678685.1/)
Genome preparation script: bismark_genome_preparation --bowtie2 --verbose /cluster/scratch/ssignorini/testuly

Do you have any suggestion?
I really thank you in advance for your availability.
Silvia Signorini

FelixKrueger · 2024-03-12T10:56:25Z

Hi Silvia, something here definitely looks fishy (no pun intended). Could you provide me a with a few reads of your read library (ideally raw, untrimmed reads, and let me know which kind of library prep had been performed). Maybe 200K sequences (gzipped), as an email attachement?

I will in the meantime try to get the reference sequence and have it prepared. Cheers, Felix

Silvia-Giorgia-Signorini · 2024-03-12T14:06:04Z

Thanks a lot for your fast reply!
I have prepared individual libraries through the NEBNext Enzymatic Methyl-seq Kit (EM-seq™) and sequenced on an Illumina Novaseq 6000 sequencer and attached you can find a subsample of an individual (R1 and R2, raw data). Do you need more information?
Thanks a lot,
Silvia
subsampleR1_100K.fq.gz
subsampleR2_100K.fq.gz

FelixKrueger · 2024-03-12T15:14:24Z

I took your files, trimmed 10bp from both 5' ends, and ran alignments: (directional):

Sequence pairs analysed in total:       99872
Number of paired-end alignments with a unique best hit: 36615
Mapping efficiency:     36.7%

Sequence pairs with no alignments under any condition:  47409
Sequence pairs did not map uniquely:    15848
Sequence pairs which were discarded because genomic sequence could not be extracted:    0

Number of sequence pairs with unique best (first) alignment came from the bowtie output:
CT/GA/CT:       18493   ((converted) top strand)
GA/CT/CT:       0       (complementary to (converted) top strand)
GA/CT/GA:       0       (complementary to (converted) bottom strand)
CT/GA/GA:       18122   ((converted) bottom strand)

Number of alignments to (merely theoretical) complementary strands being rejected in total:     0

Final Cytosine Methylation Report
=================================
Total number of C's analysed:   1769028

Total methylated C's in CpG context:    74903
Total methylated C's in CHG context:    1838
Total methylated C's in CHH context:    7128
Total methylated C's in Unknown context:        17

Total unmethylated C's in CpG context:  192829
Total unmethylated C's in CHG context:  296885
Total unmethylated C's in CHH context:  1195445
Total unmethylated C's in Unknown context:      430

C methylated in CpG context:    28.0%
C methylated in CHG context:    0.6%
C methylated in CHH context:    0.6%
C methylated in unknown context (CN or CHN):    3.8%


Bismark completed in 0d 0h 1m 47s

So all works well, there is a 50/50 split between top and bottom strand. I am not exactly sure what went wrong on your side, but something did... Do you have enough resources available at all?

Silvia-Giorgia-Signorini · 2024-03-13T19:48:18Z

I apologize for my late reply but I was trying to figure out what was my problem. I think I had an issue with the original fasta file of the reference genome, now I fixed it thanks to your indications.
I really thank you for your time!
Best wishes
Silvia

FelixKrueger · 2024-03-13T20:21:33Z

That's good to hear! All the best

FelixKrueger closed this as completed Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alignment problem with reference genome #661

alignment problem with reference genome #661

Silvia-Giorgia-Signorini commented Mar 12, 2024

FelixKrueger commented Mar 12, 2024

Silvia-Giorgia-Signorini commented Mar 12, 2024

FelixKrueger commented Mar 12, 2024 •

edited

Loading

Silvia-Giorgia-Signorini commented Mar 13, 2024

FelixKrueger commented Mar 13, 2024

alignment problem with reference genome #661

alignment problem with reference genome #661

Comments

Silvia-Giorgia-Signorini commented Mar 12, 2024

Final Alignment report

Final Cytosine Methylation Report

FelixKrueger commented Mar 12, 2024

Silvia-Giorgia-Signorini commented Mar 12, 2024

FelixKrueger commented Mar 12, 2024 • edited Loading

Silvia-Giorgia-Signorini commented Mar 13, 2024

FelixKrueger commented Mar 13, 2024

FelixKrueger commented Mar 12, 2024 •

edited

Loading