-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it normal for there to be interactions between haps? #34
Comments
Hi @zoumingr, Thank you so much for providing such detailed information. This could save us a lot of time. In short, it's abnormal.
I have observed that the mapping and filtering processes were performed in accordance with our instructions. Most multiple mapping reads should have been filtered out. Furthermore, the Hi-C signals between allelic contig pairs often exhibit a diagonal distribution, which should not be as strong as the contact map you provided.
This parameter does not alter the Hi-C signals on the heatmap. Instead, it's just a weight to control the confidence degree of the phasing result obtained from hifiasm. I suspect that the heterozygosity level of the species may be too low to resolve the haplotypes using hifiasm + Hi-C. Hifiasm might separate a chromosome into two parts in different hap*.gfa files. You may only be able to assemble 50 chromosomes. To verify my assumption, could you please tell me the expected genome size (2n) of your species? Additionally, please also provide the log files generated by hifiasm. I intend to check the kmer histograms within the logs to confirm the heterozygosity level and the estimated genome size. Best wishes, |
Hi, Xiaofei, However, this reminds me that when I used 3ddna to scaffold the concatenated genome, it could only generate 50 chromosomes (2n=100), but the total sequence length was the sum of the two haploid lengths. This might be because 3ddna merged the two homologous chromosomes together. If the similarity between the two parents is high, can we still obtain all 100 chromosomes? For example, not providing the phase information of hifiasm to haphic (--gfa "hap1.p_ctg.gfa,hap2.p_ctg.gfa"). Below is the log of the hifiasm assembly. [M::ha_analyze_count] lowest: count[7] = 951718 |
You have generated ~92.6 Gb of HiFi data. Based on the histograms, a main peak appears at ~29X depth, with no noticeable peak lower than this depth. The estimated genome size should be ~3.2 Gb, indicating that my initial assumption was incorrect. Both haplotypes have already been assembled. I‘m curious about the strong Hi-C signals observed between homologous chromosomes. Several aspects should be verified:
You may also consider:
|
Q:
![image](https://private-user-images.githubusercontent.com/49975785/343793506-4090cbb2-4df7-4d14-a41a-e8c02dde0a28.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA4NTUzNTYsIm5iZiI6MTcyMDg1NTA1NiwicGF0aCI6Ii80OTk3NTc4NS8zNDM3OTM1MDYtNDA5MGNiYjItNGRmNy00ZDE0LWE0MWEtZThjMDJkZGUwYTI4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzEzVDA3MTczNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWExOTEzYjRmNWZmZTMxYWZmMjg5MjZiY2I1YWRhMzZkNDY4MGY5NmE4NmI1NDI4OTIwMTFmZDc4OGQyYWJkMzMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.4XZ7sh_cZ0j9ySqYK4XIT2axm0Y1k1n6XrhkXm5ykIo)
Haphic is very convenient for processing Hi-C data.
We combined the data of two haps together and then processed it with haphic. The resulting Hi-C heatmap shows some strong interactions between the two haps (as shown in the figure). This situation occurs with both --phasing_weight 0 and 1. I would like to ask if this is normal?
The karyotype of the species: 2n=100
The commands and parameters used in running HapHiC: haphic pipeline genome.fas HiC.filtered.bam ${nchrs} --quick_view --threads 20 --processes 60 --gfa "hap1.p_ctg.gfa,hap2.p_ctg.gfa" --phasing_weight 0 --correct_nrounds 2
The log files generated by HapHiC:
2024-06-27 08:50:28 <HapHiC_pipeline.py> [main] Pipeline started, HapHiC version: 1.0.3 (update: 2024.05.20)
2024-06-27 08:50:28 <HapHiC_pipeline.py> [main] Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
2024-06-27 08:50:28 <HapHiC_pipeline.py> [haphic_cluster] Step1: Execute preprocessing and Markov clustering for contigs...
2024-06-27 08:50:28 <HapHiC_cluster.py> [run] Program started, HapHiC version: 1.0.3 (update: 2024.05.20)
2024-06-27 08:50:28 <HapHiC_cluster.py> [run] Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
2024-06-27 08:50:28 <HapHiC_cluster.py> [detect_format] The file for Hi-C read alignments is detected as being in BAM format
2024-06-27 08:50:28 <HapHiC_cluster.py> [run] Ultra-long data are not supported now when assembly correction is enabled
2024-06-27 08:50:28 <HapHiC_cluster.py> [parse_fasta] Parsing input FASTA file...
2024-06-27 08:51:14 <HapHiC_cluster.py> [parse_gfa] Parsing input gfa file(s)...
2024-06-27 08:51:41 <HapHiC_cluster.py> [parse_bam_for_correction] Parsing input BAM file for contig correction...
2024-06-27 09:21:02 <HapHiC_cluster.py> [correct_assembly] Performing assembly correction...
2024-06-27 09:22:31 <HapHiC_cluster.py> [correct_assembly] Correction round 1, breakpoints are detected in 6 contig(s)
2024-06-27 09:22:31 <HapHiC_cluster.py> [break_and_update_ctgs] Breaking contigs and updating data...
2024-06-27 09:22:35 <HapHiC_cluster.py> [correct_assembly] Correction round 2, breakpoints are detected in 0 contig(s)
2024-06-27 09:22:35 <HapHiC_cluster.py> [correct_assembly] Generating corrected assembly file...
2024-06-27 09:22:35 <HapHiC_cluster.py> [correct_assembly] 6 contigs were broken into 12 contigs. Writing corrected assembly to corrected_asm.fa...
2024-06-27 09:22:59 <HapHiC_cluster.py> [stat_fragments] Making some statistics of fragments (contigs / bins)
2024-06-27 09:22:59 <HapHiC_cluster.py> [stat_fragments] bin_size is set to 0, no fragments will be split
2024-06-27 09:23:01 <HapHiC_cluster.py> [parse_alignments_for_ctgs] Parsing input alignments...
2024-06-27 09:41:54 <HapHiC_cluster.py> [output_pickle] Writing HT_link_dict to HT_links.pkl...
2024-06-27 09:41:55 <HapHiC_cluster.py> [run] Program finished in 3086.3505272865295s
2024-06-27 09:41:55 <HapHiC_pipeline.py> [haphic_reassign] Step2: Reassign and rescue contigs...
2024-06-27 09:41:55 <HapHiC_reassign.py> [run] Program started, HapHiC version: 1.0.3 (update: 2024.05.20)
2024-06-27 09:41:55 <HapHiC_reassign.py> [run] Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
2024-06-27 09:41:55 <HapHiC_cluster.py> [parse_fasta] Parsing input FASTA file...
2024-06-27 09:42:18 <HapHiC_cluster.py> [parse_gfa] Parsing input gfa file(s)...
2024-06-27 09:42:18 <HapHiC_reassign.py> [run] Program finished in 23.16846799850464s
2024-06-27 09:42:18 <HapHiC_pipeline.py> [haphic_sort] Step3: Order and orient contigs within each group...
2024-06-27 09:42:18 <HapHiC_sort.py> [run] Program started, HapHiC version: 1.0.3 (update: 2024.05.20)
2024-06-27 09:42:18 <HapHiC_sort.py> [run] Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
2024-06-27 09:42:18 <HapHiC_sort.py> [run] Checking the path of ALLHiC...
2024-06-27 09:42:18 <HapHiC_sort.py> [run] ALLHiC has been found in /mnt/01_srcs/xxx/HapHiC/scripts
2024-06-27 09:42:18 <HapHiC_sort.py> [parse_fasta] Parsing fasta file...
2024-06-27 09:42:26 <HapHiC_sort.py> [run] Loading input pickle file...
2024-06-27 09:42:26 <HapHiC_sort.py> [run] Parsing group files and clm files...
2024-06-27 09:42:28 <HapHiC_sort.py> [run] Program will be executed in multiprocessing mode (processes=2)
2024-06-27 09:42:28 <HapHiC_sort.py> [fast_sort] [group1_1461626638bp] Performing fast sorting...
2024-06-27 09:42:28 <HapHiC_sort.py> [fast_sort] [group1_1461626638bp] Checking the content of input group file...
2024-06-27 09:42:28 <HapHiC_sort.py> [fast_sort] [group1_1461626638bp] Starting fast sorting iterations...
2024-06-27 09:42:28 <HapHiC_sort.py> [fast_sort] [group2_1792589528bp] Performing fast sorting...
2024-06-27 09:42:28 <HapHiC_sort.py> [fast_sort] [group2_1792589528bp] Checking the content of input group file...
2024-06-27 09:42:28 <HapHiC_sort.py> [fast_sort] [group2_1792589528bp] Starting fast sorting iterations...
<class 'networkx.utils.decorators.argmap'> compilation 34:3: FutureWarning:
shortest_path will return an iterator that yields
(node, path) pairs instead of a dictionary when source
and target are unspecified beginning in version 3.5
To keep the current behavior, use:
<class 'networkx.utils.decorators.argmap'> compilation 34:3: FutureWarning:
shortest_path will return an iterator that yields
(node, path) pairs instead of a dictionary when source
and target are unspecified beginning in version 3.5
To keep the current behavior, use:
2024-06-27 10:00:37 <HapHiC_sort.py> [run] Program finished in 1098.1858050823212s
2024-06-27 10:00:37 <HapHiC_pipeline.py> [haphic_build] Step4: Build final scaffolds (pseudomolecules)...
2024-06-27 10:00:37 <HapHiC_build.py> [run] Program started, HapHiC version: 1.0.3 (update: 2024.05.20)
2024-06-27 10:00:37 <HapHiC_build.py> [run] Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
2024-06-27 10:00:37 <HapHiC_cluster.py> [parse_fasta] Parsing input FASTA file...
2024-06-27 10:01:01 <HapHiC_build.py> [parse_tours] Parsing tour files...
2024-06-27 10:01:01 <HapHiC_build.py> [build_final_scaffolds] Building final scaffolds...
2024-06-27 10:01:43 <HapHiC_build.py> [run] Program finished in 66.46069979667664s
2024-06-27 10:01:43 <HapHiC_pipeline.py> [main] HapHiC pipeline finished in 4274.942216873169s
The methods (commands) used for Hi-C read mapping and filtering:
bwa mem -5SP -t 60 genome.fas ${read1} ${read2} | samblaster | samtools view - -@ 20 -S -h -b -F 3340 -o HiC.bam
filter_bam HiC.bam 1 --nm 3 --threads 20 | samtools view - -b -@ 20 -o HiC.filtered.bam
The method used for genome assembly (e.g., hifiasm + Hi-C) and the assembly utilized for scaffolding (e.g., p_ctg, hap.p_ctg or p_utg):*
hifiasm -t 60 --h1 s_1.fq.gz --h2 s_2.fq.gz ./ccs.fastq
Statistics for the assembly input into HapHiC (e.g., N10-N90 and L10-L90):
#hap1:
StatType ContigLength ContigNumber
N50 14292390 32
N60 12748816 42
N70 9949659 55
N80 6898132 73
N90 3149213 103
Longest 46849406 1
Total 1461626638 985
Length>=1kb 1461626638 985
Length>=2kb 1461626638 985
Length>=5kb 1461626638 985
#hap2:
StatType ContigLength ContigNumber
N50 16020022 37
N60 12619360 50
N70 9100361 66
N80 6754290 89
N90 3293374 127
Longest 43036230 1
Total 1792589528 727
Length>=1kb 1792589528 727
Length>=2kb 1792589528 727
Length>=5kb 1792589528 727
The text was updated successfully, but these errors were encountered: