-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pore-C data #5
Comments
Hi @socialhang, I have a plan to support Pore-C data, but it is not a high priority at the moment. Best regards, |
Hi @socialhang, I have conducted an experiment on the test data of https://github.com/epi2me-labs/wf-pore-c. You can try the following steps with your own FASTA and FASTQ files: (1) Execute the wf-pore-c pipeline to align Pore-C data and create a mock paired-end BAM file: source ~/software/miniconda3/bin/activate nextflow
# modify --fastq, --ref, --cutter, and --chunk_size
nextflow run epi2me-labs/wf-pore-c \
--fastq /work/software/genome/wf-pore-c-265a1a4/test_data/porec_test.concatemers.fastq --chunk_size 100 \
--ref /work/software/genome/wf-pore-c-265a1a4/test_data/porec_test.fasta \
--cutter NlaIII \
--paired_end_minimum_distance 100 --paired_end_maximum_distance 200 --hi_c --mcool --paired_end
conda deactivate Then, you will obtain a paired-end BAM file named (2) Rename the Hi-C reads in python3 rename_reads_for_bam.py null.ns.bam | samtools view - -@ 14 -S -h -b -F 3340 -o HiC.bam
/path/to/HapHiC/utils/filter_bam.py HiC.bam 1 --threads 14 | samtools view -b -@ 14 -o HiC.filtered.bam Finally, you can use Here is the custom script: Please be aware that this method has not been fully tested. I welcome your feedback on any attempts you make. This will be of great help. Once this method is validated, I will integrate it into HapHiC. Best regards, |
Thanks a lot! I will try today! |
hello! @zengxiaofei I'm back! I'm so sorry, the wf-pore-c software could not run on my server. So could I use falign to make the bam for replacing wf-pore-c? |
Hi @socialhang, You could try it. It seems that Best, |
OK! |
Hello! @zengxiaofei I produce pairwise bam file using falign , but some errors happend.
I think its because of my bam format. Can you help me point out the errors in my bam file? Thanks a lot! |
Hi @socialhang, Could you please paste or upload Best, |
OK, waiting a minute |
Is the |
Since my bam file is 400Gb in size, |
Please use this script (modify_bam_for_falign.py.gz) to modify the bam file: python3 modify_bam_for_falign.py small.1.bam | samtools view -b -o new.bam |
OK! Thanks! I will try! |
some errors happen:
|
It seems that the position in the BAM file (the fourth column for pos or the eighth column for mpos ) exceeds the length of the contig. The error message shows that the |
Hi!@zengxiaofei I've found the problem and I've updated the script. |
Could you please provide a full view of the Hi-C contact heatmap that includes scaffold boundaries (called superscaffold in Juicebox, and surrounded by blue outlines)? |
could you give me an email address?I will send you the hic heat map privately. @zengxiaofei |
You could send email to xiaofei_zeng#whu.edu.cn (replace # with @). |
@socialhang or @zengxiaofei have you somehow managed to adjust the parameters to improve your contactmap and distinguish between the different haploids ? |
Hi @LikeLucasLab, The primary challenge faced when working with Pore-C data lies in achieving accurate mapping. We conducted some tests using falign, and adapted HapHiC to be compatible with this aligner through a Python script. Although Pore-C data showed performance comparable to NGS-based Hi-C data in scaffolding haplotype-collapsed assemblies, it exhibited much lower mapping accuracy in the context of distinguishing haplotypes, resulting in much stronger Hi-C signals being observed between haplotigs. We tried different MAPQ filtering criteria, but the results did not have significant improvements. There may be some adjustments in HapHiC for Pore-C data in the future updates. When using the current version of HapHiC for scaffolding haplotype-phased assemblies, please always add the |
Hi @zengxiaofei |
Hi @csxie-666, As a update to this issue, I currently recommend using minimap2 other than falign for Pore-C reads mapping. HapHiC has supported the BAM file output by minimap2 using a Python script.
According to the author's description, herro was exclusively trained on human data (lbcb-sci/herro#23). I'm not sure whether it is versatile for other species and ploidy levels. I'm trying out another haplotype-aware ONT read correction tool called DeChat on ONT ultra-long reads. This tool has been validated in simulated autopolyploids. However, I encounter a difficulty in running the program (refer to this issue: LuoGroup2023/DeChat#3). It is worth noting that Pore-C data may significantly differ from ONT whole-genome sequencing data in the context of read correction. This is because Pore-C reads consist of sequences originating from different loci, resulting in lower coverage between overlapping Pore-C reads compared to ONT whole-genome sequencing reads. Best, |
I have included the best practice for handling Pore-C reads in the updated README.md. |
Hello, @zengxiaofei
Can HapHiC input porec data? If not, what adjustments do I need to make?
Thank you in advance for your response
The text was updated successfully, but these errors were encountered: