You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Metawrap is a good software for metagenomic analysis, while I found a bug in the script 'fix_megahit_contig_naming.py' which was used in the 'assembly' module. I found that this script reads and modifies the contig ID, and determine the length of contig at the first loop, then, in the next loop, reads the sequence and store the sequence with it's ID in the dictionary. This means that the last contig would not be stored, therefore you wrote 'dic[name]=tmp_contig' to store the last contig. However, you forgot to added a length judgment process at this line which will result in the last contig being output to the final file regardless of whether its length is long enough.
For example, the length of the last contig in the assembly result produced by megahit is 599 bp and the threshold is 1000 bp, but this contig still be wrote to the final output file.
The text was updated successfully, but these errors were encountered:
By the way, I found the other bug in the 'assembly.sh'. In the first screenshot, I can see that you set a situation that metawrap assembly only using metaspades (elif [ "$metaspades_assemble" = true ]; then). However, you set the default variable $megahit_assemble to 'True'... It means that their only two situation: 1) only megahit was used: '--metaspades' was not specified, '--megahit' was pecified or not specified; 2) both megahit and metaspades were used: '--metaspades' was specified, '--megahit' was pecified or not specified. I think the default value of variable $megahit_assemble should be 'False' rather than 'True', then the third situation will exists: 3) only metaspades was used when the '--metaspades' was specified and the '--megahit' was not specified.
Metawrap is a good software for metagenomic analysis, while I found a bug in the script 'fix_megahit_contig_naming.py' which was used in the 'assembly' module. I found that this script reads and modifies the contig ID, and determine the length of contig at the first loop, then, in the next loop, reads the sequence and store the sequence with it's ID in the dictionary. This means that the last contig would not be stored, therefore you wrote 'dic[name]=tmp_contig' to store the last contig. However, you forgot to added a length judgment process at this line which will result in the last contig being output to the final file regardless of whether its length is long enough.
For example, the length of the last contig in the assembly result produced by megahit is 599 bp and the threshold is 1000 bp, but this contig still be wrote to the final output file.
The text was updated successfully, but these errors were encountered: