Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Support xz and/or zstd in addition to gzip #274

Closed
corneliusroemer opened this issue Dec 23, 2021 · 3 comments
Closed

[Enhancement] Support xz and/or zstd in addition to gzip #274

corneliusroemer opened this issue Dec 23, 2021 · 3 comments

Comments

@corneliusroemer
Copy link
Contributor

For SARS-CoV-2 sequences gzip isn't really a good compression method. Everyone uses xz but zstd also works well.

It'd be great if seqkit supported one or both of these compression formats for input/output compression as well.

@shenwei356
Copy link
Owner

We use the xopen for I/O, which recognize gzip data according to the magic number,
the golang packages for zstd and xz might be easy to be integrated.

@corneliusroemer
Copy link
Contributor Author

That would be amazing! I can imagine this being useful for many people.

@shenwei356
Copy link
Owner

Sorry, it's late, but I've made it. 🎆

$ seqkit stats ../tests/hairpin.* -e
file                     format  type  num_seqs    sum_len  min_len  avg_len  max_len
../tests/hairpin.fa      FASTA   RNA     28,645  2,949,871       39      103    2,354
../tests/hairpin.fa.gz   FASTA   RNA     28,645  2,949,871       39      103    2,354
../tests/hairpin.fa.xz   FASTA   RNA     28,645  2,949,871       39      103    2,354
../tests/hairpin.fa.zst  FASTA   RNA     28,645  2,949,871       39      103    2,354

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants