-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNA sequence hashing #144
Comments
Yes, it's easy to implement. Just call a fast hash function on any sequence (string), but it's irreversible. |
Implemented in For DNA/RNA transform, use |
I use this to get [id][hash]
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Feature request:
Given a DNA sequence, convert into a number by hashing in base 2 as shown below:
mapping = {'A': '00', 'T' = '01', 'G': '10', 'C' = '11'}
produce a value of zero if there are non-ACGT values.
For RNA, we could have T=U in case there are Us.
Example:
Maybe the output should be tabular as the output of
seqkit fx2tab
?Maybe this is already somehow implemented internally in the deduplication code, not sure. It's useful (for me) when wanting to give a short(ish) numerical value that would be unique to each unique DNA sequence. Thx
The text was updated successfully, but these errors were encountered: