-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Binary Value of each Mnemonic and its Wordlist Index Value #132
Comments
Thanks for these interesting feature suggestions. Overall I disagree with adding these features but remain open to being convinced otherwise.
Currently entering a wordlist and changing languages in the tool does use a direct index lookup. In the tests (see this test) the word 'abandon' (first in English list) is converted to 'abaco' (first in Italian list) when Italian is selected. So the tool does act as a direct conversion tool between languages in this way. I don't think adding the list of indexes in the interface is useful because the mnemonic is what is recorded, not the indexes. That's the point of mnemonics! So if a user were to record their indexes for safekeeping bip39 has basically failed at the intended purpose.
Is this the same as the 'BIP39 Seed' field just below the optional passphrase field? Representing the mnemonic phrase in any form other than the words seems overall detrimental since it encourages poor user behaviour and does not facilitate any additional use (at least not that I can see, feel free to point out any additional uses).
Please outline the ways it complements existing backups because currently I'm not convinced this is in the interest of users of this tool. Mnemonics are the secret and are the backup, not other numbers or encodings despite many such forms being possible. |
I think perhaps point 1 "Return word list index values for each word in mnemonic sentence" could be easily achieved by adding another 'language' to the list called 'numbers' which shows the index numbers instead of the word. This is a pretty elegant way to achieve that goal without too many negative side effects. |
Thanks for the reply and interesting feedback! Much appreciate the dialogue and willingness to explore how adding a few functions to the existing tool could further empower its potential range of uses and depending on the underlying evolving needs of its users.
Regarding Point 1 (Index values as numbers)Noted regarding this point, thanks for the education regarding the language differences. I guess the only question I have left here that could be worth exploring is whether foreign language apps could detect other languages more easily by letting users input the index number or the foreign matching word? (i.e. if an Italian crypto coin wallet allowed importing English words - such as “abandon” in your example, would it be easier for them to request the word or the matching index number “0”?) It certainly wouldn’t be easier for users unless they also had the index value from wordlist, and since the app/tool does use the index lookup feature already in the backend maybe bringing this to the front-end for users could make sense at some point in the future, not sure. (p.s. just saw your latest comment addressing how this could be added in a subtle manner, sounds that a potentially viable option worth considering!).
I totally agree that the point of mnemonics are indeed the convenience they provide with phrases that are easy to read/write/relay versus having to do that with the underlying initial seed (if the tool didn’t exist). I’d like to highlight that the context of this suggestion was to complement that same purpose served by BIP39 and not meant by any means as a replacement, even though individual users can decide how to use their data - I think it could empower them if other formats of the same data is available (not just for creative purposes - such as obscuring their mnemonic, but also for verifications or redundancy, perhaps). Regarding Point 2 (showing the filtered entropy for a mnemonic generated by the BIP39 tool)
I am not sure. I'd say no unless the BIP39 seed can also be used to derive the raw binary equivalent (and/or its subsequently filtered entropy)? Below is a good argument for why showing the filtered entropy for a Mnemonic Sentence (MS) generated by the BIP39 tool could be useful: For example, using the same entropy and MS from point 2 above, given the case where “Supply my own source of entropy” is selected in the current version of the BIP39 tool, and the user enters into the “Entropy” field the following 264-bit string (256-bit entropy plus 8-bit hashed checksum): 10101010101001010000100000111110101111011111011110110010101010010111111110111111011110000100010001000101010001001000101101001010101010100101010000001000100100110011001100111100001111001000101111000010100101011101010100101010101101010101010010101010000111100c1b002d The above string is duplicated correctly in the “filtered entropy” section as well as “entropy section” and the resulting BIP39 mnemonic is shown correctly: wink fashion differ love acid stool spy rich copy horn goose curious input act athlete rare quiz school crucial amateur trend valve basic army
I would argue by pointing out that the tool already displays other forms of the mnemonic (although in non-mnemonic form) such as the "filtered entropy" shown when a users supplies their own source of entropy, as these represent the mathematical corollary of the resulting mnemonic (and provided they are used correctly in terms of calculating checksum as the tool warns). Likewise, for the mnemonic generated by the BIP39 tool, showing the matching filtered entropy would just be the inverse of that function and could serve more advanced users although I agree that putting it front and center alongside the generated ms words wouldn’t be ideal which is why below is a proposed solution to where it could best fit.
Considering that the tool currently shows the “filtered entropy” value for a given generated mnemonic in cases when a user supplies their own source of entropy, I don’t think it would be unreasonable to also show the values for the filtered entropy in cases when the entropy/mnemonic was generated by the BIP39 tool as well (and not just for those importing their own entropy). Basically, you’d just be enabling the inverse of a function already enabled in the tool and thus making the feature more versatile for those who want it - in cases when they are not supplying their own source of entropy. (i.e. when relying on the pseudo-random number generator used by the BIP39 standalone tool to generate the initial entropy and the resulting filtered entropy that includes the last word as the checksum.) Point 2 ExampleFor example, if a user generated the following mnemonic as noted in point 1 using the tool: lunch anger issue giggle scout cloth once marriage busy save notice farm syrup rally garment tennis price rather unusual brother whisper issue orphan toe There is no way - at least that I am aware of - for them to obtain the filtered entropy from the tool even though you offer this option to users who supply their own entropy in the advanced feature section. Perhaps it could be conveniently housed there in the advanced section? I think this subtle change - if warranted - could be of value for those who do already rely on the P(RNG) used in the tool to generate the ms, in addition to those who supply their own entropy source and who already can access it. |
The word. Because a) users will / should only have the word not the index, and b) the specific word does matter. The addresses change if the language is changed because the derived seed depends on the words. Quoting BIP39 - From mnemonic to seed
So this also means the prior idea I had to show indexes/numbers as a 'language' instead of words won't work.
This tool (as far as I know) is the only tool that does this. More to the point, no other tool / wallet does conversion between languages so this is not even a consideration for them. This doesn't negate the point to show indexes, but it's something to consider.
Great point, I completely agree with you on this.
Right, the seed can't be reversed back into the mnemonic, so you're looking for some new information to be displayed. I understand now.
True. The tool is basically just a lot of different encodings and conversions of a single root piece of entropy.
You've provided a fairly nice suggestion - populate the entropy field with the value from the PRNG if 'generate' is clicked. Even though the user doesn't see it, they can choose to see it if they want to by revealing the hidden entropy section. Good idea! I think adding the word indexes as an extra field in the entropy section will also work nicely. Thanks for working through this and providing useful examples and justifications. It's people like yourself that make this tool useful. Proposed changes to implement for this issue:
|
Thanks for the positive feedback and glad to see the implementations moving along! I have a 3 questions that came up during some manual tests: I was trying to map the entropy+checksum pasted above in the point 2 example to the mnemonic and was having difficulty finding the direct link between the entropy: and the 24-word mnemonic sentence, in terms of how the 11-bit groups of bits (after 264/32 = 11 groups as per above snapshot) are used to encode each number corresponding to the index value (and its related word) as per BIP39 from the following used in our example: wink fashion differ love acid stool spy rich copy horn goose curious input act athlete rare quiz school crucial amateur trend valve basic army Question 1: Not sure if this is entirely the bitarray.js function that encodes the 11-bit strings as 32-bit words or if it is affected by other data. For example, do you have any handy method, such as pointing out which binary string from the 264-bit array above in point 2 corresponds to the word "wink" from the first word in the mnemonic sentence? Raw Entropy showing as hex value if entropy contains any hex values (should it show as "mixed" input instead?)I also noticed that in the BIP39 tool the software was interpreting the inputted entropy as hex values (under "entropy type") and therefore stretching the "raw binary" shown out by four times with extra padding showing each bit as a 4-bit string (i.e. 1 as 0001). (Should this instead be mixed entropy and only the hex values should be converted to binary instead of treating it all as hex? perhaps not relevant but thought to ask in case) *Question 2: More importantly, should the first 8 bits from the hashed value from the SHA2-hashed checksum "0c1b002d" not be in hex format? Becuase changing it to binary "00001100" (after using first 8bits of 00001100000110110000000000101101 converted from hex) makes the 264-bit string hex free: 101010101010010100001000001111101011110111110111101100101010100101111111101111110111100001000100010001010100010010001011010010101010101001010100000010001001001100110011001111000011110010001011110000101001010111010101001010101011010101010100101010100001111000001100 but results in a different mnemonic: custom close toddler rival item wisdom can opinion there music rail priority amazing blind usual giant joy smooth oxygen undo fatigue pond immense add **Question 3 *while both could be "correct" mnemonics in a different context, which do you think is correct here for the purpose of the point 2 example and having the proper checksum value from the original 256-bit starting point for each one? I know this is handled by the code in software but thought that mapping it manually could also help users (including myself) better understand it on the surface. I appreciate any light you can shed on this! |
I'm just going to gather some facts together before diving into this... The entropy in question is
Where does this entropy initially come from? Is it an example from a webpage? I'm just curious because I never could make this example entropy match with the example mnemonic. Can you please outline the steps to go from the the supplied entropy to the mnemonic (using any tool, not necessarily this one). As you stated, this is a mix of binary and hex which requires some disclaimer from me: The tool does not work with mixed entropy because of the reason given at the top comment of entropy.js: "Automatically uses lowest entropy to avoid issues such as interpretting 0101 as hexadecimal which would be 16 bits when really it's only 4 bits of binary entropy." I don't consider using mixed entropy as a reasonable interface since it will always involve some 'magical' degree of interpretation. To directly address the questions:
In the jsbip39 library being used by this tool, the toMnemonic method illustrates what's happening. The words come directly from the entropy. So the first word is simply the first 11 bits of the entropy. Those 11 bits are converted to a number between 0 and 2047, and used to look up the word at that index. It's a direct map from entropy to words. The only exception is the checksum bits which are appended to the raw entropy, so it affects the final word in the mnemonic. The first 11 bits of the example entropy are "10101010101" which is index 1365 which is line 1366 and is word 'primary' - so I don't know why the first word of the example mnemonic is 'wink'.
The checksum is added automatically to the entropy by the jsbip39 library. See jsbip39.js#L88. Entropy entered into this tool is pure entropy, not bip39-checksummed-entropy. Do not enter the checksum. If you want to do that manually for some other reason, use the same encoding for the checksum as the entropy (hex or binary or whatever). It's also worth pointing out that each hex character represents 4 bits, so the checksum "0c1b002d" is 32 bits, not 8. Another point is that the checksum is not always 8 bits. The amount of checksum to add to the original entropy depends on how much entropy there is in the first place. I recommend reading the section of BIP39 titled Generating the mnemonic for more info.
Neither. Please explain the original mapping used between entropy and mnemonic, because I could never get it to match. Until it's clearer about how to investigate the original it's hard to look much further into it. |
Okay, great feedback here and thanks again for your time and patience. Bottom-line, it looks like when I was including the checksum into the raw entropy field - that was causing the incorrect mnemonic starting with “wink” which couldn’t be further duplicated since the tool computes the checksum on its own as you stated. Resolved:I was eventually able to correctly duplicate the mnemonic that you mentioned should start with “primary” from that same initial entropy in two different ways (one correct, and one incorrect worth exploring perhaps), first using your suggestion of not including the checksum: (1st way) Initial 256-bit ENT1010101010100101000010000011111010111101111101111011001010101001011111111011111101111000010001000100010101000100100010110100101010101010010101000000100010010011001100110011110000111100100010111100001010010101110101010010101010110101010101001010101000011110 Resulting 24-word MS: primary choose autumn know kite feed year upper dust clay carpet next pipe affair error guide develop fun pistol prevent prize prevent position silly I believe this is the correct mnemonic and how the tool should be used, where “silly” is the correct checksum and if so, since it maps to the binary number 11001000101 or decimal 1605 in the wordlist from its first right-most 8-bits 01000101, I believe revealing the checksum value could also be useful (please see further below) because it would be otherwise unclear on the front-end how that happens. Proposed enhancements:Therefore, now that this has been resolved, if you agree with the above (?), I would suggest the following tiny additions be made that could help users:
Even though the checksum values will still be the values we were referencing earlier (which simply didn't need to be pasted into the raw entropy field) it would help bring full circle the functionality of this enhancement by revealing those values either as: 8bit hex CS of above ENT:
or Converted to 32-bits:
Or shown as some other value related to the last word? Overall, I think both of these pointers (especially revealing the checksum value) would help achieve fully what we set out to do with the last commits you made from this thread so far and for manual verification of the resulting outputs. Additional feedback from testing potentially worth exploring if relevant below: (2nd way) Initial ENT +32bit binary checksum (288-bits) Although this is incorrect since the checksum shouldn’t be entered, it turns out that if we use the previously mentioned 8-bit hex checksum converted to 32-bit binary as the checksum, then the resulting 288-bit string will correctly map to the mnemonic but with what appears to be an error/bug as the 27 words are returned instead of 24 and the last word (after omitting the last three) is “screen” instead of the correct checksum “silly.” primary choose autumn know kite feed year upper dust clay carpet next pipe affair error guide develop fun pistol prevent prize prevent position screen Obviously, since that is not in line with what BIP39 suggests as it is beyond the limit, as you also pointed out the checksum is proportional to the entropy size, where ENT/32= CS length in bits, and because the initial entropy in question is 256 bits, its checksum shouldn’t be longer than 8 bits, which is why I tried to only use the first 8 bits in earlier testing (00001100) but that produced a different mnemonic (not realizing that I didn’t have to enter the checksum value at all -since the tool calculates it). So for those mistakenly trying a 288-bit string as in the above example, below are some thoughts that came to mind: Takeaway optional suggestion:
This suggestion is less important but I thought could be of value for those who could otherwise run into errors if attempting to supply their own entropy where the string includes checksum as a binary value where the tool might not be able to distinguish the mistake by the user as their final checksum word will be wrong (as in the above example). |
Hi Ian, Nice work with the updated version Release v0.3.1, looks great: Not sure if you've had time to look at the additional suggestions in the last comment above. I look forward to your comments if you have time, thank you. |
Yes I have and agree they should be added. Will do so when I have time. Thanks for the added suggestions. |
Ian, my pleasure, thank you! Was also thinking if it would be feasible to be able to import/paste a mnemonic into the tool at some stage (into the BIP39 Mnemonic field) in the future in order to reveal its raw/filtered entropy? Comparable to how wallets allow an import for recovery, or would the mnemonic alone not be sufficient to calculate the original ENT (even if the xpub and xprv were accessible using the mnemonic?) Thanks again. |
This thread has been super helpful .. thanks to both Ian and hatgit! I second the request to add the checksum value when providing manual entropy. I would also take this a step further...because it’s good to see the steps involved :
EDIT: MISTAKE I MADE (SOLVED): Hash tool I was using was treating the binary as a string, and not a binary number.Site I used to generate CORRECT hash of a binary number: https://cryptii.com/hash-function |
v0.3.4 has some updates for this issue
I didn't add the full sha256(entropy) value to the ui, just the checksum. This is a cleaner interface and I think works well because the entropy binary is now grouped into 11 bits, and the last part of the entropy is very clearly not 11 bits. The checksum binary directly follows that and is of a length that 'fills in' the 'missing' entropy binary. Feedback welcome but I now consider this feature to be complete. |
Here is the official BIP39 list just FYI which is used in this software: (note this list is not zero-indexed, but in the software it "is" see below for explanation) https://github.com/bitcoin/bips/blob/master/bip-0039/english.txt Sounds like the list you are using is not zero-indexed (so all index values are likely offset by one place), because for the index value 1591 which is the 11-bit binary number "11000110111" corresponds correctly to the word "shoulder," whereas the prior index value for 1590 ( which is binary "11000110110") is for the word "short". Please check your list, the first value on your list should be for the Zero index value, binary "00000000000", for the word "Abandon" and the last word should be the 2047th index value for the 2048th word or the binary number "11111111111" for the word "zoo". In summary, neither list is wrong, one is zero-indexed and the other is not (hence that offset needs to be applied). In a perfect world the official BIP39 wordlist should have been zero-indexed because that it how it is coded in all the software, because the range of 2^11 numbers (i.e. all 2048 11-bit binary numbers) starts with the first number being 0 and the last being 2047 (i.e. the number 2048 is the first 12 bit number). |
|
What do you mean exactly for calculating entropy/binary? If you are trying to deconstruct words into binary, perhaps this list can help but you need to be mindful of so many things to avoid making mistakes so use at your own risk. For example, there is a checksum which is deterministic. (i.e. depending on the number of words, the last few bits are calculated from the preceding bits using a hash function). |
you are not using the tool correctly, I never said to input those numbers to check. If you want to deconstruct your words into binary, see my last reply. Also, if you already have the words, what are you trying to recover? If the words are not working for you, where did you generate them and how? (i.e. did you choose the entropy or was it generated automatically) I am just curious at this point and not offering to help you recover anything as that is not a service I provide. |
I understand and apologies accepted, as a novice you are deep down the rabbit hole here and not sure why, and even experts find themselves at times in that same position. However, this tool is not for beginners and it is imperative to know why and how to use it. If you are trying to login to your wallet using a recovery phrase and the phrase doesn't work, then either the wallet is broken or the words are wrong, or both. First step could be to identity whether the recovery phrase is correct (i.e. the Ledger wallet from Ledger.com has a function where you can verify whether a BIP39 recovery phrase is valid), then login to a trusted wallet (i.e. using the same ledger device). If you don't see any funds in the related addresses generated, you may need to change the "derivation path" and try again, otherwise identify how you created this recovery phrase, and the method you used to back-it-up. for example, did you write just the words down, or did you write down the index values and now you are trying to find the words? The best practice is to write the words down. Hope this helps and good luck. P.S. remember this tool is not a wallet, it is used for educational purposes, and in select special cases can help with recovery, and can be used offline for generating valid BIP39 mnemonics, but things can go wrong if you use the advanced features incorrectly. |
Regarding the recovery phrase you created using SafePal, I cannot comment as I never heard of them before and don't know whether their processes are cryptographically secure or whether their recovery phrases are BIP39 compliant. I think you should reach out to them and ask for a technical explanation on how funds could be lost if an upgrade failed even if someone had their recovery phrases properly backed up, as that sounds like their older software could have been faulty and they had to apply some conversion to correct it (I am not sure, just speculating), as their site warns: https://safepal.io/upgrade |
P.S. my last suggestion if all else fails (and if using the Ledger device outlined above which is safer doesn't help), use at your own risk, is an advanced one that is very technical and can cause you to lose your money if an attacker observed your valid recovery phrase:
However, only enter the first 128 bits (or 256 bits for a 24-word recovery phrase) from left to right, (assuming that you read from left to write and that your computer is not configured for Arabic/Hebrew or other right-to-left languages, otherwise you will have to to it in reverse order), and then see if the last word shown by the tool matches the once you have. If not, enter the last 4 bits, and change the "mnemonic length" value from 3 Words (Raw Entropy) to 12 words, and see if that produces the matching recovery phrase. In both cases you can check the derived addresses for the supported cryptocurrency and see if there is any balance further down for each derivation path, here is an example below, don't sent any money to this phrase it is just a random one to demonstrate tool: |
This tool is for advanced users only and doesn't allow you to withdraw any funds anyway so you may be in the wrong place, unless you have tried other wallets with the recovery phrase you got from SafePal and it isn't working for you so you are trying to troubleshoot why your recovery phrase or the wallet is faulty (as I mentioned before) and to check related derived addresses for your recovery phrases. While you should never share your recovery phrase with anyone, have you tried another self custodial (e,g, https://blockchain.com) wallet to acess your account? Again, this is not a place for customer support and Mr Coleman may delete our entire conversation unless it's relevant to the software improvement process. |
Good luck! |
A proposal to add two functions to further improve the usefulness of the standalone HTML tool https://github.com/iancoleman/bip39/blob/master/bip39-standalone.html
(not sure if this has already been considered, sharing just in case)
Point 1) Return word list index values for each word in mnemonic sentence (ms)
If the option could be added to return the number from the wordlist index - for each mnemonic word returned, it could be useful for users across other wordlist-languages and/or in the event that the wordlist order was changed at the application source.
For example, using this 24 ms:
lunch anger issue giggle scout cloth once marriage busy save notice farm syrup rally garment tennis price rather unusual brother whisper issue orphan toe
the associated index number for each word would follow (commas are just for ease of reading here and could be spaces):
1003,30,951,790,1745,412,1229,1147,252,1564,1192,710,1754,1476,800,1841,1290,1458,1918,241,2025,951,1238,1770
Point 2) Return the binary string that was encoded to each word and/or the bip39 seed in binary format.
This could be added as an optional checkbox that when ticked will also display the 11-bit binary number for each word including the hashed checksum (or show it in the existing section where users can supply their own entropy). For example, in the current version of the html tool, users can check the box to supply their own source of entropy, but if a user wanted to see the source of entropy generated by the tool itself, this is not currently available, yet could be a useful feature for security to backup/verify or cross-reference in other applications, etc..
For example, if the tool generated the following 24-word MS:
wink fashion differ love acid stool spy rich copy horn goose curious input act athlete rare quiz school crucial amateur trend valve basic army
The user could opt to see the associated 264-bit string including the concactenated checksum:
10101010101001010000100000111110101111011111011110110010101010010111111110111111011110000100010001000101010001001000101101001010101010100101010000001000100100110011001100111100001111001000101111000010100101011101010100101010101101010101010010101010000111100c1b002d
The index numbers for each word proposed in point 1 above, and their full entropy string proposed in point 2, could complement existing backup recovery sentences in a number of ways, in cases where users found using such data to be advantageous despite obvious trade-offs between security risks/ convenience.
The text was updated successfully, but these errors were encountered: