Hacker News new | past | comments | ask | show | jobs | submit login
How to securely encrypt a file with an insecure password in Rust (kerkour.com)
54 points by sylvain_kerkour on Jan 19, 2022 | hide | past | favorite | 36 comments



This is inaccurate and borderline dangerous advice.

The output of a KDF does not contain more entropy than the input. You cannot "create" entropy with a KDF. Sure it _looks_ random, but no actual randomness was added during the process.

A KDF adds some security against brute force attacks by making them more expensive, but it does not add entropy, it does not increase the search space of a brute force attack.



I'd generally look at anything like this as a code smell. If you're looking for simple file encryption in Rust, and you'd consider doing something as bespoke as this, just use `rage` (and its `age` crate). As a bonus, you get interop with Go (the reference implementation of age is in Go).

https://github.com/str4d/rage

Having said this, I want to put a word in for a design change I think all of these tools should consider: don't accept user-provided passphrases by default. Instead, generate passphrases for the user, with a wordlist and entropy target.

Encrypting programs can still accept a (bad) passphrase with an option! But it shouldn't be the default behavior.


> Having said this, I want to put a word in for a design change I think all of these tools should consider: don't accept user-provided passphrases by default. Instead, generate passphrases for the user, with a wordlist and entropy target.

I like this idea, but proper passphrase generation is hard:

1. My personal estimation is there maybe 2^(11-12) words that nearly-all English speakers know and spell the same way; note that this is much smaller than the vocabularies of the lowest percentile, but two people with 5k word vocabularies will have an intersection smaller than 5k words, and there are words spelled differently in different dialects. Also rarely used words are harder to recall.

2. At least for me, I confuse similarly sounding words on recall

3. There are lots of non-English speakers

4. You probably don't want your passphrase generator spitting out double-entendres

For my own personal use, I took the 2k most common words in the English language, used metaphone to remove words that were likely to sound the same, and I don't care about #3 and #4. This left me with over 512 but less than 1024 words remaining. 9 bits of entropy per key, a 5 word passphrase is memorable for most people giving you 45 bits of entropy total. I'm not a cryptographer, but I suspect 45 bits is "good enough" with something non-fancy like PBKDF2-HMAC-SHA1 and an improvement over a prompt, but I haven't solved all of the problems on the list. I think #4 can be safely ignored for a free product, but that concern specifically caused us to swap out passphrases for an alphanumeric password for a tool used at work.


This is all true, but my take would be that you can't do worse than users will do with a password prompt. By all means, leave an option for fussy users to provide their own.


This i a good idea, it could use a standardized word list, like the one from Diceware[1] or EFF's Diceware wordlist [2]. Of course there's also an XKCD for that [3].

[1]: https://theworld.com/~reinhold/diceware.html [2]: https://www.eff.org/deeplinks/2016/07/new-wordlists-random-p... [3]: https://xkcd.com/936/


I guess some context is that this is intended to be educational and the author also has a book on the topic.


Not my crate, but I'll plug secrecy[1] as a better alternative to calling `zeroize()` manually on each variable containing secret material. That provides `Secret<S: Zeroize>`, which provides reliable zero-on-drop semantics.

[1]: https://docs.rs/secrecy/latest/secrecy/


Alternatively, you could use secrets[1] (full disclosure, I'm the author). The secrecy crate does very little to prevent you from accidentally creating copies of secret data. Additionally, my crate punts to libsodium's memory management[2] API so provides all the features it does: memory is prevented from being paged to disk, memory is protected by `mprotect` when not being accessed, memory is protected with guard pages before and after, and memory is preceded by a canary to protect against underflow. secrets' functionality and protections are essentially a superset of secrecy's.

Additionally, there's support for stack-allocated secrets which lack some of the extra protections that heap-allocated secrets have, but is often a much more appropriate approach for short-lived secrets.

A downside of this is that it relies on unsafe code in order to call into libsodium and convert back and forth between pointers. And of course it has a libsodium dependency.

1: https://docs.rs/secrets/latest/secrets/

2: https://doc.libsodium.org/memory_management


As an aside, the code is also wrong and will fail in some rare cases because it assumes that the `read` syscall will only return partial information if you are at the end of the file. This is false, however, and will lead to data loss if happen.

Full post with details here: https://ayende.com/blog/196289-C/dont-assume-the-result-of-r...


If the salt is stored with the file, how is this safe against a brute force attack against the low entropy password?


The purpose of a "salt" is just to randomize the hash; an attacker can precalculate a dictionary for a hash function H, but they can't plausibly precalculate 2^128 dictionaries for the family of hash functions H_nonce.

People get hung up on this, because the nonce looks like it could serve as a key; if you keep the key hidden, an attacker can't brute-force your hash at all. The obvious response to that is: if you can do keep your nonce secret like that, just get rid of the passwords, and key your system with actual keys. Or, store your passwords in the super-secure place you store the nonces.

These discussions quickly rabbithole into analyses of the varying levels of security between filesystems, HSMs, program memory, networked filesystems, the kernel, VM boundaries, and the difficulty for an attacker of assembling all these components at once. It's all pretty silly. But the answer to your question is simple: a salt (or nonce) isn't a key; that's not the purpose it serves in the design. If you really want to key your password hash, you don't need to muck with the salt to do that.


In this case, the security is in the memory and/or time hardness of the KDF: a motivated attacker could use the salt with their dictionary, but would have to be willing to wait (on average) a decent, perhaps deterring, amount of time (or similar for memory).

Edit: The defaults for the argon2 crate are here[1]. They seem to prioritize time cost over memory cost. Some random searching online suggests that a time cost of 3 corresponds to roughly ~2 seconds on modern hardware, so running a 100k dictionary with a time cost of 3 would require ~27.7 hours for the amortized find (50k) or ~55.5 for the worst case find. So, this isn't a very good scheme for a motivated (or parallel) attacker and an exceptionally weak password.


I think the short answer is "don't be in a dictionary". Using a unique password is critically important.

Let's imagine that you increased the time by 10x. That's 277 hours for a password. That's not very long at all - 12 days. Even if you increased by 100x, 120 days is not crazy, and presumably attackers can go way faster than your assumption.

A KDF isn't going to be enough to save you if you're using a top 100k password and the attacker can bruteforce offline.


"Don't be in a dictionary" is trivially easy to solve with generated passphrases: just pick a bunch of random words and string them together. You can generate an arbitrary amount of "entropy" this way.

Of course, users won't do this for themselves, which is why tools that do passphrase encryption should generate passwords by default, and accept user-provided passwords only as a non-default option.

Passphrases still have value, even when they're long strings of words: they're easy to write down, easy to repeat aloud, and easier than a random string to "recognize" visually.


It should probably be regarded as "relatively secure" encryption. You can't have truly secure encryption if your passphrase is garbage.


My understanding is that the key derivation function chosen, argon2, is much more expensive to compute than something like the SHA family of hashes. This is a desirable property in a KDF precisely because it makes brute forcing much more difficult.

Further, argon2id incorporates strategies to make GPU parallelization less effective.

Obviously this won't protect you against something like a dictionary attack, there's nothing that can magically protect you if you choose a low-entropy password, just something that can make the process more difficult.


It does seem like talking about dictionary attacks in the article would be helpful. Without some reference as to how many argon2 hashes per minute is reasonably possible with the shown settings, we're flying a little blind.


I performed a rough estimate in my comment up the thread, using ~2s per Argon2id with a time cost of 3. TL;DR is that you probably wouldn't want to have an extremely common password with this scheme.


There's no KDF in the world that can protect you if your password is in a top-10,000 list or exposed elsewhere alongside your username.


Right, but that's sort of what I was getting at. The article doesn't talk much about the password other than it's "insecure". It's probably worth mentioning that a dictionary attack at some multiple of ~2/per-second/per-core is possible. So it's not just top-10,000 list, but maybe "top million" or more that's a bad idea.


Yes, I think that's what the GP was trying to say. The post doesn't qualify "insecure" meaning "not best practices" vs. "insecure" meaning "your password is an extremely common one."


> there's nothing that can magically protect you if you choose a low-entropy password

Ignoring the extra time for decryption there's no difference between a unique low-entropy password that takes 2 years to bruteforce and a high-entropy password that takes 2 years to bruteforce.


Yeah, if you a very common password like "hunter2" or "Password1", then even with a KDF that takes 100ms to generate the key, it's still very feasible to run through the 100k most common passwords and compromise it within a few hours.

If we're talking a more random but still short password (for example, just 8 random alphanumeric characters is log2(262+10)8 = ~48 bits), then the KDF becomes very attractive to help skyrocket the brute forcing cost to something more similar to trying to brute force the 256 bit key instead.


Good question!

I will add a conclusion with an explanation of the security of the system.

TL;DR: You can play with Argon2's parameters to use the resources that fit your requirements.


How does a KDF add entropy to a passphrase?

I would expect the entropy to remain the same.


It doesn't, this article is inaccurate


It adds a salt. Otherwise entropy shouldn't really be a factor - your input password could be more entropic than your output hash, in terms of shannon entropy/ number of bits needed to represent the values.


I am referring to the diagram saying low entropy passphase -> high entropy key

Since the salt is included in the output of the KDF, is it really adding entropy to the key?


Yeah, I found the diagram a bit confusing as well. I don't think that a KDF is adding any entropy.


The salt adds no entropy.


Tangential thought: if you're using a passphrase you're not going to ever type manually, for example something you're going to generate once and stick in a secret management system, why not build the passphrase using all possible UTF-8 characters as your corpus? Seems like restricting yourself to ASCII characters is just giving an advantage to those attempting to brute force the passphrase.


> why not build the passphrase using all possible UTF-8 characters as your corpus? Seems like restricting yourself to ASCII characters is just giving an advantage to those attempting to brute force the passphrase.

Restricting yourself to ascii means you don't need to worry about text encoding. Who knows when you end up needing to paste it, or when something decides to be helpful and messes up the encodings.


This doesn't make much sense to me. The point of a passphrase is to be readable/writeable by a human. If you don't need that, you just want a binary key (which can be base64 encoded/decoded to be read/written by a human).

Using all utf-8 characters seems like it combines the downsides of both of these (not really human readable/writeable but also not using the full key space).


I'd be happy with just 32 bytes of random alphanumeric ascii, it doesn't really need improvement. If that gives a too big advantage, then use more.


Because all possible UTF-8 characters give you passphrases that are hard to write down, hard to transcribe from paper onto keyboard or vice/versa, hard to repeat aloud, and hard to recognize visually.

The tradeoff for using a smaller character set is longer passphrases (for shorter character sets) versus "less humane" passphrases, for a given level of target entropy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: