Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple offline hash strength from online #4814

Merged
merged 16 commits into from
Nov 11, 2023

Conversation

rnewson
Copy link
Member

@rnewson rnewson commented Oct 19, 2023

Overview

Allow arbitrarily strong on-disk password hashes without impacting database request performance.

This PR introduces PBKDF2 with SHA-256 (our existing version only uses SHA-1 which is now deprecated).

This PR also introduces an in-memory cache of password hashes with a separate, and deliberately low, iteration
count. The first request for a given user is forced to perform a slow password hash check (as is any attacker that
might have access to the on-disk password hashes). Subsequent hashes are faster. Entries in the password hash
cache are time-limited, unused entries are automatically deleted, and there is a capacity bound.

Testing recommendations

covered by automated tests.

Related Issues or Pull Requests

N/A

Checklist

  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • Documentation changes were made in the src/docs folder
  • Documentation changes were backported (separated PR) to affected branches

Copy link
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! It's about time we moved on from SHA-1.

I had a few general first impressions:

  • I wonder if we could de-couple the cache mechanism from adding a sha256 pbkdf2 variant, so it would be two separate PRs?

On the cache component:

  • Some users or services may already generate random passwords and salt values, say for API keys. Perhaps we can avoid forcing them to go through the extra ETS table lookup and an extra gen_server cast for every request.

On the built-in SHA256 PBKDF2 bit:

@rnewson rnewson force-pushed the decouple_offline_hash_strength_from_online branch from 0bba0da to 40f4c39 Compare October 20, 2023 16:46
@rnewson
Copy link
Member Author

rnewson commented Oct 20, 2023

I'd considered https://github.com/esl/fast_pbkdf2/ before I noticed they'd made it part of crypto.erl. It just hadn't occurred to me that it would be worse. I will switch to fast_pbkdf2 instead, which does the NIF yieldy goop we need.

@rnewson
Copy link
Member Author

rnewson commented Oct 20, 2023

I deliberately separated into multiple commits but I feel adding pbkdf2-sha256 with high iterations without the cache would lead to problems for folks.

the password cache could be optional, I could add a toggle so admins that want to can disable it. given that the work effort for a cached item is quite small (though not as small as the single-pass sha1) I didn't think it was necessary, but it's also easy to add.

@nickva
Copy link
Contributor

nickva commented Oct 20, 2023

I'd considered https://github.com/esl/fast_pbkdf2/ before I noticed they'd made it part of crypto.erl. It just hadn't occurred to me that it would be worse. I will switch to fast_pbkdf2 instead, which does the NIF yieldy goop we need.

Or even upgrading our pbkdf2 could work too but yeah that one does seem to do better.

Here is an escript which shows the abnormal behavior:

#!/usr/bin/env escript
%% -*- erlang -*-

-mode(compile).

hash() ->
    crypto:pbkdf2_hmac(sha256, <<"password">>, <<"salt">>, 1500000, 32),
    %timer:sleep(1),
    %erlang:yield(),
    hash().

spawn_hasher() ->
    spawn_link(fun() ->
        timer:sleep(round(rand:uniform() * 10)),
        hash()
    end).

main([TimeSecStr, HashersStr]) ->
    TimeSec = list_to_integer(TimeSecStr),
    Hashers = list_to_integer(HashersStr),
    HasherPids = [spawn_hasher() || _ <- lists:seq(1, Hashers)],
    io:format(" * ~p hashers started, measuring scheduler usage for ~p seconds~n", [Hashers, TimeSec]),
    T0 = erlang:monotonic_time(millisecond),
    msacc:start(TimeSec * 1000),
    T1 = erlang:monotonic_time(millisecond),
    io:format(" * killing ~p hashers~n", [Hashers]),
    [begin unlink(Pid), exit(Pid, kill) end || Pid <- HasherPids],
    io:format(" * displaying msacc stats~n", []),
    msacc:print(msacc:stats(), #{system => true}),
    io:format(" * time it took to run: ~p~n", [(T1-T0)/1000]),
    io:format(" * done~n").

It should be done in about 10 second but it just block and until I stopped it by hand:

% time ./pbkdfprocs.escript 10 4
 * 4 hashers started, measuring scheduler usage for 10 seconds
^C
./pbkdfprocs.escript 10 4  51.52s user 0.38s system 99% cpu 52.347 total

I'll make an OTP issue, otherwise the function, as is, is completely unusable for its intended purpose.

@nickva
Copy link
Contributor

nickva commented Oct 20, 2023

Upstream OTP issue: erlang/otp#7769

@big-r81
Copy link
Contributor

big-r81 commented Oct 20, 2023

Some additional infos about the iteration count: https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#pbkdf2

@nickva
Copy link
Contributor

nickva commented Oct 20, 2023

I deliberately separated into multiple commits but I feel adding pbkdf2-sha256 with high iterations without the cache would lead to problems for folks.

The cache would be for the increased default iteration count? It would make sense to add it even for a sha1 higher iteration count. sha256 is a bit more expensive but still within the same order of magnitude as sha1.

> timer:tc(fun() ->  crypto:pbkdf2_hmac(sha256, <<"password">>, <<"salt">>, 10000, 32) end).
{7010,
 <<94,192,43,145,164,181,156,111,89,221,95,190,76,166,73,
   236,228,250,133,104,205,184,186,54,207,65,66,...>>}

> timer:tc(fun() ->  crypto:pbkdf2_hmac(sha, <<"password">>, <<"salt">>, 10000, 20) end).
{4951,
 <<162,194,100,97,134,130,132,116,183,84,89,26,84,124,24,
   241,50,216,141,116>>}

the password cache could be optional, I could add a toggle so admins that want to can disable it. given that the work effort for a cached item is quite small (though not as small as the single-pass sha1) I didn't think it was necessary, but it's also easy to add.

The main concern there is that concurrent requests will check the single ets table, which also ends up being updated constantly with the atime accessed operations. It seems if users have generated random 128 bit salts and password for API keys, they should be able to use a sha256 or sha512 with a low iteration number to avoid an extra hash and ets table check.

@rnewson rnewson force-pushed the decouple_offline_hash_strength_from_online branch 22 times, most recently from 6516106 to 99e095c Compare October 24, 2023 14:00
@rnewson
Copy link
Member Author

rnewson commented Nov 10, 2023

I can't trigger the conflict cases locally but I've added a catch on the front of couch_password_hasher's call to AuthModule:update_user_creds.

@nickva
Copy link
Contributor

nickva commented Nov 10, 2023

Makes sense that we can just ignore the conflict as they should be the same content. Even if a conflict is generated it should merge cleanly.

I can't trigger the conflict cases locally but I've added a catch on the front of couch_password_hasher's call to AuthModule:update_user_creds.

That would still dump the error in the logs. The error comes from the error logger earlier because we throw inside the spawn_monitor-ed process here and don't catch it:

{_, Ref} = spawn_monitor(fun() ->
case fabric:update_doc(dbname(), UserDoc, []) of
{ok, _} ->
exit(ok);
Else ->
exit(Else)
end
end),

We could do a try ... catch conflict -> exit(conflict) end.

@rnewson rnewson force-pushed the decouple_offline_hash_strength_from_online branch 2 times, most recently from 5ddf3f9 to fea50ab Compare November 10, 2023 18:26
Copy link
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@rnewson rnewson force-pushed the decouple_offline_hash_strength_from_online branch from fea50ab to dbcbc9a Compare November 10, 2023 23:25
@rnewson rnewson merged commit 5fd3579 into main Nov 11, 2023
15 checks passed
@rnewson rnewson deleted the decouple_offline_hash_strength_from_online branch November 11, 2023 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants