Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement per database encryption for primary data #2685

Closed
wants to merge 24 commits into from

Conversation

eiri
Copy link
Member

@eiri eiri commented Mar 20, 2020

Overview

This PR adds an option for enabling encryption for primary data. The encryption done with aes-256-gcm algorithm per document with document keys (Data Encryption Keys) derived from an individual database's key (Key Encryption Key). KEK supplied at time of the database creation and stored along with the rest of its configuration in encrypted "wrapped" form.

The encoding and decoding of documents handled in a dedicated module that also acts as a cache of "unwrapped" KEKs. The general management of KEKs implemented through epi plugin interface to allow a possibility for implementation of interfaces to alternative Key Management Services, e.g. Hashicorps Vault or SecretHub.

The default epi plugin generates KEK at a databae creation time and encrypts it with a Master Encryption Key specified in CouchDB configuration.

Testing recommendations

Make target make check-fdb should pass

@eiri
Copy link
Member Author

eiri commented Mar 20, 2020

@davisp @rnewson This PR is still work in progress, but can you please take a look if general shape of things seems fine to you?

;
; Keep both files as read-only and owned by couch process.
; key_file = /var/secured/mount/location/key.dat
; iv_file = /var/secured/mount/location/iv.dat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. not sure why these are external and not inline given the other files are readable by the same user.
  2. storing an IV here makes no immediate sense to me so I worry we have a crypto problem to come.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config is writable, my idea is that key will be kept in read-only file with an option to have that file on a separate encrypted mount point.

How do you suggest to store IV?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there shouldn't be one.

AddSize = sum_add_rev_sizes([NewWinner | ToUpdate]),
RemSize = sum_rem_rev_sizes(ToRemove),
incr_stat(Db, <<"sizes">>, <<"external">>, AddSize - RemSize),
incr_stat(Db, <<"update_count">>, 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This increment will only happen if the transaction succeeds.

_ ->
UpdateCounter = get_stat(Db, <<"update_count">>),
BinBody = term_to_binary(Body,
[{compressed, 0}, {minor_version, 1}]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please compress the body.

UpdateCounter = get_stat(Db, <<"update_count">>),
BinBody = term_to_binary(Body,
[{compressed, 0}, {minor_version, 1}]),
{ok, Encoded} = fabric2_encryption:encode(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encode is not an acceptable synonym for encrypt.

false ->
{Body, DiskAtts, Deleted};
_ ->
UpdateCounter = get_stat(Db, <<"update_count">>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that we fetch this stat and use it without guaranteeing it is unique is a serious problem. the update happens elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stats are using atomic operations, if the counter got updated since the start of this transaction it'll get aborted here and retried, so counter guaranteed to be unique in a single transaction.

I have a question though: using read-write on atomic ops kills its performance benefits - can we use uuid instead of update counter, is it cryptographically solid enough? that would allow us to avoid "external" monitonic dependancy all together

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would not have a guarantee of uniqueness, which is the only reason we have "update counter" instead of "rev" from the original proposal.

is_binary(DocId),
is_integer(UpdateCounter), UpdateCounter > 0,
is_binary(DocBody) ->
gen_server:call(?MODULE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing all the crypto work in a single gen_server sounds like a recipe for performance issues, even if we do spawn off worker processes there.

could we instead a) set the sensitive flag on self(), b) fetch the unwrapped key c) perform the operation in process d) unset sensitive flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because of KEK cache. Its erts table set to protected, so only gen server can read it, fetching unwrapped key from worker would require to change ets to private, but then any proc could read from it, there are no option to limit this, unfortunally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean private there. a protected table can be read by all processes.

in the current code you hand the unwrapped key to another process (spawn_monitor). It doesn't appear to be more protected in that process than if we did the work in the calling process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, private, thank you. it's more protected in sense that I can't remsh and read unwrapped key from ets. if you think this is not a concern then I'll make worker to do the unwrap fetch.

end.


encode(WrappedKEK, DbName, DocId, UpdateCounter, DocBody)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function performs a cryptographic encryption not a mere encoding. rename to encrypt for clarity.

{encode, WrappedKEK, DbName, DocId, UpdateCounter, DocBody}).


decode(WrappedKEK, DbName, DocId, UpdateCounter, DocBody)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decrypt



handle_call({get_wrapped_kek, DbName}, _From, #{cache := Cache} = St) ->
{ok, KEK, WrappedKEK} = fabric2_encryption_plugin:get_wrapped_kek(DbName),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid doing slow external work within the gen_server loop?


init(_) ->
process_flag(sensitive, true),
process_flag(trap_exit, true),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try to avoid building another supervisor of our own.



unwrap_kek(WrappedKEK) ->
case get_mek_iv() of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reusing the same Key and IV for different data is a cryptographic no-no.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we can use the key wrapping and unwrapping code from aegis instead)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Master key though, not databases' Key encryption key, it's not touching data, it only used to encrypt generated per db KEKs. I can change aes_ctr encryption of KEK I'm using here for aegis wrapping, but it'll still be the same key for wrap/unwrap operation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we must use a standard key wrapping technique, not one of your own invention.

@davisp
Copy link
Member

davisp commented Mar 23, 2020

I'm gonna leave all the encryption related stuff to @rnewson and others that know better. My only concern was already called out about having a single gen_server handling all of that work. Though looks like there's been changes there already.

@rnewson rnewson self-requested a review March 24, 2020 10:22
Copy link
Member

@rnewson rnewson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR raised several key management issues that were left unspecified in the original IBM proposal for encryption at rest within CouchDB 4.0. That proposal uses a proprietary key management system (IBM Key Protect) that the Apache CouchDB project will not accept being coupled to.

The key management scheme for Apache CouchDB will be as follows;

Master Key

The master key is needed whenever CouchDB starts and is cached in memory for as long as CouchDB remains running. CouchDB will call an external script (using os:cmd()), if present, at a location configured in the .ini config file, and the script is required to print the master key to standard out (and nothing else). CouchDB will ship with a few example implementations of this script;

  • printing the contents of a local file.
  • fetching the contents of a url and printing the response body.

Per-Database Key's

These will be generated by CouchDB by calling crypto:strong_rand_bytes(32) and using the resultant value directly as the encryption key for a specific database. Each database will have its own independently generated key.
This value will be wrapped by the master key using the AES Key Wrap algorithm described in NIST Special Publication 800-38F (https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-38F.pdf) so that administrators can, if they choose to, run in a FIPS 140-2 compliant manner (if they provide a FIPS 140-2 validated module, of course).
The wrapped value will be stored in the database where other database metadata is stored.

Per-Document Key's

These keys are derived from the per-database key using the key derivation function specified in FIPS 140-2 Annex D, Recommendation for Key Derivation Using Pseudorandom Functions, Section 5.1 (KDF in Counter Mode).

The key derivation function will be passed;

KI
The per-database key
Label
The string "couchdb-aes256-gcm-encryption-key"
Context
The document id and document revision concatenated together, separated by a single `0x00` octet
h
256
r
16

@eiri eiri force-pushed the prototype/fdb-encryption branch from 513114c to 865b560 Compare April 3, 2020 20:46
eiri added 23 commits April 7, 2020 02:33
When encryption provided registred in epi's provider chain
it intercepts the call chain preventing alternative plugins
from working. This changes it to be just a default module,
so it'll be used when no other encryption plugins registered.
Instead of serializing and encrypting doc body separately
this changes approach to encrypt the whole stored term prior chunkifying.

This is now possible because we not depend on a stored in term update counter
to derive a key anymore and saves a term/bin convertion per doc write/read.
- Set sensitive flags on workers
- Use longer timeout for all the calls
- Don't trap exit
- No superfluous init_st
- Properly drain waiters on the server termination
If getting KEK requires a database name, then depending on an encryption
provider it could be also necessary for unwrapping it too, so pass it to
unwrap function just in case.
@eiri eiri force-pushed the prototype/fdb-encryption branch from 865b560 to 086a4e7 Compare April 7, 2020 05:45
@eiri
Copy link
Member Author

eiri commented Apr 17, 2020

Closing in favour of prototype/fdb-layer...aegis

@eiri eiri closed this Apr 17, 2020
@eiri eiri deleted the prototype/fdb-encryption branch April 27, 2020 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants