Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expiring cache #2331

Merged
merged 1 commit into from
Dec 11, 2019
Merged

Conversation

jaydoane
Copy link
Contributor

Overview

This implements an FDB backed key value cache, where each entry has a "stale" and "expires" time associated with it. Once the current time exceeds the "expires" time, the entry is automatically removed.

Potentially useful for implementing e.g. interfaces to external systems such as OAuth 2.

Testing recommendations

make check apps=couch_expiring_cache

Related Issues or Pull Requests

Checklist

@jaydoane jaydoane requested a review from nickva November 25, 2019 07:52
@nickva
Copy link
Contributor

nickva commented Nov 25, 2019

@jaydoane nice works, small and easy to understand!

Made a few comments in-line.

Was also wondering if you thought about building it more as a library, such that it would plug into to another application's supervisor with something like:

couch_expiring_cache:start_link(Name, Options) when is_atom(Name) ->
    gen_server:start_link({local, Name}, ...) -> ets:new(Name, [named_table, ...]).

Then we wouldn't need a way to disable or enable the whole application and it would be started only as needed. What do you think? If you'd rather keep it as is, that'll work just. Just thought to explore another option.

Copy link
Member

@garrensmith garrensmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Some minor feedback.


-spec lookup(Name :: binary(), Key :: binary()) ->
not_found | {fresh, Val :: binary()} | {stale, Val :: binary()} | expired.
lookup(Name, Key) ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea that @nickva mentions above about adding an optional transaction. I think it should also apply to the lookup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b939216 enables optional transactions for both insert and lookup operations.

]).


-define(XC, 53). % coordinate with fabric2.hrl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit nervous that we have it here instead of fabric2.hrl, it could be easy for someone else to later use the namespace for something else. Also XC isn't a particluarly clean name. I have no idea what it means. Could you expand it a little.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's not a great solution. I just copied this from couch_jobs, but maybe we should just make it a practice to directly modify fabric2.hrl as necessary?

Also, expanded XC to EXPIRING_CACHE in 7144ceb


-include_lib("couch/include/couch_db.hrl").
-include_lib("couch/include/couch_eunit.hrl").
-include_lib("eunit/include/eunit.hrl").
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you could make this one of the new shiny exunit tests instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love nothing more than to use elixir here, but the expiring cache has no HTTP interface, so unfortunately it would be extremely inconvenient :(

Copy link
Member

@garrensmith garrensmith Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its possible to do that now https://github.com/apache/couchdb/tree/prototype/fdb-layer/test/elixir#how-to-write-exunit-tests

You don't need an http interface to use the exunit tests.

@jaydoane
Copy link
Contributor Author

jaydoane commented Dec 2, 2019

Was also wondering if you thought about building it more as a library, such that it would plug into to another application's supervisor something like:

couch_expiring_cache:start_link(Name, Options) when is_atom(Name) ->
    gen_server:start_link({local, Name}, ...) -> ets:new(Name, [named_table, ...]).

@nickva this is an interesting idea, and I hadn't considered adding the cache to an existing supervisor tree, as your snippet seems to imply. I'm guessing that Options could be a map containing optional overrides for period, jitter, and batch size. I think this would also imply a subtle change to the data model, where each Name gets its own gen_server on each node, and

(?EXPIRING_CACHE, ?EXP, ExpiresTS, Name, Key) := ()
would become
(?EXPIRING_CACHE, ?EXP, Name, ExpiresTS, Key) := ()

The main drawback I could think of with such an approach is that unexpired entries of abandoned/changed Names. would never be cleared without some explicit garbage collection mechanism. Alternately, it could use a system like couch_jobs, and explicitly manage names, but that adds complexity.

@nickva
Copy link
Contributor

nickva commented Dec 5, 2019

@jaydoane I was thinking it would be like having N expiring caches, one for each application that needs it. Each would have its own timeouts, etc. Presumably the expiring cache for OAuth2 vs some other auth mechanism could need different expiry and stale timeouts. The model would then look like:

(?EXPIRING_CACHE, Name, ?PK, Key) := (Val, StaleTS, ExpireTS)
(?EXPIRING_CACHE, Name, ?EXP, ExpireTS, Key) := ()

So there would be one for OAuth2 with something like Name = oauth2_xcache. If there another auth library application it would have Name = otherlib_xcache. Each application would then be in charge of starting its own expiry cache child in its supervisor, cleaning up stale entries, updating entries, etc.

@jaydoane
Copy link
Contributor Author

jaydoane commented Dec 5, 2019

(?EXPIRING_CACHE, Name, ?PK, Key) := (Val, StaleTS, ExpireTS)
(?EXPIRING_CACHE, Name, ?EXP, ExpireTS, Key) := ()

@nickva I think the main issue with this schema is that if application Name is retired, or changes its name while there are still unexpired cache entries, those entries will remain as garbage indefinitely in FDB without a way to get cleaned up. We could of course create a registry for cache names similar to how job types are registered, but that adds complexity. Not sure if there's a simpler way to do it, or whether it's even a real concern. I suppose we could just create a cleanup function that should be called (manually) when a Name is retired/changed?

EDIT: It occurs to me that it's possible to clear all keys under the (?EXPIRING_CACHE) prefix, which is sort of the "nuclear option" for key cleanup, but since this is just a cache, it seems like a relatively benign way to ensure any garbage from abandoned Names gets cleared. In short, I agree that overall this change is a substantial improvement, and an update is in the works!

timer_ref => schedule_remove_expired(Period, MaxJitter),
oldest_ts => 0,
elapsed => 0,
largest_elapsed => 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does largest_elapsed track? Is it a stat for debugging?

Copy link
Contributor Author

@jaydoane jaydoane Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, since we don't currently have a way to collect metrics, it's just a simple way to see how long it's taking (in the worst case) to delete expired entries. If it gets close to 5 seconds, we might need to decrease the period, and/or lower the batch size.

-spec insert(Tx :: jtx(), Name :: binary(), Key :: binary(), Value :: binary(),
StaleTS :: millisecond(), ExpiresTS :: millisecond()) -> ok.
insert(Tx, Name, Key, Value, StaleTS, ExpiresTS) ->
couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(Tx), fun(JTx) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, maybe add some guards on the types of arguments.

is_binary/1 for Name, Key and Value and is_integer/1 for StaleTs and ExpiresTS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 601 to 611

[expiring_cache]
;
; How often to remove expired entries
; period_msec = 5000
;
; Jitter applied when removing expired entries
; max_jitter_msec = 1000
;
; Maximum number of entries removed in a single transaction
; batch_size = 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a library, let's skip updating the default.ini changes. Each application can then have its own section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, I forgot to remove this when I converted to a library. Done.

@nickva
Copy link
Contributor

nickva commented Dec 11, 2019

@jaydoane Looks great. See a few minor notes.

To run tests I had to add it to the subdirs in rebar:

--- a/rebar.config.script
+++ b/rebar.config.script
@@ -92,6 +92,7 @@ SubDirs = [
     "src/fabric",
     "src/couch_jobs",
+    "src/couch_expiring_cache",

Otherwise tests ran and passed.

Created an expiring cache manually and saw that entries were being returned as fresh, stale, expired then cleaned up and returned as non_found.

 couch_expiring_cache:insert(<<"auth-decision">>, <<"a">>, <<"x">>, erlang:system_time(millisecond) + 6000, erlang:system_time(millisecond) + 10000), couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
{fresh,<<"x">>}
([email protected])79> couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).                        {fresh,<<"x">>}                                                                                  couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
{fresh,<<"x">>}
([email protected])81> couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
{fresh,<<"x">>}
([email protected])89> couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
{stale,<<"x">>}
([email protected])90> couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
{stale,<<"x">>}
([email protected])91> couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
{stale,<<"x">>}
([email protected])92> couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
{stale,<<"x">>}
([email protected])93> couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
{stale,<<"x">>}
([email protected])94> couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
{stale,<<"x">>}
([email protected])103> couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
{stale,<<"x">>}
([email protected])104> couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
{stale,<<"x">>}
([email protected])105> couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
expired
([email protected])106> couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
expired
([email protected])107> couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
not_found
([email protected])109> couch_expiring_cache:lookup(<<"auth-decision">>, <<"a">>).
not_found

Nice work!

+1 (after the minor fixes from the comments)

This is a library for creating an FDB backed key value cache, where
each entry has a `stale` and `expires` time associated with it. Once
the current time exceeds the `expires` time, the entry is automatically
removed. The `stale` time can be used to indicate that a refresh is
necessary, while still returning a non-expired value. It is potentially
useful for implementing e.g. caches to external systems of record, such
as OAuth 2.
@jaydoane jaydoane merged commit b3899c4 into apache:prototype/fdb-layer Dec 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants