Allow warmers to trigger hooks #322

camilleryr · 2024-01-03T16:36:41Z

A) Sorry that this didnt start with an issue - feel free to close this and we can move any discussion over there, but I put together this PR as a POC to ensure that this solved our issue on our application side

B) The issue we were running into is that we wanted to use a Hook to react to when a Warmer ran and found that Warmers did not trigger hooks - this issue seems to be caused by the fact that warmers are initialized with a cache spec before the hooks are associated to their pids, so the notification filters these out thinking that they are not running.

This PR addresses that particular issue by having the warmer call put_many with just the cache name instead of the cache spec so that the current state will always be used to avoid any possibility of a race condition between the warmer running and the hooks starting and being added to the Overseer's cache state

Additionally, this PR rearranges the lifecycle of the warmer a bit to ensure that the first run will happen after any hooks have been started / attached so that notifications will not be missed

Please let me know if there are any questions / comments / changes you would like to see - and thank you for all of the work you do by maintaining this project

whitfin · 2024-01-03T23:20:43Z

Hey @camilleryr!

I've only glanced over this so far, I'll take a deeper look in future. Thank you for contributing! No worries about no corresponding issue, I don't really care; I just prefer that to save people doing unnecessary work.

My initial thoughts (again, with only a quick glance over):

This PR addresses that particular issue by having the warmer call put_many with just the cache name instead of the cache spec so that the current state will always be used to avoid any possibility of a race condition between the warmer running and the hooks starting and being added to the Overseer's cache state

Yep makes sense. I'm not entirely a fan of calling via the name, I'm wondering if we can do something like :provision that hooks have (but also I really hate that interface, so maybe not). Cache warmers are (in theory) called less frequently, so the overhead of using the name is very likely irrelevant in practice - even if the perfectionist in me would like to avoid it.

Additionally, this PR rearranges the lifecycle of the warmer a bit to ensure that the first run will happen after any hooks have been started / attached so that notifications will not be missed

I'm not sure I'm a fan of this, it feels a bit awkward and unnatural to me. I understand there are cases where you might want this, but you could also manually implement this waiting without having to change it for everyone (and thus add the maintenance burden). I guess I'm curious what your hook is doing; it feels like you implicitly know a warmer is running on startup so I'm curious why you'd have to be notified about it?

Please let me know if there are any questions / comments / changes you would like to see

Although I understand why you did it, I'm not a huge fan of create_hook in the test code - I'd rather have the modules be explicitly defined in the same test file rather than hiding them away. Debugging some of that common stuff is already a headache (particularly because it's async), so I prefer to have it all compact. I know this is just my personal preference though, so I hope you don't take this as a negative!

camilleryr · 2024-01-04T00:15:05Z

This PR addresses that particular issue by having the warmer call put_many with just the cache name instead of the cache spec so that the current state will always be used to avoid any possibility of a race condition between the warmer running and the hooks starting and being added to the Overseer's cache state

Yep makes sense. I'm not entirely a fan of calling via the name, I'm wondering if we can do something like :provision that hooks have (but also I really hate that interface, so maybe not). Cache warmers are (in theory) called less frequently, so the overhead of using the name is very likely irrelevant in practice - even if the perfectionist in me would like to avoid it.

I agree that this is a suboptimal solution, but I also though about the performance impact and had the same though about how frequently these are used! Im open to other approaches, but this seemed like the path of least resistance to ensuring that we would never call put_many with a stale state from the warmer

Additionally, this PR rearranges the lifecycle of the warmer a bit to ensure that the first run will happen after any hooks have been started / attached so that notifications will not be missed

I'm not sure I'm a fan of this, it feels a bit awkward and unnatural to me. I understand there are cases where you might want this, but you could also manually implement this waiting without having to change it for everyone (and thus add the maintenance burden). I guess I'm curious what your hook is doing; it feels like you implicitly know a warmer is running on startup so I'm curious why you'd have to be notified about it?

I am generally sensitive to the potential for race conditions - so when approaching how to allow warmers to trigger hooks I wanted to guarantee that the fix would work as expected every time. Without some coordination between the overseer and incubator there would be a potential for the warmer to run and have the put_many call happen before the Overseer.update call that occurs in Cachex.start_link which would result in the notification being missed. That being said - I don't think there is any tangible change for an existing user - the hooks and warmers still function the same way, but are just orchestrated in a way that enforces ordering. The additional waiting time would be only at startup and would be (I assume) negligible. The maintenance cost is definitely negotiable though!

As far as what we are trying to achieve - We have a work load that would rely heavily on this cached data, and ideally we would not start the workload until the warmer has ran. We could utilize the async: false attribute for the warmer for this, but that would disrupt the start up of our application and is generally not an approach that fits into our design. My initial though was that we could send a message from the warmer to kick off our work, but since we would be sending a message before the call to put_many has completed, we could end up starting our work before the cache is ready. These changes are designed to allow us to have a hook listening for the put_many notification to then send a message to our workload orchestrator that we are ready to start. (Once again, sensitive to the potential for race conditions)

Please let me know if there are any questions / comments / changes you would like to see

Although I understand why you did it, I'm not a huge fan of create_hook in the test code - I'd rather have the modules be explicitly defined in the same test file rather than hiding them away. Debugging some of that common stuff is already a headache (particularly because it's async), so I prefer to have it all compact. I know this is just my personal preference though, so I hope you don't take this as a negative!

Lol - I thought the same thing, but went with the helper to follow what I thought of as a convention here - I can definitely pull that out and replace it with a explicitly defined module

I understand if you dont think this warrants inclusion in cachex proper, but thought I would propose some changes since these pieces did not compose in the way I anticipated!

Thanks for the review and I am looking forward to more of your thoughts.

whitfin · 2024-01-04T00:44:39Z

Most of this makes sense to me, I'll give it another look over and see if anything else comes up. It seems like the required changes basically boil down to this (correct me if I'm wrong, of course):

Hooks should be available prior to Warmers.
- I think I consider it a bug that it doesn't already work this way?
- It feels like any change here is a definite improvement.
Warmers should use the latest version of cache state.
- I definitely consider this one a bug!
- There's no good reason to not use the name
- The overhead is so small, it will only ever cause negative side effects (like this PR) by using the state

Then there's the third change regarding startup, but I'm not exactly sure what I think about this. Thinking out loud on this a bit more (disclaimer: kinda sick, and it's 1am, so feel free to tell me if I'm being stupid):

I see the dilemma with the current call to Overseer.update/2 being at the end of the chain so anything prior could race, but I'd rather do something to fix this cause rather than just sync the hooks/warmers stuff up specifically.

Technically Overseer.update/2 is sequential, so maybe we could have hooks self-register on startup instead - in theory then as long as they're started before the Warmers, it should all be up to date (right?). There is definitely a hit at cache startup here, but it's probably a lot more robust than linking it all after the fact. Note to self to check potential blocking cycles here, but I think it's safe.

I haven't really looked at how this would appear in the codebase, but rather than Informant.link/1 after the main tree is started, we'd just refresh from the Overseer before passing back. It sounds like it would look a lot cleaner, but it might not be in practice. Looking back at it, it's definitely weird l that there's some bookkeeping to be done after calling Supervisor.start_link/3 - it'd be nice if we could somehow address that here. Feel free to let me know your thoughts!

I've marked this for v3.7.0 which should be in the next couple of weeks (waiting for some follow up on a couple of other threads just to tie those up). Hopefully this timeline works for you? We can definitely improve a lot of this even if we don't fully merge everything as-is right now.

camilleryr · 2024-01-04T14:17:51Z

Then there's the third change regarding startup, but I'm not exactly sure what I think about this.

I think you are correct that its better to address the cause of the issue rather than to code around it! I have reimplemented this PR to try and have the Informant register the Hooks as they start up. To do this, I added a wrapper around the GenServer.start_link call that will capture the returned pid and call Overseer.update

This would not have been my preferred approach, as I think I would have put this in init rather than start_link - but that would require a bit of structural changes to facilitate, since the current init function is overridable, so we would need to create a GenHook type behaviour that delegates to the individual Hook implementation - its doable, but would require more code and I dont think there are really any down sides to where this is currently placed - let me know if you think there is a more suitable place to patch this in

Additionally - I pulled out that create_hook helper and implemented the module in the test suite - then found that there was a preexisting helper that not only created a hook - but created exactly the hook I needed, so I ended up reusing that - while this might use an abstraction that you eventually want to do away with - at least its a pre existing one!

Hope you feel better soon - and as always, let me know your thoughts on this and thank you for your time.

camilleryr · 2024-03-21T02:05:43Z

Wanted to check in and see if you had any additional thoughts on this work

whitfin · 2024-03-21T02:48:52Z

Hi @camilleryr! I was actually planning on looking at this today, what strange timing.

I can run through this now with my thoughts, then you can adapt as you see fit - then I'll probably get this merged in this weekend (maybe with some further changes from my side if necessary).

I'll leave a quick review, and if you could resolve the conflicts (which don't seem to make sense to me...) it'd be great! I should note I'm also happy to pick this up if your time is limited, just let me know!

whitfin

I left a couple of comments on the initial changes.

That being said - in general I don't really like the new flow of hooks starting first before the main cache table.

I think I would prefer doing something where the warmers aren't automatically fired on startup, and are instead fired after the start_link call on the line below here.

I think this makes this a lot easier, not much has to change beyond what we have - it's just some changes to warmers and then a tweak here to add an initial warming (which can be done via Cachex.warm/2 that I added recently).

Do you have any thoughts on this approach instead?

whitfin · 2024-03-21T02:50:36Z

lib/cachex/warmer.ex

@@ -99,11 +102,11 @@ defmodule Cachex.Warmer do

 # set pairs without options
 {:ok, pairs} ->
- Cachex.put_many(cache, pairs)
+ Cachex.put_many(name, pairs)


This should stay as state when possible; using the name has overhead that the struct does not.

This change was included because there was a race condition caused by the use of the cache here - we had perviously discussed that the performance impacts of this change were mostly likely irrelevant due the frequency of warmers - happy to try and address this in some other way if you have thoughs

Warmers should use the latest version of cache state.

I definitely consider this one a bug!

There's no good reason to not use the name

The overhead is so small, it will only ever cause negative side effects (like this PR) by using the state

Yeah, totally spaced on this one - the bug also sort of exists on L117, to a lesser extent. I think the best approach here is to wrap everything in the case body in a Cachex.execute(name) block, so that we have a copy of the latest state regardless.

That also means that we can purely store the warmer's state and the timer ref as the server state, with just the cache name:

{ name, state, timer }

Note to self that this kinda sucks because provisions don't work with warmers, only hooks. I don't think it should be changed in this PR, but I have filed #332 to follow up on to clean this type of thing up a bit more.

lib/cachex/services/overseer.ex

lib/cachex/actions/warm.ex

camilleryr · 2024-03-21T13:10:58Z

lib/cachex/warmer.ex

+ def warm(pid, opts) do
+ if opts[:respect_async_warmers] do
+ GenServer.call(pid, :cachex_warmer, :infinity)
+ else
+ send(pid, :cachex_warmer)
+ end
+ end


We need to be able to block for async: false workers, so I added this api - if you would prefer we could move this if/else block to the Actions.Warm module to avoid the additional function here.

camilleryr · 2024-03-21T13:12:35Z

lib/cachex/warmer.ex

+ _from,
+ {cache, warmer(async: async) = warmer, timer_ref}
+ ) do
+ if timer_ref, do: Process.cancel_timer(timer_ref)


This is an unrelated fix - but currently every time that Cachex.warm is called we would stack timers causing additional runs of the warmer, by tracking and potentially canceling a timer, we should just reset the interval whenever Cachex.warm

Good catch on this one, I'm going to revisit this after merge - I can clean up the Cachex.warm/2 implementation a bit more.

camilleryr · 2024-03-21T13:14:05Z

I refactored this, hopefully its closer to what you were describing. With the exception of the updates to the stats tests, I think that this is the cleanest implementation yet - thanks for the guidance, and let me know what additional changes you would like to see.

whitfin · 2024-03-21T17:01:07Z

lib/cachex/warmer.ex

+ _from,
+ {cache, warmer(async: async) = warmer, timer_ref}
+ ) do
+ if timer_ref, do: Process.cancel_timer(timer_ref)


Good catch on this one, I'm going to revisit this after merge - I can clean up the Cachex.warm/2 implementation a bit more.

whitfin · 2024-03-21T17:04:55Z

lib/cachex/warmer.ex

@@ -99,11 +102,11 @@ defmodule Cachex.Warmer do

 # set pairs without options
 {:ok, pairs} ->
- Cachex.put_many(cache, pairs)
+ Cachex.put_many(name, pairs)


Yeah, totally spaced on this one - the bug also sort of exists on L117, to a lesser extent. I think the best approach here is to wrap everything in the case body in a Cachex.execute(name) block, so that we have a copy of the latest state regardless.

That also means that we can purely store the warmer's state and the timer ref as the server state, with just the cache name:

{ name, state, timer }

lib/cachex/warmer.ex

whitfin · 2024-03-21T17:08:06Z

lib/cachex.ex

@@ -302,6 +302,7 @@ defmodule Cachex do
 {:ok, pid} = Supervisor.start_link(__MODULE__, cache, name: name),
 {:ok, link} = Informant.link(cache),
 ^link <- Overseer.update(name, link),
+ _ = warm(cache, respect_async_warmers: true),


First instinct is already that this looks a lot better. Minor nitpick but would prefer Cachex.warm rather than warm to make it more obvious we're re-using the public API.

lib/cachex/actions/warm.ex

whitfin · 2024-03-21T17:17:06Z

lib/cachex/warmer.ex

@@ -99,11 +102,11 @@ defmodule Cachex.Warmer do

 # set pairs without options
 {:ok, pairs} ->
- Cachex.put_many(cache, pairs)
+ Cachex.put_many(name, pairs)


Note to self that this kinda sucks because provisions don't work with warmers, only hooks. I don't think it should be changed in this PR, but I have filed #332 to follow up on to clean this type of thing up a bit more.

test/cachex/actions/stats_test.exs

lib/cachex/actions/warm.ex

whitfin · 2024-03-22T22:37:13Z

Looks good, thanks @camilleryr!

whitfin · 2024-03-23T00:09:06Z

This will be shipped in the next release, which is looking like it will be v4.0.0. Your work here has raised a lot of ugliness with the existing warmer implementation, so I'm going to take this opportunity to refactor all of it before release - so thank you once again for your time and patience with all of this!

whitfin self-requested a review January 3, 2024 23:20

whitfin added this to the v3.7.0 milestone Jan 3, 2024

whitfin added the enhancement label Jan 3, 2024

whitfin self-assigned this Jan 3, 2024

whitfin requested review from whitfin and removed request for whitfin January 3, 2024 23:21

camilleryr force-pushed the main branch 2 times, most recently from e4e778a to 8dfa226 Compare January 4, 2024 14:07

camilleryr force-pushed the main branch from 8dfa226 to 741a144 Compare January 4, 2024 14:31

whitfin added the bug label Jan 4, 2024

whitfin reviewed Mar 21, 2024

View reviewed changes

camilleryr force-pushed the main branch 2 times, most recently from 2291ea2 to 8af1324 Compare March 21, 2024 13:06

camilleryr commented Mar 21, 2024

View reviewed changes

lib/cachex/actions/warm.ex Outdated Show resolved Hide resolved

camilleryr commented Mar 21, 2024

View reviewed changes

whitfin reviewed Mar 21, 2024

View reviewed changes

camilleryr force-pushed the main branch 3 times, most recently from e41692e to a0d3717 Compare March 22, 2024 11:50

Refactor warmers behavior on startup

5aaef29

camilleryr force-pushed the main branch from a0d3717 to 5aaef29 Compare March 22, 2024 12:19

whitfin merged commit 5754f3e into whitfin:main Mar 22, 2024
15 checks passed

whitfin modified the milestones: v3.7.0, v4.0.0 Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow warmers to trigger hooks #322

Allow warmers to trigger hooks #322

camilleryr commented Jan 3, 2024 •

edited

Loading

whitfin commented Jan 3, 2024

camilleryr commented Jan 4, 2024

whitfin commented Jan 4, 2024

camilleryr commented Jan 4, 2024

camilleryr commented Mar 21, 2024

whitfin commented Mar 21, 2024 •

edited

Loading

whitfin left a comment

whitfin Mar 21, 2024

camilleryr Mar 21, 2024

whitfin Mar 21, 2024

whitfin Mar 21, 2024

camilleryr Mar 21, 2024

camilleryr Mar 21, 2024

whitfin Mar 21, 2024

camilleryr commented Mar 21, 2024

whitfin Mar 21, 2024

whitfin Mar 21, 2024

whitfin Mar 21, 2024

whitfin Mar 21, 2024

whitfin commented Mar 22, 2024 •

edited

Loading

whitfin commented Mar 23, 2024 •

edited

Loading

Allow warmers to trigger hooks #322

Allow warmers to trigger hooks #322

Conversation

camilleryr commented Jan 3, 2024 • edited Loading

whitfin commented Jan 3, 2024

camilleryr commented Jan 4, 2024

whitfin commented Jan 4, 2024

camilleryr commented Jan 4, 2024

camilleryr commented Mar 21, 2024

whitfin commented Mar 21, 2024 • edited Loading

whitfin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

camilleryr commented Mar 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whitfin commented Mar 22, 2024 • edited Loading

whitfin commented Mar 23, 2024 • edited Loading

camilleryr commented Jan 3, 2024 •

edited

Loading

whitfin commented Mar 21, 2024 •

edited

Loading

whitfin commented Mar 22, 2024 •

edited

Loading

whitfin commented Mar 23, 2024 •

edited

Loading