Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow warmers to trigger hooks #322

Merged
merged 1 commit into from
Mar 22, 2024
Merged

Allow warmers to trigger hooks #322

merged 1 commit into from
Mar 22, 2024

Conversation

camilleryr
Copy link
Contributor

@camilleryr camilleryr commented Jan 3, 2024

A) Sorry that this didnt start with an issue - feel free to close this and we can move any discussion over there, but I put together this PR as a POC to ensure that this solved our issue on our application side

B) The issue we were running into is that we wanted to use a Hook to react to when a Warmer ran and found that Warmers did not trigger hooks - this issue seems to be caused by the fact that warmers are initialized with a cache spec before the hooks are associated to their pids, so the notification filters these out thinking that they are not running.

This PR addresses that particular issue by having the warmer call put_many with just the cache name instead of the cache spec so that the current state will always be used to avoid any possibility of a race condition between the warmer running and the hooks starting and being added to the Overseer's cache state

Additionally, this PR rearranges the lifecycle of the warmer a bit to ensure that the first run will happen after any hooks have been started / attached so that notifications will not be missed

Please let me know if there are any questions / comments / changes you would like to see - and thank you for all of the work you do by maintaining this project

@whitfin
Copy link
Owner

whitfin commented Jan 3, 2024

Hey @camilleryr!

I've only glanced over this so far, I'll take a deeper look in future. Thank you for contributing! No worries about no corresponding issue, I don't really care; I just prefer that to save people doing unnecessary work.

My initial thoughts (again, with only a quick glance over):

This PR addresses that particular issue by having the warmer call put_many with just the cache name instead of the cache spec so that the current state will always be used to avoid any possibility of a race condition between the warmer running and the hooks starting and being added to the Overseer's cache state

Yep makes sense. I'm not entirely a fan of calling via the name, I'm wondering if we can do something like :provision that hooks have (but also I really hate that interface, so maybe not). Cache warmers are (in theory) called less frequently, so the overhead of using the name is very likely irrelevant in practice - even if the perfectionist in me would like to avoid it.

Additionally, this PR rearranges the lifecycle of the warmer a bit to ensure that the first run will happen after any hooks have been started / attached so that notifications will not be missed

I'm not sure I'm a fan of this, it feels a bit awkward and unnatural to me. I understand there are cases where you might want this, but you could also manually implement this waiting without having to change it for everyone (and thus add the maintenance burden). I guess I'm curious what your hook is doing; it feels like you implicitly know a warmer is running on startup so I'm curious why you'd have to be notified about it?

Please let me know if there are any questions / comments / changes you would like to see

Although I understand why you did it, I'm not a huge fan of create_hook in the test code - I'd rather have the modules be explicitly defined in the same test file rather than hiding them away. Debugging some of that common stuff is already a headache (particularly because it's async), so I prefer to have it all compact. I know this is just my personal preference though, so I hope you don't take this as a negative!

@whitfin whitfin self-requested a review January 3, 2024 23:20
@whitfin whitfin added this to the v3.7.0 milestone Jan 3, 2024
@whitfin whitfin self-assigned this Jan 3, 2024
@whitfin whitfin requested review from whitfin and removed request for whitfin January 3, 2024 23:21
@camilleryr
Copy link
Contributor Author

This PR addresses that particular issue by having the warmer call put_many with just the cache name instead of the cache spec so that the current state will always be used to avoid any possibility of a race condition between the warmer running and the hooks starting and being added to the Overseer's cache state

Yep makes sense. I'm not entirely a fan of calling via the name, I'm wondering if we can do something like :provision that hooks have (but also I really hate that interface, so maybe not). Cache warmers are (in theory) called less frequently, so the overhead of using the name is very likely irrelevant in practice - even if the perfectionist in me would like to avoid it.

I agree that this is a suboptimal solution, but I also though about the performance impact and had the same though about how frequently these are used! Im open to other approaches, but this seemed like the path of least resistance to ensuring that we would never call put_many with a stale state from the warmer

Additionally, this PR rearranges the lifecycle of the warmer a bit to ensure that the first run will happen after any hooks have been started / attached so that notifications will not be missed

I'm not sure I'm a fan of this, it feels a bit awkward and unnatural to me. I understand there are cases where you might want this, but you could also manually implement this waiting without having to change it for everyone (and thus add the maintenance burden). I guess I'm curious what your hook is doing; it feels like you implicitly know a warmer is running on startup so I'm curious why you'd have to be notified about it?

I am generally sensitive to the potential for race conditions - so when approaching how to allow warmers to trigger hooks I wanted to guarantee that the fix would work as expected every time. Without some coordination between the overseer and incubator there would be a potential for the warmer to run and have the put_many call happen before the Overseer.update call that occurs in Cachex.start_link which would result in the notification being missed. That being said - I don't think there is any tangible change for an existing user - the hooks and warmers still function the same way, but are just orchestrated in a way that enforces ordering. The additional waiting time would be only at startup and would be (I assume) negligible. The maintenance cost is definitely negotiable though!

As far as what we are trying to achieve - We have a work load that would rely heavily on this cached data, and ideally we would not start the workload until the warmer has ran. We could utilize the async: false attribute for the warmer for this, but that would disrupt the start up of our application and is generally not an approach that fits into our design. My initial though was that we could send a message from the warmer to kick off our work, but since we would be sending a message before the call to put_many has completed, we could end up starting our work before the cache is ready. These changes are designed to allow us to have a hook listening for the put_many notification to then send a message to our workload orchestrator that we are ready to start. (Once again, sensitive to the potential for race conditions)

Please let me know if there are any questions / comments / changes you would like to see

Although I understand why you did it, I'm not a huge fan of create_hook in the test code - I'd rather have the modules be explicitly defined in the same test file rather than hiding them away. Debugging some of that common stuff is already a headache (particularly because it's async), so I prefer to have it all compact. I know this is just my personal preference though, so I hope you don't take this as a negative!

Lol - I thought the same thing, but went with the helper to follow what I thought of as a convention here - I can definitely pull that out and replace it with a explicitly defined module

I understand if you dont think this warrants inclusion in cachex proper, but thought I would propose some changes since these pieces did not compose in the way I anticipated!

Thanks for the review and I am looking forward to more of your thoughts.

@whitfin
Copy link
Owner

whitfin commented Jan 4, 2024

Most of this makes sense to me, I'll give it another look over and see if anything else comes up. It seems like the required changes basically boil down to this (correct me if I'm wrong, of course):

  1. Hooks should be available prior to Warmers.
    • I think I consider it a bug that it doesn't already work this way?
    • It feels like any change here is a definite improvement.
  2. Warmers should use the latest version of cache state.
    • I definitely consider this one a bug!
    • There's no good reason to not use the name
    • The overhead is so small, it will only ever cause negative side effects (like this PR) by using the state

Then there's the third change regarding startup, but I'm not exactly sure what I think about this. Thinking out loud on this a bit more (disclaimer: kinda sick, and it's 1am, so feel free to tell me if I'm being stupid):

I see the dilemma with the current call to Overseer.update/2 being at the end of the chain so anything prior could race, but I'd rather do something to fix this cause rather than just sync the hooks/warmers stuff up specifically.

Technically Overseer.update/2 is sequential, so maybe we could have hooks self-register on startup instead - in theory then as long as they're started before the Warmers, it should all be up to date (right?). There is definitely a hit at cache startup here, but it's probably a lot more robust than linking it all after the fact. Note to self to check potential blocking cycles here, but I think it's safe.

I haven't really looked at how this would appear in the codebase, but rather than Informant.link/1 after the main tree is started, we'd just refresh from the Overseer before passing back. It sounds like it would look a lot cleaner, but it might not be in practice. Looking back at it, it's definitely weird l that there's some bookkeeping to be done after calling Supervisor.start_link/3 - it'd be nice if we could somehow address that here. Feel free to let me know your thoughts!

I've marked this for v3.7.0 which should be in the next couple of weeks (waiting for some follow up on a couple of other threads just to tie those up). Hopefully this timeline works for you? We can definitely improve a lot of this even if we don't fully merge everything as-is right now.

@camilleryr
Copy link
Contributor Author

Then there's the third change regarding startup, but I'm not exactly sure what I think about this.

I think you are correct that its better to address the cause of the issue rather than to code around it! I have reimplemented this PR to try and have the Informant register the Hooks as they start up. To do this, I added a wrapper around the GenServer.start_link call that will capture the returned pid and call Overseer.update

This would not have been my preferred approach, as I think I would have put this in init rather than start_link - but that would require a bit of structural changes to facilitate, since the current init function is overridable, so we would need to create a GenHook type behaviour that delegates to the individual Hook implementation - its doable, but would require more code and I dont think there are really any down sides to where this is currently placed - let me know if you think there is a more suitable place to patch this in

Additionally - I pulled out that create_hook helper and implemented the module in the test suite - then found that there was a preexisting helper that not only created a hook - but created exactly the hook I needed, so I ended up reusing that - while this might use an abstraction that you eventually want to do away with - at least its a pre existing one!

Hope you feel better soon - and as always, let me know your thoughts on this and thank you for your time.

@camilleryr
Copy link
Contributor Author

Wanted to check in and see if you had any additional thoughts on this work

@whitfin
Copy link
Owner

whitfin commented Mar 21, 2024

Hi @camilleryr! I was actually planning on looking at this today, what strange timing.

I can run through this now with my thoughts, then you can adapt as you see fit - then I'll probably get this merged in this weekend (maybe with some further changes from my side if necessary).

I'll leave a quick review, and if you could resolve the conflicts (which don't seem to make sense to me...) it'd be great! I should note I'm also happy to pick this up if your time is limited, just let me know!

Copy link
Owner

@whitfin whitfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments on the initial changes.

That being said - in general I don't really like the new flow of hooks starting first before the main cache table.

I think I would prefer doing something where the warmers aren't automatically fired on startup, and are instead fired after the start_link call on the line below here.

I think this makes this a lot easier, not much has to change beyond what we have - it's just some changes to warmers and then a tweak here to add an initial warming (which can be done via Cachex.warm/2 that I added recently).

Do you have any thoughts on this approach instead?

@@ -99,11 +102,11 @@ defmodule Cachex.Warmer do

# set pairs without options
{:ok, pairs} ->
Cachex.put_many(cache, pairs)
Cachex.put_many(name, pairs)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should stay as state when possible; using the name has overhead that the struct does not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was included because there was a race condition caused by the use of the cache here - we had perviously discussed that the performance impacts of this change were mostly likely irrelevant due the frequency of warmers - happy to try and address this in some other way if you have thoughs

  1. Warmers should use the latest version of cache state.

    • I definitely consider this one a bug!
    • There's no good reason to not use the name
    • The overhead is so small, it will only ever cause negative side effects (like this PR) by using the state

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, totally spaced on this one - the bug also sort of exists on L117, to a lesser extent. I think the best approach here is to wrap everything in the case body in a Cachex.execute(name) block, so that we have a copy of the latest state regardless.

That also means that we can purely store the warmer's state and the timer ref as the server state, with just the cache name:

{ name, state, timer }

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self that this kinda sucks because provisions don't work with warmers, only hooks. I don't think it should be changed in this PR, but I have filed #332 to follow up on to clean this type of thing up a bit more.

lib/cachex/services/overseer.ex Outdated Show resolved Hide resolved
Comment on lines 70 to 76
def warm(pid, opts) do
if opts[:respect_async_warmers] do
GenServer.call(pid, :cachex_warmer, :infinity)
else
send(pid, :cachex_warmer)
end
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be able to block for async: false workers, so I added this api - if you would prefer we could move this if/else block to the Actions.Warm module to avoid the additional function here.

_from,
{cache, warmer(async: async) = warmer, timer_ref}
) do
if timer_ref, do: Process.cancel_timer(timer_ref)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unrelated fix - but currently every time that Cachex.warm is called we would stack timers causing additional runs of the warmer, by tracking and potentially canceling a timer, we should just reset the interval whenever Cachex.warm

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on this one, I'm going to revisit this after merge - I can clean up the Cachex.warm/2 implementation a bit more.

@camilleryr
Copy link
Contributor Author

I refactored this, hopefully its closer to what you were describing. With the exception of the updates to the stats tests, I think that this is the cleanest implementation yet - thanks for the guidance, and let me know what additional changes you would like to see.

_from,
{cache, warmer(async: async) = warmer, timer_ref}
) do
if timer_ref, do: Process.cancel_timer(timer_ref)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on this one, I'm going to revisit this after merge - I can clean up the Cachex.warm/2 implementation a bit more.

@@ -99,11 +102,11 @@ defmodule Cachex.Warmer do

# set pairs without options
{:ok, pairs} ->
Cachex.put_many(cache, pairs)
Cachex.put_many(name, pairs)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, totally spaced on this one - the bug also sort of exists on L117, to a lesser extent. I think the best approach here is to wrap everything in the case body in a Cachex.execute(name) block, so that we have a copy of the latest state regardless.

That also means that we can purely store the warmer's state and the timer ref as the server state, with just the cache name:

{ name, state, timer }

lib/cachex/warmer.ex Outdated Show resolved Hide resolved
lib/cachex.ex Outdated
@@ -302,6 +302,7 @@ defmodule Cachex do
{:ok, pid} = Supervisor.start_link(__MODULE__, cache, name: name),
{:ok, link} = Informant.link(cache),
^link <- Overseer.update(name, link),
_ = warm(cache, respect_async_warmers: true),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First instinct is already that this looks a lot better. Minor nitpick but would prefer Cachex.warm rather than warm to make it more obvious we're re-using the public API.

lib/cachex/actions/warm.ex Outdated Show resolved Hide resolved
@@ -99,11 +102,11 @@ defmodule Cachex.Warmer do

# set pairs without options
{:ok, pairs} ->
Cachex.put_many(cache, pairs)
Cachex.put_many(name, pairs)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self that this kinda sucks because provisions don't work with warmers, only hooks. I don't think it should be changed in this PR, but I have filed #332 to follow up on to clean this type of thing up a bit more.

test/cachex/actions/stats_test.exs Outdated Show resolved Hide resolved
lib/cachex/actions/warm.ex Outdated Show resolved Hide resolved
@whitfin
Copy link
Owner

whitfin commented Mar 22, 2024

Looks good, thanks @camilleryr!

@whitfin whitfin merged commit 5754f3e into whitfin:main Mar 22, 2024
15 checks passed
@whitfin
Copy link
Owner

whitfin commented Mar 23, 2024

This will be shipped in the next release, which is looking like it will be v4.0.0. Your work here has raised a lot of ugliness with the existing warmer implementation, so I'm going to take this opportunity to refactor all of it before release - so thank you once again for your time and patience with all of this!

@whitfin whitfin modified the milestones: v3.7.0, v4.0.0 Mar 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants