You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following usecases should be in mind for the refactor:
It should actually not be a pile of spaghetti
It should be possible to implement modules that consider that one LM query may be usable to answer multiple requests - for example, if the same query is repeated, or if there are two loglikelihood queries that each have the same context and different single-token-long-responses (there are a load of these - anything that continues with " yes" and " no" fits the bill). While it would be a violation of the abstraction to actually look at how many tokens the continuation is, it still makes sense to aggregate stuff with the same context and let LM know somehow that these are potentially-optimizable - and even it it's not single-token continuation, it might still be possible to cache the context or something. (this will require moderate changes to LM interface) I'm thinking tentatively maybe a flag it can pass along per-request to say "hey, this is something that should be cached for x times (evaluator counts how many times it expects it to be reused)" so only the things that get reused actually get cached, and get evicted when no longer necessary. This does have the chance that a new LM impl might not handle exactly the same and get out of sync on the count, but that just introduces inefficiency rather than breaking anything. Any better idea proposasls are welcome.
The text was updated successfully, but these errors were encountered:
The following usecases should be in mind for the refactor:
The text was updated successfully, but these errors were encountered: