-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core/identity: refactor identity manager #5091
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on, I think this is a good approach.
Just a few questions / suggestions from me... I think I almost understand the new scheduler logic but I'm not quite there yet.
var managerUser manager.User | ||
managerUser.User, _ = user.Get(ctx, s.dataBrokerClient, sess.GetUserId()) | ||
if managerUser.User == nil { | ||
u, _ := user.Get(ctx, s.dataBrokerClient, sess.GetUserId()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for simplifying this... not sure why I still had the manager.User here.
select { | ||
case uuis.reset <- struct{}{}: | ||
default: | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This effectively de-duplicates Reset() calls, is that right? Because reset
is a buffered channel, this should never "miss" a reset, because sending to the reset channel should never block unless there's already a pending item in the channel buffer?
(not requesting any changes, just want to make sure I understand how this works)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment. Yes we're basically using the channel as a signaling mechanism. There are other ways to accomplish this, but using channels allows us to use the select
statement.
case <-uuis.reset: | ||
ticker.Reset(uuis.updateUserInfoInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm still missing something— when do we need to reset the ticker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We reset after the user is updated. This fixes an unnecessary update in the original scheduler. When a session is refreshed it also updates the user data, in which case it would be better to reset the timer and wait another 10 minutes before doing it again. The original scheduler would keep to its 10 minute interval regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, got it.
case <-uuis.reset: | ||
ticker.Reset(uuis.updateUserInfoInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, got it.
Co-authored-by: Kenneth Jenkins <[email protected]>
Summary
Implement a new version of the identity manager. The legacy version can be enabled via a runtime flag, but the new version is the default.
The identity manager is responsible for two things: refreshing user sessions and updating user information. A typical Pomerium session lasts for 14 hours, whereas a typical IdP OIDC token is valid for 1 hour. So we refresh the user's IdP token so that they don't have to login again. We also periodically (every 10 minutes) update the user's info from the IdP so that policy evaluation is always against relatively fresh data.
The identity manager receives session and user data from a Syncer to the Databroker. Every new session that gets created is pushed to the identity manager.
The legacy implementation maintained a sorted list of scheduled session refreshes and user updates. Each loop iteration we would retrieve the next ready session, refresh it, and then move on to users once all soon-to-expire sessions have been refreshed. The problem with this design is that if refreshing sessions or updating user information takes a long time, it blocks the identity manager from doing anything else. In particular if a session has an hour to be refreshed and all the sessions roughly expire at the same time, and calls to refresh a session take several seconds, we could very easily fail to schedule in time, leading to the session token expiring during Authorization and users being logged out.
Rather than attempting to refresh multiple sessions or update multiple users concurrently, the new identity manager uses an entirely different approach.
For each new session we start an independent
refreshSessionScheduler
and for each user we start an independentupdateUserInfoScheduler
. These schedulers run in their own goroutines and when ready invoke a method on theManager
. This removes the bottleneck as all sessions can now refresh concurrently. It also makes the code easier to understand.The downside of this approach is that we will be starting lots of goroutines. A goroutine for each session and user object. A goroutine uses roughly 2KB of RAM, so for 10,000 sessions and 1,000 users this would mean roughly 21MB of RAM. In addition to memory usage there's the added work put on the Go scheduler, which ought to be negligible. Based on these back-of-the-napkin calculations I don't think this needs further optimization, but we could pursue that in a subsequent PR if that's desired.
Related issues
User Explanation
The new identity manager should change no functionality.
Checklist
improvement
/bug
/ etc)