Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/identity: refactor identity manager #5091

Merged
merged 20 commits into from
May 2, 2024

Conversation

calebdoxsey
Copy link
Contributor

Summary

Implement a new version of the identity manager. The legacy version can be enabled via a runtime flag, but the new version is the default.

The identity manager is responsible for two things: refreshing user sessions and updating user information. A typical Pomerium session lasts for 14 hours, whereas a typical IdP OIDC token is valid for 1 hour. So we refresh the user's IdP token so that they don't have to login again. We also periodically (every 10 minutes) update the user's info from the IdP so that policy evaluation is always against relatively fresh data.

The identity manager receives session and user data from a Syncer to the Databroker. Every new session that gets created is pushed to the identity manager.

The legacy implementation maintained a sorted list of scheduled session refreshes and user updates. Each loop iteration we would retrieve the next ready session, refresh it, and then move on to users once all soon-to-expire sessions have been refreshed. The problem with this design is that if refreshing sessions or updating user information takes a long time, it blocks the identity manager from doing anything else. In particular if a session has an hour to be refreshed and all the sessions roughly expire at the same time, and calls to refresh a session take several seconds, we could very easily fail to schedule in time, leading to the session token expiring during Authorization and users being logged out.

Rather than attempting to refresh multiple sessions or update multiple users concurrently, the new identity manager uses an entirely different approach.

For each new session we start an independent refreshSessionScheduler and for each user we start an independent updateUserInfoScheduler. These schedulers run in their own goroutines and when ready invoke a method on the Manager. This removes the bottleneck as all sessions can now refresh concurrently. It also makes the code easier to understand.

The downside of this approach is that we will be starting lots of goroutines. A goroutine for each session and user object. A goroutine uses roughly 2KB of RAM, so for 10,000 sessions and 1,000 users this would mean roughly 21MB of RAM. In addition to memory usage there's the added work put on the Go scheduler, which ought to be negligible. Based on these back-of-the-napkin calculations I don't think this needs further optimization, but we could pursue that in a subsequent PR if that's desired.

Related issues

User Explanation

The new identity manager should change no functionality.

Checklist

  • reference any related issues
  • updated docs
  • updated unit tests
  • updated UPGRADING.md
  • add appropriate tag (improvement / bug / etc)
  • ready for review

@calebdoxsey calebdoxsey added enhancement New feature or request performance labels Apr 26, 2024
@calebdoxsey calebdoxsey requested a review from a team as a code owner April 26, 2024 21:47
@coveralls
Copy link

coveralls commented Apr 26, 2024

Coverage Status

coverage: 56.73% (-0.6%) from 57.377%
when pulling 87fa449 on cdoxsey/identity-manager-refactor-1
into 8b3a791 on main.

Copy link
Contributor

@kenjenkins kenjenkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on, I think this is a good approach.

Just a few questions / suggestions from me... I think I almost understand the new scheduler logic but I'm not quite there yet.

internal/identity/legacymanager/manager.go Outdated Show resolved Hide resolved
var managerUser manager.User
managerUser.User, _ = user.Get(ctx, s.dataBrokerClient, sess.GetUserId())
if managerUser.User == nil {
u, _ := user.Get(ctx, s.dataBrokerClient, sess.GetUserId())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for simplifying this... not sure why I still had the manager.User here.

internal/identity/manager/data.go Show resolved Hide resolved
internal/identity/manager/datastore_test.go Outdated Show resolved Hide resolved
Comment on lines +40 to +43
select {
case uuis.reset <- struct{}{}:
default:
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This effectively de-duplicates Reset() calls, is that right? Because reset is a buffered channel, this should never "miss" a reset, because sending to the reset channel should never block unless there's already a pending item in the channel buffer?

(not requesting any changes, just want to make sure I understand how this works)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. Yes we're basically using the channel as a signaling mechanism. There are other ways to accomplish this, but using channels allows us to use the select statement.

Comment on lines +58 to +59
case <-uuis.reset:
ticker.Reset(uuis.updateUserInfoInterval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm still missing something— when do we need to reset the ticker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We reset after the user is updated. This fixes an unnecessary update in the original scheduler. When a session is refreshed it also updates the user data, in which case it would be better to reset the timer and wait another 10 minutes before doing it again. The original scheduler would keep to its 10 minute interval regardless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, got it.

Comment on lines +58 to +59
case <-uuis.reset:
ticker.Reset(uuis.updateUserInfoInterval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, got it.

internal/identity/manager/manager.go Outdated Show resolved Hide resolved
@calebdoxsey calebdoxsey merged commit a95423b into main May 2, 2024
16 checks passed
@calebdoxsey calebdoxsey deleted the cdoxsey/identity-manager-refactor-1 branch May 2, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants