core/identity: refactor identity manager #5091

calebdoxsey · 2024-04-26T21:47:26Z

Summary

Implement a new version of the identity manager. The legacy version can be enabled via a runtime flag, but the new version is the default.

The identity manager is responsible for two things: refreshing user sessions and updating user information. A typical Pomerium session lasts for 14 hours, whereas a typical IdP OIDC token is valid for 1 hour. So we refresh the user's IdP token so that they don't have to login again. We also periodically (every 10 minutes) update the user's info from the IdP so that policy evaluation is always against relatively fresh data.

The identity manager receives session and user data from a Syncer to the Databroker. Every new session that gets created is pushed to the identity manager.

The legacy implementation maintained a sorted list of scheduled session refreshes and user updates. Each loop iteration we would retrieve the next ready session, refresh it, and then move on to users once all soon-to-expire sessions have been refreshed. The problem with this design is that if refreshing sessions or updating user information takes a long time, it blocks the identity manager from doing anything else. In particular if a session has an hour to be refreshed and all the sessions roughly expire at the same time, and calls to refresh a session take several seconds, we could very easily fail to schedule in time, leading to the session token expiring during Authorization and users being logged out.

Rather than attempting to refresh multiple sessions or update multiple users concurrently, the new identity manager uses an entirely different approach.

For each new session we start an independent refreshSessionScheduler and for each user we start an independent updateUserInfoScheduler. These schedulers run in their own goroutines and when ready invoke a method on the Manager. This removes the bottleneck as all sessions can now refresh concurrently. It also makes the code easier to understand.

The downside of this approach is that we will be starting lots of goroutines. A goroutine for each session and user object. A goroutine uses roughly 2KB of RAM, so for 10,000 sessions and 1,000 users this would mean roughly 21MB of RAM. In addition to memory usage there's the added work put on the Go scheduler, which ought to be negligible. Based on these back-of-the-napkin calculations I don't think this needs further optimization, but we could pursue that in a subsequent PR if that's desired.

Related issues

Regular session invalidation and login prompts due to expiring id_token ingress-controller#938

User Explanation

The new identity manager should change no functionality.

Checklist

… users

…ger-refactor-1

coveralls · 2024-04-26T21:54:00Z

coverage: 56.73% (-0.6%) from 57.377%
when pulling 87fa449 on cdoxsey/identity-manager-refactor-1
into 8b3a791 on main.

kenjenkins

Thanks for taking this on, I think this is a good approach.

Just a few questions / suggestions from me... I think I almost understand the new scheduler logic but I'm not quite there yet.

internal/identity/legacymanager/manager.go

kenjenkins · 2024-04-30T23:53:38Z

internal/authenticateflow/stateful.go

- var managerUser manager.User
- managerUser.User, _ = user.Get(ctx, s.dataBrokerClient, sess.GetUserId())
- if managerUser.User == nil {
+ u, _ := user.Get(ctx, s.dataBrokerClient, sess.GetUserId())


Thanks for simplifying this... not sure why I still had the manager.User here.

internal/identity/manager/data.go

internal/identity/manager/datastore_test.go

kenjenkins · 2024-05-01T00:15:47Z

internal/identity/manager/schedulers.go

+ select {
+ case uuis.reset <- struct{}{}:
+ default:
+ }


This effectively de-duplicates Reset() calls, is that right? Because reset is a buffered channel, this should never "miss" a reset, because sending to the reset channel should never block unless there's already a pending item in the channel buffer?

(not requesting any changes, just want to make sure I understand how this works)

Added a comment. Yes we're basically using the channel as a signaling mechanism. There are other ways to accomplish this, but using channels allows us to use the select statement.

kenjenkins · 2024-05-01T00:19:35Z

internal/identity/manager/schedulers.go

+ case <-uuis.reset:
+ ticker.Reset(uuis.updateUserInfoInterval)


I think I'm still missing something— when do we need to reset the ticker?

We reset after the user is updated. This fixes an unnecessary update in the original scheduler. When a session is refreshed it also updates the user data, in which case it would be better to reset the timer and wait another 10 minutes before doing it again. The original scheduler would keep to its 10 minute interval regardless.

Thanks, got it.

kenjenkins · 2024-05-01T22:38:24Z

internal/identity/manager/schedulers.go

+ case <-uuis.reset:
+ ticker.Reset(uuis.updateUserInfoInterval)


Thanks, got it.

internal/identity/manager/manager.go

Co-authored-by: Kenneth Jenkins <[email protected]>

calebdoxsey added 14 commits April 19, 2024 11:04

core/identity: add data store for thread-safe storage of sessions and…

a6577fd

… users

wip

a80ef11

add test

2eaa429

wip

0e3e5ff

clean up context

a8efb59

fix nil session error

e21dc6f

add stop message

8dfb74e

remove log

1a5a9f1

use origin context

27b3a39

use base context for manager calls

6965bfb

use manager context for syncers too

7856830

Merge remote-tracking branch 'origin/main' into cdoxsey/identity-mana…

66ed736

…ger-refactor-1

Merge remote-tracking branch 'origin/main' into cdoxsey/identity-mana…

14cbb23

…ger-refactor-1

add runtime flag

c4768ad

calebdoxsey added enhancement New feature or request performance labels Apr 26, 2024

calebdoxsey requested a review from a team as a code owner April 26, 2024 21:47

calebdoxsey requested review from wasaga and kenjenkins April 26, 2024 21:47

kenjenkins reviewed May 1, 2024

View reviewed changes

calebdoxsey added 4 commits May 1, 2024 08:47

rename legacy lease

3f58d8c

add comment

3d43939

use NotSame

0b14792

add comment

3148e49

kenjenkins approved these changes May 1, 2024

View reviewed changes

calebdoxsey and others added 2 commits May 2, 2024 09:34

Update internal/identity/manager/manager.go

81a26ff

Co-authored-by: Kenneth Jenkins <[email protected]>

lint

87fa449

calebdoxsey merged commit a95423b into main May 2, 2024
16 checks passed

calebdoxsey deleted the cdoxsey/identity-manager-refactor-1 branch May 2, 2024 16:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/identity: refactor identity manager #5091

core/identity: refactor identity manager #5091

calebdoxsey commented Apr 26, 2024

coveralls commented Apr 26, 2024 •

edited

Loading

kenjenkins left a comment

kenjenkins Apr 30, 2024

kenjenkins May 1, 2024

calebdoxsey May 1, 2024

kenjenkins May 1, 2024

calebdoxsey May 1, 2024

kenjenkins May 1, 2024

kenjenkins May 1, 2024

core/identity: refactor identity manager #5091

core/identity: refactor identity manager #5091

Conversation

calebdoxsey commented Apr 26, 2024

Summary

Related issues

User Explanation

Checklist

coveralls commented Apr 26, 2024 • edited Loading

kenjenkins left a comment

Choose a reason for hiding this comment

kenjenkins Apr 30, 2024

Choose a reason for hiding this comment

kenjenkins May 1, 2024

Choose a reason for hiding this comment

calebdoxsey May 1, 2024

Choose a reason for hiding this comment

kenjenkins May 1, 2024

Choose a reason for hiding this comment

calebdoxsey May 1, 2024

Choose a reason for hiding this comment

kenjenkins May 1, 2024

Choose a reason for hiding this comment

kenjenkins May 1, 2024

Choose a reason for hiding this comment

coveralls commented Apr 26, 2024 •

edited

Loading