-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audit log proposal II #5062
base: main
Are you sure you want to change the base?
Audit log proposal II #5062
Conversation
Co-authored-by: Dimitri POSTOLOV <[email protected]>
Co-authored-by: Dimitri POSTOLOV <[email protected]>
Co-authored-by: Dimitri POSTOLOV <[email protected]>
|
docs/proposals/proposal-audit-log.md
Outdated
INDEX idx_target_id target_id TYPE set(0) GRANULARITY 64, | ||
) ENGINE = MergeTree () | ||
ORDER BY event_time | ||
TTL timestamp + INTERVAL 3 MONTH; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TTL timestamp + INTERVAL 3 MONTH; | |
TTL timestamp + INTERVAL 2 YEAR; |
@dotansimha how long should we keep the audit logs for? Is there any requirement in SOC-2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to 2 years. If it is too expensive we can always lower it
docs/proposals/proposal-audit-log.md
Outdated
target_id UUID, | ||
target_name STRING, | ||
schema_version_id UUID, | ||
event_action STRING NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
event_action STRING NOT NULL, | |
event_action LowCardinality(String) CODEC(ZSTD(1)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you get my point
const { createClient } = require('@clickhouse/client'); | ||
const clickhouse = createClient() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do have our own clickhouse client (packages/services/api/src/modules/operations/providers/clickhouse-client.ts)
const { createClient } = require('@clickhouse/client'); | ||
const clickhouse = createClient() | ||
|
||
type AuditLogEvent = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be support all events in a backwards compatible manner, so I guess we will need something more strict than string
for an event type and for the details, something to avoid "reading 'foo' of undefined" errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed eventAction
to be AuditLogEventAction.
That can be a generated enum from the GQL schema.
|
||
|
||
```sql | ||
CREATE TABLE audit_log ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, I am more in favor of computing the message from the data. The thought behind this is that we want to make parts of the message interactive (e.g. link to the organization or user etc. from the UI). But for export purposes also having a message within the table might make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same opinion as @n1ru4l.
Also storing the message would be useless if we ever do i18n in the future. Rendering the message is trivial and we can share the code for it between FE/BE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have both. If we store the message in clickhouse, we can export data from it in CSV format and ship it to S3, with no IO reading/writing/transforms, and still produce a message dynamically in React.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo we should store messages and use them for file exports (text and csv)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I will add the column for that
docs/proposals/proposal-audit-log.md
Outdated
export class AuditLog { | ||
logAuditEvent (event: AuditLogEvent) { | ||
const { userId, organizationId, projectId, targetId, schemaVersionId, eventAction, details } = event; | ||
const userEmail = userId ? await getUserEmail(userId) : null; // get user email would be cached in redis for 10 minutes to avoid hitting the DB too often |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every action that requires a user, has the user info cached already, so it won't be a problem. I will explain more once you will start implementing these in the codebase.
}; | ||
} | ||
|
||
// sample usage - always await unless performance is critical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should always await
but also introduce a prometheus metric to measure how long it took to insert an event and for what action type.
docs/proposals/proposal-audit-log.md
Outdated
| USER_REMOVED | Admin **[email protected]** removed user **[email protected]**. | | ||
| ORG_TRANSFERRED | Admin **[email protected]** transferred ownership. | | ||
| SCHEMA_CHECKED | CI made a schema check for **Project Alpha**. | | ||
| SCHEMA_PUBLISH | User **[email protected]** published a new schema version for **Project Alpha**. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SCHEMA_PUBLISH | User **[email protected]** published a new schema version for **Project Alpha**. | | |
| SCHEMA_PUBLISHED | User **[email protected]** published a new schema version for **Project Alpha**. | |
@n1ru4l do you think we should use past tense everywhere? SCHEMA_PUBLISHED
vs PUBLISH_SCHEMA
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed this for consistency with all the other ones
docs/proposals/proposal-audit-log.md
Outdated
event_time: Date! | ||
user_id: String | ||
user_email: String | ||
organization_id: String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
organization_id: String | |
organization_id: String! |
I think we will always have organization_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct. We can always make it non nullable if we introduce event for something like user signup
docs/proposals/proposal-audit-log.md
Outdated
id UUID PRIMARY KEY, | ||
url TEXT, | ||
filters JSON, | ||
valid_until TIMESTAMP WITH TIME ZONE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valid_until TIMESTAMP WITH TIME ZONE, | |
expires_at TIMESTAMP WITH TIME ZONE, |
maybe? But I like the idea of using S3+TTL and a column in PG to make sure we hide the export
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
await clickhouse.query({ | ||
query, | ||
format: 'CSV', | ||
}).stream().pipe(writableStream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to include formatted messages in the CSV or TEXT output, we will have to go row by row and render it.
Maybe storing full messages in clickhouse is good option to avoid such scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added it now to clickhouse table
```sql | ||
CREATE TABLE audit_log_export ( | ||
id UUID PRIMARY KEY, | ||
url TEXT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would have to be a presigned url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes writableStream is not defined, but it would be something like in this SO answer: https://stackoverflow.com/a/50291380/671457
} | ||
|
||
type Mutation { | ||
exportAuditLogsToFile(filter: AuditLogFilter): AuditLogFileExport |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess here, we will show a link to the file, right? And filters, to render the thing correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think we want to give users ability to filter in exports too, not just in UI.
url is a field in the AuditLogFileExport, so FE can pick that when doing the mutation
I have made some changes based on your feedback @kamilkisiela. |
what do you think @n1ru4l? I have only seen one comment from you. Do you reckon I could start implementing this feature based on this proposal? |
All good. Next time you can message me on Slack, because I missed this comment and it's been a week. |
Sorry, will do next time. So I will close this PR as it seems everything is addressed. |
event_time: Date! | ||
user_id: String | ||
user_email: String | ||
organization_id: String! | ||
project_id: String | ||
project_name: String | ||
target_id: String | ||
target_name: String | ||
schema_version_id: String | ||
event_action: AuditLogEventAction! | ||
event_details: JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets stay consistent and use camel case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at migrations here:
https://github.com/kamilkisiela/graphql-hive/tree/d07cc6b1c2992227d13c79325927e7f6ebb857de/packages/migrations/src/clickhouse-actions
it seems we are using snake case for clickhouse tables.
userEmail: String | ||
projectId: String | ||
targetId: String | ||
eventAction: String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be AuditLogEventAction
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change it
edges: [ExportResultEdge] | ||
pageInfo: PageInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets make these non nullable
| USER_REMOVED | Admin **[email protected]** removed user **[email protected]**. | | ||
| ORGANIZATION_TRANSFERRED | Admin **[email protected]** transferred ownership to **[email protected]**. | | ||
| SCHEMA_CHECKED | CI made a schema check for **Project Alpha**. | | ||
| SCHEMA_PUBLISHED | User **[email protected]** published a new schema version for **Project Alpha**. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schema publishes are not linked to an account, but access token, so we also need to cover that an action belongs to an access token, and make it possible to filter for actions done through an access token? 🤔
target_name STRING, | ||
schema_version_id LowCardinality(String) CODEC(ZSTD(1)), | ||
event_action LowCardinality(String) CODEC(ZSTD(1)), | ||
event_details JSON, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set setting allow_experimental_object_type = 1 in order to allow it. (ILLEGAL_COLUMN) (version 24.4.3.25 (official build))"
}
actually if we want to use JSON, we would have to enable allow_experimental_object_type
.
Do you want to enable that or change this to plain string @kamilkisiela ?
I would go with allow_experimental_object_type
if it was up to me.
actually after reading up on it, they will replace it with something else. So let's use plain String for now ClickHouse/ClickHouse#54864
just for review purposes.
If you prefer to read as markdown open here: https://github.com/capaj/graphql-hive/blob/audit-log-proposal-2/docs/proposals/proposal-audit-log.md