Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use derived_tstamp as the primary tstamp in Redshift #420

Open
yalisassoon opened this issue Feb 19, 2016 · 5 comments
Open

Use derived_tstamp as the primary tstamp in Redshift #420

yalisassoon opened this issue Feb 19, 2016 · 5 comments
Assignees

Comments

@yalisassoon
Copy link
Member

Currently we use the collector_tstamp for the root_tstamp value.
It would be preferable to use the derived_tstamp once all our client side trackers support generating a dvce_sent_tstamp. (Because that point from an analytics perspective you're only interested in the derived_tstamp.

We need to figure out how we migrate from collector_tstamp -> derived_tstamp e.g. what happens for old users who have events without derived_tstamp values.

@yalisassoon
Copy link
Member Author

This impacts the table SORTKEY as well

@alexanderdean alexanderdean self-assigned this Feb 19, 2016
@tdevitt
Copy link

tdevitt commented May 31, 2016

I noticed sql-runner appears to still be using collector_tstamp instead of derived_tstamp as well, so you guys may want to update that when you get to this issue

@bogaert
Copy link

bogaert commented Jul 5, 2016

We need to test whether this leads to tables that are more unsorted after inserting new events.

@bogaert
Copy link

bogaert commented Nov 21, 2016

It'd be interesting to explore the impact of this patch: https://forums.aws.amazon.com/ann.jspa?annID=4157

Data loading enhancement. If you load your data in sort key order using a compound sort key with only one sort column, you might now reduce or even eliminate the need to vacuum as the COPY command automatically adds new rows in sort order to the table's sorted region

This should eliminate the need to vacuum if we continue to use the collector timestamp, and reduce (but not eliminate) this need if we were to switch to the derived timestamp.

@alexanderdean
Copy link
Member

Likely this is no longer a good idea, but moving over to review

@alexanderdean alexanderdean transferred this issue from snowplow/snowplow Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants