Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behavior produced when transforming a date column #2

Closed
fsaad opened this issue Jan 9, 2017 · 3 comments
Closed

Strange behavior produced when transforming a date column #2

fsaad opened this issue Jan 9, 2017 · 3 comments

Comments

@fsaad
Copy link

fsaad commented Jan 9, 2017

Using loom.transform on a date field produces strange columns. The raw data and schema I provided to loom.tasks.transform is

>> dataset.csv
name,       petal_width,    timestamp
setosa,     0.200,          '2014-01-03'
setosa,     0.299,          '2014-01-04'
versicolor, 0.200,          '2014-01-08'
versicolor, 0.200,          '2014-01-09'
virginica,  0.200,          '2014-02-10'
virginica,  0.200,          '2017-12-11'

>> schema.csv
Feature Name,   Type
name,           categorical
petal_width,    real
timestamp,      date

(the actual files don't have extraneous whitespaces after commas, I put them above for easier readability).

The result in ingest/rows_csv/dataset.csv/gz shows a strange transform for timestamp:

name,       petal_width,    timestamp.absolute, timestamp.day,  timestamp.month,    timestamp.week, timestamp.year
setosa,     0.200,          -87.0,              0,              3,                  4,              1
setosa,     0.299,          -86.0,              0,              4,                  5,              1
versicolor, 0.200,          -82.0,              0,              8,                  2,              1
versicolor, 0.200,          -81.0,              0,              9,                  3,              1
virginica,  0.200,          -49.0,              0,              10,                 0,              2
virginica,  0.200,          1351.0,             0,              11,                 0,              12

the timestamp.month appears to be the day, and the timetsamp.year appears to be the month, etc.

Maybe the issue is over here:
https://github.com/posterior/loom/blob/master/loom/transforms.py#L339-L343

@fritzo
Copy link
Member

fritzo commented Jan 10, 2017

Hi Feras, thanks for pointing at the helpful context in transforms.py. From the code there, it looks to me like the numbers are correct but the suffices are being truncated: 'timestamp.month' should really be 'timestamp.mod.month', i.e. the timstamp modulo month, which is intended. That seems to make sense, since 'timestamp.day' would mean 'timestamp.mod.day' i.e. hour, which is always zero, since you only have day-resolution (Loom is wary about assuming all data has only day-resolution, and avoids dropping categorical columns with only a single value, since those columns often result from subsampling, and Loom wants inference to be invariant wrt subsampling). It looks like Loom has pretty weak test of transform name consistency, and I'm not sure where the '.mod' is elided.

@fritzo
Copy link
Member

fritzo commented Jan 10, 2017

Note that Loom probably ought to transform to cyclic von Mises variables, rather than categoricals, but Loom does not implement the von Mises distribution (unlike BayesDB).

@fsaad
Copy link
Author

fsaad commented Jan 10, 2017

I'm not sure where the '.mod' is elided.

It was elided in my transcription of rows.csv.gz file into the GitHub issue; indeed it shows timestamp.mod.day rather than timestamp.day, etc. Sorry for the confusion.

From the code there, it looks to me like the numbers are correct but the suffices are being truncated: 'timestamp.month' should really be 'timestamp.mod.month', i.e. the timstamp modulo month, which is intended.

I see; since the current behavior is intended I'll close the ticket.

@fsaad fsaad closed this as completed Jan 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants