Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot cast drive.start_field and drive.end_field to integer or smallint #45

Open
albertlyu opened this issue Aug 10, 2014 · 8 comments

Comments

@albertlyu
Copy link

I'm looking to make some schema and data type tweaks to nfldb so that I can more easily serve nfldb data to Rails' Active Record models. I'm doing just fine with automatically casting user-defined data types to varchars, like player.position, player.status, game.day_of_week, and so on. But it seems that I'm not having much luck with the numeric user-defined data types for fields like drive.start_field, drive.end_field, and drive.pos_time.

Is there any way to do this? Here's what I've tried so far:

nfldb=> ALTER TABLE drive ALTER COLUMN start_field TYPE int;
ERROR:  column "start_field" cannot be cast automatically to type integer
HINT:  Specify a USING expression to perform the conversion.

So I tried specifying a USING expression, but here's the error:

nfldb=> ALTER TABLE drive ALTER COLUMN start_field TYPE int USING start_field::int;
ERROR:  cannot cast type field_pos to integer

It seems that I need to go one level deeper, not just setting the data type for the user-defined type field_pos, but also for the domain constraint field_offset.

The business reason for altering data types is that Rails Active Record Migrations can't recognize user-defined types in PostgreSQL, so won't know to interpret start_field and end_field as integers. I'm making a bunch of other table and column alters too, and have taken care to write changes to revert to the original schema so that nfldb-update still works.

Thanks in advance!

@BurntSushi
Copy link
Owner

The business reason for altering data types is that Rails Active Record Migrations can't recognize user-defined types in PostgreSQL, so won't know to interpret start_field and end_field as integers. I'm making a bunch of other table and column alters too, and have taken care to write changes to revert to the original schema so that nfldb-update still works.

This sounds like a great project! Thank you for working on it. Bringing nfldb into the Rails world seems like a great thing.

With respect to your problem... The issue is that the field_pos type is actually a "row" type with a single column. You can see the different with these two select queries:

This selects the full "row": (notice the parenthesis, you could also think of start_field as a tuple of a single value)

SELECT start_field FROM drive LIMIT 1;
 start_field 
-------------
 (-30)
(1 row)

And you can access the individual components of the row or tuple:

SELECT (start_field).pos FROM drive LIMIT 1;
 start_field 
-------------
 -30
(1 row)

And now you can cast:

SELECT (start_field).pos::int FROM drive LIMIT 1;
 start_field 
-------------
 -30
(1 row)

The issue is that you were trying to cast a tuple to an integer directly. This problem would have been more clear if it weren't a 1-tuple, but creating custom types is useful for automatic serialization with Python's PostgreSQL driver, psycopg2.

@albertlyu
Copy link
Author

Hi -- thanks, that was a very helpful explanation! It makes sense now why some fields are encapsulated by parentheses -- they're tuples with only one element! start_time and end_time are more obviously 2-tuples of data type game_time as (phase, elapsed), and so on (e.g. (Q1,0)).

So I can cast (start_field).pos directly to an integer, but I am now getting a different error when altering the column using this method. Do you think you can explain the following?

ALTER TABLE drive ALTER COLUMN start_field SET DATA TYPE int USING (start_field).pos::int;

ERROR:  column notation .pos applied to type integer, which is not a composite type
LINE 1: ...BLE drive ALTER COLUMN start_field SET DATA TYPE int USING (...
                                                             ^

My guess is that this ALTER statement is trying to alter the column (a tuple) and change the data type by using an individual component of the tuple, instead of the tuple itself. But all I want is the component of the tuple, not the 'row' itself (composite type is to row/record as data type is to field?).

Perhaps I'm not understanding how USING is worked when changing the data type of a column. Curiously, the following also return the same error message, though they may be in effect executing the same command:

ALTER TABLE drive ALTER COLUMN start_field SET DATA TYPE int USING (start_field).pos;

ERROR:  column notation .pos applied to type integer, which is not a composite type
LINE 1: ...BLE drive ALTER COLUMN start_field SET DATA TYPE int USING (...
                                                             ^
ALTER TABLE drive ALTER COLUMN start_field TYPE int USING (start_field).pos;

ERROR:  column notation .pos applied to type integer, which is not a composite type
LINE 1: ...BLE drive ALTER COLUMN start_field TYPE int USING (start_fie...
                                                             ^

Any thoughts?

@BurntSushi
Copy link
Owner

I'm stumped. Can you brute force it instead?

nfldb_test=# alter table drive add column start_field_scalar field_offset;
ALTER TABLE
nfldb_test=# update drive set start_field_scalar = (start_field).pos;
UPDATE 39368
nfldb_test=# alter table drive drop column start_field;
ALTER TABLE
nfldb_test=# alter table drive rename column start_field_scalar to start_field;
ALTER TABLE

Resulting table:

nfldb_test=# \d drive
               Table "public.drive"
    Column     |         Type         | Modifiers 
---------------+----------------------+-----------
 gsis_id       | gameid               | not null
 drive_id      | usmallint            | not null
 start_time    | game_time            | not null
 end_field     | field_pos            | 
 end_time      | game_time            | not null
 pos_team      | character varying(3) | not null
 pos_time      | pos_period           | 
 first_downs   | usmallint            | not null
 result        | text                 | 
 penalty_yards | smallint             | not null
 yards_gained  | smallint             | not null
 play_count    | usmallint            | not null
 time_inserted | utctime              | not null
 time_updated  | utctime              | not null
 start_field   | field_offset         | 

@albertlyu
Copy link
Author

Sorry for the late response, but I will resort to adding/dropping columns if I can't figure out another way. Feel free to close this if you want to keep the issue list with recent or high priority issues.

@albertlyu
Copy link
Author

Thanks again for the help!

@BurntSushi
Copy link
Owner

No worries, I'm just doing some tidying. Feel free to close this when you reach a resolution.

I noodled on this a little more, and I speculate that it may be impossible to do with a single ALTER TABLE command. I think the problem is that ALTER COLUMN will only let you change the type if the in memory representation is the same. I would have thought that a tuple with a single field would have the same representation as a single field itself, but perhaps that was a bad assumption.

In any case, I found it difficult to get proper documentation on the matter. The ALTER TABLE docs are broad without much depth.

@albertlyu
Copy link
Author

So is it possible to do this with multiple ALTER TABLE commands without dropping the columns? Like altering the type of the field or the tuple in subsequent ALTER TABLE commands? The way I ideally would like to set up my ALTER statements would be each table per ALTER statement with ALTER COLUMNS one by one, but there does not seem to be a straightforward way to go about that. Maybe this is not worth the trouble to find a 'clean way' of alterring the data types from tuples. Hmm.

@BurntSushi
Copy link
Owner

If my speculation about ALTER COLUMN not being allowed to change the in memory representation of a column is correct (and (value,) has a different representation than value), then no, it cannot be done with just ALTER commands.

I emphasize that this is speculation. Haven't really had time to dig deeper.

The method I suggested above is a bit brutish, but if it's contained inside a single transaction, then it's guaranteed that you won't corrupt your database by being left in an intermediate state. (You should be thankful that you aren't dealing with SQLite. It doesn't even have an ALTER TABLE command in the first place! You have to create a temporary table, copy data to it, drop existing table, create new table with schema changes, copy old data into new table with whatever transformations are necessary and then drop the temporary table. Blech.)

If you're writing migrations and it's doing transactions correctly, then there really isn't a major downside to just using my approach. In fact, it's probably pretty close to what happens under the hood if the ALTER TABLE command worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants