Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to create joins #171

Open
expectopatronum opened this issue Aug 14, 2016 · 2 comments
Open

How to create joins #171

expectopatronum opened this issue Aug 14, 2016 · 2 comments

Comments

@expectopatronum
Copy link

First I'd like to thank you for your great work!

I'm working on a small project where we have quite a complex SQL query which aggregates statistics from the play and agg_play table for each game. So far I save it as csv and later have to import it to python. I'd like to skip this step and do it directly with the nfldb methods.

I tried a few things but currently I can not get it to work like I would.

  1. I found this file https://github.com/BurntSushi/nfldb/blob/master/tests/test_sql.py but it seems that the join methods are not visible when importing nfldb. I copied join and joins_to to my source file and tried to use. But it doesn't return anything and it also doesn't change anything. But since it is in the test folder maybe it shouldn't be used?

  2. I tried to loop over games and then over e.g. play like this.

db = nfldb.connect()
q = nfldb.Query(db)

q.game(season_year=2012, season_type='Regular')
q.player(full_name='Peyton Manning')

for g in q.as_games():
    q.play(gsis_id=g.gsis_id)
    for p in q.as_aggregate():
        features = [g.gsis_id, g.week, g.season_type, g.season_year,g.start_time, g.home_team, g.home_score, g.home_score_q1, g.home_score_q2, g.home_score_q3, g.home_score_q4,
                    p.passing_tds, p.rushing_tds, p.kicking_fga, p.kicking_fgm, p.fumbles_tot, p.fumbles_lost, p.kicking_xpa]
        print '\t'.join(str(f) for f in features)

But this does not work because of course in the loop the query is set to only this game and I am not able to loop over play and play as_aggregate() in nested loops, this does not make much sense....

My desired output would be like the table in the screenshot. Do you see any way how this can be done with the nfldb library?

screen shot 2016-08-14 at 21 30 42

@ochawkeye
Copy link
Contributor

I have no idea if it is the most efficient but why not just create a second query once you have a single game in hand?

import nfldb

db = nfldb.connect()
q = nfldb.Query(db)

q.game(season_year=2012, season_type='Regular')
q.player(full_name='Peyton Manning')

for g in q.as_games():
    game_info = map(str, [g.gsis_id, g.week, g.season_type, g.season_year, g.start_time, g.home_team, g.home_score_q1, g.home_score_q2, g.home_score_q3, g.home_score_q4])

    new_q = nfldb.Query(db)
    new_q.game(gsis_id=g.gsis_id)
    new_q.player(full_name='Peyton Manning')
    for p in new_q.as_aggregate():
        player_info = map(str, [p.passing_tds, p.rushing_tds, p.kicking_fga, p.kicking_fgm, p.fumbles_tot, p.fumbles_lost, p.kicking_xpa])
    print game_info + player_info
['2012121606', '15', 'Regular', '2012', '2012-12-16 12:00:00-06:00', 'BAL', '0', '0', '3', '14', '1', '0', '0', '0', '0', '0', '0']
['2012101500', '6', 'Regular', '2012', '2012-10-15 19:30:00-05:00', 'SD', '10', '14', '0', '0', '3', '0', '0', '0', '0', '0', '0']
['2012111810', '11', 'Regular', '2012', '2012-11-18 15:25:00-06:00', 'DEN', '0', '17', '7', '6', '3', '0', '0', '0', '0', '0', '0']
['2012102811', '8', 'Regular', '2012', '2012-10-28 19:20:00-05:00', 'DEN', '7', '10', '7', '10', '3', '0', '0', '0', '0', '0', '0']
['2012110404', '9', 'Regular', '2012', '2012-11-04 12:00:00-06:00', 'CIN', '0', '3', '10', '10', '3', '0', '0', '0', '0', '0', '0']
['2012112504', '12', 'Regular', '2012', '2012-11-25 12:00:00-06:00', 'KC', '6', '0', '3', '0', '2', '0', '0', '0', '0', '0', '0']
['2012093008', '4', 'Regular', '2012', '2012-09-30 15:05:00-05:00', 'DEN', '10', '0', '21', '6', '3', '0', '0', '0', '0', '0', '0']
['2012090912', '1', 'Regular', '2012', '2012-09-09 19:20:00-05:00', 'DEN', '0', '7', '7', '17', '2', '0', '0', '0', '0', '0', '0']
['2012120209', '13', 'Regular', '2012', '2012-12-02 15:05:00-06:00', 'DEN', '7', '0', '21', '3', '3', '0', '0', '0', '0', '0', '0']
['2012123012', '17', 'Regular', '2012', '2012-12-30 15:25:00-06:00', 'DEN', '7', '14', '14', '3', '3', '0', '0', '0', '0', '0', '0']
['2012111100', '10', 'Regular', '2012', '2012-11-11 12:00:00-06:00', 'CAR', '7', '0', '0', '7', '1', '0', '0', '0', '1', '1', '0']
['2012092311', '3', 'Regular', '2012', '2012-09-23 15:25:00-05:00', 'DEN', '5', '6', '0', '14', '2', '0', '0', '0', '0', '0', '0']
['2012100709', '5', 'Regular', '2012', '2012-10-07 15:25:00-05:00', 'NE', '7', '10', '14', '0', '3', '0', '0', '0', '1', '1', '0']
['2012122311', '16', 'Regular', '2012', '2012-12-23 15:05:00-06:00', 'DEN', '7', '7', '7', '13', '3', '0', '0', '0', '0', '0', '0']
['2012091700', '2', 'Regular', '2012', '2012-09-17 19:30:00-05:00', 'ATL', '10', '10', '7', '0', '1', '0', '0', '0', '0', '0', '0']
['2012120600', '14', 'Regular', '2012', '2012-12-06 19:20:00-06:00', 'OAK', '0', '7', '0', '6', '1', '0', '0', '0', '0', '0', '0']

@sansbacon
Copy link

I use materialized views for weekly and season-long statistics and then add indexes on the materialized views. Here is an example for weekly targets:

CREATE MATERIALIZED VIEW cs_targets AS

WITH aggwt AS (
SELECT
pp.gsis_id, pp.player_id, pp.team, sum(pp.receiving_tar) as tar, sum(pp.receiving_rec) as rec
FROM
play_player AS pp
INNER JOIN game as g ON pp.gsis_id = g.gsis_id
WHERE g.season_type = 'Regular' AND g.season_year = 2016
GROUP BY
pp.gsis_id, pp.player_id, pp.team
)

SELECT
aggwt.gsis_id, g.season_year, g.week, aggwt.player_id, p.full_name, p.position, aggwt.team,
(CASE WHEN aggwt.team = g.home_team THEN g.away_team ELSE g.home_team END) as opp,
aggwt.tar, aggwt.rec

FROM aggwt
INNER JOIN player as p ON aggwt.player_id = p.player_id
INNER JOIN game as g ON aggwt.gsis_id = g.gsis_id

WHERE
p.position in ('RB', 'WR', 'TE')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants