Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL DB from NFLDB #54

Open
iliketowel opened this issue Sep 8, 2014 · 3 comments
Open

SQL DB from NFLDB #54

iliketowel opened this issue Sep 8, 2014 · 3 comments

Comments

@iliketowel
Copy link

Hi all, this is less of an "issue" and more of a question, but I'm not quite sure of the best place to put it. I created a query to include most plays/results of the games. But I wanted to confirm that I'm pulling the data the best way possible. This is the (rather long) query I've made.

You'll see that I pull from PlayPlayer, and then pull the other data from there. Is this the best place to get the most accurate information?

Thanks,
Eric

SELECT
P.player_id, P.profile_url,
P.gsis_name,
P.full_name,
P."position",
CASE WHEN P.birthdate = '' THEN '1/1/1900'
WHEN P.birthdate = '//' THEN '1/1/1900'
Else P.birthdate
End as "birthdate",
P.years_pro,
G.gsis_id,
G.gsis_ID || PP.play_id as "GameplayID",
PP.team,
PP.passing_att,
PP.passing_cmp,
PP.passing_yds,
PP.passing_tds,
PP.passing_int,
PP.passing_cmp_air_yds,
PP.passing_twopta,
PP.passing_twoptm,
PP.passing_sk,
PP.passing_sk_yds,
PP.passing_incmp,
PP.passing_incmp_air_yds,
CASE WHEN PL.description like '% left%' THEN 'Left'
WHEN PL.description like '% middle%' THEN 'Middle'
WHEN PL.description like '% right%' THEN 'Right'
END as "play_direction",
PP.receiving_tar,
PP.receiving_rec,
PP.receiving_yds,
PP.receiving_tds,
PP.receiving_yac_yds,
PP.receiving_twopta,
PP.receiving_twoptm,
PP.rushing_att,
PP.rushing_yds,
PP.rushing_tds,
PP.rushing_twopta,
PP.rushing_twoptm,
PP.rushing_loss,
PP.rushing_loss_yds,
G.start_time,
G.week,
G.day_of_week,
G.season_year,
G.season_type,
G.home_team,
G.home_score,
G.away_team,
G.away_score,
PP.fumbles_tot,
PP.fumbles_lost,
PP.defense_qbhit,
PP.defense_tkl_loss,
PP.defense_tkl_loss_yds,
PP.defense_sk,
PP.defense_sk_yds,
PP.defense_ffum,
PP.defense_frec,
PP.defense_frec_yds,
PP.defense_frec_tds,
PP.defense_int,
PP.defense_int_tds,
PP.defense_int_yds,
PL.time,
PL.pos_team,
PL.yardline,
PL.down,
PL.yards_to_go,
PL.first_down,
PL.penalty,
PL.penalty_yds,
PL.description,
CASE WHEN PL.description like '%(Shotgun)%' THEN 'Shotgun'
END as "Shotgun",
PL.Note
FROM
public.play_player PP,
public.game G,
public.player P,
public.play PL
WHERE
PP.player_id = P.player_id AND
G.gsis_id = PP.gsis_id AND
G.gsis_id = PL.gsis_id AND
PL.play_id = PP.play_id AND
(PP.passing_att > 0 or
PP.rushing_att > 0 or
PP.receiving_tar > 0 or
PP.passing_sk > 0 or
PP.passing_twopta > 0 or
PP.rushing_twopta > 0 or
PL.note like 'TD' or
PL.note like '%FUMBLE%' or
PP.defense_qbhit > 0 or
PP.defense_tkl_loss > 0 or
PP.defense_tkl_loss_yds > 0 or
PP.defense_sk > 0 or
PP.defense_sk_yds > 0 or
PP.defense_ffum > 0 or
PP.defense_frec > 0 or
PP.defense_frec_yds > 0 or
PP.defense_frec_tds > 0 or
PP.defense_int > 0 or
PP.defense_int_tds > 0 or
PP.defense_int_yds > 0)
ORDER BY
G.start_time,
G.gsis_id,
PL.time

@BurntSushi
Copy link
Owner

Could you describe your query in English? It looks like this could be much simpler if you use the nfldb.Query interface.

@iliketowel
Copy link
Author

I was trying to pull in every statistic recorded, other than tackles. I was trying to tie every play to public.play_player so that I would have all recorded statistics, however based on what I've seen and played with, the public.play_player may not be 100% accurate. I'm had thought you had written before that there was another way to get more accurate statistical data.

On a football level... I have to mentally get over the Jets loss on Sunday... it was very... Jetlike.

@BurntSushi
Copy link
Owner

There is no way to freely access 100% accurate data. If you really need 100%, then you've got to pay Stats, Inc. for it.

I don't have a figure on the accuracy of nfldb's data, but a very rough guess is >95%.

There are some ideas I have for future work on increasing accuracy, but they will not happen any time soon.

I will try to respond again when I have more time to actually answer your question here. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants