Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different plays with same play_id in 2000 pbp data #243

Closed
ajreinhard opened this issue May 27, 2021 · 4 comments
Closed

Different plays with same play_id in 2000 pbp data #243

ajreinhard opened this issue May 27, 2021 · 4 comments
Labels
NFL data issue Error in the underlying data, not a bug

Comments

@ajreinhard
Copy link

It looks like there might be multiple plays in the 2000 play-by-play data that have been given the same play_id. No other seasons have a similar issue.

library(nflfastR)
#> Warning: replacing previous import 'vctrs::data_frame' by 'tibble::data_frame'
#> when loading 'dplyr'
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.6.3
#> Warning: package 'ggplot2' was built under R version 3.6.3
#> Warning: package 'tibble' was built under R version 3.6.3
#> Warning: package 'dplyr' was built under R version 3.6.3
  
pbp_df <- load_pbp(2000)

pbp_df %>% 
  group_by(game_id, play_id) %>% 
  filter(n() > 1) %>% 
  ungroup %>% 
  select(game_id, play_id, down, ydstogo, yardline_100, game_seconds_remaining, desc) %>% 
  arrange(game_id, play_id)
#> # A tibble: 12 x 7
#>    game_id   play_id  down ydstogo yardline_100 game_seconds_re~ desc           
#>    <chr>       <dbl> <dbl>   <dbl>        <dbl>            <dbl> <chr>          
#>  1 2000_03_~    2767     1      10           16              900 (15:00) T.Couc~
#>  2 2000_03_~    2767     2      10           16              900 (15:00) T.Couc~
#>  3 2000_03_~    2768    NA       0           16               NA Timeout #1 by ~
#>  4 2000_03_~    2768     3       4           10              900 (15:00) T.Couc~
#>  5 2000_06_~    1825    NA       0           48               NA Timeout #3 by ~
#>  6 2000_06_~    1825     4       9           69             1846 (:46) T.Barnha~
#>  7 2000_06_~    1825     1      10           52             1833 (:33) D.McNabb~
#>  8 2000_06_~    1825     2       8           50             1816 (:16) D.McNabb~
#>  9 2000_11_~    2323     2       3           94             1429 (8:49) B.Gries~
#> 10 2000_11_~    2323     3       1           96             1386 (8:06) T.Davis~
#> 11 2000_11_~    2323     4       2           95             1349 (7:29) J.Elam ~
#> 12 2000_11_~    2323    NA       0           30             1345 J.Elam kicks 5~
@guga31bb
Copy link
Member

guga31bb commented May 28, 2021

Sadly nothing to be done about this one #33 (comment)

Note that there are fewer games affected by this since we switched to a different data source for 2001-2010, but there's no way to fix the games from 2000, I don't think

@mrcaseb
Copy link
Member

mrcaseb commented May 28, 2021

Since we are talking about 12 plays in 3 games I think we could hard code the correct order of the problematic plays using the game books. Maybe we can choose a new play_id? Is there any system in the play ids?

Example
grafik
grafik

@guga31bb
Copy link
Member

guga31bb commented Jun 3, 2021

I think there's a system in play IDs because they can be linked to external sources (ESPN, Big Data Bowl) but for games from 2000 this isn't that relevant and hard coding some play IDs is probably fine

@tanho63 tanho63 added the NFL data issue Error in the underlying data, not a bug label Dec 8, 2021
@guga31bb
Copy link
Member

guga31bb commented Jul 7, 2022

Closing since it's the same as #33

@guga31bb guga31bb closed this as completed Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NFL data issue Error in the underlying data, not a bug
Projects
None yet
Development

No branches or pull requests

4 participants