Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink Player Stats Logic #445

Open
mrcaseb opened this issue Nov 18, 2023 · 3 comments
Open

Rethink Player Stats Logic #445

mrcaseb opened this issue Nov 18, 2023 · 3 comments

Comments

@mrcaseb
Copy link
Member

mrcaseb commented Nov 18, 2023

We see new issues in #444 and already had lots of problems caused by the fact that we summarize play stats into a tidy form.

I think for player stats, we should make a transition to a new concept.

We load raw game data, extract the play stats and row bind them. This will make it pretty easy and straightforward to correctly summarize player stats and team stats.

@guga31bb
Copy link
Member

Yep this makes more sense

@mrcaseb
Copy link
Member Author

mrcaseb commented Jun 28, 2024

Thought about this for a while and there are several design decisions to be made.

  1. Function name: I would like to wrap all stats in calculate_stats. This function should output everything that the three currently implemented player stats functions return
  2. Functions args: That's a bit tricky. If we want to keep model based stats like epa, we need the actual pbp data but most stats will be computed using playstats data from this release. If pbp is the function argument, then we have to download playstats and filter down to the game ids in pbp. If we use seasons as argument, we have to download both pbp and playstats for those seasons.
  3. Summary level: currently we either summarise all or weekly. We do not care about multiple seasons and leave it to the user to supply relevant pbp data subsets. Generally, I would like to apply the pattern in the kicking stats function
    grp_vars <- if (isTRUE(weekly)){
    list("season", "week", "season_type", "player_id", "team")
    } else if (isFALSE(weekly)){
    list("player_id", "team")
    }
    where we set the grouping variables dynamically with rlang::data_syms()
  4. Allow team stats as well? It's mostly just a different grouping variable

My current thought process is something like this :

calculate_stats <- function(seasons = nflreadr::most_recent_season(),
                            summary_level = c("season", "week"),
                            stat_type = c("player", "team")){
  
  pbp <- nflreadr::load_pbp(seasons = seasons)
  
  playstats <- # a load_pbp pendant for playstats from https://github.com/nflverse/nflverse-pbp/releases/download/playstats/play_stats_{season}.rds
  
    
  # set grouping variables based off summary_level and stat_type
  # 
  # sumarise epa stats and dakota using pbp
  # 
  # summarise all other stats using playstats. That's a big call to summarise
  # where we create all sorts of stats with the various stat IDs
  # 
  # load player data if stat_type is player to joing player info
  # 
  # join everything 
  
}

@guga31bb
Copy link
Member

This all sounds logical to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants