Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

offense_yds aggregate double counts passing/recieving yds #85

Open
hvivian opened this issue May 21, 2015 · 2 comments
Open

offense_yds aggregate double counts passing/recieving yds #85

hvivian opened this issue May 21, 2015 · 2 comments

Comments

@hvivian
Copy link

hvivian commented May 21, 2015

According to the nfldb wiki, offense_yds are counted by summing

"nfldb.PlayPlayer.passing_yds, nfldb.PlayPlayer.rushing_yds, nfldb.PlayPlayer.receiving_yds and nfldb.PlayPlayer.fumbles_rec_yds".
However, passing yards == receiving yards, and counting both has the effect of inflating total yardage when aggregating PlayPlayers for multiple Players (for example, attempting to count the total yardage of a team over an entire game).

db = nfldb.connect()
q = nfldb.Query(db)
q.game(season_year=2014, season_type='Regular', week=16)
q.game(home_team='CIN').play(pos_team='CIN')

agg = q.as_aggregate() 

total_yds = sum([play.offense_yds for play in agg])
total_yds_true = sum([play.rushing_yds + play.receiving_yds + play.fumbles_rec_yds for play in agg])

print total_yds
print total_yds_true

Results in:

499
353

The ESPN box score agrees that the second result is accurate.

@ochawkeye
Copy link
Contributor

Aggregating individual player statistics isn't matching your expectation here, but I'm not sure how this one might be addressed other than documenting how the aggregated data could/should be used. offense_yds is derived from PlayPlayer statistics and as such wouldn't be the best candidate for calculating Play statistics.

You are correct that when you try to aggregate over multiple players that the total offense_yds breaks down, but any change along the lines of what you propose breaks aggregating over a single player which, in my opinion, is what the derived statistic's primary use is.

import nfldb

db = nfldb.connect()
q = nfldb.Query(db)
q.game(season_year=2014, season_type='Regular', week=16)
q.game(home_team='CIN').play(pos_team='CIN')
q.player(full_name='Andy Dalton')
agg = q.as_aggregate()

total_yds = sum([play.offense_yds for play in agg])
total_yds_true = sum([play.rushing_yds + play.receiving_yds + play.fumbles_rec_yds for play in agg])

print total_yds
print total_yds_true
171
25

In week 3 2014, Andy Dalton had 169 yards passing, 3 yards rushing, and 18 yards receiving. The only way to arrive at 190 total yards for the game is to use all statistics that currently go into offense_yds.

I think if you are harvesting full team stats for a game, then you might have to sum the play yards yourself rather than rely upon individual player statistics to add up to the number you are looking for.

@hvivian
Copy link
Author

hvivian commented May 21, 2015

Makes sense, thanks for clearing that up. I hadn't really considered that use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants