Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding custom stats to database #262

Open
CMorton737 opened this issue Sep 11, 2017 · 5 comments
Open

Adding custom stats to database #262

CMorton737 opened this issue Sep 11, 2017 · 5 comments

Comments

@CMorton737
Copy link

@ochawkeye will probably be the best guy to help me out with this question

I use nfldb for fantasy analysis. I have my own processes where I compute scores for players, but it can be cumbersome when I have to iterate through every play to calculate 'tiered' statistics, such as distance of each kicking_fgm stat. I'd like to be able to use the .as_aggregate() function, as it is so much faster than calculating it play-by-play, especially when calculating multi-season stats. I figure the best thing to do is to create a new {kicking_fgm_50 : 1} stat and add it to the stats dictionary in the nfldb database.

I see two paths forward to address this:
1.) I can modify nflgame to compute these stats as it parses the raw json and populates nfldb
2.) I can write an independent script (update-tiered-stats.py) that will iterate through every play in the database and insert each tiered stat to its original play statistics set in the database

I like one because it is clean and straightforward, but I don't know how much I like the idea of editing the base code of the nflgame package. Not my usual M.O.

I like two because it is a modular add-on script that can be shared online. However it seems inefficient and a bit of a lazy hack.

So, what are your thoughts on the best way to go about doing this? or maybe you have another idea as to the most efficient way to keep track of FF scoring using the nflx packages?

@etsweeney
Copy link

Someone will have a better answer, but in the meantime you can check out what @JessePresnell did with his forked code, as it looks like he added some fantasy scoring.

https://github.com/JessePresnell/nfldb/commits/master

@CMorton737
Copy link
Author

CMorton737 commented Sep 13, 2017

Thanks. This is a bit removed from what I'm trying to do, but interesting nonetheless!

My project is a combination of nfldb and an ESPN FF API. I have one database of player/game information (nfldb), and another database of specific league information for different leagues. Since scoring is specific to a leagues settings, the scoring procedures must be dynamic.

I'll add a bit more information
I would ultimately like to have my code operate as such:

league_scoring = {'passing_tds': 4, 'rushing_yds': 0.1,..., kicking_fgm_50: 4}

q.game(season_year = 2017, week = 1).player(player_id = 'xyz')
for pp in q.as_aggregate():
    for stat in pp.fields:
        score[stat] = league_scoring[stat] * getattr(pp, stat)

Where the statistic kicking_fgm_50 was already calculated and inserted into the nfldb database, rather than having to compute kicking_fgm_50 every time I want to compute a score

I'm just looking to discuss the best way to get tiered statistics such as kicking_fgm_50, passing_tds_50, punting_inside_20, etc into the database before querying so that when I compute the score it becomes a simple "stat_score = stat_points * stat_value" operation.

@sansbacon
Copy link

You can create a trigger function that does the calculations when new rows are inserted into the applicable table. This will slow down the inserts, though.

You could also create a postgresql function that does the calculations for you based on various parameters. Then query the function instead of the underlying table.

As an aside, why is it a problem to recalculate in python code? I would be surprised if performance was a huge issue for a small number of players and statistics. If you find that you are doing the same calculations over and over, you can cache the function results.

@JessePresnell
Copy link

Hey @CMorton737 --

I've updated my query a bit, but I ran into a similar use case where I had multiple scoring systems. The best thing to do is take the structure of the fantasy.sql script and create views that will automatically update named for the scoring system you use. In my case individual_dk represents individual level fantasy stats for Draft Kings. You could do the same for ESPN, Fan Duel, NFL, or whatever scoring system you may use. Creating views seems more optimal than writing additional parsing functionality as the data that's already been parsed is sufficient to build your scoring tables out with. Using views ensures the data stays dynamic.

As for referencing those tables inside of nflgame, I'd have to toy with the code, but I'm sure it's possible.

@CMorton737
Copy link
Author

@sansbacon I like the idea of using a trigger function. Assuming by 'slowing down inserts' you mean during a live game or when populating nfldb, it which case it I don't see that as much of an issue. I'll definitely look into it.

I see it as an issue because my ultimate goal is to use my FF analysis package as a backend for a small fantasy football stats website where a user can enter their league ID and get info relevant to their league/waiver wire. Since NFL player statistics are global, I figure it would be best to calculate the tiered stats one time and insert them to the database, rather than calculate them every time a user requests a score (which could end up being quite frequent)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants