-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
projections database add-on #77
Comments
I would definitely say to do it as a separate package that depends on
Nope. As long as there are no name collisions, you should be OK. If you modify existing tables... that might be harder. ... however, one of the nice properties of
The query system is by far the most complex piece of ... but I wrote all that from the perspective of modifying We can collaborate however you like, but if you want to chat synchronously, you can visit me on IRC/FreeNode on |
Thanks for the prompt response. That's all about what I expected (/hoped) to hear. I have a good amount of experience using databases, mostly interactively, but this will be my first time creating a database application. I think the best way to start is to not worry about the query system at first. I'll write some code to set up new tables and start collecting scraping scripts to populate them. Then when that seems to be working we can consider if it would be worthwhile to tie it into the query system. So until then, I hopefully won't need much help from you. One question, though, what do you mean about a wrapper around Thanks again for your offer to help, hopefully this doesn't turn out to be too much harder than I'm anticipating... |
Yeah, it could update the projections. It's mostly as a convenience and you can do it as a separate transaction.
Definitely. It might be worth checking out |
Out of curiosity, @hsharrison, how would you map game IDs and player IDs for player-game projections from external sources outside of NFL.com? Are you considering some sort of fuzzy string matching, or some sort of association table / cross-reference table to maintain your ID mappings? Or are those questions TBD at a later date? |
Well, what's prompting this is that if I've been doing a lot of it already with csv files. I've been maintaining a table of mis-matching names by hand, mapping alternative spellings onto player IDs. I'll probably want to work on a more robust solution eventually, but it's what I'll start with at least. So far it has 54 entries after processing data from a few different sources. |
@hsharrison Do you know about the fuzzy player name searching in |
Yeah, I saw that. When I started I wasn't sure if it would be robust and I had much smaller dataset so only had a few entries in the association table. Now that I have a longer list it would be a good opportunity to test its robustness. I was a little disappointed that you could only filter by current team, necessitating leaving the comparisons open to the entire database, but it might still be robust enough. I'll give it a shot. |
Yeah, it'd be nice to open that up a bit... It might be possible, but it would require accessing the |
+1 for making past teams / rosters available. Not sure what it would look like, but having that functionality in some form would add a lot of flexibility, I think. |
@rkmaddox It's only possible to the extent that a player has recorded a statistic. So, e.g., offensive linemen are usually missing. |
Well, I've started a repo: https://github.com/hsharrison/nfldb-projections No code yet, just a README stub. The only thing of substance is a description of the new tables I'm thinking of adding. Curious if anyone has any comments. It does seem overly complex but I can't think of a better way to introduce the concept of fantasy points into the system. At this point, we can take any discussion to the new repo, and ping back here with updates and/or questions about nfldb internals. |
Ah right, hadn't really thought about that quirk of the data source. |
@BurntSushi to keep track of the add-on version, would you recommend adding a value to |
@hsharrison Unless you have a really good reason for doing so, I would very much try to stay away from modifying existing tables. |
Makes sense, thanks. |
The ERD is a big spaghetti mess but I think I've got all the major issues sorted bar one: handling games that aren't in nfldb yet. I'm at a bit of a loss here. The best plan I can think of is to use some placeholder value in What do you think? Is there better option that I'm missing? This feels like an ugly hack. |
Could you outline the use case here? Can it just not add info for games that don't exist in the database? Generally speaking, games can be in nfldb as soon as the schedule for them is known. For example, when the season starts, an entry for every game in the season gets added as soon as the schedule is released. Similarly for playoffs---you just have to update the |
Well, that solves the problem then! I just saw the comment in |
Are you looking at the
Unfortunately, the update script is rather complex. It has to be smart about what it updates, or else it will be really slow. e.g., It will bulk insert schedule data (which are just rows in the And once a game is over, it never changes. An interesting property of this approach is that it is idempotent. If you run
|
I was looking at this comment just a few lines below that docstring. In short, games more than 15 minutes in the future are skipped. But now I see that's just for purposes of checking for more info rather than not inserting the game at all. So I'm not sure why I didn't have the wild-card games in my database until I updated again just now. I thought I had updated yesterday or the day before, but I guess not. Well, it looks like this was a false alarm. I'll just make sure |
That's because I just updated |
Ooooh, everything makes sense now. Awesome. By the way, I've spent the last couple days looking through your code extensively (in order to mimic your approach wherever possible), and I have to say it's been way easier than I expected to figure out what's going on. So thanks for all the hard work you put into this, the final product is a joy to work with. |
No problem. :-) But you may not be saying that once you get to SQL generation. :-) This one is pretty crazy: https://github.com/BurntSushi/nfldb/blob/master/nfldb/sql.py#L177 |
Yeah, I almost added a line "but I haven't got the query system yet". :-) |
Still making progress... Any idea how I could handle DST "players"? The only option I have seems to be allowing the Alternatively, would you be willing to hardcode these rows into the Edit: would an intermediary table that is basically a concatenation of |
I definitely do not want to be adding defensive teams as players in the In all the code I've written, I've handled "defense players" as a special case unfortunately. I haven't yet discovered a good abstraction over regular players and defensive team "players." The other option is to have two projection tables. One for regular players and one for defensive teams. I'd probably try that. |
Fair enough.
Just curious, what's the rationale? A pro seems to be that I could then split the offensive statistics from the defensive categories. The con is that queries become a lot more complicated if one is interested in fantasy points across all "positions". Right now I'm leaning toward the intermediary table with a trigger to keep it updated when players are added. I think it will keep things easier, but I'm curious what your reason is to prefer splitting them. |
Hmm, I got a second opinion in favor of the option of splitting them, so I'm willing to accept my intuition here is wrong. |
I absolutely think you should go with your intuition. I'm not happy with any solution I've ever come up with---I'm just tossing out ideas. :-)
That second point is precisely the issue. If you split them, then running one query to grab everything is going to be difficult (so I agree with you there). It may be much easier to say, "If you want projections for individual players, then write this kind of SQL. And if you want projections for defensive teams, then write this kind of SQL." If you end up developing a library or a UI around this stuff, that's where you might be able to merge them under one common interface. Your idea of using one id column to store either a player id or a team id is interesting. You're right to point out a foreign key problem. Maybe you just don't use a foreign key for that column? Another option is for each of your projection tables to have two columns: I really don't see a clear answer here. I'd suggest trying both and see which one works out better. :-) |
Alright, thanks for the advice. Sounds like our thoughts on this are similar. I'll give..something...a shot. |
Still thinking, but putting this link here for posterity as it touches upon all the options we just mentioned: http:https://stackoverflow.com/questions/441001/possible-to-do-a-mysql-foreign-key-to-one-of-two-possible-tables |
Hey, thanks for this awesome library. I've been thinking of starting a new project, something similar to this but for keeping track of experts' projections.
I just thought I'd ask for advice, as I'd want it to integrate with
nfldb
--at least key off of game ID and player ID. But how tight is the question. Are things going to screw up if I had more tables to the database? And how hard would it be to leverage the query system as well? Appreciate any thoughts you might have.As for how to make it available, I think the best option is a separate package with a dependency of
nfldb
. But if you think it would make more sense as a pull request, or even a fork, let me know.The text was updated successfully, but these errors were encountered: