Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importing nfldb on iPython Notebook (Jupyter) #159

Open
theSanchize3 opened this issue Jun 12, 2016 · 7 comments
Open

importing nfldb on iPython Notebook (Jupyter) #159

theSanchize3 opened this issue Jun 12, 2016 · 7 comments

Comments

@theSanchize3
Copy link

theSanchize3 commented Jun 12, 2016

I decided switch to iPython notebook because it was very annoying having to repeatedly wait 20+ minutes to execute code when I wasn't even editing the part of the code that takes up the bulk of the time. However, when I tried to import nfldb, it gave me an error. The first time, it said there was no nfldb module that exists. I added the path that was needed, but now I have a different error.

ImportError                               Traceback (most recent call last)
<ipython-input-14-92a941991d5d> in <module>()
----> 1 import nfldb

C:\Python27\lib\site-packages\nfldb\__init__.py in <module>()
     59 from __future__ import absolute_import, division, print_function
     60 
---> 61 from nfldb.db import __pdoc__ as __db_pdoc__
     62 from nfldb.db import api_version, connect, now, set_timezone, schema_version
     63 from nfldb.db import Tx

C:\Python27\lib\site-packages\nfldb\db.py in <module>()
      7 import sys
      8 
----> 9 import psycopg2
     10 from psycopg2.extras import RealDictCursor
     11 from psycopg2.extensions import TRANSACTION_STATUS_INTRANS

C:\Python27\lib\site-packages\psycopg2\__init__.py in <module>()
     48 # Import the DBAPI-2.0 stuff into top-level module.
     49 
---> 50 from psycopg2._psycopg import BINARY, NUMBER, STRING, DATETIME, ROWID
     51 
     52 from psycopg2._psycopg import Binary, Date, Time, Timestamp

ImportError: DLL load failed: %1 is not a valid Win32 application.

I'm not sure what this means. I tried importing psycopg2, but that returned this error:

ImportError                               Traceback (most recent call last)
<ipython-input-15-bd284aa2cf56> in <module>()
----> 1 import psycopg2

C:\Python27\lib\site-packages\psycopg2\__init__.py in <module>()
     48 # Import the DBAPI-2.0 stuff into top-level module.
     49 
---> 50 from psycopg2._psycopg import BINARY, NUMBER, STRING, DATETIME, ROWID
     51 
     52 from psycopg2._psycopg import Binary, Date, Time, Timestamp

ImportError: DLL load failed: %1 is not a valid Win32 application.

Please help with this.

Thanks.

@ochawkeye
Copy link
Contributor

very annoying having to repeatedly wait 20+ minutes to execute code when I wasn't even editing the part of the code that takes up the bulk of the time

What is your script doing that takes that long to run? A common query rarely takes more than seconds to complete. If you are collecting the entirety of the database through each pass maybe you should consider re-thinking your query to make it more efficient. Maybe if I could see your code I'd be able to provide a little more input.

I tried importing psycopg2, but that returned this error:

For this issue, I'd start there...if the environment is not set up properly, how is nfldb supposed to interact with the database?

@theSanchize3
Copy link
Author

theSanchize3 commented Jun 13, 2016

So I got nfldb to work in with Jupyter Notebook and ran this:

import nfldb
import pandas as pd
from pandas import DataFrame as df
from sqlalchemy import create_engine
from nfldb.query import guess_position
from pandas.io import sql

engine = create_engine('postgresql:https://nfldb:hohoho19@localhost:5432/nfldb')
db = nfldb.connect()

playerList = []
bigList = []

def Get_Yds_Per(yds,att):
    if att != 0:
        return yds / att
    elif att == 0:
        return 0

#I know I can make this part shorter, but I just haven't gotten to it yet.
q = nfldb.Query(db)
q.game(season_year=2015,season_type='Regular')
q.player(position__ne='K').player(position__ne='P').player(position__ne='DE').player(position__ne='CB')
q.player(position__ne='OLB').player(position__ne='FS').player(position__ne='DT').player(position__ne='T')
q.player(position__ne='SS').player(position__ne='NT').player(position__ne='OG').player(position__ne='MLB')
q.player(position__ne='LB').player(position__ne='C').player(position__ne='ILB').player(position__ne='LS')
q.player(position__ne='DB').player(position__ne='LB').player(position__ne='G')
for g in q.as_players():
    playerList.append([str(g.full_name),str(g.team),str(g.position)])

#This is the part that takes over 15 minutes.

for i in range(1,18):
    for p in playerList:
        q = nfldb.Query(db)
        q.game(season_year=2015,season_type='Regular',week=i)
        q.player(full_name=p[0])
        aaTeam = p[1]
        aaPos = p[2]
        if aaPos == 'UNK' or aaPos == 'UKN':
            aaPos = guess_position(q.as_play_players())
        for aa in q.as_play_players():
            aaTeam = aa.team
        q = nfldb.Query(db)
        q.game(season_year=2015,season_type='Regular',week=i)
        q.player(full_name=p[0])
        for a in q.as_aggregate():
            bigList.append([p[0],aaTeam,aaPos,i,a.passing_cmp,a.passing_att,a.passing_yds,
                            Get_Yds_Per(a.passing_yds,a.passing_att),a.passing_int,
                            a.passing_tds,a.passing_sk,a.rushing_att,a.rushing_yds,
                            Get_Yds_Per(a.rushing_yds,a.rushing_att),a.rushing_tds,
                            a.receiving_rec,a.receiving_tar,a.receiving_yds,
                            Get_Yds_Per(a.receiving_yds,a.receiving_rec),a.receiving_tds,
                            a.fumbles_rec+a.fumbles_lost,a.fumbles_lost])

The reason it was difficult without Jupyter Notebook is I put this into a DataFrame, and I had to wait 15+ minutes every time I wanted to change something or do something to the DataFrame.

I figured out how to get this to work in Jupyter Notebook, but if you know how to shorten the amount of time that the query takes that'd be great.

@ochawkeye
Copy link
Contributor

Well, without deconstructing the code too much, what you are doing is generating a list of more than 1800 players that appeared during the 2015 season.

With that list of players, you are then going through each of the seventeen weeks of the 2015 season 1800 times. When you do the .player() query, nfldb is going through each play of the week - regardless of whether or not the player appeared in that game - looking for the player you are interested in. A list of plays is produced and then each of those plays are evaluated combining the stats with as_aggregate(). There's a lot of overhead in looking at all of the plays for the week for each player.

I'll try to spend some time with this tonight to see if I can't give some better tips.

@theSanchize3
Copy link
Author

Ok, thank you.

@ochawkeye
Copy link
Contributor

#I know I can make this part shorter, but I just haven't gotten to it yet.
q = nfldb.Query(db)
q.game(season_year=2015,season_type='Regular')
q.player(position__ne='K').player(position__ne='P').player(position__ne='DE').player(position__ne='CB')
q.player(position__ne='OLB').player(position__ne='FS').player(position__ne='DT').player(position__ne='T')
q.player(position__ne='SS').player(position__ne='NT').player(position__ne='OG').player(position__ne='MLB')
q.player(position__ne='LB').player(position__ne='C').player(position__ne='ILB').player(position__ne='LS')
q.player(position__ne='DB').player(position__ne='LB').player(position__ne='G')
for g in q.as_players():
    playerList.append([str(g.full_name),str(g.team),str(g.position)])

You could replace that block with something like this:

playerList = [[g.full_name, g.team, g.position] for g in q.as_players() if str(g.position) in ['QB', 'RB', 'TE', 'WR', 'FB']]

Though, as you note, that won't do anything to pare down the amount of time you spend in the time intensive chunk of code (though on my laptop that takes closer to 9 seconds per week, not 15 minutes).

You're doing the right thing by substituting the play.team for the player's team that is found in the meta data. I'm not really able to get it down much under how you are doing it. Using this comes in at closer to 5-6 seconds per week, though I suspect you might be trying perform this on a web-based database which no doubt contributes immensely to the long run times.

@theSanchize3
Copy link
Author

Ok, thanks for the help. I shortened it as you suggested.

@cagz125
Copy link

cagz125 commented Mar 5, 2017

@theSanchize3 how did you end up solving your jupyter notebook issue mentioned above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants