importing nfldb on iPython Notebook (Jupyter) #159

theSanchize3 · 2016-06-12T00:49:17Z

I decided switch to iPython notebook because it was very annoying having to repeatedly wait 20+ minutes to execute code when I wasn't even editing the part of the code that takes up the bulk of the time. However, when I tried to import nfldb, it gave me an error. The first time, it said there was no nfldb module that exists. I added the path that was needed, but now I have a different error.

ImportError                               Traceback (most recent call last)
<ipython-input-14-92a941991d5d> in <module>()
----> 1 import nfldb

C:\Python27\lib\site-packages\nfldb\__init__.py in <module>()
     59 from __future__ import absolute_import, division, print_function
     60 
---> 61 from nfldb.db import __pdoc__ as __db_pdoc__
     62 from nfldb.db import api_version, connect, now, set_timezone, schema_version
     63 from nfldb.db import Tx

C:\Python27\lib\site-packages\nfldb\db.py in <module>()
      7 import sys
      8 
----> 9 import psycopg2
     10 from psycopg2.extras import RealDictCursor
     11 from psycopg2.extensions import TRANSACTION_STATUS_INTRANS

C:\Python27\lib\site-packages\psycopg2\__init__.py in <module>()
     48 # Import the DBAPI-2.0 stuff into top-level module.
     49 
---> 50 from psycopg2._psycopg import BINARY, NUMBER, STRING, DATETIME, ROWID
     51 
     52 from psycopg2._psycopg import Binary, Date, Time, Timestamp

ImportError: DLL load failed: %1 is not a valid Win32 application.

I'm not sure what this means. I tried importing psycopg2, but that returned this error:

ImportError                               Traceback (most recent call last)
<ipython-input-15-bd284aa2cf56> in <module>()
----> 1 import psycopg2

C:\Python27\lib\site-packages\psycopg2\__init__.py in <module>()
     48 # Import the DBAPI-2.0 stuff into top-level module.
     49 
---> 50 from psycopg2._psycopg import BINARY, NUMBER, STRING, DATETIME, ROWID
     51 
     52 from psycopg2._psycopg import Binary, Date, Time, Timestamp

ImportError: DLL load failed: %1 is not a valid Win32 application.

Please help with this.

Thanks.

The text was updated successfully, but these errors were encountered:

ochawkeye · 2016-06-13T21:41:33Z

very annoying having to repeatedly wait 20+ minutes to execute code when I wasn't even editing the part of the code that takes up the bulk of the time

What is your script doing that takes that long to run? A common query rarely takes more than seconds to complete. If you are collecting the entirety of the database through each pass maybe you should consider re-thinking your query to make it more efficient. Maybe if I could see your code I'd be able to provide a little more input.

I tried importing psycopg2, but that returned this error:

For this issue, I'd start there...if the environment is not set up properly, how is nfldb supposed to interact with the database?

theSanchize3 · 2016-06-13T22:16:57Z

So I got nfldb to work in with Jupyter Notebook and ran this:

import nfldb
import pandas as pd
from pandas import DataFrame as df
from sqlalchemy import create_engine
from nfldb.query import guess_position
from pandas.io import sql

engine = create_engine('postgresql:https://nfldb:hohoho19@localhost:5432/nfldb')
db = nfldb.connect()

playerList = []
bigList = []

def Get_Yds_Per(yds,att):
    if att != 0:
        return yds / att
    elif att == 0:
        return 0

#I know I can make this part shorter, but I just haven't gotten to it yet.
q = nfldb.Query(db)
q.game(season_year=2015,season_type='Regular')
q.player(position__ne='K').player(position__ne='P').player(position__ne='DE').player(position__ne='CB')
q.player(position__ne='OLB').player(position__ne='FS').player(position__ne='DT').player(position__ne='T')
q.player(position__ne='SS').player(position__ne='NT').player(position__ne='OG').player(position__ne='MLB')
q.player(position__ne='LB').player(position__ne='C').player(position__ne='ILB').player(position__ne='LS')
q.player(position__ne='DB').player(position__ne='LB').player(position__ne='G')
for g in q.as_players():
    playerList.append([str(g.full_name),str(g.team),str(g.position)])

#This is the part that takes over 15 minutes.

for i in range(1,18):
    for p in playerList:
        q = nfldb.Query(db)
        q.game(season_year=2015,season_type='Regular',week=i)
        q.player(full_name=p[0])
        aaTeam = p[1]
        aaPos = p[2]
        if aaPos == 'UNK' or aaPos == 'UKN':
            aaPos = guess_position(q.as_play_players())
        for aa in q.as_play_players():
            aaTeam = aa.team
        q = nfldb.Query(db)
        q.game(season_year=2015,season_type='Regular',week=i)
        q.player(full_name=p[0])
        for a in q.as_aggregate():
            bigList.append([p[0],aaTeam,aaPos,i,a.passing_cmp,a.passing_att,a.passing_yds,
                            Get_Yds_Per(a.passing_yds,a.passing_att),a.passing_int,
                            a.passing_tds,a.passing_sk,a.rushing_att,a.rushing_yds,
                            Get_Yds_Per(a.rushing_yds,a.rushing_att),a.rushing_tds,
                            a.receiving_rec,a.receiving_tar,a.receiving_yds,
                            Get_Yds_Per(a.receiving_yds,a.receiving_rec),a.receiving_tds,
                            a.fumbles_rec+a.fumbles_lost,a.fumbles_lost])

The reason it was difficult without Jupyter Notebook is I put this into a DataFrame, and I had to wait 15+ minutes every time I wanted to change something or do something to the DataFrame.

I figured out how to get this to work in Jupyter Notebook, but if you know how to shorten the amount of time that the query takes that'd be great.

ochawkeye · 2016-06-13T22:25:46Z

Well, without deconstructing the code too much, what you are doing is generating a list of more than 1800 players that appeared during the 2015 season.

With that list of players, you are then going through each of the seventeen weeks of the 2015 season 1800 times. When you do the .player() query, nfldb is going through each play of the week - regardless of whether or not the player appeared in that game - looking for the player you are interested in. A list of plays is produced and then each of those plays are evaluated combining the stats with as_aggregate(). There's a lot of overhead in looking at all of the plays for the week for each player.

I'll try to spend some time with this tonight to see if I can't give some better tips.

theSanchize3 · 2016-06-13T23:42:27Z

Ok, thank you.

ochawkeye · 2016-06-14T04:09:19Z

#I know I can make this part shorter, but I just haven't gotten to it yet.
q = nfldb.Query(db)
q.game(season_year=2015,season_type='Regular')
q.player(position__ne='K').player(position__ne='P').player(position__ne='DE').player(position__ne='CB')
q.player(position__ne='OLB').player(position__ne='FS').player(position__ne='DT').player(position__ne='T')
q.player(position__ne='SS').player(position__ne='NT').player(position__ne='OG').player(position__ne='MLB')
q.player(position__ne='LB').player(position__ne='C').player(position__ne='ILB').player(position__ne='LS')
q.player(position__ne='DB').player(position__ne='LB').player(position__ne='G')
for g in q.as_players():
    playerList.append([str(g.full_name),str(g.team),str(g.position)])

You could replace that block with something like this:

playerList = [[g.full_name, g.team, g.position] for g in q.as_players() if str(g.position) in ['QB', 'RB', 'TE', 'WR', 'FB']]

Though, as you note, that won't do anything to pare down the amount of time you spend in the time intensive chunk of code (though on my laptop that takes closer to 9 seconds per week, not 15 minutes).

You're doing the right thing by substituting the play.team for the player's team that is found in the meta data. I'm not really able to get it down much under how you are doing it. Using this comes in at closer to 5-6 seconds per week, though I suspect you might be trying perform this on a web-based database which no doubt contributes immensely to the long run times.

theSanchize3 · 2016-06-14T15:00:03Z

Ok, thanks for the help. I shortened it as you suggested.

cagz125 · 2017-03-05T04:09:55Z

@theSanchize3 how did you end up solving your jupyter notebook issue mentioned above?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

importing nfldb on iPython Notebook (Jupyter) #159

importing nfldb on iPython Notebook (Jupyter) #159

theSanchize3 commented Jun 12, 2016 •

edited

Loading

ochawkeye commented Jun 13, 2016

theSanchize3 commented Jun 13, 2016 •

edited

Loading

ochawkeye commented Jun 13, 2016

theSanchize3 commented Jun 13, 2016

ochawkeye commented Jun 14, 2016

theSanchize3 commented Jun 14, 2016

cagz125 commented Mar 5, 2017

importing nfldb on iPython Notebook (Jupyter) #159

importing nfldb on iPython Notebook (Jupyter) #159

Comments

theSanchize3 commented Jun 12, 2016 • edited Loading

ochawkeye commented Jun 13, 2016

theSanchize3 commented Jun 13, 2016 • edited Loading

ochawkeye commented Jun 13, 2016

theSanchize3 commented Jun 13, 2016

ochawkeye commented Jun 14, 2016

theSanchize3 commented Jun 14, 2016

cagz125 commented Mar 5, 2017

theSanchize3 commented Jun 12, 2016 •

edited

Loading

theSanchize3 commented Jun 13, 2016 •

edited

Loading