Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nflgame-update-players socket timeout error #136

Open
andr3w321 opened this issue Sep 10, 2015 · 6 comments
Open

nflgame-update-players socket timeout error #136

andr3w321 opened this issue Sep 10, 2015 · 6 comments

Comments

@andr3w321
Copy link

Sorry to keep opening issues with your projects (which are great btw thanks for all your hard work). I got nflvid working now, but I can't seem to get nflgame-update-players to work properly. I suspect a lot of my issues were related to hacking together old score data downloads, but I've since completely reinstalled all your programs with a fresh db.

$ nflgame-update-players 
Loading games for REG 2015 week 1
Downloading team rosters...
/home/dan/.local/lib/python2.7/site-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

  markup_type=markup_type))
23/32 complete. (71.88%)Traceback (most recent call last):
  File "/usr/local/bin/nflgame-update-players", line 4, in <module>
    nflgame.update_players.run()
  File "/home/dan/.local/lib/python2.7/site-packages/nflgame/update_players.py", line 398, in run
    for i, (team, soup) in enumerate(pool.imap(fetch, teams), 1):
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 659, in next
    raise value
httplib2.ServerNotFoundError: Unable to find the server at www.nfl.com

Sometimes it ends in a socket.timeout instead of the unable to find nfl.com error

$ nflgame-update-players --full-scan
Loading players in games since 2009, this may take a while...
Traceback (most recent call last):
  File "/home/dan/.local/bin/nflgame-update-players", line 4, in <module>
    nflgame.update_players.run()
  File "/home/dan/.local/lib/python2.7/site-packages/nflgame/update_players.py", line 348, in run
    for _, schedule in nflgame.sched.games.itervalues():
ValueError: too many values to unpack

I've also tried $ nflgame-update-players --simultaneous-reqs 1 which gives the same error. Sometimes it gets further in the download sometimes it gets not as far...

32/32 complete. (100.00%)
Done!
Fetching GSIS identifiers for players not in nflgame...
90/330 complete. (27.27%)Traceback (most recent call last):
  File "/home/dan/.local/bin/nflgame-update-players", line 4, in <module>
    nflgame.update_players.run()
  File "/home/dan/.local/lib/python2.7/site-packages/nflgame/update_players.py", line 426, in run
    for i, (purl, gid) in enumerate(pool.imap(fetch, purls), 1):
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 659, in next
    raise value
httplib2.ServerNotFoundError: Unable to find the server at www.nfl.com
@BurntSushi
Copy link
Owner

I have no clue. It is working fine for me here. Can you access www.nfl.com in a browser? It looks like some kind of connectivity problem...

@andr3w321
Copy link
Author

Yes. Maybe I am being rate limited by nfl.com right now? I'll try again this weekend from a different ip address.

@BurntSushi
Copy link
Owner

FWIW, I've never seen nfl.com do rate limiting in the years I've been doing this. Specifically, as long as you're starting with an existing player database, the number of requests sent to nfl.com should be pretty small.

@andr3w321
Copy link
Author

Good to know. I did try running nfldb-update and nflgame-update-players a few times already today with a much larger database compiled here BurntSushi/nfldb#82 which I think may have been making WAY more requests than intended.

@andr3w321
Copy link
Author

I was able to get this to work today after a few tries. Maybe just need to catch the error and retry the GET if the first one fails.

$ nflgame-update-players
Loading games for REG 2015 week 1
Finding (profile id -> gsis id) mapping for players...
Traceback (most recent call last):
File "/home/dan/.local/bin/nflgame-update-players", line 4, in
nflgame.update_players.run()
File "/home/dan/.local/lib/python2.7/site-packages/nflgame/update_players.py", line 376, in run
for i, t in enumerate(pool.imap(fetch, players.items()), 1):
File "/usr/lib/python2.7/multiprocessing/pool.py", line 659, in next
raise value
httplib2.ServerNotFoundError: Unable to find the server at www.nfl.com
$ nflgame-update-players
Loading games for REG 2015 week 1
Finding (profile id -> gsis id) mapping for players...
83/101 complete. (82.18%)Traceback (most recent call last):
File "/home/dan/.local/bin/nflgame-update-players", line 4, in
nflgame.update_players.run()
File "/home/dan/.local/lib/python2.7/site-packages/nflgame/update_players.py", line 376, in run
for i, t in enumerate(pool.imap(fetch, players.items()), 1):
File "/usr/lib/python2.7/multiprocessing/pool.py", line 659, in next
raise value
httplib2.ServerNotFoundError: Unable to find the server at www.nfl.com
$ nflgame-update-players
Loading games for REG 2015 week 1
Finding (profile id -> gsis id) mapping for players...
101/101 complete. (100.00%)
Done!
Downloading team rosters...
/home/dan/.local/lib/python2.7/site-packages/bs4/init.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

BeautifulSoup([your markup])

to this:

BeautifulSoup([your markup], "lxml")

markup_type=markup_type))
32/32 complete. (100.00%)
Done!
Fetching GSIS identifiers for players not in nflgame...
222/222 complete. (100.00%)
Done!

@andr3w321
Copy link
Author

I know this is an old issue but I found the source of the problem. This is sort of unrelated to nflgame as I found the same issue with other scrapers and it was due to my poor wifi connection and lost packets. The solution was to retry all get requests if the first one fails. For a code sample I replace all

req = requests.get(url)

with

''' Get a url and return the request, try it up to 3 times if it fails initially'''
def retry_request(url):
    session = requests.Session()
    session.mount("http:https://", requests.adapters.HTTPAdapter(max_retries=3))
    return session.get(url=url)

req = retry_request(url)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants