-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link to images #44
Comments
import nfldb
url_start = 'http:https://www.nfl.com/players/profile?id='
db = nfldb.connect()
q = nfldb.Query(db)
q.game(season_year=2014, season_type='Preseason', week=0)
for pp in q.limit(20).as_aggregate():
print('%s: %s%s' % (pp.player, url_start, pp.player.profile_id))
I do not know how they decide the folder structure of the actual photos, though, so I think you'd have to scrape those yourself. |
Yeah, it's interesting, the profileIDs are different than the photo ID they are attached to. I would actually be able to bring it over directly if the ID in the image was the same. As it stands, I can bring over the page, but I have no way of tying the image. |
I agree this would be a nice addition. It's an easy but tedious change that requires
You're welcome to take a crack at it, otherwise you have two choices:
I should caution you: unlike statistical data, images are copyrighted content (as are logos and video footage). Therefore, it isn't a good idea to show them on a public web site. If it's just for your personal private use, then you're OK. |
Thanks, I may give it a shot. One question though, is this something I would have to pull from the url, or would this also exist somewhere in the GSIS data? |
I don't know what the "GSIS data" is. Do you mean the NFL.com gamecenter feed? There is virtually no player data in the JSON feed other than an abbreviated name (e.g., |
@iliketowel I would very much encourage you to take a crack it. I will happily mentor you through it. The easiest way is to log on to IRC/FreeNode at Otherwise, we could do it over the issue tracker or via email. |
I'm going to give it a shot. I'll let you know when i run into issues. But it probably won't be until tomorrow at the earliest. |
The photo URL isn't actually as cryptic as I once believed. Included in the source of the player profile page, residing right along side the For Dominique Rodgers-Cromartie, the following can be found:
The photo URL looks like
Testing this out sight unseen has been successful for a number of players, though your mileage may vary. For example, Tom Brady's profile page
Grabbing that
which takes us right to the Tom Terrific's beautiful mug. |
It would probably be worthwhile to extract both pieces, in case there are some images that don't follow the pattern. I've also seen the ESB id used in other places (I think the XML gamebook files). |
The ESB ID, is equivalent to the actual Profile_ID on the nfl.com profile pages, the link to the images with the ESBID is actually quite simple. It's the image is always What I've been struggling with is how to either pull that ID from the data the way that you pull the rest of the information in the script, or how to add the ESBID into the script. |
@ochawkeye URK! Beware. Both of those functions issue a new request. You really don't want to do that. Ideally you'd issue one request and retrieve all information possible. @iliketowel I will try to write something up that will guide you. In the mean time, forget about nflgame. Instead, pick a profile page that has an image, read the documentation for import bs4
import requests
html = requests.get('profile_url').read()
soup = bs4.soup(html)
# do stuff with soup (see beautifulsoup4 doco for examples) (That won't work verbatim. I'm just sketching out pseudo code and I probably got the function names wrong.) |
redacted :) |
I'm just confirming here, you mean "Should Not", right? |
Whoops, sorry, yes you're right. Start without (If you work on |
I know I'm over my skis here, but my new function to collect both def gsis_and_esb_ids(profile_url):
resp, content = new_http().request(profile_url, 'GET')
if resp['status'] != '200':
return None, None
gid, esb = None, None
m = re.search('GSIS\s+ID:\s+([0-9-]+)', content)
n = re.search('ESB\s+ID:\s+([A-Z][A-Z][A-Z][0-9]+)', content)
if m is not None:
gid = m.group(1).strip()
if n is not None:
esb = n.group(1).strip()
if len(gid) != 10:
gid = None
if len(esb) != 9:
esb = None
return gid, esb
def run():
...
if len(purls) > 0:
eprint('Fetching GSIS and ESB identifiers for players not in nflgame...')
def fetch(purl):
gid, esb = gsis_and_esb_ids(purl)
return purl, gid, esb
for i, (purl, gid, esb) in enumerate(pool.imap(fetch, purls), 1):
progress(i, len(purls)) |
That looks pretty reasonable, although I'd probably use a looser regex:
In my experience, NFL.com isn't always terribly consistent with their identifiers... |
I'm clearly doing something wrong. Because I get an error as soon as I try to do
I installed beautifulsoup4 when I installed nfldb, but is there some other sort of install I need to do separately? |
When there is an In this scenario, it's likely that you simply haven't installed Each search result is a package you can install. The package name is what you can use to install it with So once we think we know the package we want, it's time to install it, just like you installed
And then you should be able to run |
So, I'm still on the first part. I got as far as this:
Which prints the link:
But, I don't know how to pull out only the "MAN738705" (or 738705)? |
I think if you import re
s = "http:https://static.nfl.com/static/content/public/static/img/getty/headshot/M/A/N/MAN738705.jpg"
m = re.search('([^/]+)\.[^/]+$', s)
print m.group(1) Output: |
Okay, so, I'm not clear on the next step. I have this ability to create a static pull of the ID, but how do I do that dynamically for all of the players. I suspect it's something about the website, but I'm not sure what. |
@iliketowel That piece you thankfully don't need to worry about. The nflgame/update_players.py will actually do it for you. The next step is to take the code you used to extract the ID from the HTML and merge it into def gsis_id(profile_url):
resp, content = new_http().request(profile_url, 'GET')
if resp['status'] != '200':
return None
m = re.search('GSIS\s+ID:\s+([0-9-]+)', content)
if m is None:
return None
gid = m.group(1).strip()
if len(gid) != 10: # Can't be valid...
return None
return gid Here's what you might want to do: (notice the name change of the function!) def nfl_ids_for_player(profile_url):
resp, content = new_http().request(profile_url, 'GET')
if resp['status'] != '200':
return None
m = re.search('GSIS\s+ID:\s+([0-9-]+)', content)
if m is None:
return None
gid = m.group(1).strip()
if len(gid) != 10: # Can't be valid...
return None
# Your code goes here...
soup = ...
esb_id = ...
return {'gsis_id': gid, 'esb_id': esb_id} So at this point, I started going deeper (because you have to change the places where If you could do the above and submit a pull request to the |
@iliketowel @BurntSushi I'm not sure where this ended up, or if it went offline or whut, but I'm in the market for just this thing. I know it's over 2 years old, but I'd be happy to help contribute where possible to get something working. For personal use, of course. |
FWIW I've been using this data just locally in a Postgres DB and found a pretty straight forward way to inject the ESB IDs using some modification of the above code and psycopg2. From that I can just apply a generic URL to have the avatars render wherever I query it. I'm not sure anyone's interested in my janky Python code but the above references were super helpful getting it working. |
I'm not sure if this is data available in the database (or if there is even a 'profile' table). But I notice that all the players have a link to their webpage (profile_url)(http:https://www.nfl.com/player/playername/playerid/profile.
I wanted to pull in the image that's connected for all players (http:https://static.nfl.com/static/content/public/static/img/getty/headshot/K/A/E/KAE371576.jpg) (For Colin Kaepernick, for example), I'm curious if there is something that isn't currently brought in that would have that information. I was hoping to use this in my dashboard
The text was updated successfully, but these errors were encountered: