Skip to content

Nishant173/football-data-extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

football-data-extraction

  • Extracts football data from certain data sources
  • The raw JSON data is pulled via the understat module created by Amos Bastian
  • Extracts various kinds of football data from the top 5 leagues, starting from the 2014/15 season.
  • The top 5 leagues are: ['EPL', 'Bundesliga', 'La Liga', 'Serie A', 'Ligue 1']

Usage

  • You'll have to first install Python 3.6 (or higher) via any of these links:
    1. Python 3.7 for Windows 10
    2. Python 3.8 for Windows 10/8/7
    3. Python for Mac OS
    4. Python 3.6 for Mac OS
  • Make sure your environment variables are setup correctly - search for another resource on how to install Python, if necessary.
  • Open command-line/terminal in football-data-extraction directory, and do pip install -r requirements.txt to install all dependencies. If you're unfamiliar with command-line for Windows, check this out.
  • You might have to regenerate IDs of players/teams every season, by running any one of the following commands inside the understat_wrangler directory:
    1. python3 regenerate_ids.py
    2. python regenerate_ids.py
    3. py regenerate_ids.py
  • Open the user_inputs.csv file in the understat_wrangler directory, and feed in your inputs, regarding which data you'd like to extract.
  • You can then pull wrangled stats from understat by running any one of the following commands inside the understat_wrangler directory:
    1. python3 run.py
    2. python run.py
    3. py run.py

Code structure

  • The source code is present in the understat_wrangler directory
  • The extract.py file is used to extract raw JSON data from the understat module. You can checkout the understat documentation as well.
  • The transform.py file is used to transform/wrangle the raw JSON data into human-readable Excel/CSV files.
  • The pipeline.py file is used to put together the code in the codebase, and store various Excel/CSV files, as desired.

Which data can be extracted?

  • On this page by Amos Bastian, you can see which data is being extracted.
  • I've displayed the information that I reckon is necessary from said page below

Upcoming league fixtures data

League fixtures data

League players data

League players data

League results data

League results data

Match players data (data about all players in a given match)

Match players data

Match shots data (data about all shots taken in a given match)

Match shots data

Player grouped stats data

Player grouped stats data

Player shots data

Player shots data

Player stats data

Player stats data

Stats data (Time-series of stats)

Stats data

Upcoming team fixtures data

Team fixtures data

Team players data

Team players data

Team results data

Team results data

Team stats data

Team stats data

Team data (by league/season)

Team data

Contributing and Collaborating

  • Feel free to make a pull request with details about what you intend to change/add.
  • Feel free to write up an issue if you'd like to communicate any issues or reach out to me.

About

Extracts football data from certain data sources

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages