Skip to content

Empowering football analytics through Transfermarkt data crawling, robust database design, and advanced analytics, yielding valuable insights and accurate predictions

Notifications You must be signed in to change notification settings

MiladNooraei/Quera-Football

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Football Analytics Project: From Transfermarkt Data Scraping to Predictive Insights

Welcome to the Football Analytics project! This repository showcases a comprehensive four-phase journey aimed at uncovering the hidden insights within football data. Through meticulous data scraping from the Transfermarkt website and subsequent analyses, we present a fresh perspective on the game we all love.

Phases of the Project

Phase 1: Data Collection

In the initial phase, we harnessed the power of Python's libraries, specifically Beautiful Soup 4 (BS4) and Selenium, to meticulously scrape valuable data from the top five European leagues: Spain, Germany, Italy, France, and England. The following dataset was collected:

Club Data
├── Big Five League from season 15/21
├── All Clubs
├── All players and their positions
├── All Rankings
├── All Squads
├── All total and average market values
├── All Ages and average ages
├── All Stadiums and their capacity
├── All coaches
├── All club victories and prizes
├── All club income | expenditure | OverallBalance
└── All foreign players
Players Data
├── Player Name
├── Player Full Name
├── Player ID
├── Player Shirt Number
├── Date of Birth
├── Citizenship
├── Place Of Birth
├── Caps
├── Goals
├── Other Positions
├── Foot
├── Outfitter
├── Agent
├── Contract Joined
├── Contract Expires
├── Date Of Last Contract
├── Height
├── Current Club
├── All Players Transfer Data (Season / Date / Market Value / Fee / Left / Joined)
├── All Players Stats
│   ├── Appearances
│   ├── Goals In Each Season
│   ├── Assists
│   ├── Yellow Card | Second Yellow Card | Red Card
│   ├── Minutes Played
│   └── Goals Conceded | Clean Sheets
└── ...

Phase 2: Data Processing and Database Design

During this phase, we began by developing an ER diagram to guide the creation of a structured MySQL database. After meticulous data cleaning, we seamlessly utilized SQL Alchemy to establish the database, ensuring a robust foundation for our analysis.

Phase 3: Statistical Analysis

In this phase, we leveraged statistical analysis to uncover insights and address pertinent questions related to the collected data. Below are some key inquiries we addressed:

  • Player Participation Analysis in the 2021-2022 Season: Distribution of Match Appearances and Percentage of Involvement
  • Exploring the Relationship Between Goals Scored and Estimated Player Value: A Linear Regression Analysis Using 2021-2022 Season Data
  • Analyzing the Relationship Between Goals Scored and Estimated Market Value for Strikers in the 2021-2022 Season Using Linear Regression
  • Exploring Estimated Player Prices Distribution by Position for the 2021-2022 Season Data
  • Goal Scoring Analysis Across Different Leagues in the 2021-2022 Season
  • Player Acquisition Costs Analysis across Seasons 2017-2018 to 2021-2022 in Football Leagues
  • Discrepancy Between Player Transfer Fees and Actual Values in Football Industry: A Comparative Analysis
  • Identifying Players with Performance in the Top 30% but Market Value in the Bottom 40%
  • Comparing Performance Distribution of Players Obtained in the previous parts with the Overall Player Population
  • Comparing Distribution of Players' Positions Obtained in the previous parts with the Overall Player Community
  • Identifying Underperforming Players in Top 5 European Leagues Based on Performance Metrics
  • Performance Comparison of Experienced and Young Football Players After Transfers to New Teams
  • Performance Comparison of Teams in UEFA Champions League and Domestic Leagues

Phase 4: Machine Learning Insights

In this pivotal phase, we harnessed the power of machine learning techniques to tackle three critical questions:

  1. Predicting Player Market Value
  2. Player Post Classification
  3. Player Similarity Clustering

Authors

About

Empowering football analytics through Transfermarkt data crawling, robust database design, and advanced analytics, yielding valuable insights and accurate predictions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.6%
  • Python 0.4%