Skip to content

FarzanehSoltanzadeh/Quera-Football

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Football Analytics Project: From Transfermarkt Data Scraping to Predictive Insights

Welcome to the Football Analytics project! This repository showcases a comprehensive four-phase journey aimed at uncovering the hidden insights within football data. Through meticulous data scraping from the Transfermarkt website and subsequent analyses, we present a fresh perspective on the game we all love.

Phases of the Project

Phase 1: Data Collection

In the initial phase, we harnessed the power of Python's libraries, specifically Beautiful Soup 4 (BS4) and Selenium, to meticulously scrape valuable data from the top five European leagues: Spain, Germany, Italy, France, and England. The following dataset was collected:

Club Data
├── Big Five League from season 15/21
├── All Clubs
├── All players and their positions
├── All Rankings
├── All Squads
├── All total and average market values
├── All Ages and average ages
├── All Stadiums and their capacity
├── All coaches
├── All club victories and prizes
├── All club income | expenditure | OverallBalance
└── All foreign players
Players Data
├── Player Name
├── Player Full Name
├── Player ID
├── Player Shirt Number
├── Date of Birth
├── Citizenship
├── Place Of Birth
├── Caps
├── Goals
├── Other Positions
├── Foot
├── Outfitter
├── Agent
├── Contract Joined
├── Contract Expires
├── Date Of Last Contract
├── Height
├── Current Club
├── All Players Transfer Data (Season / Date / Market Value / Fee / Left / Joined)
├── All Players Stats
│   ├── Appearances
│   ├── Goals In Each Season
│   ├── Assists
│   ├── Yellow Card | Second Yellow Card | Red Card
│   ├── Minutes Played
│   └── Goals Conceded | Clean Sheets
└── ...

Phase 2: Data Processing and Database Design

During this phase, we began by developing an ER diagram to guide the creation of a structured MySQL database. After meticulous data cleaning, we seamlessly utilized SQL Alchemy to establish the database, ensuring a robust foundation for our analysis.

Phase 3: Statistical Analysis

In this phase, we leveraged statistical analysis to uncover insights and address pertinent questions related to the collected data. Below are some key inquiries we addressed:

  • Player Participation Analysis in the 2021-2022 Season: Distribution of Match Appearances and Percentage of Involvement
  • Exploring the Relationship Between Goals Scored and Estimated Player Value: A Linear Regression Analysis Using 2021-2022 Season Data
  • Analyzing the Relationship Between Goals Scored and Estimated Market Value for Strikers in the 2021-2022 Season Using Linear Regression
  • Exploring Estimated Player Prices Distribution by Position for the 2021-2022 Season Data
  • Goal Scoring Analysis Across Different Leagues in the 2021-2022 Season
  • Player Acquisition Costs Analysis across Seasons 2017-2018 to 2021-2022 in Football Leagues
  • Discrepancy Between Player Transfer Fees and Actual Values in Football Industry: A Comparative Analysis
  • Identifying Players with Performance in the Top 30% but Market Value in the Bottom 40%
  • Comparing Performance Distribution of Players Obtained in the previous parts with the Overall Player Population
  • Comparing Distribution of Players' Positions Obtained in the previous parts with the Overall Player Community
  • Identifying Underperforming Players in Top 5 European Leagues Based on Performance Metrics
  • Performance Comparison of Experienced and Young Football Players After Transfers to New Teams
  • Performance Comparison of Teams in UEFA Champions League and Domestic Leagues

Phase 4: Machine Learning Insights

In this pivotal phase, we harnessed the power of machine learning techniques to tackle three critical questions:

  1. Predicting Player Market Value
  2. Player Post Classification
  3. Player Similarity Clustering

Authors

About

quera bootcamp- project1

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages