- This repository contains the code and data for a project on England Premier League player's info and stats in 2020-2021 . The project uses the EPL_20_21.csv dataset, which can be downloaded from Kaggle. Link:https://www.kaggle.com/datasets/rajatrc1705/english-premier-league202021?select=EPL_20_21.csv
-
This dataset is a collection of basic but crucial stats of the English Premier League 2020-21 season. The dataset has all the players that played in the EPL and their standard stats such as Goals, Assists, xG, xA, Passes Attempted, Pass Accuracy and more! The columns are:
-
Position:Each player has a certain position, in which he plays regularly. The position in this dataset are, FW - Forward, MF - Midfield, DF - Defensive, GK - Goalkeeper
-
Starts: The number of times the player was named in the starting 11 by the manager.
-
MinS:The number of minutes played by the player.
-
Goals:The number of Goals scored by the player.
-
Assists:The number of times the player has assisted other player in scoring the goal.
-
Passes_Attempted:The number of passes attempted by the player.
-
Perc_Passes_Completed:The number of passes that the player accurately passed to his teammate.
-
xG:Expected number of goals from the player in a match.
-
xA:Expected number of assists from the player in a match.
-
Yellow_Cards:The players get a yellow card from the referee for indiscipline, technical fouls, or other minor fouls.
-
Red Cards:The players get a red card for accumulating 2 yellow cards in a single game, or for a major foul.
The following tools were used for this analysis:
-
Python 3
-
Pandas
-
NumPy
-
Matplotlib
-
Seaborn
-
Plotly
-
Sklearn
-
To run this project, you will need to have Python 3 installed on your machine. You can install the required libraries by running the following command:
-
pip install pandas matplotlib seaborn numpy plotly
- To run the analysis, simply execute the notebook. The script will generate several visualizations that help illustrate analysis of data.
The analysis includes the following tasks:
- Data loading and cleaning
- Exploratory data analysis
- Feature engineering
- Correlation analysis
- detecting outliers
- building model
- evaluating model
- tuning model
- visualization results
The analysis includes visualizations using Matplotlib, Plotly and Seaborn.
- Contributions to this project are welcome. If you notice any errors or have ideas for additional analyses, please feel free to open an issue or submit a pull request.